Make confident decisions with probabilities instead of p-values, understanding exactly how likely each variant is to be the winner
Posterior-focused analysis and decision metrics.
Compute posterior for each variant (Beta updates for binary outcomes) with credible intervals.
Draw posterior samples to estimate P(best), pairwise probabilities, and expected loss.
Summarize absolute/relative uplift vs control with credible intervals.
Provide decision metrics: recommendation, confidence, expected risk, sample size status.
Direct mapping to the analysis module’s result object.
Best performing variant with probability of being best, expected loss if wrong, sample sizes analyzed, confidence levels, and clear go/no-go recommendations
Conversion rates with credible intervals, probability each variant is best, head-to-head comparisons, uplift vs control with confidence bands, decision recommendations
Posterior probability distributions, credible interval plots, Monte Carlo simulation results, uplift charts showing relative and absolute improvements
Posterior distributions, credible intervals, and P(B > A) for uplift decisions.
Decision rules based on expected loss/utility, with ROPE thresholds for practicality.
Optional sequential monitoring without p‑hacking; stop early when confident.
Your data needs a group_column (variant names like A/B or control/treatment) and outcome_column (binary 0/1 for conversion). Each row represents one user/session.
Data format: Simple table with variant labels and conversion outcomes. Supports multiple treatment variants vs control. Binary outcomes only (converted/not converted).
Minimum requirements: At least 100 conversions per variant recommended, ideally 1000+ samples per variant for reliable posterior distributions.
What you get: Probability that each variant is best, expected loss if you choose wrong, credible intervals showing uncertainty, and clear recommendations based on your risk tolerance.
From priors to decision thresholds
Define metrics and priors; clean data, check randomization, and stratify if needed.
Compute posterior distributions; summarize uplift, credible intervals, and P(best).
Apply expected‑loss rules and ROPE; optional sequential monitoring for early stopping.
Bayesian testing offers direct, intuitive probabilities and allows principled monitoring—speeding decisions while reducing false stops.
Move from “is it statistically significant?” to “how likely is this better, by how much, and what’s the downside risk?”. Stakeholders get credible intervals and expected loss for confident go/no‑go calls.
Note: Good randomization and metric definitions are critical. For complex heterogeneity or many variants, consider hierarchical models or adaptive designs.
Make probability‑based decisions with clear risk tradeoffs
Read the article: Bayesian A/B Testing