Bayesian A/B testing reframes decisions in terms of probabilities and costs: “How likely is B better than A, by how much, and what’s my expected loss if I choose wrong?”
Quick Overview
Inputs
- Dataset: tabular data with group and outcome
- Columns:
group_column,outcome_column,control_name,treatment_names - Priors:
prior_alpha,prior_beta(defaults 1, 1) - Analysis:
credible_interval(default 0.95),n_simulations(default 10000)
What
- Update Beta posteriors per variant; compute means and credible intervals
- Simulate to estimate P(best), pairwise P(A > B), expected loss
- Summarize absolute/relative uplift vs control with credible intervals
- Produce decision metrics with recommended action and risk
Why
- Decision‑focused probabilities, not p‑values
- Transparent risk quantification via expected loss
- Supports principled sequential monitoring
Outputs
- Metrics: n_variants, total_samples, best_variant, best_probability, best_expected_loss, CI level, n_simulations
- Tables: variant_summary, probability_analysis, pairwise_probabilities, uplift_analysis, decision_metrics
- Datasets: posterior_distributions, credible_intervals, simulation_samples, uplift_data
Setup
- Define the metric (binary conversion, revenue, or continuous KPI)
- Choose weakly informative priors (default) or encode prior knowledge
- Ensure randomization and guard against allocation or metric leakage
Key Outputs
- P(B > A) and posterior uplift distribution with credible intervals
- Expected loss for each decision (ship A, ship B, continue)
- ROPE (region of practical equivalence) to ignore negligible effects
- Optional P(best) across 3+ variants
Sequential Monitoring
- Bayesian updating supports principled peeking; stop when posterior criteria are met
- Define thresholds on P(B > A), expected loss, or ROPE inclusion
Communication
- Lead with posterior probability and credible interval of uplift
- Include expected loss under chosen decision to quantify risk
- Document priors and sensitivity to alternative prior choices