Did Your A/B Test
Actually Win?
Stop guessing. Upload your experiment data and get definitive statistical analysis with p-values, confidence intervals, effect sizes, and plain-English interpretation of whether your test is a winner.
No credit card required. Free for up to 100,000 data points.
Quick Significance Calculator
Enter your control and variant numbers to check statistical significance instantly.
A Control
B Variant
Want deeper analysis? Upload your full dataset for Bayesian probability, effect sizes, and AI interpretation.
Analyze Your Full A/B Test Free →What is A/B Testing?
A/B testing (also called split testing or bucket testing) is a method of comparing two versions of a webpage, email, app feature, or any other digital experience to determine which one performs better. By randomly showing different versions to different users and measuring their behavior, you can make data-driven decisions instead of relying on intuition.
The challenge is knowing when the difference you observe is real versus just random variation. That's where statistical analysis comes in. Without proper analysis, you might ship a change that doesn't actually improve anything (false positive) or abandon a winning variation too early (false negative).
Why Statistical Significance Matters
Every A/B test has natural variation. If you flip a fair coin 100 times, you won't always get exactly 50 heads. Similarly, if two identical landing pages show conversion rates of 4.8% and 5.2%, that difference might just be noise.
Statistical significance tells you the probability that your observed difference is real. A p-value of 0.05 means there's only a 5% chance the difference you see happened by random chance. The lower the p-value, the more confident you can be that your variant truly outperforms the control.
- p < 0.05: Standard threshold - 95% confidence the result is real
- p < 0.01: High confidence - 99% confidence for high-stakes decisions
- p < 0.001: Very high confidence - often used in scientific research
Complete A/B Test Analysis
Everything you need to make confident decisions about your experiments
Statistical Significance
Know with confidence whether your results are real or random chance. Get p-values and significance levels explained in plain English.
Confidence Intervals
Understand the range of likely true effect sizes. See best-case and worst-case scenarios for your experiment's impact.
Effect Size & Lift
Measure the practical significance of your results. Statistical significance doesn't mean business significance—we show both.
Sample Size Validation
Know if your test has enough data to detect the effect you're looking for. Avoid underpowered experiments that waste time.
Bayesian Analysis
Get probability statements like "94% chance B beats A" that are more intuitive than p-values for decision making.
AI Recommendations
Get clear recommendations: "Ship variant B" or "Keep testing"—with the reasoning explained step by step.
How It Works
Upload Data
CSV file with variant assignment and outcomes
Select Metric
Conversions, revenue, or any measurable outcome
Run Analysis
Automated statistical tests with visualizations
Get Answer
Clear recommendation with full statistical backing
Compare A/B Testing Solutions
See how MCP Analytics stacks up against alternatives
| Feature | MCP Analytics | Optimizely | Google Optimize | Manual (Excel) |
|---|---|---|---|---|
| Statistical Significance | Yes | Yes | Yes | Manual |
| Bayesian Analysis | Yes | Yes | Yes | No |
| AI Interpretation | Yes | No | No | No |
| Upload Your Own Data | Yes | No | No | Yes |
| Sample Size Calculator | Yes | Yes | Limited | No |
| Multi-Variant Testing | Yes | Yes | Yes | Complex |
| Confidence Intervals | Yes | Yes | Limited | Manual |
| Effect Size Analysis | Yes | Basic | Basic | No |
| Free Tier | 100K rows | No | Sunset | Yes |
| Pricing | $20/mo | $79+/mo | Discontinued | Free (time cost) |
* Google Optimize was sunset in September 2023. Optimizely pricing varies by plan.
Trusted by Data-Driven Teams
Rigorous statistical methods you can rely on
(validated against R & Python)
(even for 1M+ rows)
(frequentist & Bayesian)
(by teams worldwide)
Our statistical engine uses the same methods published in peer-reviewed journals and used by leading tech companies. Every calculation is validated against industry-standard tools like R and scipy to ensure accuracy.
What You Can A/B Test
Analyze any type of experiment or split test
Conversion Rates
Landing pages, checkout flows, sign-up forms. Test which version converts more visitors into customers.
Email Campaigns
Subject lines, send times, content variations. Find what drives opens, clicks, and conversions.
Pricing Tests
Price points, discount strategies, bundle offers. Optimize for revenue, not just conversion rate.
Product Features
New features, UI changes, onboarding flows. Measure impact on engagement, retention, and satisfaction.
Ad Creatives
Headlines, images, CTAs, audiences. Optimize your ad spend by finding the best performing variations.
Any Metric
Revenue per user, time on site, support tickets, NPS scores. If you can measure it, you can A/B test it.
Frequently Asked Questions
Everything you need to know about A/B testing analysis
Statistical significance in A/B testing indicates whether the difference between your control and variant is likely due to actual performance differences rather than random chance.
A test is typically considered statistically significant when the p-value is below 0.05 (95% confidence level), meaning there's less than a 5% probability that the observed difference occurred by chance.
MCP Analytics calculates this automatically using t-tests for continuous metrics (like revenue) and z-tests for proportions (like conversion rates), and provides plain-English interpretation so you don't need a statistics degree to understand the results.
You should stop an A/B test when you meet one of these criteria:
- Statistical significance reached: You've hit p < 0.05 AND collected your pre-determined minimum sample size
- Maximum duration reached: You've hit the maximum test duration you set beforehand (usually 2-4 weeks)
- Sequential analysis allows early stopping: Methods like Bayesian sequential testing indicate you can stop with high confidence
Warning: Never stop a test just because it looks like a winner early on. This practice, called "peeking," dramatically inflates false positive rates. If you check results 10 times during a test at p=0.05, your actual false positive rate is closer to 30%.
MCP Analytics provides guidance on when your test has sufficient data to make a reliable decision.
Sample size depends on four key factors:
- Baseline conversion rate: Your current metric (e.g., 5% conversion rate)
- Minimum detectable effect (MDE): The smallest improvement worth detecting (e.g., 10% relative lift)
- Statistical power: Typically 80% - the probability of detecting a real effect
- Significance level: Typically 95% (alpha = 0.05)
Example: To detect a 10% relative lift from a 5% baseline conversion rate with 80% power and 95% confidence, you need approximately 31,000 visitors per variant (62,000 total).
MCP Analytics includes a built-in sample size calculator that accounts for all these factors.
Frequentist A/B testing (traditional approach):
- Uses p-values and confidence intervals
- Answers: "Is there a statistically significant difference?"
- Requires fixed sample size determined in advance
- Can't be "peeked" at without inflating false positives
Bayesian A/B testing:
- Calculates the probability that one variant beats another
- Answers: "What's the probability B is better than A?" (e.g., 94%)
- Allows continuous monitoring without peeking problems
- Provides more intuitive results for business decisions
MCP Analytics provides both approaches so you can make the best decision for your situation. Bayesian is often preferred for its intuitive probability statements.
The standard threshold is p < 0.05, meaning you can be 95% confident the result isn't due to chance. However, the appropriate threshold depends on your situation:
- p < 0.10 (90% confidence): Acceptable for low-risk changes or exploratory tests
- p < 0.05 (95% confidence): Standard threshold for most business decisions
- p < 0.01 (99% confidence): Recommended for high-stakes decisions like major redesigns or pricing changes
Important: Statistical significance alone doesn't guarantee practical significance. A test might show p=0.01 but only a 0.1% improvement - technically significant but not worth implementing. Always consider effect size and business impact alongside p-values.
A 95% confidence interval shows the range where the true effect likely falls. For example, if your A/B test shows a lift of +15% with a 95% CI of [+8%, +22%], you can be confident the real improvement is between 8% and 22%.
Key interpretations:
- If the CI doesn't include zero (or 1.0 for ratios), the result is statistically significant
- The width indicates precision - narrower intervals mean more reliable estimates
- For business decisions, focus on the lower bound as a conservative estimate
Example: If CI = [+2%, +25%], you can be confident you'll see at least a 2% improvement. If CI = [-5%, +10%], the result is not significant because the interval includes zero (no effect).
Yes, but with caution. Running multiple tests on the same users creates interaction effects that can skew results. Best practices include:
- Mutually exclusive traffic: Allocate different user segments to different tests that might interact
- Multiple comparison corrections: Apply Bonferroni or FDR corrections when testing many variants
- Track experiment exposure: Know which users are in which experiments
- Consider multivariate testing: Use MVT for related changes to measure interactions
MCP Analytics supports multi-variant testing with up to 10 variants and automatically adjusts for multiple comparisons to maintain valid statistical conclusions.
Related Resources
Deep-dive guides to help you run better experiments
A/B Testing Statistical Significance Made Simple
From t-tests to advanced experimental design - a comprehensive guide to running statistically sound A/B tests.
T-Test Practical Guide for Data-Driven Decisions
When and how to use t-tests for comparing means between groups in your experiments.
Chi-Square Test for Conversion Rates
The right statistical test for comparing conversion rates and categorical outcomes in A/B tests.
Bayesian Methods for A/B Testing
Move beyond p-values with Bayesian analysis for more intuitive probability statements.
Bonferroni Correction for Multiple Tests
How to maintain valid statistics when running multiple tests or comparing multiple variants.
Cohort Analysis for Experiment Follow-Up
Track how your A/B test winners perform over time with cohort-based analysis.
Stop Running Inconclusive Tests
Upload your experiment data and get definitive answers in under 2 minutes. Free to start, no credit card required.