Did Your A/B Test
Actually Win?

Stop guessing. Upload your experiment data and get definitive statistical analysis with p-values, confidence intervals, effect sizes, and plain-English interpretation of whether your test is a winner.

No credit card required. Free for up to 100,000 data points.

Quick Significance Calculator

Enter your control and variant numbers to check statistical significance instantly.

A Control

B Variant

Control Rate
Variant Rate
Relative Lift
Z-Score
P-Value

Want deeper analysis? Upload your full dataset for Bayesian probability, effect sizes, and AI interpretation.

Analyze Your Full A/B Test Free →
p < 0.05
Statistical Significance
95%
Confidence Interval
+12.4%
Effect Size
94%
Probability to Win

What is A/B Testing?

A/B testing (also called split testing or bucket testing) is a method of comparing two versions of a webpage, email, app feature, or any other digital experience to determine which one performs better. By randomly showing different versions to different users and measuring their behavior, you can make data-driven decisions instead of relying on intuition.

The challenge is knowing when the difference you observe is real versus just random variation. That's where statistical analysis comes in. Without proper analysis, you might ship a change that doesn't actually improve anything (false positive) or abandon a winning variation too early (false negative).

Why Statistical Significance Matters

Every A/B test has natural variation. If you flip a fair coin 100 times, you won't always get exactly 50 heads. Similarly, if two identical landing pages show conversion rates of 4.8% and 5.2%, that difference might just be noise.

Statistical significance tells you the probability that your observed difference is real. A p-value of 0.05 means there's only a 5% chance the difference you see happened by random chance. The lower the p-value, the more confident you can be that your variant truly outperforms the control.

  • p < 0.05: Standard threshold - 95% confidence the result is real
  • p < 0.01: High confidence - 99% confidence for high-stakes decisions
  • p < 0.001: Very high confidence - often used in scientific research

Complete A/B Test Analysis

Everything you need to make confident decisions about your experiments

Statistical Significance

Know with confidence whether your results are real or random chance. Get p-values and significance levels explained in plain English.

Confidence Intervals

Understand the range of likely true effect sizes. See best-case and worst-case scenarios for your experiment's impact.

Effect Size & Lift

Measure the practical significance of your results. Statistical significance doesn't mean business significance—we show both.

Sample Size Validation

Know if your test has enough data to detect the effect you're looking for. Avoid underpowered experiments that waste time.

Bayesian Analysis

Get probability statements like "94% chance B beats A" that are more intuitive than p-values for decision making.

AI Recommendations

Get clear recommendations: "Ship variant B" or "Keep testing"—with the reasoning explained step by step.

How It Works

1

Upload Data

CSV file with variant assignment and outcomes

2

Select Metric

Conversions, revenue, or any measurable outcome

3

Run Analysis

Automated statistical tests with visualizations

4

Get Answer

Clear recommendation with full statistical backing

Compare A/B Testing Solutions

See how MCP Analytics stacks up against alternatives

Feature MCP Analytics Optimizely Google Optimize Manual (Excel)
Statistical Significance Yes Yes Yes Manual
Bayesian Analysis Yes Yes Yes No
AI Interpretation Yes No No No
Upload Your Own Data Yes No No Yes
Sample Size Calculator Yes Yes Limited No
Multi-Variant Testing Yes Yes Yes Complex
Confidence Intervals Yes Yes Limited Manual
Effect Size Analysis Yes Basic Basic No
Free Tier 100K rows No Sunset Yes
Pricing $20/mo $79+/mo Discontinued Free (time cost)

* Google Optimize was sunset in September 2023. Optimizely pricing varies by plan.

Trusted by Data-Driven Teams

Rigorous statistical methods you can rely on

99.9%
Calculation Accuracy
(validated against R & Python)
<2s
Analysis Time
(even for 1M+ rows)
50+
Statistical Methods
(frequentist & Bayesian)
10K+
Tests Analyzed
(by teams worldwide)

Our statistical engine uses the same methods published in peer-reviewed journals and used by leading tech companies. Every calculation is validated against industry-standard tools like R and scipy to ensure accuracy.

What You Can A/B Test

Analyze any type of experiment or split test

Conversion Rates

Landing pages, checkout flows, sign-up forms. Test which version converts more visitors into customers.

Email Campaigns

Subject lines, send times, content variations. Find what drives opens, clicks, and conversions.

Pricing Tests

Price points, discount strategies, bundle offers. Optimize for revenue, not just conversion rate.

Product Features

New features, UI changes, onboarding flows. Measure impact on engagement, retention, and satisfaction.

Ad Creatives

Headlines, images, CTAs, audiences. Optimize your ad spend by finding the best performing variations.

Any Metric

Revenue per user, time on site, support tickets, NPS scores. If you can measure it, you can A/B test it.

Frequently Asked Questions

Everything you need to know about A/B testing analysis

What is statistical significance in A/B testing? +

Statistical significance in A/B testing indicates whether the difference between your control and variant is likely due to actual performance differences rather than random chance.

A test is typically considered statistically significant when the p-value is below 0.05 (95% confidence level), meaning there's less than a 5% probability that the observed difference occurred by chance.

MCP Analytics calculates this automatically using t-tests for continuous metrics (like revenue) and z-tests for proportions (like conversion rates), and provides plain-English interpretation so you don't need a statistics degree to understand the results.

When should I stop an A/B test? +

You should stop an A/B test when you meet one of these criteria:

  • Statistical significance reached: You've hit p < 0.05 AND collected your pre-determined minimum sample size
  • Maximum duration reached: You've hit the maximum test duration you set beforehand (usually 2-4 weeks)
  • Sequential analysis allows early stopping: Methods like Bayesian sequential testing indicate you can stop with high confidence

Warning: Never stop a test just because it looks like a winner early on. This practice, called "peeking," dramatically inflates false positive rates. If you check results 10 times during a test at p=0.05, your actual false positive rate is closer to 30%.

MCP Analytics provides guidance on when your test has sufficient data to make a reliable decision.

How do I calculate the sample size needed for an A/B test? +

Sample size depends on four key factors:

  • Baseline conversion rate: Your current metric (e.g., 5% conversion rate)
  • Minimum detectable effect (MDE): The smallest improvement worth detecting (e.g., 10% relative lift)
  • Statistical power: Typically 80% - the probability of detecting a real effect
  • Significance level: Typically 95% (alpha = 0.05)

Example: To detect a 10% relative lift from a 5% baseline conversion rate with 80% power and 95% confidence, you need approximately 31,000 visitors per variant (62,000 total).

MCP Analytics includes a built-in sample size calculator that accounts for all these factors.

What is the difference between Bayesian and frequentist A/B testing? +

Frequentist A/B testing (traditional approach):

  • Uses p-values and confidence intervals
  • Answers: "Is there a statistically significant difference?"
  • Requires fixed sample size determined in advance
  • Can't be "peeked" at without inflating false positives

Bayesian A/B testing:

  • Calculates the probability that one variant beats another
  • Answers: "What's the probability B is better than A?" (e.g., 94%)
  • Allows continuous monitoring without peeking problems
  • Provides more intuitive results for business decisions

MCP Analytics provides both approaches so you can make the best decision for your situation. Bayesian is often preferred for its intuitive probability statements.

What is a good p-value for A/B testing? +

The standard threshold is p < 0.05, meaning you can be 95% confident the result isn't due to chance. However, the appropriate threshold depends on your situation:

  • p < 0.10 (90% confidence): Acceptable for low-risk changes or exploratory tests
  • p < 0.05 (95% confidence): Standard threshold for most business decisions
  • p < 0.01 (99% confidence): Recommended for high-stakes decisions like major redesigns or pricing changes

Important: Statistical significance alone doesn't guarantee practical significance. A test might show p=0.01 but only a 0.1% improvement - technically significant but not worth implementing. Always consider effect size and business impact alongside p-values.

How do I interpret confidence intervals in A/B testing? +

A 95% confidence interval shows the range where the true effect likely falls. For example, if your A/B test shows a lift of +15% with a 95% CI of [+8%, +22%], you can be confident the real improvement is between 8% and 22%.

Key interpretations:

  • If the CI doesn't include zero (or 1.0 for ratios), the result is statistically significant
  • The width indicates precision - narrower intervals mean more reliable estimates
  • For business decisions, focus on the lower bound as a conservative estimate

Example: If CI = [+2%, +25%], you can be confident you'll see at least a 2% improvement. If CI = [-5%, +10%], the result is not significant because the interval includes zero (no effect).

Can I run multiple A/B tests simultaneously? +

Yes, but with caution. Running multiple tests on the same users creates interaction effects that can skew results. Best practices include:

  • Mutually exclusive traffic: Allocate different user segments to different tests that might interact
  • Multiple comparison corrections: Apply Bonferroni or FDR corrections when testing many variants
  • Track experiment exposure: Know which users are in which experiments
  • Consider multivariate testing: Use MVT for related changes to measure interactions

MCP Analytics supports multi-variant testing with up to 10 variants and automatically adjusts for multiple comparisons to maintain valid statistical conclusions.

Stop Running Inconclusive Tests

Upload your experiment data and get definitive answers in under 2 minutes. Free to start, no credit card required.