Experiment Overview
Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| position_adjustment | true | position_adjustment |
| confidence_level | 0.95 | confidence_level |
| min_impressions | 10 | min_impressions |
| decision_threshold | 0.05 | decision_threshold |
Purpose
This analysis evaluates whether title changes improved click-through rates (CTR) across 7 pages, with statistical adjustment for average position shifts. The test compares a control group (12 initial observations) against a treatment group (7 initial observations), examining whether position-adjusted CTR improvements are statistically significant and practically meaningful.
Key Findings
- Average Adjusted CTR Lift: 0.27x across all pages—treatment variant shows 0.419 adjusted CTR versus 0.148 for control, representing a 183% relative improvement after position normalization
- Statistical Significance: Only 1 of 7 pages (14.3%) achieved statistical significance; the tutorial page showed a 0.72x adjusted lift with p-value of 0.01
- Position Improvement: Treatment pages ranked 2.2 positions higher on average (9.53 vs 11.76), suggesting title changes may have improved search visibility
- Inconclusive Results: 6 pages (85.7%) remain inconclusive despite positive adjusted lifts, indicating insufficient statistical power (median power = 0.165)
Interpretation
While the treatment variant demonstrates a substantial adjusted CTR improvement and consistent position gains, the analysis lacks statistical power to confirm most improvements are real rather than random variation. The single promoted page shows genuine significance, but the
Data preprocessing and column mapping
Purpose
This section documents the data filtering applied before statistical analysis of the A/B test results. The 63.2% removal rate reflects quality control measures necessary to ensure only valid page-level comparisons enter the analysis. Understanding retention is critical because it directly impacts the reliability of conclusions drawn about treatment effectiveness across the tested pages.
Key Findings
- Retention Rate: 36.8% (7 of 19 rows retained) - Indicates substantial filtering, likely removing pages with insufficient impressions or incomplete control/treatment pairs
- Rows Removed: 12 observations excluded from analysis - Suggests strict minimum impression thresholds (10 impressions noted in decision parameters) were enforced
- Final Dataset: 7 pages analyzed - Matches the "pages_tested" metric, confirming all retained rows represent valid page-level comparisons with both control and treatment variants
Interpretation
The aggressive filtering ensures statistical validity by excluding underpowered comparisons. With only 7 pages retained, the analysis focuses on pages meeting minimum data quality standards. This explains the low average statistical power (0.165) observed across tests—even retained pages have limited treatment impressions (mean: 238 vs. control: 1,362), creating inherent power constraints that directly contribute to the 85.7% inconclusive verdict rate.
Context
No train/test split is
Executive Summary
Executive Summary
| finding | value |
|---|---|
| Overall Verdict | 1 promote, 0 rollback, 0 neutral, 6 keep running |
| Pages Tested | 7 |
| Statistically Significant | 1 (14.3%) |
| Win Rate | 100.0% (1/1) |
| Biggest Winner | linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html (+1.44x) |
| Biggest Loser | support-vector-machine-svm-practical-guide-for-data-driven-decisions.html (-0.31x) |
| Average Adjusted CTR Lift | 0.27x |
| Inconclusive (Need More Data) | 6 (85.7%) |
Key Findings:
• 1 pages to PROMOTE - Significant positive CTR lift, deploy new titles
• 0 pages to ROLLBACK - Significant negative CTR lift, revert to old titles
• 6 pages INCONCLUSIVE - Need more data (extend experiment 2-4 weeks)
Recommendation: Promote winning titles immediately. Rollback losers. Extend inconclusive experiments. This title testing strategy appears to be working with a 100.0% win rate among significant results.
EXECUTIVE SUMMARY
Purpose
This section synthesizes the overall A/B testing results across 7 pages to assess whether the SEO title optimization strategy achieved its business objective. It provides decision-makers with a clear bottom-line assessment of test performance, statistical confidence, and readiness for deployment.
Key Findings
- Pages Tested: 7 total pages evaluated for title optimization impact
- Statistically Significant Results: 1 page (14.3%) showed conclusive evidence of improvement
- Win Rate Among Significant Pages: 100% (1 winner, 0 losers)
- Average Adjusted CTR Lift: 0.27x across all tested pages, indicating position-normalized performance gains
- Inconclusive Pages: 6 pages (85.7%) lack sufficient statistical power to draw conclusions
Interpretation
The experiment identified one clear winner—the product bundle affinity tutorial achieved a 72% adjusted CTR lift with p-value of 0.01. However, 86% of tested pages remain inconclusive due to low statistical power (average power: 0.165), suggesting insufficient sample sizes or effect sizes too small to detect reliably. The 100% win rate reflects only one significant result, limiting confidence in the broader strategy's effectiveness.
Context
The treatment group received substantially fewer impressions (1,
Position-Adjusted CTR Lift
Position-Adjusted CTR Lift per Page with 95% Confidence Intervals
Purpose
This forest plot visualizes position-adjusted CTR lift across 7 tested pages, isolating the treatment effect from natural position-based click variations. By adjusting for search position, the analysis reveals whether observed CTR changes reflect genuine content improvements or simply result from ranking shifts. This is critical for understanding whether the treatment causally improved user engagement.
Key Findings
- Average Adjusted CTR Lift: 0.27x across all pages—a modest positive signal, but heavily influenced by outliers
- Statistical Significance: Only 1 of 7 pages (14.3%) achieved significance; the Shopify tutorial showed a 0.72x lift with narrow confidence intervals
- Effect Size Range: Spans from -0.31x (SVM article) to +1.44x (LDA article), indicating highly variable treatment response across content types
- Confidence Interval Width: Most bars cross zero, reflecting insufficient sample sizes to distinguish signal from noise
Interpretation
The experiment reveals mixed results when controlling for position effects. While the average lift appears positive, 86% of pages remain inconclusive—their confidence intervals encompass zero, meaning observed differences could plausibly be due to chance. The single significant winner (Shopify tutorial) demonstrates the treatment can work, but most pages require additional data to confirm whether improvements are real or statistical artifacts.
Raw Performance Comparison
Side-by-Side Control vs Treatment Metrics (Before Position Adjustment)
| page | control_impressions | control_clicks | control_ctr | control_position | treatment_impressions | treatment_clicks | treatment_ctr | treatment_position | raw_ctr_lift | position_change |
|---|---|---|---|---|---|---|---|---|---|---|
| articles/arima-practical-guide-for-data-driven-decisions.html | 1303 | 2 | 0.0015 | 15.91 | 282 | 0 | 0 | 10.56 | -0.0015 | -5.35 |
| articles/association-rules-apriori-practical-guide-for-data-driven-decisions.html | 919 | 1 | 0.0011 | 12.73 | 291 | 1 | 0.0034 | 10.66 | 0.0023 | -2.07 |
| articles/linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html | 1082 | 2 | 0.0018 | 15.63 | 195 | 3 | 0.0154 | 11.61 | 0.0136 | -4.02 |
| articles/one-class-svm-practical-guide-for-data-driven-decisions.html | 1567 | 1 | 0.0006 | 9.22 | 422 | 1 | 0.0024 | 8.24 | 0.0018 | -0.98 |
| articles/session-based-recommendations-practical-guide-for-data-driven-decisions.html | 1639 | 0 | 0 | 6.21 | 262 | 0 | 0 | 8.27 | 0 | 2.06 |
| articles/support-vector-machine-svm-practical-guide-for-data-driven-decisions.html | 1331 | 3 | 0.0023 | 13.54 | 176 | 0 | 0 | 11.14 | -0.0023 | -2.4 |
| tutorials/how-to-use-product-bundle-affinity-analysis-in-shopify-step-by-step-tutorial.html | 1693 | 0 | 0 | 9.1 | 41 | 1 | 0.0244 | 6.26 | 0.0244 | -2.84 |
Purpose
This section presents raw, unadjusted performance metrics to establish a baseline comparison between control and treatment periods. It serves as the foundation for understanding why position-adjusted analysis is necessary—the treatment variant achieved a higher raw CTR (0.65% vs 0.10%), but this difference may be partially or entirely attributable to improved search ranking (avg_position improved by 2.23 positions) rather than title quality alone.
Key Findings
- Raw CTR Lift: Treatment shows 0.55 percentage points higher CTR (0.65% vs 0.10%), but this is confounded by ranking improvements
- Position Improvement: Treatment pages ranked 2.23 positions higher on average (9.53 vs 11.76), which naturally drives more clicks regardless of title effectiveness
- Impression Imbalance: Control received 5.7× more impressions (9,534 vs 1,669), reflecting the unequal traffic split during the test period
- Click Volume: Treatment generated 6 clicks from fewer impressions, while control generated 9 clicks from substantially more exposure
Interpretation
The raw metrics suggest treatment outperforms, but this comparison conflates two distinct factors: title quality and search ranking. Since treatment pages ranked higher, they received more favorable visibility, making it impossible to isolate whether improved CTR stems
Position vs CTR Analysis
Position vs CTR Scatter with Expected CTR Curve
Purpose
This scatter plot isolates the relationship between search position and click-through rate to determine whether the treatment's CTR improvements stem from better positioning or from title/content changes that drive clicks independent of position. By overlaying observed data against the expected CTR curve, it reveals whether the treatment variant is outperforming or underperforming industry benchmarks at its achieved positions.
Key Findings
- Position Improvement: Treatment variant achieved mean position of 9.53 vs. control's 11.76 (−2.23 position change), moving closer to top results where higher CTR is expected
- CTR vs. Expected: Treatment points cluster near or slightly above the expected CTR curve, suggesting performance aligns with position-based benchmarks rather than exceptional title/content quality
- Curve Alignment Pattern: Most pages show treatment points tracking parallel to the expected curve, indicating position changes drive CTR gains rather than independent title optimization
Interpretation
The treatment's +0.27x adjusted CTR lift appears largely attributable to improved search positioning rather than superior title or content relevance. With treatment pages ranking ~2.2 positions higher on average, they naturally capture more clicks according to industry expectations. Only the promoted tutorial page (0.72x lift) shows meaningful divergence from the curve, suggesting genuine title/content superiority beyond positional advantage.
Context
Low absolute
Position Adjustment Detail
Position Adjustment Calculation Breakdown
| page | variant | raw_ctr | position_val | expected_ctr | adjusted_ctr | adjustment_factor |
|---|---|---|---|---|---|---|
| articles/arima-practical-guide-for-data-driven-decisions.html | control | 0.0015 | 15.91 | 0.0059 | 0.2558 | 0.2558 |
| articles/association-rules-apriori-practical-guide-for-data-driven-decisions.html | control | 0.0011 | 12.73 | 0.0081 | 0.136 | 0.136 |
| articles/linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html | control | 0.0018 | 15.63 | 0.0061 | 0.2971 | 0.2971 |
| articles/one-class-svm-practical-guide-for-data-driven-decisions.html | control | 0.0006 | 9.22 | 0.015 | 0.04 | 0.04 |
| articles/session-based-recommendations-practical-guide-for-data-driven-decisions.html | control | 0 | 6.21 | 0.034 | 0 | 0 |
| articles/support-vector-machine-svm-practical-guide-for-data-driven-decisions.html | control | 0.0023 | 13.54 | 0.0075 | 0.3058 | 0.3058 |
| tutorials/how-to-use-product-bundle-affinity-analysis-in-shopify-step-by-step-tutorial.html | control | 0 | 9.1 | 0.015 | 0 | 0 |
| articles/arima-practical-guide-for-data-driven-decisions.html | treatment | 0 | 10.56 | 0.0096 | 0 | 0 |
| articles/association-rules-apriori-practical-guide-for-data-driven-decisions.html | treatment | 0.0034 | 10.66 | 0.0095 | 0.3565 | 0.3565 |
| articles/linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html | treatment | 0.0154 | 11.61 | 0.0089 | 1.736 | 1.736 |
| articles/one-class-svm-practical-guide-for-data-driven-decisions.html | treatment | 0.0024 | 8.24 | 0.02 | 0.12 | 0.12 |
| articles/session-based-recommendations-practical-guide-for-data-driven-decisions.html | treatment | 0 | 8.27 | 0.02 | 0 | 0 |
| articles/support-vector-machine-svm-practical-guide-for-data-driven-decisions.html | treatment | 0 | 11.14 | 0.0092 | 0 | 0 |
| tutorials/how-to-use-product-bundle-affinity-analysis-in-shopify-step-by-step-tutorial.html | treatment | 0.0244 | 6.26 | 0.034 | 0.7176 | 0.7176 |
Purpose
This section isolates the title/content quality impact from ranking position effects by normalizing actual CTR against industry benchmarks for each position. Since the treatment variant achieved better average positions (9.53 vs 11.76), adjusted CTR reveals whether improved clicks stem from better rankings alone or from genuinely stronger title performance. This is critical for evaluating whether the treatment represents a true quality improvement.
Key Findings
- Adjusted CTR Range: 0 to 1.74x across all observations, with mean of 0.28x—indicating most pages underperform position-adjusted benchmarks
- Treatment Outperformance: Treatment variant shows higher adjusted CTR (0.419x avg) versus control (0.148x avg), a 0.271x lift—suggesting title changes drive engagement beyond position gains
- Extreme Outlier: Linear Discriminant Analysis treatment achieves 1.74x adjusted CTR, dramatically exceeding position-based expectations
- Position Dependency: Expected CTR varies 0.01–0.03 based on position; treatment's improved ranking (−2.23 position change) would naturally boost raw CTR, but adjusted metrics show additional quality gains
Interpretation
The adjustment mechanism reveals that treatment pages don't merely benefit from ranking improvements—they demonstrate stronger intrinsic appeal relative to their positions. The
Statistical Summary
Significance Testing Results with P-Values and Power Analysis
| page | p_value | adjusted_p_value | ci_lower | ci_upper | power | is_significant | sample_size_adequate |
|---|---|---|---|---|---|---|---|
| articles/arima-practical-guide-for-data-driven-decisions.html | 1 | 1 | -0.0052 | 0.0021 | 0.1536 | False | True |
| articles/association-rules-apriori-practical-guide-for-data-driven-decisions.html | 0.9749 | 1 | -0.007 | 0.0117 | 0.0949 | False | True |
| articles/linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html | 0.0305 | 0.2137 | -0.007 | 0.034 | 0.3621 | False | True |
| articles/one-class-svm-practical-guide-for-data-driven-decisions.html | 0.8958 | 1 | -0.0046 | 0.008 | 0.1046 | False | True |
| articles/session-based-recommendations-practical-guide-for-data-driven-decisions.html | 1 | 1 | 0 | 0 | 0 | False | True |
| articles/support-vector-machine-svm-practical-guide-for-data-driven-decisions.html | 1 | 1 | -0.0071 | 0.0025 | 0.1447 | False | True |
| tutorials/how-to-use-product-bundle-affinity-analysis-in-shopify-step-by-step-tutorial.html | 0.0017 | 0.012 | -0.0353 | 0.0841 | 0.295 | True | True |
Purpose
This section evaluates whether observed treatment effects are statistically reliable or likely due to chance. Using two-proportion z-tests with Bonferroni correction, it controls for false positives when testing multiple pages simultaneously. Understanding statistical significance is critical for distinguishing genuine improvements from noise in the experiment.
Key Findings
- Significance Rate: 14.3% (1 of 7 pages) achieved statistical significance at the 0.05 threshold—only the Shopify tutorial showed a reliable effect
- Median P-Value: 0.975 indicates most pages show no meaningful difference between control and treatment variants
- Average Statistical Power: 0.165 (well below the 0.8 benchmark) reveals the experiment is severely underpowered to detect real effects
- Bonferroni Adjustment: Protects against false positives by raising the effective significance bar when testing multiple pages
Interpretation
The low power (16.5%) means the experiment lacks sufficient sample size to reliably detect treatment effects across most pages. Six of seven pages remain inconclusive—not because the treatment failed, but because the data volume is insufficient to distinguish signal from noise. The single significant result (Shopify tutorial, p=0.01) passed the corrected threshold, but the broader pattern suggests most observed differences are statistically indistinguishable from zero
Recommendations
Per-Page Recommendations with Estimated Click Uplift
| page | verdict | adjusted_ctr_lift | p_value | estimated_monthly_click_uplift |
|---|---|---|---|---|
| articles/arima-practical-guide-for-data-driven-decisions.html | keep_running | -0.2558 | 1 | 0 |
| articles/association-rules-apriori-practical-guide-for-data-driven-decisions.html | keep_running | 0.2205 | 1 | 0 |
| articles/linear-discriminant-analysis-lda-practical-guide-for-data-driven-decisions.html | keep_running | 1.438 | 0.2137 | 0 |
| articles/one-class-svm-practical-guide-for-data-driven-decisions.html | keep_running | 0.08 | 1 | 0 |
| articles/session-based-recommendations-practical-guide-for-data-driven-decisions.html | keep_running | 0 | 1 | 0 |
| articles/support-vector-machine-svm-practical-guide-for-data-driven-decisions.html | keep_running | -0.3058 | 1 | 0 |
| tutorials/how-to-use-product-bundle-affinity-analysis-in-shopify-step-by-step-tutorial.html | promote | 0.7176 | 0.012 | 30 |
Purpose
This section synthesizes per-page experiment results into clear verdicts based on statistical significance and effect size. It translates raw statistical findings into actionable categories—promote, rollback, neutral, or keep running—enabling stakeholders to make informed decisions about which title treatments should be deployed, reverted, or extended with additional data collection.
Key Findings
- Pages Winning: 1 page (14.3%) achieved statistical significance with positive effect—the Shopify bundle affinity tutorial with +0.72x adjusted CTR lift and p-value of 0.01
- Pages Inconclusive: 6 pages (85.7%) lack sufficient statistical power (mean power = 0.165) to draw definitive conclusions, despite showing mixed effect directions (range: -0.31x to +1.44x)
- Estimated Monthly Click Uplift: Only the winning page projects measurable impact (30 additional clicks/month); all others show zero uplift due to non-significance
- No Negative Outcomes: Zero pages met rollback criteria, indicating the treatment variants caused no statistically detectable harm
Interpretation
The experiment reveals a highly imbalanced power distribution: one clear winner emerged from statistical testing, but the majority of pages remain underpowered. The wide range of adjusted CTR lifts (-0.31x to +1