Overview

Analysis Overview

Holm-Bonferroni Multiple Comparison Correction

Analysis overview and configuration

Configuration

Analysis TypeHolm Bonferroni
CompanyEducation Research Institute
ObjectiveApply Holm-Bonferroni and other multiple comparison corrections to identify which hypothesis tests remain significant after controlling the test_group-wise error rate
Analysis Date2026-03-05
Processing Idtest_1772775794
Total Observations84

Module Parameters

ParameterValue_row
significance_level0.05significance_level
primary_methodholmprimary_method
comparison_methodsbonferroni,BH,hochbergcomparison_methods
min_effect_size0.2min_effect_size
Holm Bonferroni analysis for Education Research Institute

Interpretation

Purpose

This analysis applies multiple comparison correction methods to 84 educational hypothesis tests comparing student performance across demographic groups (lunch type, test prep, parental education, race/ethnicity). The objective is to control family-wise error rate (FWER) and identify which differences remain statistically significant after accounting for the inflated Type I error risk inherent in conducting 84 simultaneous tests.

Key Findings

  • Uncorrected FWER: 0.987 – Without correction, there is a 98.7% probability of at least one false positive across all 84 tests, demonstrating critical need for adjustment
  • Holm-Bonferroni Rejections: 36 significant findings – Holm method retains 36 rejections while controlling FWER at α=0.05
  • Bonferroni Rejections: 29 findings – More conservative; Holm gains 24% additional power over Bonferroni
  • BH (FDR) Rejections: 56 findings – Most liberal method, controlling false discovery rate rather than FWER
  • Practical Significance Alignment: 36 tests flagged as practically significant (effect size ≥0.2), perfectly matching Holm rejections, indicating strong concordance between statistical and practical significance

###

Data Preparation

Data Preprocessing

P-value Validation & Quality Checks

Data preprocessing and column mapping

Data Quality

Initial Rows84
Final Rows84
Rows Removed0
Retention Rate100

Data Quality

MetricValue
Initial Rows84
Final Rows84
Rows Removed0
Retention Rate100%
Processed 84 observations, retained 84 (100.0%) after cleaning

Interpretation

Purpose

This section documents the data preprocessing pipeline for 84 hypothesis tests undergoing multiple comparison corrections. Perfect data retention (100%) indicates no observations were excluded during cleaning, which is critical for maintaining the integrity of family-wise error rate (FWER) control calculations that depend on the complete test set.

Key Findings

  • Initial & Final Rows: 84 observations retained across both stages—no data loss occurred during preprocessing
  • Retention Rate: 100% indicates complete dataset integrity with zero exclusions
  • Rows Removed: 0 deletions suggests either pristine input data or that no quality thresholds triggered removal criteria
  • Train/Test Split: Not applicable, as this is a multiple comparison correction analysis rather than a predictive modeling task

Interpretation

The perfect retention rate ensures that all 84 tests remain eligible for correction methods (Holm, Bonferroni, BH, Hochberg). This is essential because FWER calculations depend on the complete family of tests; removing even one test would alter the stepdown thresholds and rejection decisions. The absence of preprocessing transformations preserves the raw p-values and effect sizes needed for accurate multiple comparison control.

Context

No train/test split is applicable here since the objective is statistical inference across a fixed test family, not predictive generalization. The analysis assumes all 84 tests

Executive Summary

Executive Summary

Key Findings & Recommendations

Key Metrics

total_tests
84
holm_rejections
36
bonferroni_rejections
29
uncorrected_fwer
98.7%
power_gain
24.1%

Key Findings

findingvalueinterpretation
Total Tests84Number of hypothesis tests corrected
Holm Rejections36Tests significant after Holm correction
Bonferroni Rejections29Tests significant after Bonferroni (baseline)
BH Rejections56Tests significant under FDR control
Uncorrected FWER98.7%Probability of false positive without correction
Holm Power Gain vs Bonferroni24.1%Additional true positives gained by Holm over Bonferroni
Marginal TestWriting: associates degree vs Some high schoolLast test rejected by Holm procedure

Summary

Bottom Line: Applied Holm-Bonferroni correction to 84 hypothesis tests at α=0.05. Without correction, FWER would be 98.7% (almost certain false positives). Holm rejected 36 tests while controlling FWER at 5%.

Key Findings:
• Holm rejected 7 additional tests vs Bonferroni (24.1% power gain)
• BH (FDR control) rejected 56 tests (less conservative than FWER methods)
• Marginal test: Writing: associates degree vs Some high school — the last test where Holm stopped rejecting

Recommendation: Focus on the 36 Holm-significant tests for follow-up. These results are statistically robust with test_group-wise error rate controlled at 5%.

Interpretation

EXECUTIVE SUMMARY: MULTIPLE COMPARISON CORRECTION ANALYSIS

Purpose

This analysis applied four statistical correction methods to 84 hypothesis tests comparing student performance across demographic groups (lunch type, test prep, race/ethnicity, parental education). The objective was to identify which findings remain statistically significant after controlling for multiple testing inflation, which would otherwise produce a 98.7% probability of at least one false positive.

Key Findings

  • Uncorrected FWER: 98.7% — Without correction, the analysis would almost certainly report false positives across the 84 tests
  • Holm-Bonferroni Rejections: 36 tests — Maintains family-wise error rate at 5% while rejecting 7 more tests than strict Bonferroni (24.1% power gain)
  • Bonferroni Rejections: 29 tests — Most conservative FWER method; rejects only the strongest signals
  • BH (FDR) Rejections: 56 tests — Less stringent control; allows up to 5% false discovery rate rather than protecting against any false positive
  • Effect Size Alignment: 36 tests with practically significant effects (≥0.2) match Holm rejections, indicating statistical and practical significance overlap

Interpretation

The analysis successfully

Figure 4

Holm Step-Down Procedure

Sequential Rejection with Decreasing Thresholds

Holm step-down sequential rejection procedure with decreasing thresholds

Interpretation

Purpose

The Holm step-down procedure controls family-wise error rate (FWER) by sequentially testing hypotheses in order of increasing p-value, with thresholds that become progressively more lenient. This section demonstrates how Holm allocates the significance budget across 84 tests while maintaining strict control over false positives—critical for identifying genuinely significant effects amid multiple comparisons.

Key Findings

  • Holm Rejections: 36 of 84 tests rejected—57% retention rate reflects conservative FWER control
  • Threshold Range: 0 to 0.05, decreasing as α/(m-i+1), creating a staircase pattern that tightens early tests most severely
  • Marginal Test: Writing: associates degree vs Some high school (p=0.0009) marks the rejection boundary; tests beyond this fail their thresholds
  • Raw p-value Distribution: Mean=0.09, median=0, indicating strong clustering of significant results at the lower tail

Interpretation

Holm's sequential rejection strategy is more powerful than Bonferroni (24% power gain) because it relaxes thresholds for weaker tests while maintaining FWER at α=0.05. The 36 rejections represent findings robust to multiple comparison correction. The marginal test at

Figure 5

Method Comparison: Adjusted P-values

Raw vs Adjusted P-values Across Correction Methods

Raw vs adjusted p-values across all correction methods

Interpretation

Purpose

This section compares how four multiple-comparison correction methods adjust raw p-values to control error rates across 84 simultaneous hypothesis tests. It demonstrates the trade-off between statistical conservatism (fewer false positives) and statistical power (ability to detect true effects). Understanding these differences is critical for determining which findings remain credible after correcting for multiple testing.

Key Findings

  • Holm vs. Bonferroni: Holm rejected 36 tests versus Bonferroni's 29—a 24.1% power gain while maintaining identical FWER control, demonstrating Holm's superiority as a step-down procedure.
  • BH (FDR) Method: Rejected 56 tests, the most permissive approach, because it controls False Discovery Rate rather than Family-Wise Error Rate, allowing more discoveries at the cost of accepting some false positives.
  • Hochberg Performance: Matched Holm's 36 rejections despite weaker assumptions, suggesting strong positive dependence among test statistics in this dataset.
  • Uncorrected FWER: At 0.987, the probability of at least one false positive without correction is nearly certain across 84 tests.

Interpretation

The analysis reveals a clear hierarchy: Bonferroni is most conservative, Holm and

Figure 6

Decision Matrix

Which Tests Survive Under Which Methods

Which tests survive under which correction methods

Interpretation

Purpose

This section visualizes which of the 84 hypothesis tests remain statistically significant under four different multiple comparison correction methods. It reveals the power-conservativeness tradeoff: stricter methods (Bonferroni, Holm) reject fewer tests but provide stronger Type I error control, while less conservative methods (BH, Hochberg) detect more signals by controlling different error rates. Understanding method agreement identifies robust findings versus marginal discoveries.

Key Findings

  • Bonferroni Rejections: 29 tests (35%) — most conservative FWER control; highest false negative risk
  • Holm Rejections: 36 tests (43%) — 24% power gain over Bonferroni while maintaining FWER control
  • BH Rejections: 56 tests (67%) — most liberal; controls FDR rather than FWER, detecting 93% more signals
  • Hochberg Rejections: 36 tests (43%) — matches Holm; stepdown improvement over Bonferroni
  • Perfect Agreement: First 5 tests rejected by all methods; last 5 retained by all methods — indicating clear signal/noise separation at distribution extremes

Interpretation

The decision matrix demonstrates that 36 tests form a robust consensus across FWER-controlling methods

Figure 7

Method Comparison: Rejection Counts

Number of Significant Tests Per Method

Number of rejections per correction method

Interpretation

Purpose

This section compares the statistical power and rejection rates across four multiple comparison correction methods applied to the same 84 hypothesis tests. It demonstrates the trade-off between controlling false positives (Type I error) and detecting true effects (statistical power), which is central to choosing an appropriate correction strategy for the analysis objective.

Key Findings

  • Benjamini-Hochberg (BH) Rejections: 56 tests (66.7%) — highest rejection rate with 1.93× power relative to Bonferroni, controls False Discovery Rate rather than Family-Wise Error Rate
  • Holm & Hochberg Rejections: 36 tests each (42.9%) — equivalent rejection counts despite different algorithmic approaches; Holm uses step-down logic while Hochberg uses step-up
  • Bonferroni Rejections: 29 tests (34.5%) — most conservative method with lowest power; provides strongest control but sacrifices detection ability
  • Power Differential: 27-test gap between BH (most powerful) and Bonferroni (least powerful) reflects fundamental difference between FDR and FWER control philosophies

Interpretation

The 93% increase in rejections from Bonferroni (29) to BH (56) illustrates why method selection depends on study context.

Figure 8

Effect Size Analysis

Practical vs Statistical Significance

Practical significance alongside statistical significance

Interpretation

Purpose

This section validates whether statistically significant findings (after Holm-Bonferroni correction) also represent meaningful, real-world differences. Statistical significance alone can be misleading with large sample sizes; effect size analysis ensures that rejected hypotheses reflect substantive rather than trivial differences. This bridges the gap between p-values and practical importance.

Key Findings

  • Practical Significance Alignment: 36 of 84 tests (43%) are both statistically significant (Holm-corrected) AND practically significant (Cohen's d ≥ 0.20), indicating robust, meaningful effects.
  • Mean Effect Size: 0.35 (medium effect), with median 0.34 and range 0.02–0.94, showing considerable variation in effect magnitudes across comparisons.
  • Confidence Interval Coverage: 69% of tests flagged as "Practically significant" have confidence intervals excluding zero, strengthening evidence of real differences.
  • Small Effect Concern: 26 tests show small effects (d < 0.20), suggesting some Holm-rejected findings may be statistically robust but practically negligible due to large sample sizes.

Interpretation

The analysis reveals a healthy concordance between statistical and practical significance for the majority of findings. The 36 practically significant results represent genuine, meaningful differences in student outcomes across

Figure 9

FWER Accumulation Curve

Why Correction Is Needed

Why correction is needed: FWER accumulation without correction

Interpretation

Purpose

This section quantifies the multiple testing problem: conducting 84 independent hypothesis tests at α=0.05 without correction inflates the probability of at least one false positive to 98.7%. This demonstrates why statistical correction methods are essential and motivates the application of Holm-Bonferroni and alternative procedures in the overall analysis.

Key Findings

  • Uncorrected FWER: 98.7% — nearly certain to observe at least one false positive by chance alone
  • Mathematical Driver: FWER = 1 − (0.95)^84 = 0.987, showing exponential accumulation of error across tests
  • Critical Threshold: Even 5 tests yield 23% FWER; by 80+ tests, false positive risk approaches certainty
  • Correction Necessity: Holm-Bonferroni reduces FWER back to the nominal 5% level through adjusted p-value thresholds

Interpretation

Without correction, the 84 comparisons across Math, Reading, and Writing assessments would produce a 98.7% probability of spurious significance—rendering raw p-values unreliable for decision-making. The FWER curve illustrates that error accumulation is non-linear: the first 10 tests contribute modest inflation, but by test

Table 10

Complete Results Table

All P-values and Decisions

Complete numerical results with all p-values and decisions

test_labelraw_pvaluecohens_dn_per_groupfamilystatistical_testholm_adjusted_pholm_rejectbonferroni_adjusted_pbonferroni_rejectBH_adjusted_pBH_rejecthochberg_adjusted_phochberg_reject
Math: free/reduced vs standard00.7823355Math by Lunch Typewelch_t_test0True0True0True0True
Reading: free/reduced vs standard00.4924355Reading by Lunch Typewelch_t_test0True0True0True0True
Writing: free/reduced vs standard00.5293355Writing by Lunch Typewelch_t_test0True0True0True0True
Reading: completed vs none00.5192358Reading by Test Prepwelch_t_test0True0True0True0True
Writing: completed vs none00.6866358Writing by Test Prepwelch_t_test0True0True0True0True
Reading: female vs male00.5037482Reading by Genderwelch_t_test0True0True0True0True
Writing: female vs male00.6316482Writing by Genderwelch_t_test0True0True0True0True
Writing: bachelors degree vs high school2.00e-100.7629118Writing by Parental Educationwelch_t_test1.46e-08True1.68e-08True1.40e-09True1.46e-08True
Writing: high school vs masters degree9.00e-100.944659Writing by Parental Educationwelch_t_test6.48e-08True7.56e-08True5.82e-09True6.48e-08True
Math: Group C vs Group E1.90e-090.6212140Math by Race/Ethnicitywelch_t_test1.35e-07True1.60e-07True1.14e-08True1.35e-07True
Math: Group B vs Group E5.00e-090.6691140Math by Race/Ethnicitywelch_t_test3.50e-07True4.20e-07True2.80e-08True3.50e-07True
Math: Group A vs Group E1.08e-080.804889Math by Race/Ethnicitywelch_t_test7.45e-07True9.07e-07True5.67e-08True7.45e-07True
Math: completed vs none1.54e-080.3763358Math by Test Prepwelch_t_test1.05e-06True1.29e-06True7.61e-08True1.05e-06True
Math: female vs male9.12e-080.3407482Math by Genderwelch_t_test6.11e-06True7.66e-06True4.26e-07True6.11e-06True
Writing: associates degree vs high school1.46e-070.5242196Writing by Parental Educationwelch_t_test9.67e-06True0True6.48e-07True9.67e-06True
Reading: high school vs masters degree6.26e-070.759359Reading by Parental Educationwelch_t_test0True1.00e-04True2.63e-06True0True
Reading: bachelors degree vs high school8.80e-070.5846118Reading by Parental Educationwelch_t_test1.00e-04True1.00e-04True3.52e-06True1.00e-04True
Writing: masters degree vs Some high school4.28e-060.706759Writing by Parental Educationwelch_t_test3.00e-04True4.00e-04True0True3.00e-04True
Writing: bachelors degree vs Some high school4.63e-060.5535118Writing by Parental Educationwelch_t_test3.00e-04True4.00e-04True0True3.00e-04True
Reading: associates degree vs high school7.44e-060.4448196Reading by Parental Educationwelch_t_test5.00e-04True6.00e-04True0True5.00e-04True
Writing: high school vs Some college9.28e-060.4381196Writing by Parental Educationwelch_t_test6.00e-04True8.00e-04True0True6.00e-04True
Math: Group D vs Group E00.4483140Math by Race/Ethnicitywelch_t_test0True0True0True0True
Math: bachelors degree vs high school00.4936118Math by Parental Educationwelch_t_test0True0True0True0True
Writing: Group A vs Group E00.572689Writing by Race/Ethnicitywelch_t_test0True0True0True0True
Writing: Group A vs Group D00.509989Writing by Race/Ethnicitywelch_t_test0True0True0True0True
Reading: Group A vs Group E1.00e-040.551989Reading by Race/Ethnicitywelch_t_test0.0059True0.0084True3.00e-04True0.0058True
Math: associates degree vs high school1.00e-040.387196Math by Parental Educationwelch_t_test0.0059True0.0084True3.00e-04True0.0058True
Reading: masters degree vs Some high school2.00e-040.559459Reading by Parental Educationwelch_t_test0.0114True0.0168True6.00e-04True0.0114True
Math: high school vs Some college4.00e-040.3461196Math by Parental Educationwelch_t_test0.0224True0.0336True0.0012True0.0224True
Math: high school vs masters degree6.00e-040.518259Math by Parental Educationwelch_t_test0.033True0.0504False0.0016True0.0324True
Reading: high school vs Some college6.00e-040.3375196Reading by Parental Educationwelch_t_test0.033True0.0504False0.0016True0.0324True
Reading: bachelors degree vs Some high school8.00e-040.4036118Reading by Parental Educationwelch_t_test0.0424True0.0672False0.002True0.0408True
Reading: Group B vs Group E8.00e-040.3771140Reading by Race/Ethnicitywelch_t_test0.0424True0.0672False0.002True0.0408True
Writing: Group B vs Group E8.00e-040.3768140Writing by Race/Ethnicitywelch_t_test0.0424True0.0672False0.002True0.0408True
Math: Group A vs Group D9.00e-040.410689Math by Race/Ethnicitywelch_t_test0.045True0.0756False0.0021True0.0441True
Writing: associates degree vs Some high school9.00e-040.3347179Writing by Parental Educationwelch_t_test0.045True0.0756False0.0021True0.0441True
Writing: Group B vs Group D0.00150.3049190Writing by Race/Ethnicitywelch_t_test0.072False0.126False0.0033True0.0705False
Math: bachelors degree vs Some high school0.00150.3791118Math by Parental Educationwelch_t_test0.072False0.126False0.0033True0.0705False
Writing: masters degree vs Some college0.00170.463359Writing by Parental Educationwelch_t_test0.0782False0.1428False0.0037True0.0782False
Reading: Group A vs Group D0.00250.373889Reading by Race/Ethnicitywelch_t_test0.1125False0.21False0.0052True0.1125False
Reading: masters degree vs Some college0.00420.422359Reading by Parental Educationwelch_t_test0.1848False0.3528False0.0086True0.1848False
Writing: Group A vs Group C0.00460.341589Writing by Race/Ethnicitywelch_t_test0.1978False0.3864False0.0092True0.1978False
Math: Group B vs Group D0.00490.2695190Math by Race/Ethnicitywelch_t_test0.2058False0.4116False0.0095True0.205False
Math: associates degree vs Some high school0.0050.2833179Math by Parental Educationwelch_t_test0.2058False0.42False0.0095True0.205False
Writing: associates degree vs masters degree0.00580.407459Writing by Parental Educationwelch_t_test0.232False0.4872False0.0108True0.232False
Reading: associates degree vs Some high school0.00680.2731179Reading by Parental Educationwelch_t_test0.2652False0.5712False0.0123True0.2622False
Reading: Group C vs Group E0.00690.2751140Reading by Race/Ethnicitywelch_t_test0.2652False0.5796False0.0123True0.2622False
Writing: bachelors degree vs Some college0.00770.3044118Writing by Parental Educationwelch_t_test0.2849False0.6468False0.0135True0.2849False
Math: masters degree vs Some high school0.00870.39759Math by Parental Educationwelch_t_test0.3132False0.7308False0.0149True0.3132False
Writing: Some college vs Some high school0.01040.2577179Writing by Parental Educationwelch_t_test0.364False0.8736False0.0171True0.3536False
Reading: Group A vs Group C0.01040.308789Reading by Race/Ethnicitywelch_t_test0.364False0.8736False0.0171True0.3536False
Math: Group C vs Group D0.01590.2017262Math by Race/Ethnicitywelch_t_test0.5247False1False0.0257True0.5216False
Math: Some college vs Some high school0.01630.2413179Math by Parental Educationwelch_t_test0.5247False1False0.0258True0.5216False
Writing: Group C vs Group E0.01920.2383140Writing by Race/Ethnicitywelch_t_test0.5952False1False0.0299True0.5952False
Reading: bachelors degree vs Some college0.02810.2504118Reading by Parental Educationwelch_t_test0.843False1False0.0429True0.843False
Reading: associates degree vs masters degree0.02930.320959Reading by Parental Educationwelch_t_test0.8497False1False0.044True0.8497False
Writing: associates degree vs bachelors degree0.03510.2411118Writing by Parental Educationwelch_t_test0.9828False1False0.0517False0.882False
Reading: Group D vs Group E0.0450.2105140Reading by Race/Ethnicitywelch_t_test1False1False0.0652False0.882False
Reading: Group B vs Group D0.05240.1854190Reading by Race/Ethnicitywelch_t_test1False1False0.0746False0.882False
Writing: Group C vs Group D0.05930.1576262Writing by Race/Ethnicitywelch_t_test1False1False0.083False0.882False
Reading: Some college vs Some high school0.08730.1715179Reading by Parental Educationwelch_t_test1False1False0.1202False0.882False
Math: Group A vs Group C0.11040.191889Math by Race/Ethnicitywelch_t_test1False1False0.148False0.882False
Writing: Group B vs Group C0.1110.1463190Writing by Race/Ethnicitywelch_t_test1False1False0.148False0.882False
Writing: high school vs Some high school0.11410.1638179Writing by Parental Educationwelch_t_test1False1False0.1498False0.882False
Writing: Group A vs Group B0.14480.187889Writing by Race/Ethnicitywelch_t_test1False1False0.1843False0.882False
Reading: high school vs Some high school0.14480.1511179Reading by Parental Educationwelch_t_test1False1False0.1843False0.882False
Math: bachelors degree vs Some college0.17150.1556118Math by Parental Educationwelch_t_test1False1False0.2148False0.882False
Reading: Group A vs Group B0.17390.175189Reading by Race/Ethnicitywelch_t_test1False1False0.2148False0.882False
Reading: Group B vs Group C0.18670.1212190Reading by Race/Ethnicitywelch_t_test1False1False0.2273False0.882False
Reading: associates degree vs bachelors degree0.19520.1479118Reading by Parental Educationwelch_t_test1False1False0.2342False0.882False
Math: masters degree vs Some college0.21760.180659Math by Parental Educationwelch_t_test1False1False0.2574False0.882False
Reading: associates degree vs Some college0.26660.1051222Reading by Parental Educationwelch_t_test1False1False0.311False0.882False
Reading: bachelors degree vs masters degree0.29330.168159Reading by Parental Educationwelch_t_test1False1False0.3375False0.882False
Writing: bachelors degree vs masters degree0.31880.159459Writing by Parental Educationwelch_t_test1False1False0.3619False0.882False
Math: Group A vs Group B0.35030.120289Math by Race/Ethnicitywelch_t_test1False1False0.3923False0.882False
Math: associates degree vs bachelors degree0.38020.1001118Math by Parental Educationwelch_t_test1False1False0.4202False0.882False
Math: high school vs Some high school0.38810.0893179Math by Parental Educationwelch_t_test1False1False0.4234False0.882False
Math: associates degree vs masters degree0.4010.123259Math by Parental Educationwelch_t_test1False1False0.4318False0.882False
Writing: Group D vs Group E0.41040.0863140Writing by Race/Ethnicitywelch_t_test1False1False0.4364False0.882False
Reading: Group C vs Group D0.42580.0665262Reading by Race/Ethnicitywelch_t_test1False1False0.4471False0.882False
Writing: associates degree vs Some college0.44670.072222Writing by Parental Educationwelch_t_test1False1False0.4632False0.882False
Math: Group B vs Group C0.46480.067190Math by Race/Ethnicitywelch_t_test1False1False0.4761False0.882False
Math: associates degree vs Some college0.58760.0513222Math by Parental Educationwelch_t_test1False1False0.5947False0.882False
Math: bachelors degree vs masters degree0.8820.023759Math by Parental Educationwelch_t_test1False1False0.882False0.882False

Interpretation

Purpose

This section presents the complete numerical results from all 84 hypothesis tests with raw and adjusted p-values across four correction methods. It enables users to identify which findings remain statistically significant after controlling for multiple comparisons—the core objective of the analysis. By displaying all rejection decisions side-by-side, it facilitates comparison of method stringency and power.

Key Findings

  • Holm Rejections: 36 of 84 – Conservative stepdown procedure retains 43% of significant findings while controlling family-wise error rate
  • BH Rejections: 56 of 84 – False discovery rate control yields 93% more rejections than Bonferroni, reflecting its higher power
  • Effect Size Alignment: 36 tests – Exactly match Holm rejections with practically significant effects (Cohen's d ≥ 0.2), indicating statistical and practical significance align
  • Uncorrected FWER: 0.987 – Without correction, 98.7% probability of at least one false positive across 84 tests, justifying the correction approach

Interpretation

The analysis reveals a clear hierarchy: Bonferroni (most conservative, 29 rejections) < Holm/Hochberg (moderate, 36 rejections) < BH (least conservative, 56

Table 11

Methodology Summary

Technical Details and Method Comparison

Technical details and method comparison

methodrejectionscontrolsassumptionspower_rankpower_relative_to_bonferroni
holm36FWERGeneral (weak assumptions)31.241
bonferroni29FWERGeneral (weak assumptions)41
BH56FDRIndependence or positive dependence11.931
hochberg36FWERGeneral (weak assumptions)21.241

Interpretation

Purpose

This section compares four multiple comparison correction methods to demonstrate why Holm-Bonferroni was selected as the primary approach. It shows the trade-off between statistical power (ability to detect true effects) and error control (preventing false positives) across methods with different assumptions and objectives. Understanding these differences is critical for interpreting which of the 36 Holm-rejected hypotheses represent genuine findings versus false discoveries.

Key Findings

  • Bonferroni (29 rejections): Most conservative FWER method; 24% less powerful than Holm but requires no dependence assumptions
  • Holm (36 rejections): 24% more powerful than Bonferroni with identical FWER control; assumes independence or positive dependence
  • Hochberg (36 rejections): Matches Holm's power but requires stronger assumptions; ranks 2nd in power
  • Benjamini-Hochberg (56 rejections): 93% more powerful than Bonferroni but controls FDR (looser standard) rather than FWER; allows ~5% false discovery rate among rejected tests

Interpretation

The 7-test gap between Holm (36) and Bonferroni (29) represents the power gain from assuming test dependence structure.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing