Overview

Analysis Overview

Analysis overview and configuration

Analysis TypeHolm Bonferroni

CompanyEducation Research Institute

ObjectiveApply Holm-Bonferroni and other multiple comparison corrections to identify which hypothesis tests remain significant after controlling the test_group-wise error rate

Analysis Date2026-03-05

Processing Idtest_1772775794

Total Observations84

Parameter	Value	_row
significance_level	0.05	significance_level
primary_method	holm	primary_method
comparison_methods	bonferroni,BH,hochberg	comparison_methods
min_effect_size	0.2	min_effect_size

Interpretation

Purpose

This analysis applies multiple comparison correction methods to 84 educational hypothesis tests comparing student performance across demographic groups (lunch type, test prep, parental education, race/ethnicity). The objective is to control family-wise error rate (FWER) and identify which differences remain statistically significant after accounting for the inflated Type I error risk inherent in conducting 84 simultaneous tests.

Key Findings

Uncorrected FWER: 0.987 – Without correction, there is a 98.7% probability of at least one false positive across all 84 tests, demonstrating critical need for adjustment
Holm-Bonferroni Rejections: 36 significant findings – Holm method retains 36 rejections while controlling FWER at α=0.05
Bonferroni Rejections: 29 findings – More conservative; Holm gains 24% additional power over Bonferroni
BH (FDR) Rejections: 56 findings – Most liberal method, controlling false discovery rate rather than FWER
Practical Significance Alignment: 36 tests flagged as practically significant (effect size ≥0.2), perfectly matching Holm rejections, indicating strong concordance between statistical and practical significance

###

Data preprocessing and column mapping

Initial Rows84

Final Rows84

Rows Removed0

Retention Rate100

Interpretation

Purpose

This section documents the data preprocessing pipeline for 84 hypothesis tests undergoing multiple comparison corrections. Perfect data retention (100%) indicates no observations were excluded during cleaning, which is critical for maintaining the integrity of family-wise error rate (FWER) control calculations that depend on the complete test set.

Key Findings

Initial & Final Rows: 84 observations retained across both stages—no data loss occurred during preprocessing
Retention Rate: 100% indicates complete dataset integrity with zero exclusions
Rows Removed: 0 deletions suggests either pristine input data or that no quality thresholds triggered removal criteria
Train/Test Split: Not applicable, as this is a multiple comparison correction analysis rather than a predictive modeling task

Interpretation

The perfect retention rate ensures that all 84 tests remain eligible for correction methods (Holm, Bonferroni, BH, Hochberg). This is essential because FWER calculations depend on the complete family of tests; removing even one test would alter the stepdown thresholds and rejection decisions. The absence of preprocessing transformations preserves the raw p-values and effect sizes needed for accurate multiple comparison control.

Context

No train/test split is applicable here since the objective is statistical inference across a fixed test family, not predictive generalization. The analysis assumes all 84 tests

Executive Summary

Executive summary of multiple comparison correction results

total_tests

holm_rejections

bonferroni_rejections

uncorrected_fwer

98.7%

power_gain

24.1%

finding	value	interpretation
Total Tests	84	Number of hypothesis tests corrected
Holm Rejections	36	Tests significant after Holm correction
Bonferroni Rejections	29	Tests significant after Bonferroni (baseline)
BH Rejections	56	Tests significant under FDR control
Uncorrected FWER	98.7%	Probability of false positive without correction
Holm Power Gain vs Bonferroni	24.1%	Additional true positives gained by Holm over Bonferroni
Marginal Test	Writing: associates degree vs Some high school	Last test rejected by Holm procedure

Bottom Line: Applied Holm-Bonferroni correction to 84 hypothesis tests at α=0.05. Without correction, FWER would be 98.7% (almost certain false positives). Holm rejected 36 tests while controlling FWER at 5%.

Key Findings:
• Holm rejected 7 additional tests vs Bonferroni (24.1% power gain)
• BH (FDR control) rejected 56 tests (less conservative than FWER methods)
• Marginal test: Writing: associates degree vs Some high school — the last test where Holm stopped rejecting

Recommendation: Focus on the 36 Holm-significant tests for follow-up. These results are statistically robust with test_group-wise error rate controlled at 5%.

Interpretation

EXECUTIVE SUMMARY: MULTIPLE COMPARISON CORRECTION ANALYSIS

Purpose

This analysis applied four statistical correction methods to 84 hypothesis tests comparing student performance across demographic groups (lunch type, test prep, race/ethnicity, parental education). The objective was to identify which findings remain statistically significant after controlling for multiple testing inflation, which would otherwise produce a 98.7% probability of at least one false positive.

Key Findings

Uncorrected FWER: 98.7% — Without correction, the analysis would almost certainly report false positives across the 84 tests
Holm-Bonferroni Rejections: 36 tests — Maintains family-wise error rate at 5% while rejecting 7 more tests than strict Bonferroni (24.1% power gain)
Bonferroni Rejections: 29 tests — Most conservative FWER method; rejects only the strongest signals
BH (FDR) Rejections: 56 tests — Less stringent control; allows up to 5% false discovery rate rather than protecting against any false positive
Effect Size Alignment: 36 tests with practically significant effects (≥0.2) match Holm rejections, indicating statistical and practical significance overlap

Interpretation

The analysis successfully

Visualization

Holm Step-Down Procedure

Holm step-down sequential rejection procedure with decreasing thresholds

Interpretation

Purpose

The Holm step-down procedure controls family-wise error rate (FWER) by sequentially testing hypotheses in order of increasing p-value, with thresholds that become progressively more lenient. This section demonstrates how Holm allocates the significance budget across 84 tests while maintaining strict control over false positives—critical for identifying genuinely significant effects amid multiple comparisons.

Key Findings

Holm Rejections: 36 of 84 tests rejected—57% retention rate reflects conservative FWER control
Threshold Range: 0 to 0.05, decreasing as α/(m-i+1), creating a staircase pattern that tightens early tests most severely
Marginal Test: Writing: associates degree vs Some high school (p=0.0009) marks the rejection boundary; tests beyond this fail their thresholds
Raw p-value Distribution: Mean=0.09, median=0, indicating strong clustering of significant results at the lower tail

Interpretation

Holm's sequential rejection strategy is more powerful than Bonferroni (24% power gain) because it relaxes thresholds for weaker tests while maintaining FWER at α=0.05. The 36 rejections represent findings robust to multiple comparison correction. The marginal test at

Visualization

Method Comparison: Adjusted P-values

Raw vs adjusted p-values across all correction methods

Interpretation

Purpose

This section compares how four multiple-comparison correction methods adjust raw p-values to control error rates across 84 simultaneous hypothesis tests. It demonstrates the trade-off between statistical conservatism (fewer false positives) and statistical power (ability to detect true effects). Understanding these differences is critical for determining which findings remain credible after correcting for multiple testing.

Key Findings

Holm vs. Bonferroni: Holm rejected 36 tests versus Bonferroni's 29—a 24.1% power gain while maintaining identical FWER control, demonstrating Holm's superiority as a step-down procedure.
BH (FDR) Method: Rejected 56 tests, the most permissive approach, because it controls False Discovery Rate rather than Family-Wise Error Rate, allowing more discoveries at the cost of accepting some false positives.
Hochberg Performance: Matched Holm's 36 rejections despite weaker assumptions, suggesting strong positive dependence among test statistics in this dataset.
Uncorrected FWER: At 0.987, the probability of at least one false positive without correction is nearly certain across 84 tests.

Interpretation

The analysis reveals a clear hierarchy: Bonferroni is most conservative, Holm and

Visualization

Decision Matrix

Which tests survive under which correction methods

Interpretation

Purpose

This section visualizes which of the 84 hypothesis tests remain statistically significant under four different multiple comparison correction methods. It reveals the power-conservativeness tradeoff: stricter methods (Bonferroni, Holm) reject fewer tests but provide stronger Type I error control, while less conservative methods (BH, Hochberg) detect more signals by controlling different error rates. Understanding method agreement identifies robust findings versus marginal discoveries.

Key Findings

Bonferroni Rejections: 29 tests (35%) — most conservative FWER control; highest false negative risk
Holm Rejections: 36 tests (43%) — 24% power gain over Bonferroni while maintaining FWER control
BH Rejections: 56 tests (67%) — most liberal; controls FDR rather than FWER, detecting 93% more signals
Hochberg Rejections: 36 tests (43%) — matches Holm; stepdown improvement over Bonferroni
Perfect Agreement: First 5 tests rejected by all methods; last 5 retained by all methods — indicating clear signal/noise separation at distribution extremes

Interpretation

The decision matrix demonstrates that 36 tests form a robust consensus across FWER-controlling methods

Visualization

Method Comparison: Rejection Counts

Number of rejections per correction method

Interpretation

Purpose

This section compares the statistical power and rejection rates across four multiple comparison correction methods applied to the same 84 hypothesis tests. It demonstrates the trade-off between controlling false positives (Type I error) and detecting true effects (statistical power), which is central to choosing an appropriate correction strategy for the analysis objective.

Key Findings

Benjamini-Hochberg (BH) Rejections: 56 tests (66.7%) — highest rejection rate with 1.93× power relative to Bonferroni, controls False Discovery Rate rather than Family-Wise Error Rate
Holm & Hochberg Rejections: 36 tests each (42.9%) — equivalent rejection counts despite different algorithmic approaches; Holm uses step-down logic while Hochberg uses step-up
Bonferroni Rejections: 29 tests (34.5%) — most conservative method with lowest power; provides strongest control but sacrifices detection ability
Power Differential: 27-test gap between BH (most powerful) and Bonferroni (least powerful) reflects fundamental difference between FDR and FWER control philosophies

Interpretation

The 93% increase in rejections from Bonferroni (29) to BH (56) illustrates why method selection depends on study context.

Visualization

Effect Size Analysis

Practical significance alongside statistical significance

Interpretation

Purpose

This section validates whether statistically significant findings (after Holm-Bonferroni correction) also represent meaningful, real-world differences. Statistical significance alone can be misleading with large sample sizes; effect size analysis ensures that rejected hypotheses reflect substantive rather than trivial differences. This bridges the gap between p-values and practical importance.

Key Findings

Practical Significance Alignment: 36 of 84 tests (43%) are both statistically significant (Holm-corrected) AND practically significant (Cohen's d ≥ 0.20), indicating robust, meaningful effects.
Mean Effect Size: 0.35 (medium effect), with median 0.34 and range 0.02–0.94, showing considerable variation in effect magnitudes across comparisons.
Confidence Interval Coverage: 69% of tests flagged as "Practically significant" have confidence intervals excluding zero, strengthening evidence of real differences.
Small Effect Concern: 26 tests show small effects (d < 0.20), suggesting some Holm-rejected findings may be statistically robust but practically negligible due to large sample sizes.

Interpretation

The analysis reveals a healthy concordance between statistical and practical significance for the majority of findings. The 36 practically significant results represent genuine, meaningful differences in student outcomes across

Visualization

FWER Accumulation Curve

Why correction is needed: FWER accumulation without correction

Interpretation

Purpose

This section quantifies the multiple testing problem: conducting 84 independent hypothesis tests at α=0.05 without correction inflates the probability of at least one false positive to 98.7%. This demonstrates why statistical correction methods are essential and motivates the application of Holm-Bonferroni and alternative procedures in the overall analysis.

Key Findings

Uncorrected FWER: 98.7% — nearly certain to observe at least one false positive by chance alone
Mathematical Driver: FWER = 1 − (0.95)^84 = 0.987, showing exponential accumulation of error across tests
Critical Threshold: Even 5 tests yield 23% FWER; by 80+ tests, false positive risk approaches certainty
Correction Necessity: Holm-Bonferroni reduces FWER back to the nominal 5% level through adjusted p-value thresholds

Interpretation

Without correction, the 84 comparisons across Math, Reading, and Writing assessments would produce a 98.7% probability of spurious significance—rendering raw p-values unreliable for decision-making. The FWER curve illustrates that error accumulation is non-linear: the first 10 tests contribute modest inflation, but by test

Data Table

Complete Results Table

Complete numerical results with all p-values and decisions

test_label	raw_pvalue	cohens_d	n_per_group	family	statistical_test	holm_adjusted_p	holm_reject	bonferroni_adjusted_p	bonferroni_reject	BH_adjusted_p	BH_reject	hochberg_adjusted_p	hochberg_reject
Math: free/reduced vs standard	0	0.7823	355	Math by Lunch Type	welch_t_test	0	True	0	True	0	True	0	True
Reading: free/reduced vs standard	0	0.4924	355	Reading by Lunch Type	welch_t_test	0	True	0	True	0	True	0	True
Writing: free/reduced vs standard	0	0.5293	355	Writing by Lunch Type	welch_t_test	0	True	0	True	0	True	0	True
Reading: completed vs none	0	0.5192	358	Reading by Test Prep	welch_t_test	0	True	0	True	0	True	0	True
Writing: completed vs none	0	0.6866	358	Writing by Test Prep	welch_t_test	0	True	0	True	0	True	0	True
Reading: female vs male	0	0.5037	482	Reading by Gender	welch_t_test	0	True	0	True	0	True	0	True
Writing: female vs male	0	0.6316	482	Writing by Gender	welch_t_test	0	True	0	True	0	True	0	True
Writing: bachelors degree vs high school	2e-10	0.7629	118	Writing by Parental Education	welch_t_test	1.46e-08	True	1.68e-08	True	1.4e-09	True	1.46e-08	True
Writing: high school vs masters degree	9e-10	0.9446	59	Writing by Parental Education	welch_t_test	6.48e-08	True	7.56e-08	True	5.815e-09	True	6.48e-08	True
Math: Group C vs Group E	1.9e-09	0.6212	140	Math by Race/Ethnicity	welch_t_test	1.349e-07	True	1.596e-07	True	1.14e-08	True	1.349e-07	True
Math: Group B vs Group E	5e-09	0.6691	140	Math by Race/Ethnicity	welch_t_test	3.5e-07	True	4.2e-07	True	2.8e-08	True	3.5e-07	True
Math: Group A vs Group E	1.08e-08	0.8048	89	Math by Race/Ethnicity	welch_t_test	7.452e-07	True	9.072e-07	True	5.67e-08	True	7.452e-07	True
Math: completed vs none	1.54e-08	0.3763	358	Math by Test Prep	welch_t_test	1.047e-06	True	1.294e-06	True	7.609e-08	True	1.047e-06	True
Math: female vs male	9.12e-08	0.3407	482	Math by Gender	welch_t_test	6.11e-06	True	7.661e-06	True	4.256e-07	True	6.11e-06	True
Writing: associates degree vs high school	1.465e-07	0.5242	196	Writing by Parental Education	welch_t_test	9.669e-06	True	0	True	6.477e-07	True	9.669e-06	True
Reading: high school vs masters degree	6.258e-07	0.7593	59	Reading by Parental Education	welch_t_test	0	True	0.0001	True	2.628e-06	True	0	True
Reading: bachelors degree vs high school	8.804e-07	0.5846	118	Reading by Parental Education	welch_t_test	0.0001	True	0.0001	True	3.522e-06	True	0.0001	True
Writing: masters degree vs Some high school	4.275e-06	0.7067	59	Writing by Parental Education	welch_t_test	0.0003	True	0.0004	True	0	True	0.0003	True
Writing: bachelors degree vs Some high school	4.628e-06	0.5535	118	Writing by Parental Education	welch_t_test	0.0003	True	0.0004	True	0	True	0.0003	True
Reading: associates degree vs high school	7.442e-06	0.4448	196	Reading by Parental Education	welch_t_test	0.0005	True	0.0006	True	0	True	0.0005	True
Writing: high school vs Some college	9.275e-06	0.4381	196	Writing by Parental Education	welch_t_test	0.0006	True	0.0008	True	0	True	0.0006	True
Math: Group D vs Group E	0	0.4483	140	Math by Race/Ethnicity	welch_t_test	0	True	0	True	0	True	0	True
Math: bachelors degree vs high school	0	0.4936	118	Math by Parental Education	welch_t_test	0	True	0	True	0	True	0	True
Writing: Group A vs Group E	0	0.5726	89	Writing by Race/Ethnicity	welch_t_test	0	True	0	True	0	True	0	True
Writing: Group A vs Group D	0	0.5099	89	Writing by Race/Ethnicity	welch_t_test	0	True	0	True	0	True	0	True
Reading: Group A vs Group E	0.0001	0.5519	89	Reading by Race/Ethnicity	welch_t_test	0.0059	True	0.0084	True	0.0003	True	0.0058	True
Math: associates degree vs high school	0.0001	0.387	196	Math by Parental Education	welch_t_test	0.0059	True	0.0084	True	0.0003	True	0.0058	True
Reading: masters degree vs Some high school	0.0002	0.5594	59	Reading by Parental Education	welch_t_test	0.0114	True	0.0168	True	0.0006	True	0.0114	True
Math: high school vs Some college	0.0004	0.3461	196	Math by Parental Education	welch_t_test	0.0224	True	0.0336	True	0.0012	True	0.0224	True
Math: high school vs masters degree	0.0006	0.5182	59	Math by Parental Education	welch_t_test	0.033	True	0.0504	False	0.0016	True	0.0324	True
Reading: high school vs Some college	0.0006	0.3375	196	Reading by Parental Education	welch_t_test	0.033	True	0.0504	False	0.0016	True	0.0324	True
Reading: bachelors degree vs Some high school	0.0008	0.4036	118	Reading by Parental Education	welch_t_test	0.0424	True	0.0672	False	0.002	True	0.0408	True
Reading: Group B vs Group E	0.0008	0.3771	140	Reading by Race/Ethnicity	welch_t_test	0.0424	True	0.0672	False	0.002	True	0.0408	True
Writing: Group B vs Group E	0.0008	0.3768	140	Writing by Race/Ethnicity	welch_t_test	0.0424	True	0.0672	False	0.002	True	0.0408	True
Math: Group A vs Group D	0.0009	0.4106	89	Math by Race/Ethnicity	welch_t_test	0.045	True	0.0756	False	0.0021	True	0.0441	True
Writing: associates degree vs Some high school	0.0009	0.3347	179	Writing by Parental Education	welch_t_test	0.045	True	0.0756	False	0.0021	True	0.0441	True
Writing: Group B vs Group D	0.0015	0.3049	190	Writing by Race/Ethnicity	welch_t_test	0.072	False	0.126	False	0.0033	True	0.0705	False
Math: bachelors degree vs Some high school	0.0015	0.3791	118	Math by Parental Education	welch_t_test	0.072	False	0.126	False	0.0033	True	0.0705	False
Writing: masters degree vs Some college	0.0017	0.4633	59	Writing by Parental Education	welch_t_test	0.0782	False	0.1428	False	0.0037	True	0.0782	False
Reading: Group A vs Group D	0.0025	0.3738	89	Reading by Race/Ethnicity	welch_t_test	0.1125	False	0.21	False	0.0052	True	0.1125	False
Reading: masters degree vs Some college	0.0042	0.4223	59	Reading by Parental Education	welch_t_test	0.1848	False	0.3528	False	0.0086	True	0.1848	False
Writing: Group A vs Group C	0.0046	0.3415	89	Writing by Race/Ethnicity	welch_t_test	0.1978	False	0.3864	False	0.0092	True	0.1978	False
Math: Group B vs Group D	0.0049	0.2695	190	Math by Race/Ethnicity	welch_t_test	0.2058	False	0.4116	False	0.0095	True	0.205	False
Math: associates degree vs Some high school	0.005	0.2833	179	Math by Parental Education	welch_t_test	0.2058	False	0.42	False	0.0095	True	0.205	False
Writing: associates degree vs masters degree	0.0058	0.4074	59	Writing by Parental Education	welch_t_test	0.232	False	0.4872	False	0.0108	True	0.232	False
Reading: associates degree vs Some high school	0.0068	0.2731	179	Reading by Parental Education	welch_t_test	0.2652	False	0.5712	False	0.0123	True	0.2622	False
Reading: Group C vs Group E	0.0069	0.2751	140	Reading by Race/Ethnicity	welch_t_test	0.2652	False	0.5796	False	0.0123	True	0.2622	False
Writing: bachelors degree vs Some college	0.0077	0.3044	118	Writing by Parental Education	welch_t_test	0.2849	False	0.6468	False	0.0135	True	0.2849	False
Math: masters degree vs Some high school	0.0087	0.397	59	Math by Parental Education	welch_t_test	0.3132	False	0.7308	False	0.0149	True	0.3132	False
Writing: Some college vs Some high school	0.0104	0.2577	179	Writing by Parental Education	welch_t_test	0.364	False	0.8736	False	0.0171	True	0.3536	False
Reading: Group A vs Group C	0.0104	0.3087	89	Reading by Race/Ethnicity	welch_t_test	0.364	False	0.8736	False	0.0171	True	0.3536	False
Math: Group C vs Group D	0.0159	0.2017	262	Math by Race/Ethnicity	welch_t_test	0.5247	False	1	False	0.0257	True	0.5216	False
Math: Some college vs Some high school	0.0163	0.2413	179	Math by Parental Education	welch_t_test	0.5247	False	1	False	0.0258	True	0.5216	False
Writing: Group C vs Group E	0.0192	0.2383	140	Writing by Race/Ethnicity	welch_t_test	0.5952	False	1	False	0.0299	True	0.5952	False
Reading: bachelors degree vs Some college	0.0281	0.2504	118	Reading by Parental Education	welch_t_test	0.843	False	1	False	0.0429	True	0.843	False
Reading: associates degree vs masters degree	0.0293	0.3209	59	Reading by Parental Education	welch_t_test	0.8497	False	1	False	0.044	True	0.8497	False
Writing: associates degree vs bachelors degree	0.0351	0.2411	118	Writing by Parental Education	welch_t_test	0.9828	False	1	False	0.0517	False	0.882	False
Reading: Group D vs Group E	0.045	0.2105	140	Reading by Race/Ethnicity	welch_t_test	1	False	1	False	0.0652	False	0.882	False
Reading: Group B vs Group D	0.0524	0.1854	190	Reading by Race/Ethnicity	welch_t_test	1	False	1	False	0.0746	False	0.882	False
Writing: Group C vs Group D	0.0593	0.1576	262	Writing by Race/Ethnicity	welch_t_test	1	False	1	False	0.083	False	0.882	False
Reading: Some college vs Some high school	0.0873	0.1715	179	Reading by Parental Education	welch_t_test	1	False	1	False	0.1202	False	0.882	False
Math: Group A vs Group C	0.1104	0.1918	89	Math by Race/Ethnicity	welch_t_test	1	False	1	False	0.148	False	0.882	False
Writing: Group B vs Group C	0.111	0.1463	190	Writing by Race/Ethnicity	welch_t_test	1	False	1	False	0.148	False	0.882	False
Writing: high school vs Some high school	0.1141	0.1638	179	Writing by Parental Education	welch_t_test	1	False	1	False	0.1498	False	0.882	False
Writing: Group A vs Group B	0.1448	0.1878	89	Writing by Race/Ethnicity	welch_t_test	1	False	1	False	0.1843	False	0.882	False
Reading: high school vs Some high school	0.1448	0.1511	179	Reading by Parental Education	welch_t_test	1	False	1	False	0.1843	False	0.882	False
Math: bachelors degree vs Some college	0.1715	0.1556	118	Math by Parental Education	welch_t_test	1	False	1	False	0.2148	False	0.882	False
Reading: Group A vs Group B	0.1739	0.1751	89	Reading by Race/Ethnicity	welch_t_test	1	False	1	False	0.2148	False	0.882	False
Reading: Group B vs Group C	0.1867	0.1212	190	Reading by Race/Ethnicity	welch_t_test	1	False	1	False	0.2273	False	0.882	False
Reading: associates degree vs bachelors degree	0.1952	0.1479	118	Reading by Parental Education	welch_t_test	1	False	1	False	0.2342	False	0.882	False
Math: masters degree vs Some college	0.2176	0.1806	59	Math by Parental Education	welch_t_test	1	False	1	False	0.2574	False	0.882	False
Reading: associates degree vs Some college	0.2666	0.1051	222	Reading by Parental Education	welch_t_test	1	False	1	False	0.311	False	0.882	False
Reading: bachelors degree vs masters degree	0.2933	0.1681	59	Reading by Parental Education	welch_t_test	1	False	1	False	0.3375	False	0.882	False
Writing: bachelors degree vs masters degree	0.3188	0.1594	59	Writing by Parental Education	welch_t_test	1	False	1	False	0.3619	False	0.882	False
Math: Group A vs Group B	0.3503	0.1202	89	Math by Race/Ethnicity	welch_t_test	1	False	1	False	0.3923	False	0.882	False
Math: associates degree vs bachelors degree	0.3802	0.1001	118	Math by Parental Education	welch_t_test	1	False	1	False	0.4202	False	0.882	False
Math: high school vs Some high school	0.3881	0.0893	179	Math by Parental Education	welch_t_test	1	False	1	False	0.4234	False	0.882	False
Math: associates degree vs masters degree	0.401	0.1232	59	Math by Parental Education	welch_t_test	1	False	1	False	0.4318	False	0.882	False
Writing: Group D vs Group E	0.4104	0.0863	140	Writing by Race/Ethnicity	welch_t_test	1	False	1	False	0.4364	False	0.882	False
Reading: Group C vs Group D	0.4258	0.0665	262	Reading by Race/Ethnicity	welch_t_test	1	False	1	False	0.4471	False	0.882	False
Writing: associates degree vs Some college	0.4467	0.072	222	Writing by Parental Education	welch_t_test	1	False	1	False	0.4632	False	0.882	False
Math: Group B vs Group C	0.4648	0.067	190	Math by Race/Ethnicity	welch_t_test	1	False	1	False	0.4761	False	0.882	False
Math: associates degree vs Some college	0.5876	0.0513	222	Math by Parental Education	welch_t_test	1	False	1	False	0.5947	False	0.882	False
Math: bachelors degree vs masters degree	0.882	0.0237	59	Math by Parental Education	welch_t_test	1	False	1	False	0.882	False	0.882	False

Interpretation

Purpose

This section presents the complete numerical results from all 84 hypothesis tests with raw and adjusted p-values across four correction methods. It enables users to identify which findings remain statistically significant after controlling for multiple comparisons—the core objective of the analysis. By displaying all rejection decisions side-by-side, it facilitates comparison of method stringency and power.

Key Findings

Holm Rejections: 36 of 84 – Conservative stepdown procedure retains 43% of significant findings while controlling family-wise error rate
BH Rejections: 56 of 84 – False discovery rate control yields 93% more rejections than Bonferroni, reflecting its higher power
Effect Size Alignment: 36 tests – Exactly match Holm rejections with practically significant effects (Cohen's d ≥ 0.2), indicating statistical and practical significance align
Uncorrected FWER: 0.987 – Without correction, 98.7% probability of at least one false positive across 84 tests, justifying the correction approach

Interpretation

The analysis reveals a clear hierarchy: Bonferroni (most conservative, 29 rejections) < Holm/Hochberg (moderate, 36 rejections) < BH (least conservative, 56

Data Table

Methodology Summary

Technical details and method comparison

method	rejections	controls	assumptions	power_rank	power_relative_to_bonferroni
holm	36	FWER	General (weak assumptions)	3	1.241
bonferroni	29	FWER	General (weak assumptions)	4	1
BH	56	FDR	Independence or positive dependence	1	1.931
hochberg	36	FWER	General (weak assumptions)	2	1.241

Interpretation

Purpose

This section compares four multiple comparison correction methods to demonstrate why Holm-Bonferroni was selected as the primary approach. It shows the trade-off between statistical power (ability to detect true effects) and error control (preventing false positives) across methods with different assumptions and objectives. Understanding these differences is critical for interpreting which of the 36 Holm-rejected hypotheses represent genuine findings versus false discoveries.

Key Findings

Bonferroni (29 rejections): Most conservative FWER method; 24% less powerful than Holm but requires no dependence assumptions
Holm (36 rejections): 24% more powerful than Bonferroni with identical FWER control; assumes independence or positive dependence
Hochberg (36 rejections): Matches Holm's power but requires stronger assumptions; ranks 2nd in power
Benjamini-Hochberg (56 rejections): 93% more powerful than Bonferroni but controls FDR (looser standard) rather than FWER; allows ~5% false discovery rate among rejected tests

Interpretation

The 7-test gap between Holm (36) and Bonferroni (29) represents the power gain from assuming test dependence structure.