Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| significance_level | 0.05 | significance_level |
| primary_method | holm | primary_method |
| comparison_methods | bonferroni,BH,hochberg | comparison_methods |
| min_effect_size | 0.2 | min_effect_size |
This analysis applies multiple comparison correction methods to 84 educational hypothesis tests comparing student performance across demographic groups (lunch type, test prep, parental education, race/ethnicity). The objective is to control family-wise error rate (FWER) and identify which differences remain statistically significant after accounting for the inflated Type I error risk inherent in conducting 84 simultaneous tests.
###
Data preprocessing and column mapping
| Metric | Value |
|---|---|
| Initial Rows | 84 |
| Final Rows | 84 |
| Rows Removed | 0 |
| Retention Rate | 100% |
This section documents the data preprocessing pipeline for 84 hypothesis tests undergoing multiple comparison corrections. Perfect data retention (100%) indicates no observations were excluded during cleaning, which is critical for maintaining the integrity of family-wise error rate (FWER) control calculations that depend on the complete test set.
The perfect retention rate ensures that all 84 tests remain eligible for correction methods (Holm, Bonferroni, BH, Hochberg). This is essential because FWER calculations depend on the complete family of tests; removing even one test would alter the stepdown thresholds and rejection decisions. The absence of preprocessing transformations preserves the raw p-values and effect sizes needed for accurate multiple comparison control.
No train/test split is applicable here since the objective is statistical inference across a fixed test family, not predictive generalization. The analysis assumes all 84 tests
| finding | value | interpretation |
|---|---|---|
| Total Tests | 84 | Number of hypothesis tests corrected |
| Holm Rejections | 36 | Tests significant after Holm correction |
| Bonferroni Rejections | 29 | Tests significant after Bonferroni (baseline) |
| BH Rejections | 56 | Tests significant under FDR control |
| Uncorrected FWER | 98.7% | Probability of false positive without correction |
| Holm Power Gain vs Bonferroni | 24.1% | Additional true positives gained by Holm over Bonferroni |
| Marginal Test | Writing: associates degree vs Some high school | Last test rejected by Holm procedure |
This analysis applied four statistical correction methods to 84 hypothesis tests comparing student performance across demographic groups (lunch type, test prep, race/ethnicity, parental education). The objective was to identify which findings remain statistically significant after controlling for multiple testing inflation, which would otherwise produce a 98.7% probability of at least one false positive.
The analysis successfully
Holm step-down sequential rejection procedure with decreasing thresholds
The Holm step-down procedure controls family-wise error rate (FWER) by sequentially testing hypotheses in order of increasing p-value, with thresholds that become progressively more lenient. This section demonstrates how Holm allocates the significance budget across 84 tests while maintaining strict control over false positives—critical for identifying genuinely significant effects amid multiple comparisons.
Holm's sequential rejection strategy is more powerful than Bonferroni (24% power gain) because it relaxes thresholds for weaker tests while maintaining FWER at α=0.05. The 36 rejections represent findings robust to multiple comparison correction. The marginal test at
Raw vs adjusted p-values across all correction methods
This section compares how four multiple-comparison correction methods adjust raw p-values to control error rates across 84 simultaneous hypothesis tests. It demonstrates the trade-off between statistical conservatism (fewer false positives) and statistical power (ability to detect true effects). Understanding these differences is critical for determining which findings remain credible after correcting for multiple testing.
The analysis reveals a clear hierarchy: Bonferroni is most conservative, Holm and
Which tests survive under which correction methods
This section visualizes which of the 84 hypothesis tests remain statistically significant under four different multiple comparison correction methods. It reveals the power-conservativeness tradeoff: stricter methods (Bonferroni, Holm) reject fewer tests but provide stronger Type I error control, while less conservative methods (BH, Hochberg) detect more signals by controlling different error rates. Understanding method agreement identifies robust findings versus marginal discoveries.
The decision matrix demonstrates that 36 tests form a robust consensus across FWER-controlling methods
Number of rejections per correction method
This section compares the statistical power and rejection rates across four multiple comparison correction methods applied to the same 84 hypothesis tests. It demonstrates the trade-off between controlling false positives (Type I error) and detecting true effects (statistical power), which is central to choosing an appropriate correction strategy for the analysis objective.
The 93% increase in rejections from Bonferroni (29) to BH (56) illustrates why method selection depends on study context.
Practical significance alongside statistical significance
This section validates whether statistically significant findings (after Holm-Bonferroni correction) also represent meaningful, real-world differences. Statistical significance alone can be misleading with large sample sizes; effect size analysis ensures that rejected hypotheses reflect substantive rather than trivial differences. This bridges the gap between p-values and practical importance.
The analysis reveals a healthy concordance between statistical and practical significance for the majority of findings. The 36 practically significant results represent genuine, meaningful differences in student outcomes across
Why correction is needed: FWER accumulation without correction
This section quantifies the multiple testing problem: conducting 84 independent hypothesis tests at α=0.05 without correction inflates the probability of at least one false positive to 98.7%. This demonstrates why statistical correction methods are essential and motivates the application of Holm-Bonferroni and alternative procedures in the overall analysis.
Without correction, the 84 comparisons across Math, Reading, and Writing assessments would produce a 98.7% probability of spurious significance—rendering raw p-values unreliable for decision-making. The FWER curve illustrates that error accumulation is non-linear: the first 10 tests contribute modest inflation, but by test
Complete numerical results with all p-values and decisions
| test_label | raw_pvalue | cohens_d | n_per_group | family | statistical_test | holm_adjusted_p | holm_reject | bonferroni_adjusted_p | bonferroni_reject | BH_adjusted_p | BH_reject | hochberg_adjusted_p | hochberg_reject |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Math: free/reduced vs standard | 0 | 0.7823 | 355 | Math by Lunch Type | welch_t_test | 0 | True | 0 | True | 0 | True | 0 | True |
| Reading: free/reduced vs standard | 0 | 0.4924 | 355 | Reading by Lunch Type | welch_t_test | 0 | True | 0 | True | 0 | True | 0 | True |
| Writing: free/reduced vs standard | 0 | 0.5293 | 355 | Writing by Lunch Type | welch_t_test | 0 | True | 0 | True | 0 | True | 0 | True |
| Reading: completed vs none | 0 | 0.5192 | 358 | Reading by Test Prep | welch_t_test | 0 | True | 0 | True | 0 | True | 0 | True |
| Writing: completed vs none | 0 | 0.6866 | 358 | Writing by Test Prep | welch_t_test | 0 | True | 0 | True | 0 | True | 0 | True |
| Reading: female vs male | 0 | 0.5037 | 482 | Reading by Gender | welch_t_test | 0 | True | 0 | True | 0 | True | 0 | True |
| Writing: female vs male | 0 | 0.6316 | 482 | Writing by Gender | welch_t_test | 0 | True | 0 | True | 0 | True | 0 | True |
| Writing: bachelors degree vs high school | 2.00e-10 | 0.7629 | 118 | Writing by Parental Education | welch_t_test | 1.46e-08 | True | 1.68e-08 | True | 1.40e-09 | True | 1.46e-08 | True |
| Writing: high school vs masters degree | 9.00e-10 | 0.9446 | 59 | Writing by Parental Education | welch_t_test | 6.48e-08 | True | 7.56e-08 | True | 5.82e-09 | True | 6.48e-08 | True |
| Math: Group C vs Group E | 1.90e-09 | 0.6212 | 140 | Math by Race/Ethnicity | welch_t_test | 1.35e-07 | True | 1.60e-07 | True | 1.14e-08 | True | 1.35e-07 | True |
| Math: Group B vs Group E | 5.00e-09 | 0.6691 | 140 | Math by Race/Ethnicity | welch_t_test | 3.50e-07 | True | 4.20e-07 | True | 2.80e-08 | True | 3.50e-07 | True |
| Math: Group A vs Group E | 1.08e-08 | 0.8048 | 89 | Math by Race/Ethnicity | welch_t_test | 7.45e-07 | True | 9.07e-07 | True | 5.67e-08 | True | 7.45e-07 | True |
| Math: completed vs none | 1.54e-08 | 0.3763 | 358 | Math by Test Prep | welch_t_test | 1.05e-06 | True | 1.29e-06 | True | 7.61e-08 | True | 1.05e-06 | True |
| Math: female vs male | 9.12e-08 | 0.3407 | 482 | Math by Gender | welch_t_test | 6.11e-06 | True | 7.66e-06 | True | 4.26e-07 | True | 6.11e-06 | True |
| Writing: associates degree vs high school | 1.46e-07 | 0.5242 | 196 | Writing by Parental Education | welch_t_test | 9.67e-06 | True | 0 | True | 6.48e-07 | True | 9.67e-06 | True |
| Reading: high school vs masters degree | 6.26e-07 | 0.7593 | 59 | Reading by Parental Education | welch_t_test | 0 | True | 1.00e-04 | True | 2.63e-06 | True | 0 | True |
| Reading: bachelors degree vs high school | 8.80e-07 | 0.5846 | 118 | Reading by Parental Education | welch_t_test | 1.00e-04 | True | 1.00e-04 | True | 3.52e-06 | True | 1.00e-04 | True |
| Writing: masters degree vs Some high school | 4.28e-06 | 0.7067 | 59 | Writing by Parental Education | welch_t_test | 3.00e-04 | True | 4.00e-04 | True | 0 | True | 3.00e-04 | True |
| Writing: bachelors degree vs Some high school | 4.63e-06 | 0.5535 | 118 | Writing by Parental Education | welch_t_test | 3.00e-04 | True | 4.00e-04 | True | 0 | True | 3.00e-04 | True |
| Reading: associates degree vs high school | 7.44e-06 | 0.4448 | 196 | Reading by Parental Education | welch_t_test | 5.00e-04 | True | 6.00e-04 | True | 0 | True | 5.00e-04 | True |
| Writing: high school vs Some college | 9.28e-06 | 0.4381 | 196 | Writing by Parental Education | welch_t_test | 6.00e-04 | True | 8.00e-04 | True | 0 | True | 6.00e-04 | True |
| Math: Group D vs Group E | 0 | 0.4483 | 140 | Math by Race/Ethnicity | welch_t_test | 0 | True | 0 | True | 0 | True | 0 | True |
| Math: bachelors degree vs high school | 0 | 0.4936 | 118 | Math by Parental Education | welch_t_test | 0 | True | 0 | True | 0 | True | 0 | True |
| Writing: Group A vs Group E | 0 | 0.5726 | 89 | Writing by Race/Ethnicity | welch_t_test | 0 | True | 0 | True | 0 | True | 0 | True |
| Writing: Group A vs Group D | 0 | 0.5099 | 89 | Writing by Race/Ethnicity | welch_t_test | 0 | True | 0 | True | 0 | True | 0 | True |
| Reading: Group A vs Group E | 1.00e-04 | 0.5519 | 89 | Reading by Race/Ethnicity | welch_t_test | 0.0059 | True | 0.0084 | True | 3.00e-04 | True | 0.0058 | True |
| Math: associates degree vs high school | 1.00e-04 | 0.387 | 196 | Math by Parental Education | welch_t_test | 0.0059 | True | 0.0084 | True | 3.00e-04 | True | 0.0058 | True |
| Reading: masters degree vs Some high school | 2.00e-04 | 0.5594 | 59 | Reading by Parental Education | welch_t_test | 0.0114 | True | 0.0168 | True | 6.00e-04 | True | 0.0114 | True |
| Math: high school vs Some college | 4.00e-04 | 0.3461 | 196 | Math by Parental Education | welch_t_test | 0.0224 | True | 0.0336 | True | 0.0012 | True | 0.0224 | True |
| Math: high school vs masters degree | 6.00e-04 | 0.5182 | 59 | Math by Parental Education | welch_t_test | 0.033 | True | 0.0504 | False | 0.0016 | True | 0.0324 | True |
| Reading: high school vs Some college | 6.00e-04 | 0.3375 | 196 | Reading by Parental Education | welch_t_test | 0.033 | True | 0.0504 | False | 0.0016 | True | 0.0324 | True |
| Reading: bachelors degree vs Some high school | 8.00e-04 | 0.4036 | 118 | Reading by Parental Education | welch_t_test | 0.0424 | True | 0.0672 | False | 0.002 | True | 0.0408 | True |
| Reading: Group B vs Group E | 8.00e-04 | 0.3771 | 140 | Reading by Race/Ethnicity | welch_t_test | 0.0424 | True | 0.0672 | False | 0.002 | True | 0.0408 | True |
| Writing: Group B vs Group E | 8.00e-04 | 0.3768 | 140 | Writing by Race/Ethnicity | welch_t_test | 0.0424 | True | 0.0672 | False | 0.002 | True | 0.0408 | True |
| Math: Group A vs Group D | 9.00e-04 | 0.4106 | 89 | Math by Race/Ethnicity | welch_t_test | 0.045 | True | 0.0756 | False | 0.0021 | True | 0.0441 | True |
| Writing: associates degree vs Some high school | 9.00e-04 | 0.3347 | 179 | Writing by Parental Education | welch_t_test | 0.045 | True | 0.0756 | False | 0.0021 | True | 0.0441 | True |
| Writing: Group B vs Group D | 0.0015 | 0.3049 | 190 | Writing by Race/Ethnicity | welch_t_test | 0.072 | False | 0.126 | False | 0.0033 | True | 0.0705 | False |
| Math: bachelors degree vs Some high school | 0.0015 | 0.3791 | 118 | Math by Parental Education | welch_t_test | 0.072 | False | 0.126 | False | 0.0033 | True | 0.0705 | False |
| Writing: masters degree vs Some college | 0.0017 | 0.4633 | 59 | Writing by Parental Education | welch_t_test | 0.0782 | False | 0.1428 | False | 0.0037 | True | 0.0782 | False |
| Reading: Group A vs Group D | 0.0025 | 0.3738 | 89 | Reading by Race/Ethnicity | welch_t_test | 0.1125 | False | 0.21 | False | 0.0052 | True | 0.1125 | False |
| Reading: masters degree vs Some college | 0.0042 | 0.4223 | 59 | Reading by Parental Education | welch_t_test | 0.1848 | False | 0.3528 | False | 0.0086 | True | 0.1848 | False |
| Writing: Group A vs Group C | 0.0046 | 0.3415 | 89 | Writing by Race/Ethnicity | welch_t_test | 0.1978 | False | 0.3864 | False | 0.0092 | True | 0.1978 | False |
| Math: Group B vs Group D | 0.0049 | 0.2695 | 190 | Math by Race/Ethnicity | welch_t_test | 0.2058 | False | 0.4116 | False | 0.0095 | True | 0.205 | False |
| Math: associates degree vs Some high school | 0.005 | 0.2833 | 179 | Math by Parental Education | welch_t_test | 0.2058 | False | 0.42 | False | 0.0095 | True | 0.205 | False |
| Writing: associates degree vs masters degree | 0.0058 | 0.4074 | 59 | Writing by Parental Education | welch_t_test | 0.232 | False | 0.4872 | False | 0.0108 | True | 0.232 | False |
| Reading: associates degree vs Some high school | 0.0068 | 0.2731 | 179 | Reading by Parental Education | welch_t_test | 0.2652 | False | 0.5712 | False | 0.0123 | True | 0.2622 | False |
| Reading: Group C vs Group E | 0.0069 | 0.2751 | 140 | Reading by Race/Ethnicity | welch_t_test | 0.2652 | False | 0.5796 | False | 0.0123 | True | 0.2622 | False |
| Writing: bachelors degree vs Some college | 0.0077 | 0.3044 | 118 | Writing by Parental Education | welch_t_test | 0.2849 | False | 0.6468 | False | 0.0135 | True | 0.2849 | False |
| Math: masters degree vs Some high school | 0.0087 | 0.397 | 59 | Math by Parental Education | welch_t_test | 0.3132 | False | 0.7308 | False | 0.0149 | True | 0.3132 | False |
| Writing: Some college vs Some high school | 0.0104 | 0.2577 | 179 | Writing by Parental Education | welch_t_test | 0.364 | False | 0.8736 | False | 0.0171 | True | 0.3536 | False |
| Reading: Group A vs Group C | 0.0104 | 0.3087 | 89 | Reading by Race/Ethnicity | welch_t_test | 0.364 | False | 0.8736 | False | 0.0171 | True | 0.3536 | False |
| Math: Group C vs Group D | 0.0159 | 0.2017 | 262 | Math by Race/Ethnicity | welch_t_test | 0.5247 | False | 1 | False | 0.0257 | True | 0.5216 | False |
| Math: Some college vs Some high school | 0.0163 | 0.2413 | 179 | Math by Parental Education | welch_t_test | 0.5247 | False | 1 | False | 0.0258 | True | 0.5216 | False |
| Writing: Group C vs Group E | 0.0192 | 0.2383 | 140 | Writing by Race/Ethnicity | welch_t_test | 0.5952 | False | 1 | False | 0.0299 | True | 0.5952 | False |
| Reading: bachelors degree vs Some college | 0.0281 | 0.2504 | 118 | Reading by Parental Education | welch_t_test | 0.843 | False | 1 | False | 0.0429 | True | 0.843 | False |
| Reading: associates degree vs masters degree | 0.0293 | 0.3209 | 59 | Reading by Parental Education | welch_t_test | 0.8497 | False | 1 | False | 0.044 | True | 0.8497 | False |
| Writing: associates degree vs bachelors degree | 0.0351 | 0.2411 | 118 | Writing by Parental Education | welch_t_test | 0.9828 | False | 1 | False | 0.0517 | False | 0.882 | False |
| Reading: Group D vs Group E | 0.045 | 0.2105 | 140 | Reading by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.0652 | False | 0.882 | False |
| Reading: Group B vs Group D | 0.0524 | 0.1854 | 190 | Reading by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.0746 | False | 0.882 | False |
| Writing: Group C vs Group D | 0.0593 | 0.1576 | 262 | Writing by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.083 | False | 0.882 | False |
| Reading: Some college vs Some high school | 0.0873 | 0.1715 | 179 | Reading by Parental Education | welch_t_test | 1 | False | 1 | False | 0.1202 | False | 0.882 | False |
| Math: Group A vs Group C | 0.1104 | 0.1918 | 89 | Math by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.148 | False | 0.882 | False |
| Writing: Group B vs Group C | 0.111 | 0.1463 | 190 | Writing by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.148 | False | 0.882 | False |
| Writing: high school vs Some high school | 0.1141 | 0.1638 | 179 | Writing by Parental Education | welch_t_test | 1 | False | 1 | False | 0.1498 | False | 0.882 | False |
| Writing: Group A vs Group B | 0.1448 | 0.1878 | 89 | Writing by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.1843 | False | 0.882 | False |
| Reading: high school vs Some high school | 0.1448 | 0.1511 | 179 | Reading by Parental Education | welch_t_test | 1 | False | 1 | False | 0.1843 | False | 0.882 | False |
| Math: bachelors degree vs Some college | 0.1715 | 0.1556 | 118 | Math by Parental Education | welch_t_test | 1 | False | 1 | False | 0.2148 | False | 0.882 | False |
| Reading: Group A vs Group B | 0.1739 | 0.1751 | 89 | Reading by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.2148 | False | 0.882 | False |
| Reading: Group B vs Group C | 0.1867 | 0.1212 | 190 | Reading by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.2273 | False | 0.882 | False |
| Reading: associates degree vs bachelors degree | 0.1952 | 0.1479 | 118 | Reading by Parental Education | welch_t_test | 1 | False | 1 | False | 0.2342 | False | 0.882 | False |
| Math: masters degree vs Some college | 0.2176 | 0.1806 | 59 | Math by Parental Education | welch_t_test | 1 | False | 1 | False | 0.2574 | False | 0.882 | False |
| Reading: associates degree vs Some college | 0.2666 | 0.1051 | 222 | Reading by Parental Education | welch_t_test | 1 | False | 1 | False | 0.311 | False | 0.882 | False |
| Reading: bachelors degree vs masters degree | 0.2933 | 0.1681 | 59 | Reading by Parental Education | welch_t_test | 1 | False | 1 | False | 0.3375 | False | 0.882 | False |
| Writing: bachelors degree vs masters degree | 0.3188 | 0.1594 | 59 | Writing by Parental Education | welch_t_test | 1 | False | 1 | False | 0.3619 | False | 0.882 | False |
| Math: Group A vs Group B | 0.3503 | 0.1202 | 89 | Math by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.3923 | False | 0.882 | False |
| Math: associates degree vs bachelors degree | 0.3802 | 0.1001 | 118 | Math by Parental Education | welch_t_test | 1 | False | 1 | False | 0.4202 | False | 0.882 | False |
| Math: high school vs Some high school | 0.3881 | 0.0893 | 179 | Math by Parental Education | welch_t_test | 1 | False | 1 | False | 0.4234 | False | 0.882 | False |
| Math: associates degree vs masters degree | 0.401 | 0.1232 | 59 | Math by Parental Education | welch_t_test | 1 | False | 1 | False | 0.4318 | False | 0.882 | False |
| Writing: Group D vs Group E | 0.4104 | 0.0863 | 140 | Writing by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.4364 | False | 0.882 | False |
| Reading: Group C vs Group D | 0.4258 | 0.0665 | 262 | Reading by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.4471 | False | 0.882 | False |
| Writing: associates degree vs Some college | 0.4467 | 0.072 | 222 | Writing by Parental Education | welch_t_test | 1 | False | 1 | False | 0.4632 | False | 0.882 | False |
| Math: Group B vs Group C | 0.4648 | 0.067 | 190 | Math by Race/Ethnicity | welch_t_test | 1 | False | 1 | False | 0.4761 | False | 0.882 | False |
| Math: associates degree vs Some college | 0.5876 | 0.0513 | 222 | Math by Parental Education | welch_t_test | 1 | False | 1 | False | 0.5947 | False | 0.882 | False |
| Math: bachelors degree vs masters degree | 0.882 | 0.0237 | 59 | Math by Parental Education | welch_t_test | 1 | False | 1 | False | 0.882 | False | 0.882 | False |
This section presents the complete numerical results from all 84 hypothesis tests with raw and adjusted p-values across four correction methods. It enables users to identify which findings remain statistically significant after controlling for multiple comparisons—the core objective of the analysis. By displaying all rejection decisions side-by-side, it facilitates comparison of method stringency and power.
The analysis reveals a clear hierarchy: Bonferroni (most conservative, 29 rejections) < Holm/Hochberg (moderate, 36 rejections) < BH (least conservative, 56
Technical details and method comparison
| method | rejections | controls | assumptions | power_rank | power_relative_to_bonferroni |
|---|---|---|---|---|---|
| holm | 36 | FWER | General (weak assumptions) | 3 | 1.241 |
| bonferroni | 29 | FWER | General (weak assumptions) | 4 | 1 |
| BH | 56 | FDR | Independence or positive dependence | 1 | 1.931 |
| hochberg | 36 | FWER | General (weak assumptions) | 2 | 1.241 |
This section compares four multiple comparison correction methods to demonstrate why Holm-Bonferroni was selected as the primary approach. It shows the trade-off between statistical power (ability to detect true effects) and error control (preventing false positives) across methods with different assumptions and objectives. Understanding these differences is critical for interpreting which of the 36 Holm-rejected hypotheses represent genuine findings versus false discoveries.
The 7-test gap between Holm (36) and Bonferroni (29) represents the power gain from assuming test dependence structure.