Binary classification using GLM with binomial family. Fixed 0.5 threshold, manual ROC/AUC calculation, Hosmer-Lemeshow test, and comprehensive residual diagnostics.
Coefficients, standard errors, z-values, p-values, odds ratios. Manual AUC calculation using trapezoidal rule. Pseudo R² (1 - deviance/null deviance).
Hosmer-Lemeshow goodness-of-fit test with decile grouping. Four residual types: deviance, Pearson, standardized, plus leverage and Cook's distance.
Manual ROC curve construction with TPR/FPR calculation. Calibration plot with 10 probability bins. Fixed 0.5 classification threshold.
Provide features array and binary target column. Target automatically converted to factor if needed. No train/test split - all metrics on full dataset.
Algorithm uses GLM with binomial family and logit link. Calculates confusion matrix at 0.5 threshold, manually computes ROC curve and AUC, performs Hosmer-Lemeshow test, generates comprehensive residual diagnostics.
From preprocessing to calibrated probabilities
Convert target to factor, build formula string, fit GLM with binomial family and logit link. No regularization or parameter tuning.
Manual ROC construction: sort by probabilities, calculate TPR/FPR at each threshold. AUC via trapezoidal rule. Confusion matrix at 0.5 threshold.
Hosmer-Lemeshow test with probability deciles. Calculate deviance, Pearson, standardized residuals, leverage, Cook's distance. Calibration plot with 10 bins.
Standard GLM logistic regression providing interpretable coefficients, odds ratios, and comprehensive diagnostic metrics for binary classification.
Manual implementation of ROC curve and AUC calculation ensures transparency. Hosmer-Lemeshow test validates model fit. Multiple residual types help identify influential observations. Fixed 0.5 threshold keeps interpretation simple.
Note: No regularization, cross-validation, or train/test split. All metrics calculated on full dataset. Threshold fixed at 0.5. NaN values in precision/recall/F1 handled by setting to 0.
Get interpretable scores with calibrated probabilities
Read the article: Logistic Classification