CLASSIFICATION

Logistic Classification

Binary classification using GLM with binomial family. Fixed 0.5 threshold, manual ROC/AUC calculation, Hosmer-Lemeshow test, and comprehensive residual diagnostics.

What Makes This Interpretable

Comprehensive Metrics

Coefficients, standard errors, z-values, p-values, odds ratios. Manual AUC calculation using trapezoidal rule. Pseudo R² (1 - deviance/null deviance).

Model Diagnostics

Hosmer-Lemeshow goodness-of-fit test with decile grouping. Four residual types: deviance, Pearson, standardized, plus leverage and Cook's distance.

ROC & Calibration

Manual ROC curve construction with TPR/FPR calculation. Calibration plot with 10 probability bins. Fixed 0.5 classification threshold.

What You Need to Provide

Binary classification dataset

Provide features array and binary target column. Target automatically converted to factor if needed. No train/test split - all metrics on full dataset.

Algorithm uses GLM with binomial family and logit link. Calculates confusion matrix at 0.5 threshold, manually computes ROC curve and AUC, performs Hosmer-Lemeshow test, generates comprehensive residual diagnostics.

Tabular Schema / features + binary target

Quick Specs

ModelGLM binomial/logit
ThresholdFixed at 0.5
MetricsAUC, F1, accuracy, precision
DiagnosticsH-L test, residuals

How We Model

From preprocessing to calibrated probabilities

1

Model Fitting

Convert target to factor, build formula string, fit GLM with binomial family and logit link. No regularization or parameter tuning.

2

ROC & Metrics

Manual ROC construction: sort by probabilities, calculate TPR/FPR at each threshold. AUC via trapezoidal rule. Confusion matrix at 0.5 threshold.

3

Diagnostics

Hosmer-Lemeshow test with probability deciles. Calculate deviance, Pearson, standardized residuals, leverage, Cook's distance. Calibration plot with 10 bins.

Why This Analysis Matters

Standard GLM logistic regression providing interpretable coefficients, odds ratios, and comprehensive diagnostic metrics for binary classification.

Manual implementation of ROC curve and AUC calculation ensures transparency. Hosmer-Lemeshow test validates model fit. Multiple residual types help identify influential observations. Fixed 0.5 threshold keeps interpretation simple.

Note: No regularization, cross-validation, or train/test split. All metrics calculated on full dataset. Threshold fixed at 0.5. NaN values in precision/recall/F1 handled by setting to 0.

Ready to Classify?

Get interpretable scores with calibrated probabilities

Read the article: Logistic Classification