Jin Zhou | 周瑾 | Lecture 6

Today

Stepwise selection needs a single score to compare models of different sizes (add/drop one variable at a time).
Mallows’ Cp (regression-only):
- Built for Gaussian linear regression (OLS) with RSS and an estimate of $\sigma^2$.
- Not naturally defined outside this framework → rarely used beyond OLS
AIC (general):
- Works for any maximum-likelihood model
- Formula: AIC = -2 log L + 2d
- Tends to pick more complex models; often used when goal is prediction
- Used beyond regression: GLMs (logistic/Poisson), survival, time series, mixture models, HMMs, etc.
BIC (general):
- Works for any maximum-likelihood model
- Formula: BIC = -2 log L + d log n
- Penalizes complexity more strongly as n grows; often used when goal is simpler/”true” model
- Used beyond regression: mixture-model clustering, latent class, time series order selection, HMMs, etc.
Key takeaway:
- Cp depends on RSS/$\sigma^2$ → regression-specific
- AIC/BIC depend on likelihood → broadly applicable

Criterion	General form	Fit term	Complexity penalty	Depends on likelihood?	Works beyond linear regression?
Mallows’ Cp	RSS / $\hat\sigma^2$ − (n − 2p)	Residual sum of squares (RSS)	2p	No	No
AIC	−2 log L + 2d	−2 log-likelihood	2d	Yes	Yes
BIC	−2 log L + d log n	−2 log-likelihood	d log n	Yes	Yes