Today
-
Model selection and regularization
-
Nonlinear Models
Mallows’ Cp, AIC, and BIC
- Stepwise selection needs a single score to compare models of different sizes (add/drop one variable at a time).
- Mallows’ Cp (regression-only):
- Built for Gaussian linear regression (OLS) with RSS and an estimate of $\sigma^2$.
- Not naturally defined outside this framework → rarely used beyond OLS
- AIC (general):
- Works for any maximum-likelihood model
- Formula: AIC = -2 log L + 2d
- Tends to pick more complex models; often used when goal is prediction
- Used beyond regression: GLMs (logistic/Poisson), survival, time series, mixture models, HMMs, etc.
- BIC (general):
- Works for any maximum-likelihood model
- Formula: BIC = -2 log L + d log n
- Penalizes complexity more strongly as n grows; often used when goal is simpler/”true” model
- Used beyond regression: mixture-model clustering, latent class, time series order selection, HMMs, etc.
- Key takeaway:
- Cp depends on RSS/$\sigma^2$ → regression-specific
- AIC/BIC depend on likelihood → broadly applicable
| Criterion | General form | Fit term | Complexity penalty | Depends on likelihood? | Works beyond linear regression? |
|---|---|---|---|---|---|
| Mallows’ Cp | RSS / $\hat\sigma^2$ − (n − 2p) | Residual sum of squares (RSS) | 2p | No | No |
| AIC | −2 log L + 2d | −2 log-likelihood | 2d | Yes | Yes |
| BIC | −2 log L + d log n | −2 log-likelihood | d log n | Yes | Yes |
FAQs and announcements
- HW2 due date was extended to 2/11/26