Today

  • Model selection and regularization

  • Nonlinear Models

Mallows’ Cp, AIC, and BIC

  • Stepwise selection needs a single score to compare models of different sizes (add/drop one variable at a time).
  • Mallows’ Cp (regression-only):
    • Built for Gaussian linear regression (OLS) with RSS and an estimate of $\sigma^2$.
    • Not naturally defined outside this framework → rarely used beyond OLS
  • AIC (general):
    • Works for any maximum-likelihood model
    • Formula: AIC = -2 log L + 2d
    • Tends to pick more complex models; often used when goal is prediction
    • Used beyond regression: GLMs (logistic/Poisson), survival, time series, mixture models, HMMs, etc.
  • BIC (general):
    • Works for any maximum-likelihood model
    • Formula: BIC = -2 log L + d log n
    • Penalizes complexity more strongly as n grows; often used when goal is simpler/”true” model
    • Used beyond regression: mixture-model clustering, latent class, time series order selection, HMMs, etc.
  • Key takeaway:
    • Cp depends on RSS/$\sigma^2$ → regression-specific
    • AIC/BIC depend on likelihood → broadly applicable
Criterion General form Fit term Complexity penalty Depends on likelihood? Works beyond linear regression?
Mallows’ Cp RSS / $\hat\sigma^2$ − (n − 2p) Residual sum of squares (RSS) 2p No No
AIC −2 log L + 2d −2 log-likelihood 2d Yes Yes
BIC −2 log L + d log n −2 log-likelihood d log n Yes Yes

FAQs and announcements

  • HW2 due date was extended to 2/11/26