library(tidyverse)
fit <- lm(sales ~ TV, data = )Biostat 212a Homework 1
Due Jan 25, 2026 @ 11:59PM
1 Filling gaps in lecture notes (10% pts)
Consider the regression model \[ Y = f(X) + \epsilon, \] where \(\operatorname{E}(\epsilon) = 0\).
1.1 Optimal regression function
Show that the choice \[ f_{\text{opt}}(X) = \operatorname{E}(Y | X) \] minimizes the mean squared prediction error \[ \operatorname{E}\{[Y - f(X)]^2\}, \] where the expectations averages over variations in both \(X\) and \(Y\). (Hint: condition on \(X\).)
1.2 Bias-variance trade-off
Given an estimate \(\hat f\) of \(f\), show that the test error at a \(x_0\) can be decomposed as \[ \operatorname{E}\{[y_0 - \hat f(x_0)]^2\} = \underbrace{\operatorname{Var}(\hat f(x_0)) + [\operatorname{Bias}(\hat f(x_0))]^2}_{\text{MSE of } \hat f(x_0) \text{ for estimating } f(x_0)} + \underbrace{\operatorname{Var}(\epsilon)}_{\text{irreducible}}, \] where the expectation averages over the variability in \(y_0\) and \(\hat f\).
2 ISL Exercise 2.4.3 (10% pts)
3 ISL Exercise 2.4.4 (10% pts)
4 ISL Exercise 2.4.10 (30% pts)
Your can read in the boston data set directly from url https://raw.githubusercontent.com/ucla-biostat-212a/2026winter/master/slides/data/Boston.csv. A documentation of the boston data set is here.
library(tidyverse)
Boston <- read_csv("https://raw.githubusercontent.com/ucla-biostat-212a/2026winter/master/slides/data/Boston.csv", col_select = -1) %>%
print(width = Inf)# A tibble: 506 × 13
crim zn indus chas nox rm age dis rad tax ptratio lstat
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.00632 18 2.31 0 0.538 6.58 65.2 4.09 1 296 15.3 4.98
2 0.0273 0 7.07 0 0.469 6.42 78.9 4.97 2 242 17.8 9.14
3 0.0273 0 7.07 0 0.469 7.18 61.1 4.97 2 242 17.8 4.03
4 0.0324 0 2.18 0 0.458 7.00 45.8 6.06 3 222 18.7 2.94
5 0.0690 0 2.18 0 0.458 7.15 54.2 6.06 3 222 18.7 5.33
6 0.0298 0 2.18 0 0.458 6.43 58.7 6.06 3 222 18.7 5.21
7 0.0883 12.5 7.87 0 0.524 6.01 66.6 5.56 5 311 15.2 12.4
8 0.145 12.5 7.87 0 0.524 6.17 96.1 5.95 5 311 15.2 19.2
9 0.211 12.5 7.87 0 0.524 5.63 100 6.08 5 311 15.2 29.9
10 0.170 12.5 7.87 0 0.524 6.00 85.9 6.59 5 311 15.2 17.1
medv
<dbl>
1 24
2 21.6
3 34.7
4 33.4
5 36.2
6 28.7
7 22.9
8 27.1
9 16.5
10 18.9
# ℹ 496 more rows
5 ISL Exercise 3.7.3 (20% pts)
6 ISL Exercise 3.7.15 (20% pts)
7 Bonus question (Extra credits)
For multiple linear regression, show that \(R^2\) is equal to the correlation between the response vector \(\mathbf{y} = (y_1, \ldots, y_n)^T\) and the fitted values \(\hat{\mathbf{y}} = (\hat y_1, \ldots, \hat y_n)^T\). That is \[ R^2 = 1 - \frac{\text{RSS}}{\text{TSS}} = [\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}})]^2. \]