Resampling Methods (ISL 5)

Biostat 212A

Author

Dr. Jin Zhou @ UCLA

Published

February 3, 2026

Credit: This note heavily uses material from the books An Introduction to Statistical Learning: with Applications in R (ISL2) and Elements of Statistical Learning: Data Mining, Inference, and Prediction (ESL2).

Display system information for reproducibility.

sessionInfo()

R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.7.3

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.5.2    fastmap_1.2.0     cli_3.6.5        
 [5] tools_4.5.2       htmltools_0.5.8.1 rstudioapi_0.17.1 yaml_2.3.10      
 [9] rmarkdown_2.29    knitr_1.50        jsonlite_2.0.0    xfun_0.56        
[13] digest_0.6.37     rlang_1.1.6       evaluate_1.0.5

import IPython
print(IPython.sys_info())

{'commit_hash': '5ed988a91',
 'commit_source': 'installation',
 'default_encoding': 'utf-8',
 'ipython_path': '/Users/jinjinzhou/.virtualenvs/r-tensorflow/lib/python3.10/site-packages/IPython',
 'ipython_version': '8.33.0',
 'os_name': 'posix',
 'platform': 'macOS-15.7.3-arm64-arm-64bit',
 'sys_executable': '/Users/jinjinzhou/.virtualenvs/r-tensorflow/bin/python',
 'sys_platform': 'darwin',
 'sys_version': '3.10.16 (main, Mar  3 2025, 20:01:33) [Clang 16.0.0 '
                '(clang-1600.0.26.6)]'}

using InteractiveUtils

versioninfo()

1 Overview

In the section we discuss two resampling methods: cross-validation and the bootstrap.
These methods refit a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model.
For example, they provide estimates of test-set prediction error, and the standard deviation and bias of our parameter estimates.

2 Training error vs test error

Test error is the average error that results from using a learning method to predict the response on a new observation, one that was not used in training the method.
Training error can be easily calculated by applying the statistical learning method to the observations used in its training.
But the training error rate often is quite different from the test error rate, and in particular the former can dramatically underestimate the latter.
HW3 Bonus question rigorously justifies that the training error is an under-estimate of the test error.

Best solution: a large designated test set. Often not available.
Some methods make a mathematical adjustment to the training error rate in order to estimate the test error rate. These include the \(C_p\) statistic, AIC and BIC, which are discussed in Chapter 6.
Here we instead consider a class of methods that estimate the test error by holding out a subset of the training observations from the fitting process, and then applying the learning method to those held out observations.

3 Validation-set approach

Here we randomly divide the available set of samples into two parts: a training set and a validation or hold-out set.
The model is fit on the training set, and the fitted model is used to predict the responses for the observations in the validation set.
The resulting validation-set error provides an estimate of the test error. This is typically assessed using MSE in the case of a quantitative response and misclassification rate in the case of a qualitative (discrete) response.

Figure 1: A random splitting into two halves: left part is training set, right part is validation set.

Figure 2: The validation set approach was used on the Auto data set in order to estimate the test error that results from predicting mpg using polynomial functions of horsepower. Left: Validation error estimates for a single split into training and validation data sets. Right: The validation method was repeated ten times, each time using a different random split of the observations into a training set and a validation set. This illustrates the variability in the estimated test MSE that results from this approach.

Drawbacks of validation set approach
- The validation estimate of the test error can be highly variable, depending on precisely which observations are included in the training set and which observations are included in the validation set.
- In the validation approach, only a subset of the observations - those that are included in the training set rather than in the validation set - are used to fit the model.
- This suggests that the validation set error may tend to overestimate the test error for the model fit on the entire data set. (Why?)

4 \(K\)-fold cross-validation

Widely used approach for estimating test error.
Estimates can be used to select best model, and to give an idea of the test error of the final chosen model.
Idea is to randomly divide the data into \(K\) equal-sized parts. We leave out part \(k\), fit the model to the other \(K-1\) parts (combined), and then obtain predictions for the left-out \(k\)th part.
This is done in turn for each part \(k = 1, 2, \ldots, K\), and then the results are combined.

Figure 3: A schematic display of 5-fold CV. A set of \(n\) observations is randomly split into five non-overlapping groups. Each of these fifths acts as a validation set (shown in beige), and the remainder as a training set (shown in blue). The test error is estimated by averaging the five resulting MSE estimates.

Let the \(K\) parts be \(C_1, C_2, \ldots, C_K\), where \(C_k\) denotes the indices of the observations in part \(k\). There are \(n_k\) observations in part \(k\). If \(N\) is a multiple of \(K\), then \(n_k = n / K\).
Compute \[ \text{CV}_{(K)} = \sum_{k=1}^K \frac{n_k}{n} \text{MSE}_k, \] where \[ \text{MSE}_k = \frac{1}{n_k} \sum_{i \in C_k} (y_i - \hat y_i)^2, \] and \(\hat y_i\) is the fit for observation \(i\), obtained from the data with part \(k\) removed.

4.1 LOOCV

The special case \(K=n\) yields \(n\)-fold or leave-one out cross-validation (LOOCV).
With least-squares linear or polynomial regression, an amazing shortcut makes the cost of LOOCV the same as that of a single model fit! \[ \text{CV}_{(n)} = \frac{1}{n} \sum_{i=1}^n \left( \frac{y_i - \hat y_i}{1 - h_i} \right)^2, \] where \(\hat y_i\) is the \(i\)th fitted value from the original least squares fit, and \(h_i\) is the leverage (diagonal of the “hat” matrix, i.e., \(\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{–1}\mathbf{X}^T\)). This is like the ordinary MSE, except the \(i\)th residual is divided by \(1 - h_i\).
LOOCV sometimes useful, but typically doesn’t shake up the data enough. The estimates from each fold are highly correlated and hence their average can have high variance.
A better choice is \(K = 5\) or 10.

Figure 4: Cross-validation was used on the Auto data set in order to estimate the test error that results from predicting mpg using polynomial functions of horsepower. Left: The LOOCV error curve. Right: 10-fold CV was run nine separate times, each with a different random split of the data into ten parts. The figure shows the nine slightly different CV error curves.

Figure 5: Blue: true test MSE. Black dashed line: LOOCV estimate of test MSE. Orange: 10-fold CV estimate of test MSE. The crosses indicate the minimum of each of the MSE curves. Left data. Middle data. Right data.

5 Bias-variance tradeoff for cross-validation

Since each training set is only \((K - 1) / K\) as big as the original training set, the estimates of prediction error will typically be biased upward. (Why?)
This bias is minimized when \(K = n\) (LOOCV), but this estimate has high variance, as noted earlier.
\(K = 5\) or \(10\) provides a good compromise for this bias-variance trade-off.

6 Cross-validation for classification problems

We divide the data into \(K\) roughly equal-sized parts \(C_1, C_2, \ldots, C_K\). \(C_k\) denotes the indices of the observations in part \(k\). There are \(n_k\) observations in part \(k\). If \(n\) is a multiple of \(K\), then \(n_k = n / K\).
Compute \[ \text{CV}_k = \sum_{k=1}^K \frac{n_k}{n} \text{Err}_k, \] where \[ \text{Err}_k = \frac{1}{n_k} \sum_{i \in C_k} I(y_i \ne \hat y_i). \]
The estimated standard deviation of \(\text{CV}_k\) is \[ \hat{\text{SE}}(\text{CV}_k) = \sqrt{\frac{1}{K} \sum_{k=1}^K \frac{(\text{Err}_k - \bar{\text{Err}_k})^2}{K-1}}. \] This is a useful estimate, but strictly speaking, not quite valid.

7 Bootstrap

The bootstrap is a flexible and powerful statistical tool that can be used to quantify the uncertainty associated with a given estimator or learning method.
For example, it can provide an estimate of the standard error of a coefficient, or a confidence interval for that coefficient.
A simple example.
- Suppose that we wish to invest a fixed sum of money in two financial assets that yield returns of \(X\) and \(Y\), respectively, where \(X\) and \(Y\) are random quantities.
- We will invest a fraction \(\alpha\) of our money in \(X\), and will invest the remaining \(1 - \alpha\) in \(Y\).
- We wish to choose \(\alpha\) to minimize the total risk, or variance, of our investment. In other words, we want to minimize \(\operatorname{Var}(\alpha X + (1 - \alpha) Y)\).
- One can show that the value that minimizes the risk is given by \[ \alpha = \frac{\sigma_{Y}^2 - \sigma_{XY}}{\sigma_X^2 + \sigma_Y^2 - 2 \sigma_{XY}}, \] where \(\sigma_X^2 = \operatorname{Var}(X)\), \(\sigma_Y^2 = \operatorname{Var}(Y)\), and \(\sigma_{XY} = \operatorname{Cov}(X, Y)\).
- But the values of \(\sigma_X^2\), \(\sigma_Y^2\), and \(\sigma_{XY}\) are unknown.
- We can compute estimates for these quantities \(\hat{\sigma}_X^2\), \(\hat{\sigma}_Y^2\), and \(\hat{\sigma}_{XY}\), using a data set that contains measurements for \(X\) and \(Y\).
- We can then estimate the value of \(\alpha\) that minimizes the variance of our investment using \[ \hat{\alpha} = \frac{\hat{\sigma}_{Y}^2 - \hat{\sigma}_{XY}}{\hat{\sigma}_X^2 + \hat{\sigma}_Y^2 - 2 \hat{\sigma}_{XY}}. \]

Figure 6: Each panel displays 100 simulated returns for investments \(X\) and \(Y\). From left to right and top to bottom, the resulting estimates for \(\alpha\) are 0.576, 0.532, 0.657, and 0.651.

An unrealistic method:
- To estimate the standard deviation of \(\hat{\alpha}\), we repeated the process of simulating 100 paired observations of \(X\) and \(Y\), and estimating \(\alpha\) 1,000 times.
- We thereby obtained 1,000 estimates for \(\alpha\), which we can call \(\hat{\alpha}_1, \hat{\alpha}_2, \ldots, \hat{\alpha}_{1000}\).
- For these simulations the parameters were set to \(\sigma_{X}^2 = 1\), \(\sigma_Y^2 = 1.25\), and \(\sigma_{XY} = 0.5\), and so we know that the true value of \(\alpha\) is 0.6.
- The mean over 1,000 estimates for \(\alpha\) is \[ \bar{\alpha} = \frac{1}{1000} \sum_{r=1}^{1000} \hat{\alpha}_r = 0.5996, \] very close to \(\alpha = 0.6\), and the standard deviation of the estimates is \[ \sqrt{\frac{1}{1000-1} \sum_{r=1}^{1000} (\hat{\alpha}_r - \bar{\alpha})^2} = 0.083. \] This gives us a very good idea of the accuracy of \(\hat{\alpha}\): \(\text{SE}(\hat{\alpha}) \approx 0.083\). So roughly speaking, for a random sample from the population, we would expect \(\hat{\alpha}\) to differ from \(\alpha\) by approximately 0.08, on average.

Figure 7: A graphical illustration of the bootstrap approach on a small sample containing \(n = 3\) observations. Each bootstrap data set contains \(n\) observations, sampled with replacement from the original data set. Each bootstrap data set is used to obtain an estimate of \(\alpha\).

Bootstrap method.
- The procedure outlined above cannot be applied, because for real data we cannot generate new samples from the original population.
- However, the bootstrap approach allows us to use a computer to mimic the process of obtaining new data sets, so that we can estimate the variability of our estimate without generating additional samples.
- Rather than repeatedly obtaining independent data sets from the population, we instead obtain distinct data sets by repeatedly sampling observations from the original data set with replacement.
- Each of these “bootstrap data sets” is created by sampling with replacement, and is the same size as our original dataset. As a result some observations may appear more than once in a given bootstrap data set and some not at all.
- Denoting the first bootstrap data set by \(Z^{*1}\), we use \(Z^{*1}\) to produce a new bootstrap estimate for \(\alpha\), which we call \(\hat{\alpha}^{*1}\).
- This procedure is repeated \(B\) times for some large value of \(B\) (say 100 or 1000), in order to produce \(B\) different bootstrap data sets \(Z^{*1}, Z^{*2}, \ldots, Z^{*B}\), and \(B\) corresponding \(\alpha\) estimates, \(\hat{\alpha}^{*1}, \hat{\alpha}^{*2}, \ldots, \hat{\alpha}^{*B}\).
- We estimate the standard error of these bootstrap estimates using the formula \[ \operatorname{SE}_B(\hat{\alpha}) = \sqrt{\frac{1}{B-1} \sum_{r=1}^B (\hat{\alpha}^{*r} - \bar{\hat{\alpha}}^{*})^2}. \]
- This serves as an estimate of the standard error of \(\hat{\alpha}\) estimated from the original data set. For this example, \(\operatorname{SE}_B(\hat{\alpha}) = 0.087\).

Figure 8: Left: A histogram of the estimates of \(\alpha\) obtained by generating 1,000 simulated data sets from the true population. Center: A histogram of the estimates of \(\alpha\) obtained from 1,000 bootstrap samples from a single data set. Right: The estimates of \(\alpha\) displayed in the left and center panels are shown as boxplots. In each panel, the pink line indicates the true value of \(\alpha\).

8 Bootstrap in general

In more complex data situations, figuring out the appropriate way to generate bootstrap samples can require some thought.

For example, if the data is a time series (e.g., stock prices), we can’t simply sample the observations with replacement (why not?). We can instead create blocks of consecutive observations, and sample those with replacements. Then we paste together sampled blocks to obtain a bootstrap dataset.
Other uses of the bootstrap.
- Primarily used to obtain standard errors of an estimate.
- Also provides approximate confidence intervals for a population parameter. For example, looking at the histogram in the middle panel of histogram, the 5% and 95% quantiles of the 1000 values is (0.43, 0.72).
- This is called a Bootstrap Percentile confidence interval. It is the simplest method (among many approaches) for obtaining a confidence interval from the bootstrap.
Can the bootstrap estimate prediction error?
- In cross-validation, each of the \(K\) validation folds is distinct from the other \(K-1\) folds used for training: there is no overlap. This is crucial for its success.
- To estimate prediction error using the bootstrap, we could think about using each bootstrap dataset as our training sample, and the original sample as our validation sample.
- But each bootstrap sample has significant overlap with the original data. About two-thirds of the original data points appear in each bootstrap sample.
- This will cause the bootstrap to seriously underestimate the true prediction error.
- The other way around - with original sample = training sample, bootstrap dataset = validation sample - is worse!
- Can partly fix this problem by only using predictions for those observations that did not (by chance) occur in the current bootstrap sample.
- But the method gets complicated, and in the end, cross-validation provides a simpler, more attractive approach for estimating prediction error.

9 Lab

9.1 The validation set approach

Goal: use of the validation set approach to estimate the test error rates that from fitting linear models on the Auto data set.
We begin by using the set.seed() function in order to set a seed for R’s random number generator, so that the reader of this book will obtain the same results as those shown here.
We later (during the weekend) will talk about pipelines that we do all of these automatically.

library(ISLR2)
library(tidyverse)
set.seed(1)
# sample() function is used to randomly sample the integers from 1 to 392, without replacement
train <- sample(392, 196)
(train <- sort(train))

  [1]   1   9  13  14  15  16  17  19  20  22  24  25  29  30  31  33  36  37
 [19]  39  40  41  42  43  44  45  48  49  50  51  53  61  64  65  70  72  75
 [37]  77  78  79  80  84  85  86  88  89  92  93  98 102 103 104 105 106 107
 [55] 108 110 111 116 117 118 121 122 124 126 127 129 130 133 135 138 140 141
 [73] 143 145 149 152 157 159 160 163 164 165 167 170 172 174 176 181 185 187
 [91] 193 194 195 198 201 204 206 207 212 213 214 217 218 219 220 221 223 224
[109] 225 228 229 230 233 234 237 238 239 240 242 244 246 247 248 252 255 258
[127] 263 270 271 272 277 279 280 281 282 284 285 287 289 290 294 295 296 298
[145] 299 304 306 307 309 310 313 315 318 319 323 324 325 326 328 329 330 331
[163] 337 340 341 342 343 344 347 348 349 350 355 358 362 363 364 366 367 368
[181] 369 372 373 374 375 376 378 382 383 384 386 388 389 390 391 392

Use the subset option in lm() to fit a linear regression using only the observations corresponding to the training set.

ggplot(Auto, aes(x = horsepower, y = mpg)) + 
  geom_point() +
  geom_smooth()

lm.fit <- lm(mpg ~ horsepower, data = Auto, subset = train)

Use the predict() function Compute the mean squared error of the model on the validation set.

mean((Auto$mpg[-train] - predict(lm.fit, Auto[-train, ]))^2)

[1] 23.26601

#Auto %>% select(mpg, horsepower) %>% slice(-train) %>% 
#  mutate(pred = predict(lm.fit, Auto[-train,])) # %>% 
#  ggplot(aes(x = mpg, y = pred)) + 
#  geom_point() + 
#  geom_smooth(method = "lm") + 
  #geom_abline(intercept = 0, slope = 1, color = "red") + 
#  labs(title = "Validation set approach", x = "Observed mpg", y = "Predicted mpg")

Therefore, the estimated test MSE for the linear regression fit is 23.27.

We can use the poly() function to estimate the test error for the quadratic and cubic regressions.

lm.fit2 <- lm(mpg ~ poly(horsepower, 2), data = Auto, subset = train)
mean((Auto$mpg - predict(lm.fit2, Auto))[-train]^2)

[1] 18.71646

#Auto %>% select(mpg, horsepower) %>% slice(-train) %>% 
#  mutate(pred = predict(lm.fit2, Auto[-train,]))

lm.fit3 <- lm(mpg ~ poly(horsepower, 3), data = Auto, subset = train)
mean((Auto$mpg - predict(lm.fit3, Auto))[-train]^2)

[1] 18.79401

The estimated test MSE for the quadratic and cubic regressions are 18.716 and 18.794, respectively.
These results are not surprising, given that the relationship between mpg and horsepower appears to be fairly linear.
If we choose a different training set instead, then we will obtain somewhat different errors on the validation set.

set.seed(2)
train <- sample(392, 196)
lm.fit <- lm(mpg ~ horsepower, data = Auto, subset = train)
mean((Auto$mpg[-train] - predict(lm.fit, Auto[-train,]))^2)

[1] 25.72651

lm.fit2 <- lm(mpg ~ poly(horsepower, 2), data = Auto, subset = train)
mean((Auto$mpg - predict(lm.fit2, Auto))[-train]^2)

[1] 20.43036

lm.fit3 <- lm(mpg ~ poly(horsepower, 3), data = Auto, subset = train)
mean((Auto$mpg - predict(lm.fit3, Auto))[-train]^2)

[1] 20.38533

These results are consistent with our previous findings:
- a model that predicts mpg using a quadratic function of horsepower performs better than a model that involves only a linear function of horsepower,
- there is little evidence in favor of a model that uses a cubic function of horsepower.

9.2 Leave-One-Out Cross-Validation

The LOOCV estimate can be automatically computed for any generalized linear model using the glm() and cv.glm() functions.
In the lab, we use the glm() function to perform logistic regression by passing in the family="binomial" argument.
If we leave it blank, it will perform linear regression.
We use the cv.glm() function in boot library to perform LOOCV for the previous example

library(boot)
glm.fit <- glm(mpg ~ horsepower, data = Auto)
cv.err <- cv.glm(Auto, glm.fit)
cv.err$delta

[1] 24.23151 24.23114

The cv.glm() function produces a list with several components. The delta component contains LOOCV statistic given in (5.1).
We can repeat this procedure for increasingly complex polynomial fits.

cv.error <- rep(0, 10)
for (i in 1:10) {
 glm.fit <- glm(mpg ~ poly(horsepower, i), data = Auto)
 cv.error[i] <- cv.glm(Auto, glm.fit)$delta[1]
}
cv.error

 [1] 24.23151 19.24821 19.33498 19.42443 19.03321 18.97864 18.83305 18.96115
 [9] 19.06863 19.49093

plot(cv.error, type = "b")

9.3 k-Fold Cross-Validation

The cv.glm() function can also be used to perform k-fold CV.
Below, we use k=10 to perform 10-fold CV.

set.seed(17)
cv.error.10 <- rep(0, 10)
for (i in 1:10) {
 glm.fit <- glm(mpg ~ poly(horsepower, i), data = Auto)
 cv.error.10[i] <- cv.glm(Auto, glm.fit, K = 10)$delta[1]
}
cv.error.10

 [1] 24.27207 19.26909 19.34805 19.29496 19.03198 18.89781 19.12061 19.14666
 [9] 18.87013 20.95520

plot(cv.error.10, type = "b")

We see that the computation time is much shorter than for LOOCV.
Due to the availability of the formula (5.2) for LOOCV; however, unfortunately the cv.glm() function does not make use of this formula.
Two numbers associated with delta are the same when LOOCV is performed. When we instead perform k-fold CV, then the two numbers associated with delta differ slightly. The first is the standard k-fold CV estimate, as in equation (5.3). The second is a bias-corrected version.

9.4 The Bootstrap

Performing a bootstrap analysis in R entails only two steps:
- First, we must create a function that computes the statistic of interest.
- Second, we use the boot() function to perform the bootstrap by repeatedly sampling observations from the data set with replacement.
The Portfolio data set in the ISLR2 package is simulated data of 100 pairs of returns, https://rdrr.io/cran/ISLR/man/Portfolio.html
Remember we would like to estimate the optimal fraction in each asset to invest in order to minimize the risk, the standard error of the combined portfolio.

alpha.fn <- function(data, index) {
  X <- data$X[index]
  Y <- data$Y[index] 
  alpha = (var(Y) - cov(X, Y)) / (var(X) + var(Y) - 2 * cov(X, Y)) 
  return(alpha)
}

median.fn <- function(data, index) {
  X <- data$medv[index]
  medhat = median(X) 
  return(medhat)
}

For instance, the following command tells R to use the alpha.fn() function to compute the bootstrap estimate for alpha using all 100 bootstrap samples.

alpha.fn(Portfolio, 1:100)

[1] 0.5758321

Now we use sample() function to randomly select 100 observations from the range 1 to 100, with replacement. This is equivalent to constructing a new bootstrap data set.

set.seed(7)
alpha.fn(Portfolio, sample(100, 100, replace = T))

[1] 0.5385326

(sort(sample(100, 100, replace = T)))

  [1]   1   1   2   2   4   8  10  10  10  11  13  14  14  18  18  21  21  21
 [19]  23  25  25  28  28  28  31  32  34  34  34  35  35  37  37  38  39  40
 [37]  43  43  43  43  43  44  45  50  50  52  54  54  57  58  58  60  61  61
 [55]  65  65  66  68  70  71  72  72  73  75  75  76  77  78  79  79  79  79
 [73]  81  81  82  82  83  86  86  87  88  88  88  88  90  92  92  93  94  94
 [91]  95  95  95  95  95  96  98  99  99 100

A bootstrap analysis: perform this command many times, record all of the corresponding estimates for \(\alpha\), and computing the resulting standard deviation.

boot(Portfolio, alpha.fn, R = 1000)


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Portfolio, statistic = alpha.fn, R = 1000)


Bootstrap Statistics :
     original       bias    std. error
t1* 0.5758321 0.0009893252  0.08951762

The final output shows that using the original data, \(\hat \alpha = 0.5758\), and that the bootstrap estimate for SE(\(\hat \alpha\)) is 0.0897.
Essentially you can code a bootsrap analysis yourself to compute the standard error of \(\hat \alpha\)

B = 1000
boot.alpha2 <- replicate(B, alpha.fn(Portfolio, sample(100, 100, replace = T)))
sqrt(sum((boot.alpha2 - mean(boot.alpha2))^2)/(B - 1))

[1] 0.08894791

9.5 Estimating the Accuracy of a Linear Regression Model

Let’s use the bootstrap approach in order to assess the variability of the estimates for \(\beta_0\) and \(\beta_1\), the intercept and slope terms for the linear regression model that uses horsepower to predict mpg in the Auto data set.

boot.fn <- function(data, index) {
  return(coef(lm(mpg ~ horsepower, data = data, subset = index)))
}
boot.fn(Auto, 1:392)

(Intercept)  horsepower 
 39.9358610  -0.1578447

The boot.fn() function can also be used in order to create bootstrap estimates for the intercept and slope terms by randomly sampling from among the observations with replacement.

set.seed(1)
boot.fn(Auto, sample(392, 392, replace = T))

(Intercept)  horsepower 
 40.3404517  -0.1634868

We can use the boot() function to compute the standard errors of 1,000 bootstrap estimates for the intercept and slope terms.

boot(Auto, boot.fn, 1000)


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Auto, statistic = boot.fn, R = 1000)


Bootstrap Statistics :
      original        bias    std. error
t1* 39.9358610  0.0549915227 0.841925746
t2* -0.1578447 -0.0006210818 0.007348956

Compare with the standard errors obtained using the summary() function from linear regression:

summary(lm(mpg ~ horsepower, data = Auto))$coef

              Estimate  Std. Error   t value      Pr(>|t|)
(Intercept) 39.9358610 0.717498656  55.65984 1.220362e-187
horsepower  -0.1578447 0.006445501 -24.48914  7.031989e-81

The bootstrap standard errors for the intercept and slope are 0.842 and 0.007, respectively.
The standard errors for the intercept and slope terms obtained from linear regression are 0.717 and 0.006, respectively.
The bootstrap approach does not rely on any of the following assumptions, and so it is likely giving a more accurate estimate of the standard errors.
- The estimates of \(\sigma^2\) rely on the assumption that the true relationship between mpg and horsepower is linear.
- Standard formulas assume that the \(x_i\) are fixed, and all the variability comes from the variation in the errors \(\epsilon_i\)
- The bootstrap approach is more general, and hence is applicable in a wider range of settings.
Quadratic example:

boot.fn2 <- function(data, index) {
  return(coef(lm(mpg ~ horsepower + I(horsepower^2), data = data, subset = index)))
}
set.seed(1)
boot(Auto, boot.fn2, 1000)


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Auto, statistic = boot.fn2, R = 1000)


Bootstrap Statistics :
        original        bias     std. error
t1* 56.900099702  3.511640e-02 2.0300222526
t2* -0.466189630 -7.080834e-04 0.0324241984
t3*  0.001230536  2.840324e-06 0.0001172164

summary(lm(mpg ~ horsepower + I(horsepower^2), data = Auto))$coef

                    Estimate   Std. Error   t value      Pr(>|t|)
(Intercept)     56.900099702 1.8004268063  31.60367 1.740911e-109
horsepower      -0.466189630 0.0311246171 -14.97816  2.289429e-40
I(horsepower^2)  0.001230536 0.0001220759  10.08009  2.196340e-21