Consider two curves, \(\widehat{g}_{1}\) and \(\widehat{g}_{2}\), defined by \[\begin{align*} \widehat{g}_{1} & =\arg\min_{g}\left(\sum_{i=1}^{n}\left(y_{i}-g\left(x_{i}\right)\right)^{2}+\lambda\int\left[g^{(3)}(x)\right]^{2}dx\right)\\ \widehat{g}_{2} & =\arg\min_{g}\left(\sum_{i=1}^{n}\left(y_{i}-g\left(x_{i}\right)\right)^{2}+\lambda\int\left[g^{(4)}(x)\right]^{2}dx\right) \end{align*}\] where \(g^{(m)}\) represents the \(m\) th derivative of \(g\).
(a) \(\mathbf{Solution.}\qquad\) Since \(\hat{g}_{2}\) has a higher order polynomial in its penalty term, this makes it a more flexible model than \(\hat{g}_{1}\). Thus, as \(\lambda \rightarrow \infty\), we expect \(\hat{g}_{2}\) will have the smaller training error.
 Â
(b) \(\mathbf{Solution.}\qquad\) As \(\lambda \rightarrow \infty,\) we expect \(\hat{g}_{1}\) will have the smaller test error, because \(\hat{g}_{2}\) is a more flexible model and is likely to overfit the data.
 Â
(c) \(\mathbf{Solution.}\qquad\) For \(\lambda=0,\) the training and test errors will be the same, because the penalty term will vanish.
Consider the Ozone
data set.
(a) \(\mathbf{Solution.}\qquad\) Fitting a linear model to the training data, using the cube root of ozone concentration as the response and temperature, wind speed, and radiation as the predictors, we show summary plot below. We see that the regression coefficients of the predictors are significant at least at the 1% level. The adjusted R-squared is also high at 0.6876 which shows a reasonable fit.
# QUESTION 1a ------------------------------------------------------------------
ozone = read.delim("./Data/ozone_data.txt", header = TRUE)
set.seed(503)
train = sample(nrow(ozone), floor(0.7 * nrow(ozone)))
## Creating a cube root of ozone concentration variable
ozone$cuberoot = (ozone$ozone)^(1 / 3)
ozone_lm = lm(ozone^(1/3) ~ radiation + temperature + wind,
data = ozone[train, ])
summary(ozone_lm)
##
## Call:
## lm(formula = ozone^(1/3) ~ radiation + temperature + wind, data = ozone[train,
## ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.14281 -0.35754 -0.02483 0.34558 1.51792
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6901796 0.6778639 -1.018 0.311961
## radiation 0.0022555 0.0007117 3.169 0.002235 **
## temperature 0.0540850 0.0076927 7.031 9.14e-10 ***
## wind -0.0669379 0.0177700 -3.767 0.000332 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5222 on 73 degrees of freedom
## Multiple R-squared: 0.6999, Adjusted R-squared: 0.6876
## F-statistic: 56.75 on 3 and 73 DF, p-value: < 2.2e-16
 Â
(b) \(\mathbf{Solution.}\qquad\) Next we fit a GAM on the training data using smoothing splines on each of the predictors. First, we need to decide on the degrees of freedom for each spline. We fit a range of degrees of freedom and choose the combination which provides the model with lowest AIC.
# QUESTION 1b ---------------
library(gam)
# Grid for alpha, beta
a = seq(from = 2, to = 5, by = 0.5)
b = seq(from = 2, to = 21, by = 0.5)
c = seq(from = 2, to = 5, by = 0.5)
cA = rep(a, each = length(b) * length(c))
cB = rep(rep(b, each = length(a)), length(c))
cC = rep(c, length(a) * length(b))
# Function to compute log p(a,b | y)
gamfun = function(a, b, c) {
ozone_gam = gam(ozone^(1/3) ~ s(radiation, a) + s(temperature, b) +
s(wind, c), data = ozone[train, ])
c(a, b, c, ozone_gam$aic)
}
# For each combination of alpha, beta, compute log p(a,b | y)
lp = mapply(gamfun, cA, cB, cC)
j_min = which.min(lp[4,])
lp[,j_min]
## [1] 2.50 19.50 4.00 105.35
We find that the best combination of dfs to use for radiation, temperature, and wind speed is 2.5, 19.5, 4 respectively, with an AIC of 105.3500151. The degrees of freedom we estimated for temperature seems quite high, so instead we reduce this number to 5 in order to have a smoother curve. Next we plot our GAM below.
ozone_gam = gam(ozone^(1/3) ~ s(radiation, 2.5) + s(temperature, 5) +
s(wind, 4), data = ozone[train, ])
summary(ozone_gam)
##
## Call: gam(formula = ozone^(1/3) ~ s(radiation, 2.5) + s(temperature,
## 5) + s(wind, 4), data = ozone[train, ])
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.17027 -0.31950 -0.01885 0.26702 1.19098
##
## (Dispersion Parameter for gaussian family taken to be 0.2164)
##
## Null Deviance: 66.3461 on 76 degrees of freedom
## Residual Deviance: 13.9554 on 64.5 degrees of freedom
## AIC: 114.0055
##
## Number of Local Scoring Iterations: 2
##
## Anova for Parametric Effects
## Df Sum Sq Mean Sq F value Pr(>F)
## s(radiation, 2.5) 1.0 14.0823 14.0823 65.086 2.328e-11 ***
## s(temperature, 5) 1.0 23.5501 23.5501 108.845 1.763e-15 ***
## s(wind, 4) 1.0 3.3259 3.3259 15.372 0.0002164 ***
## Residuals 64.5 13.9554 0.2164
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Anova for Nonparametric Effects
## Npar Df Npar F Pr(F)
## (Intercept)
## s(radiation, 2.5) 1.5 3.6291 0.044277 *
## s(temperature, 5) 4.0 2.7653 0.034726 *
## s(wind, 4) 3.0 4.7523 0.004684 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We notice that the splines we fitted for each predictor seem nonlinear.
 Â
(c) \(\mathbf{Solution.}\qquad\) We compute the test errors for the linear model and GAM.
# QUESTION 1c ---------------
test_pred_lm = predict(ozone_lm, ozone[-train, ])
test_err_lm = mean((test_pred_lm - ozone[-train, c("ozone")]^(1/3))^2)
test_pred_gam = predict(ozone_gam, ozone[-train, ])
test_err_gam = mean((test_pred_gam - ozone[-train, c("ozone")]^(1/3))^2)
## Test errors
c(test_err_lm, test_err_gam)
## [1] 0.2393778 0.1992589
We find that the test error of the linear model 0.239 is larger than the test error of our GAM 0.199.
 Â
(d) \(\mathbf{Solution.}\qquad\) We show a scatter plot of the predictors and the response in the plot below. We see that radiation might have a nonlinear relationship, since it seems to increase until 200 and then begins to decrease.
# QUESTION 1d ---------------
par(mfrow = c(1, 3))
predictors <- matrix(c(ozone[train, ]$radiation, ozone[train, ]$temperature,
ozone[train, ]$wind), ncol = 3)
pred_names = c("Radiation", "Temperature", "Wind")
for(i in 1:3) {
plot(predictors[, i], ozone[train, "ozone"]^(1/3), col = "steelblue",
xlab = pred_names[i], ylab = "Cube root of ozone concentration")
if (i == 2){
title("Scatter Plot of Predictors vs Response")
}
}