Question 1

Consider two curves, \(\widehat{g}_{1}\) and \(\widehat{g}_{2}\), defined by \[\begin{align*} \widehat{g}_{1} & =\arg\min_{g}\left(\sum_{i=1}^{n}\left(y_{i}-g\left(x_{i}\right)\right)^{2}+\lambda\int\left[g^{(3)}(x)\right]^{2}dx\right)\\ \widehat{g}_{2} & =\arg\min_{g}\left(\sum_{i=1}^{n}\left(y_{i}-g\left(x_{i}\right)\right)^{2}+\lambda\int\left[g^{(4)}(x)\right]^{2}dx\right) \end{align*}\] where \(g^{(m)}\) represents the \(m\) th derivative of \(g\).

  1. As \(\lambda \rightarrow \infty,\) will \(\widehat{g}_{1}\) or \(\widehat{g}_{2}\) have the smaller training error?
  2. As \(\lambda \rightarrow \infty,\) will \(\widehat{g}_{1}\) or \(\widehat{g}_{2}\) have the smaller test error?
  3. For \(\lambda=0,\) will \(\widehat{g}_{1}\) or \(\widehat{g}_{2}\) have the smaller training and test errors?

(a) \(\mathbf{Solution.}\qquad\) Since \(\hat{g}_{2}\) has a higher order polynomial in its penalty term, this makes it a more flexible model than \(\hat{g}_{1}\). Thus, as \(\lambda \rightarrow \infty\), we expect \(\hat{g}_{2}\) will have the smaller training error.

   

(b) \(\mathbf{Solution.}\qquad\) As \(\lambda \rightarrow \infty,\) we expect \(\hat{g}_{1}\) will have the smaller test error, because \(\hat{g}_{2}\) is a more flexible model and is likely to overfit the data.

   

(c) \(\mathbf{Solution.}\qquad\) For \(\lambda=0,\) the training and test errors will be the same, because the penalty term will vanish.

Question 2

Consider the Ozone data set.

  1. Split the data into a training set \((70 \%)\) and a test set \((30 \%) .\) Fit a linear model on the training data, using the cube root of ozone concentration as the response and temperature, wind speed, and radiation as the predictors. Report the results.
  2. Fit a GAM on the training data, using the cube root of ozone concentration as the response and temperature, wind speed, and radiation as the predictors. Plot the results, and explain your findings.
  3. Evaluate both models on the test set, and explain the results.
  4. For which variables, if any, is there evidence of a non-linear relationship with the response?

(a) \(\mathbf{Solution.}\qquad\) Fitting a linear model to the training data, using the cube root of ozone concentration as the response and temperature, wind speed, and radiation as the predictors, we show summary plot below. We see that the regression coefficients of the predictors are significant at least at the 1% level. The adjusted R-squared is also high at 0.6876 which shows a reasonable fit.

## 
## Call:
## lm(formula = ozone^(1/3) ~ radiation + temperature + wind, data = ozone[train, 
##     ])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.14281 -0.35754 -0.02483  0.34558  1.51792 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.6901796  0.6778639  -1.018 0.311961    
## radiation    0.0022555  0.0007117   3.169 0.002235 ** 
## temperature  0.0540850  0.0076927   7.031 9.14e-10 ***
## wind        -0.0669379  0.0177700  -3.767 0.000332 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5222 on 73 degrees of freedom
## Multiple R-squared:  0.6999, Adjusted R-squared:  0.6876 
## F-statistic: 56.75 on 3 and 73 DF,  p-value: < 2.2e-16

   

(b) \(\mathbf{Solution.}\qquad\) Next we fit a GAM on the training data using smoothing splines on each of the predictors. First, we need to decide on the degrees of freedom for each spline. We fit a range of degrees of freedom and choose the combination which provides the model with lowest AIC.

## [1]   2.50  19.50   4.00 105.35

We find that the best combination of dfs to use for radiation, temperature, and wind speed is 2.5, 19.5, 4 respectively, with an AIC of 105.3500151. The degrees of freedom we estimated for temperature seems quite high, so instead we reduce this number to 5 in order to have a smoother curve. Next we plot our GAM below.

## 
## Call: gam(formula = ozone^(1/3) ~ s(radiation, 2.5) + s(temperature, 
##     5) + s(wind, 4), data = ozone[train, ])
## Deviance Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.17027 -0.31950 -0.01885  0.26702  1.19098 
## 
## (Dispersion Parameter for gaussian family taken to be 0.2164)
## 
##     Null Deviance: 66.3461 on 76 degrees of freedom
## Residual Deviance: 13.9554 on 64.5 degrees of freedom
## AIC: 114.0055 
## 
## Number of Local Scoring Iterations: 2 
## 
## Anova for Parametric Effects
##                     Df  Sum Sq Mean Sq F value    Pr(>F)    
## s(radiation, 2.5)  1.0 14.0823 14.0823  65.086 2.328e-11 ***
## s(temperature, 5)  1.0 23.5501 23.5501 108.845 1.763e-15 ***
## s(wind, 4)         1.0  3.3259  3.3259  15.372 0.0002164 ***
## Residuals         64.5 13.9554  0.2164                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Anova for Nonparametric Effects
##                   Npar Df Npar F    Pr(F)   
## (Intercept)                                 
## s(radiation, 2.5)     1.5 3.6291 0.044277 * 
## s(temperature, 5)     4.0 2.7653 0.034726 * 
## s(wind, 4)            3.0 4.7523 0.004684 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We notice that the splines we fitted for each predictor seem nonlinear.

   

(c) \(\mathbf{Solution.}\qquad\) We compute the test errors for the linear model and GAM.

## [1] 0.2393778 0.1992589

We find that the test error of the linear model 0.239 is larger than the test error of our GAM 0.199.

   

(d) \(\mathbf{Solution.}\qquad\) We show a scatter plot of the predictors and the response in the plot below. We see that radiation might have a nonlinear relationship, since it seems to increase until 200 and then begins to decrease.