Problem Set 2

Question 1

Use Stata to estimate the following national totals for residential energy consumption:

Electricity usage in kilowatt hours
Natural gas usage, in hundreds of cubic feet
Propane usage, in gallons
Fuel oil or kerosene usage, in gallons

In your analysis, be sure to properly weight the individual observations. Use the replicate weights to compute standard errors. At the end of your .do file, write the estimates and standard errors to a delimited file recs2015_usage.csv.

In your .Rmd read recs2015_usage.csv and produce a nicely formatted table with estimates and 95% confidence intervals.

Solution. Below we show estimates of national totals for residential energy consumption using Stata:

Electricity usage in kilowatt hours
Natural gas usage, in hundreds of cubic feet
Propane usage, in gallons
Fuel oil or kerosene usage, in gallons

**National Totals for residential energy consumption 2015** Results expressed in (Billions)
	Totals	BRR Std Err.	95% Low	95% Upper
KWH	1267.235	13.697	1240.043	1294.427
Natural Gas	39.629	1.030	37.584	41.674
Propane	3.952	0.492	2.976	4.928
Fuel/Kerosene	3.381	0.275	2.835	3.927

Question 2/3

For this question you should use the 2005-2006 NHANES ORAL Health data available here and the demographic data available here. Your analyses for this question should be done in Stata, though you may create plots and format tables using R within Rmarkdown.

For part (b-d), you can ignore the survey aspect of the data and analyze it as if the data were a simple random sample.

Determine how to read both data sets into Stata and merge them together by the participant id SEQN.
Use logistic regression to estimate the relationship between age (in months) and the probability that an individual has a primary rather than a missing or permanent upper right 2nd bicuspid. You can recode permanent root fragments as permanent and drop individuals for whom this tooth was not assessed. Use the fitted model to estimate the ages at which 25, 50, and 75% of individuals lose their primary upper right 2nd bicuspid. Round these to the nearest month. Choose a range of representative age values with one year increments by taking the floor (in years) of the 25%-ile and the ceiling (in years) of the 75%-ile.
In the regression above, control for demographics in the following way:

Add gender to the model and retain it if it improves the BIC.
Create indicators for each race/ethnicity category using the largest as the reference and collapsing ‘Other Hispanic’ and ‘Other’. In order of group size in the sample, add each category retaining those that improve BIC.
Add poverty income ratio to the model and retain it if it improves BIC.

In your pdf document, include a nicely formatted regression table for the final model and an explanation of the model fitting process.

Use the margins command to compute:

Adjusted predctions at the mean (for other values) at each of the representative ages determined in part b.
The marginal effects at the mean of any retained categorical variables at the same representative ages.
The average marginal effect of any retained categorical varialbes at the representative ages.

Refit your final model from part c using svy and comment on the differences. Include a nicely formatted regression table and cite evidence to justify your comments.

You should use the following command to set up the survey weights:

svyset sdmvpsu [pweight=wtmec2yr], strata(sdmvstra) vce(linearized)

Solution.

Part a.

First we merge both data sets by participant id. Which drops some unusued observations in the demographic data set. Mainly going from 10348 observations in the demographic data to 8305 observations in the oral health data.

Part b.

Using logistic regression, we estimate the relationship between age (in months) and the probability that an individual has lost a primary upper right 2nd bicuspid. Using the fitted model we estimate the ages at which 25, 50, and 75% of individuals lose their primary upper right 2nd bicuspid.

Below are the are the results of our logistic regression.

**Logistic Regression of monthly age on presence of Primary Tooth Upper 2nd Bicuspid**
notPrimary	Coef.	Std. Err.	95% Low	95% Upper
ageMonths	0.07	0.00	0.06	0.07
constant	-8.36	0.32	-8.99	-7.73

On the next page we also plot our fitted model.

Below are the predicted ages in months and in years at which a primary tooth would be lost at the 25, 50, 75% levels.

**Representative Ages at 25, 50, 75% levels according to our fitted model**
Age (Months) at 25%	Age (Months) at 50%	Age (Months) at 75%	Age (Years) at 25%	Age (Years) at 75%
104	120	136	8	12

Part c.

Continuing from the regression in part b, we now control for separate demographics, such as: Gender, Ethnicity, and Income Poverty Ratio.

**Various Logistic Regressions on some demographic variables**
	Age	Age/Gender	Age/Mex	Age/Black	Age/Black/Other	Age/Black/InPovRatio
BIC	1533.41	1542.05	1542.28	1529.28	1536.1	1462.89

We do not retain gender, because this caused an increase in BIC.
We do not retain Mexican, or Other categories of ethnicities, but we do retain Non-Hispanic Black.
We retain poverty income ratio because it drops BIC to 1462.895. As a final result we retain black ethnicity and Income Poverty Ratio to the regression which both improved BIC.

Part d.

Now we use the margins command to compute the following:

Adjusted predctions at the mean (for other values) at each of the representative ages (8 - 12 years old).

**Adjusted predictions at the mean**
Age (Years)	Margin	Std. Err.	95% Low	95% Upper
8	0.158	0.013	0.133	0.184
9	0.303	0.016	0.271	0.335
10	0.500	0.017	0.468	0.533
11	0.698	0.015	0.669	0.727
12	0.842	0.011	0.820	0.864

Below we show a plot of the adjusted predictions which demonstrates that the representative ages are evenly spaced out.

The marginal effects at the mean of Black and Income Poverty Ratio variables at the representative ages (8 - 12 years old).

**Marginal Effects at the Mean**
Age (Years)	MEM	Std. Err.	95% Low	95% Upper
8	0.0617	0.0186	0.0253	0.0981
9	0.1013	0.0303	0.0419	0.1607
10	0.1237	0.0372	0.0508	0.1965
11	0.1058	0.0320	0.0430	0.1686
12	0.0665	0.0203	0.0267	0.1063

The average marginal effect of Black and Income Poverty Ratio variables at the representative ages (8 - 12 years old).

**Average Marginal Effects at Representative Ages**
Age (Years)	AME	Std. Err.	95% Low	95% Upper
8	0.0623	0.0189	0.0253	0.0994
9	0.1001	0.0295	0.0422	0.1579
10	0.1209	0.0355	0.0513	0.1904
11	0.1045	0.0313	0.0431	0.1659
12	0.0671	0.0207	0.0265	0.1077

Part e.

We refit the final model from part c using svyset and compare the new model to the old model.

**Logit Regression with Before Survey Weighting**
notPrimary	Coef.	Std. Err.	p-value	95% Low	95% Upper
ageMonths	0.0714	0.0027	0.0000	0.0661	0.0767
black	0.4950	0.1489	0.0009	0.2031	0.7869
inPovRatio	-0.1191	0.0454	0.0087	-0.2080	-0.0301
constant	-8.4603	0.3510	0.0000	-9.1483	-7.7723

**Logit Regression with Survey Weighting**
notPrimary	Coef.	Std. Err.	p-value	95% Low	95% Upper
ageMonths	0.0619	0.0072	0.0000	0.0465	0.0774
black	0.5435	0.1462	0.0021	0.2319	0.8551
inPovRatio	-0.0812	0.0522	0.1407	-0.1924	0.0301
constant	-7.5160	0.8616	0.0000	-9.3524	-5.6796

We notice that there are some slight changes in regression coefficients. The main result is that income poverty ratio is no longer significant as demonstrated by it’s p-value of 0.14. However, the age and black ethnicity predictors are still highly significant even though their standard errors increased a bit.