Take home midterm
(II)
Deadline 2006/12/28
Problem 1 : Grade point average
data from
Data sets
CH01PR19.txt
( first column y: a student's GPA at the end
of the freshman year)
( second column x1: a student's entrance test score)
CH08PR16.txt
(x2=1 if student had indicated a major field of
concentration at the time of application
x2=0 if the major field was undecided.)
1. Fit a simple linear regression between y and x1 (model
1). Use
"summary" and "anova" to conclude whether or not there is a linear
association between the entrance test score and a freshman's GPA. Is
this a good model? Explain your reason.
2. Please
decide whether or not the model can be improved by
adding the
variable x2.
(a) Explain the meaning of each regression
coefficient in model (2) which contains both variables x1 and x2.
(b) Obtain the estimated regression function
(c) Test whether the x2 can be dropped from the
regression model; use the significance level of 0.01. (State your
hypotheses, decision rule and conclusion)
(d) Obtain the residuals for regression model (2).
plot them against x1, x2. Is there any evidence in your plots
suggested that it would be helpful to include an interaction term in
the model.
3. Fit another regression model (model 3) which contains
variables x1, x2, and x1*x2 (interaction between entrance test score
and whether or not a major was pre-decided.)
(a) Obtain the estimated regression function.
(b) Testing whether or not the interaction term can
be dropped from the model at the significance level of 0.01. (State
your hypotheses, decision rule and conclusion.)
(c) Interpret the meaning of this regression model.
Problem
2: Patient Satisfaction
Data sets
CH06PR15.txt
y: Patient satisfaction
x1: Patient's age (in years)
x2: Severity of illness (an index)
x3: anxiety level (an index)
Please go to
http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf titled
"Practical Regression and Anova using R" by Julian Faraway.
See Chapter 10 Variable selection for the use of possible R
functions.
(p.s. the outline for my other course is for two semesters, so
it is reasonable. Thank you for reminding me to make a correction of
my web-page.)
1. Exam the data. Are any noteworthy features?
2. Obtain the scatter plot and correlation matrix. Interpret is there
pairwise linear associations among the predictor variables?
3. Fit the first-order linear regression model (model 1) for
these three predictor variables to the data. Use
information from
"summary" and "anova" to conclude your finding for the regression model
4. Obtain the residuals of this regression model and do your residual
diagnostics
5. Obtain the analysis of variance table that decomposes the regression
sum of squares into extra sum of squares associated with X2; with X1
given X2; with X3 given X2 and X1.
6. Test whether X3 can be dropped from the regression model given that
X1 and X2 are retained. Use the F test with level of significance .025.
(State
your hypotheses, decision rule and conclusion.)
7. Test
whether both X3 and X3 can be dropped from the regression model given
that X1 is retained. Use the level of significance .025.
(State
your hypotheses, decision rule and conclusion.)
8. Test whether beta_1=-1 and beta_2=0; use the level of significance
0.025, stat the alternatives, full and reduced models, decision rule
and conclusion.
把這兩個數值帶入迴歸方程得一只有x3的方程
將此方程當做你的REDUCE model
檢定此模型和原本full model (model 1)間的差異顯不顯著
9. Calculate R^2_{Y1}, R^2_{Y1|2}, R^2_{Y1|23}
and R^2_{Y2},
R^2_{Y2|1},
R^2_{Y2|13}.
Explain what each coefficient measures.
10. Obtain the standardized regression model.
11. Fit the first-order linear regression model (model 2) for relating
patient satisfaction to patient's age and severity of illness (X1 and
X2). State the fitted regression function. Compare model 2 with model
1. What do you find? Does SSR(X1) equal SSR(X1|X3) or SSR(X2)
equal SSR(X2|X3). Is there anything to do with the correlation matrix
found in 2.
12 Use all-possible-subset regression to determine
which subset of predictor variables you would recommend as the best for
predicting patient satisfaction. Use C_p criterion in your R's leaps
functions. Calculate
the value of each of the following
criteria
(1) R^2_{a,p} (2) AIC_p (3) C_p (4) PRESS_p for the best subset model
of yours.
13. Using forward
regression procedure, find the best subset
of predictors. Use F limits of
3.0 to add a variable. Show your steps.
14. Using backward elimination procedure, find the best subset of
predictors. Use F limits of
2.9 to delete a variable. Show your steps.
15. Use
forward stepwise regression procedure, using F
limits of 3.0 and 2.9 to add or delete a variable, respectively, to
determine the subset of variables that you select.
16. Compare the results from 12-16.
17. Obtain the diagonal elements of the hat matrix for your model.
Identify any outlying X observations.
18. Obtain the three variance inflation factors. Do they indicate that
a serious multicollinearity problem exists here?
19. A summary to conclude your findings.