Take home midterm (II)
Deadline 2006/12/28

Problem 1 : Grade point average data from

Data sets
    CH01PR19.txt 
    ( first column  y: a student's GPA at the end of the freshman year)
    ( second column x1: a student's entrance test score)
    CH08PR16.txt
    (x2=1 if student had indicated a major field of concentration at the time of application
    x2=0 if the major field was undecided.)


1. Fit a simple linear regression between y and  x1 (model 1). Use "summary" and "anova" to conclude whether or not there is a linear association between the entrance test score and a freshman's GPA. Is this a good model? Explain your reason.


2. Please decide whether or not the model can be improved by adding the variable x2.
    (a) Explain the meaning of each regression coefficient in model (2) which contains both variables x1 and x2.
    (b) Obtain the estimated regression function
    (c) Test whether the x2 can be dropped from the regression model; use the significance level of 0.01. (State your hypotheses, decision rule and conclusion)
    (d) Obtain the residuals for regression model (2). plot them against x1, x2. Is there any evidence in your plots suggested that it would be helpful to include an interaction term in the model.
 
3. Fit another regression model (model 3) which contains variables x1, x2, and x1*x2 (interaction between entrance test score and whether or not a major was pre-decided.)
    (a) Obtain the estimated regression function.
    (b) Testing whether or not the interaction term can be dropped from the model at the significance level of 0.01. (
State your hypotheses, decision rule and conclusion.)
    (c) Interpret the meaning of this regression model.



Problem 2: Patient Satisfaction

Data sets    CH06PR15.txt 
 y: Patient satisfaction
x1: Patient's age (in years)
x2: Severity of illness (an index)
x3: anxiety level (an index)

Please go to http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf  titled "Practical Regression and Anova using R"  by Julian Faraway.
See Chapter 10 Variable selection for the use of possible R functions. 
(p.s. the outline for my other course is for two semesters, so it is reasonable. Thank you for reminding me to make a correction of my web-page.)

1. Exam the data. Are any noteworthy features?

2. Obtain the scatter plot and correlation matrix. Interpret is there pairwise linear associations among the predictor variables?

3. Fit the first-order linear  regression model (model 1) for these three predictor variables to the data.
Use information from "summary" and "anova" to conclude your finding for the regression model

4. Obtain the residuals of this regression model and do your residual diagnostics

5. Obtain the analysis of variance table that decomposes the regression sum of squares into extra sum of squares associated with X2; with X1 given X2; with X3 given X2 and X1.

6. Test whether X3 can be dropped from the regression model given that X1 and X2 are retained. Use the F test with level of significance .025.
(State your hypotheses, decision rule and conclusion.)

7.
Test whether both X3 and X3 can be dropped from the regression model given that X1 is retained. Use the level of significance .025. (State your hypotheses, decision rule and conclusion.)

8. Test whether beta_1=-1 and beta_2=0; use the level of significance 0.025, stat the alternatives, full and reduced models, decision rule and conclusion.
把這兩個數值帶入迴歸方程得一只有x3的方程 將此方程當做你的REDUCE model 檢定此模型和原本full model (model 1)間的差異顯不顯著

9. Calculate R^2_{Y1},
R^2_{Y1|2}, R^2_{Y1|23} and R^2_{Y2}, R^2_{Y2|1}, R^2_{Y2|13}. Explain what each coefficient measures.

10. Obtain the standardized regression model.

11. Fit the first-order linear regression model (model 2) for relating patient satisfaction to patient's age and severity of illness (X1 and X2). State the fitted regression function. Compare model 2 with model 1. What do you find?  Does SSR(X1) equal SSR(X1|X3) or SSR(X2) equal SSR(X2|X3). Is there anything to do with the correlation matrix found in 2.

12 Use all-possible-subset regression to
determine which subset of predictor variables you would recommend as the best for predicting patient satisfaction. Use C_p criterion in your R's leaps functions. Calculate the value of each of the following criteria (1) R^2_{a,p} (2) AIC_p (3) C_p (4) PRESS_p for the best subset model of yours.

13. Using forward regression procedure, find the best subset of predictors. Use F limits of 3.0 to add a variable. Show your steps.

14. Using backward elimination procedure, find the best subset of predictors. Use F limits of 2.9 to delete a variable. Show your steps.


15. Use forward stepwise regression procedure, using F limits of 3.0 and 2.9 to add or delete a variable, respectively, to determine the subset of variables that you select.

16. Compare the results from 12-16.

17. Obtain the diagonal elements of the hat matrix for your model. Identify any outlying X observations.

18. Obtain the three variance inflation factors. Do they indicate that a serious multicollinearity problem exists here?

19. A summary to conclude your findings.