MIS 301, Statistical Analysis for Business, Project
Boston all.xlsx contains housing data for 506 census tracts of Boston from the 1970 census. There are 13 variables and 506 observations:
1. crim: per capita crime rate by town.
2. zn: proportion of residential land zoned for lots over 25,000 square feet.
3. indus: proportion of non-retail business acres per town.
4. chas: Charles River dummy variable (1, if tract bounds river; 0 otherwise).
5. nox: nitrogen oxides concentration (parts per 10 million).
6. rm: average number of rooms per dwelling.
7. age: proportion of owner-occupied units built prior to 1940.
8. dis: weighted mean of distances to five Boston employment centers.
9. rad: index of accessibility to radial highways.
10. tax: full-value property-tax rate per $10,000.
11. ptratio: pupil-teacher ratio by town.
12. lstat: lower status of the population (percent).
13. medv: median value of owner-occupied homes in $1000s.
Our goal is to interpret the house price with all available information.
a. We first consider studying the relationship between rm (average number of rooms) and medv (house price). Please make a scatter plot of medv against rm and calculate their sample correlation using Excel. Please comment. (15%).
b. We regress medv (as y) on rm (as x). Please include the Excel output and write down the estimated regression model. (10%).
c. Secondly, we investigate the relationship between tax (property tax) and medv (house price). Please make a scatter plot of medv against tax and calculate their sample correlation using Excel. Please comment. (15%).
1
d. We regress medv (as y) on tax (as x). Please include the Excel output and write down the estimated regression model. (10%).
e. What is the R2 for the model in b? What is the R2 for the model in d? Which model fits the house price better? (15%).
f. Now we use all other variables as independent variables and build a liner regression model for house price. Include the Excel output. Which variables are significant and which are not? ? = 0.05. (15%).
g. What is the p-value associated with the F test for this model? What are the null and alternative hypotheses for the F test? What is your conclusion? ? = 0.05. (10%).
h. What is the p-value associated with the t test for crim? What are the null and alternative hypotheses for this t test? What is your conclusion? ? = 0.05. (10%).


Recent Comments