#  Using Toluca data (CH01TA01.DAT), this sample program shows how to  get  tables similar to
#  Figure 1.11, (part of) Table 1.2 in NKNW as well as fitted lines and bais residual diagnosis
#  Prepared by Andy Tsao
#  Refer also to Practical Regression and ANOVA using R
#
# The program is divided into the following steps
# 1. Input data set into R
# 2. Graphical data display: Whether E(Y) = \beta_0 + \beta_1 X + \epsilon is seemly acceptable?
# 3. Perform simple linear regression
#     LSE, residual, fitted value
#    
# 4. Checking GM and normality assumption on errors. 
#    Graphical:  Histograms, Box-plots, scatter plot, normal probability plot
#    We will  leave out the formal test for the moment.
#
# 1. Input Data to R
# You need to change the working dir from "File|Change dir" to the dir
# where the data is located. Here the data set = CH01TA01.DAT
# (Save to your local drive as an ASCII file)

t<-matrix(scan("CH01TA01.DAT"),ncol=2,byrow=T) 

# The data has 2 columns and read by row

toluca <-data.frame(lotsize=t[,1],hours=t[,2])
attach(toluca)  # Use "toluca" as our working dataset

# Numerical Summaries

summary(toluca)
mean(
toluca)
var(
toluca); var(lotsize); var(hours)

# 2. Graphical Displays: Linearity? Outliers or influential points?

par(mfrow=c(2,2)); 
   # Setup the graphical device
boxplot(hours,main="hours"); boxplot(lotsize,main="lot sizes")
plot(lotsize,hours, main="lotsize vs. hours")
hist(hours)

# 3. Perform simple linear regression : LSE, residual, fitted value

g <- lm(hours ~lotsize)
summary(g)

betahat<-g$coefficient
yhat<-g$fitted.values
e<-g$residuals


# You may save the output from summary(g) for later comparison.

4. Checking GM and normality assumption on errors. 

plot.lm(g)
plot(lotsize,hours,main="lotsize vs. hours with fitted line")
abline(g$coef, lty=5)
hist(e); plot(density(e))


detach(toluca)  # "toluca" is no longer used

# Then Exit R without saving workpace image.
q()