# Using Toluca data (CH01TA01.DAT), this sample program shows how to
get tables similar to
# Figure 1.11, (part of) Table 1.2 in NKNW as well as fitted lines and
bais residual diagnosis
# Prepared by Andy Tsao
# Refer also to Practical
Regression and ANOVA using R
#
# The program is divided into the following steps
# 1. Input data set into R
# 2. Graphical data display: Whether E(Y) = \beta_0 + \beta_1 X + \epsilon
is seemly acceptable?
# 3. Perform simple linear regression
# LSE, residual, fitted value
#
# 4. Checking GM and normality assumption on errors.
# Graphical: Histograms, Box-plots, scatter plot, normal
probability plot
# We will leave out the formal test for the moment.
#
# 1. Input Data to R
# You need to change the working dir from "File|Change dir" to the dir
# where the data is located. Here the data set = CH01TA01.DAT
# (Save to your local drive as an ASCII file)
t<-matrix(scan("CH01TA01.DAT"),ncol=2,byrow=T)
# The data has 2 columns and read by row
toluca <-data.frame(lotsize=t[,1],hours=t[,2])
attach(toluca) # Use "toluca" as our working
dataset
# Numerical Summaries
summary(toluca)
mean(toluca)
var(toluca); var(lotsize);
var(hours)
# 2. Graphical Displays: Linearity? Outliers or influential points?
par(mfrow=c(2,2)); # Setup the graphical device
boxplot(hours,main="hours"); boxplot(lotsize,main="lot
sizes")
plot(lotsize,hours, main="lotsize vs. hours")
hist(hours)
# 3. Perform simple linear regression : LSE, residual, fitted value
g <- lm(hours ~lotsize)
summary(g)
betahat<-g$coefficient
yhat<-g$fitted.values
e<-g$residuals
# You may save the output from summary(g) for later
comparison.
4. Checking GM and normality assumption on errors.
plot.lm(g)
plot(lotsize,hours,main="lotsize vs. hours with fitted line")
abline(g$coef, lty=5)
hist(e); plot(density(e))
detach(toluca) # "toluca" is no longer
used
# Then Exit R without saving workpace image.
q()