Now we will import data directly from the web and play with qqplot/normal probabilty plot
First, make sure you have installed data.table package. If not, go ahead and install it. Then load data.table package into R. The fread usually works well for reading csv and other standard ascii files.
library(data.table)
data.table 1.9.6 For help type ?data.table or https://github.com/Rdatatable/data.table/wiki
The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
exam <- fread('http://faculty.ndhu.edu.tw/~chtsao/ftp/stat2016.txt')
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 558 100 558 0 0 10631 0 --:--:-- --:--:-- --:--:-- 10730
head(exam) # Take a quick look of first few cases
fread creates exam which is a dataframe. Assign variable names: year, mid, final. Set year to factor.
summary(exam)
V1 V2 V3
Min. :2.000 Min. : 0.00 Min. :-10.00
1st Qu.:2.000 1st Qu.: 25.50 1st Qu.: 13.50
Median :2.000 Median : 46.00 Median : 31.00
Mean :2.476 Mean : 46.52 Mean : 34.11
3rd Qu.:3.000 3rd Qu.: 67.50 3rd Qu.: 53.00
Max. :5.000 Max. :110.00 Max. :100.00
is.data.frame(exam)
[1] TRUE
colnames(exam)<-c("year","mid","final")
exam$year<-as.factor(exam$year)
summary(exam)
year mid final
2:42 Min. : 0.00 Min. :-10.00
3:15 1st Qu.: 25.50 1st Qu.: 13.50
4: 3 Median : 46.00 Median : 31.00
5: 3 Mean : 46.52 Mean : 34.11
3rd Qu.: 67.50 3rd Qu.: 53.00
Max. :110.00 Max. :100.00
Now we use ggplot to further explore the data refer to Data visualization with ggplot2
library(ggplot2)
Need help getting started? Try the cookbook for R:
http://www.cookbook-r.com/Graphs/
scatter <- ggplot(data=exam, aes(x = mid, y = final))
scatter + geom_point(aes(color=year, shape=year)) +
xlab("midterm") + ylab("final") +
ggtitle("Midterm vs Final Plot")
# Construct boxplot of midterm, final (respectively) by year
boxplot(mid~year,data=exam,
xlab="year", ylab="midterm", main="Midterm Boxplots")
#library(ggplot2)
box <- ggplot(data=exam, aes(x=year, y=mid))
box + geom_boxplot(aes(fill=year)) +
ylab("mid") + ggtitle("Midterm Boxplots") +
stat_summary(fun.y=mean, geom="point", shape=5, size=4)
boxplot(final~year,data=exam,
xlab="year", ylab="final", main="Final Boxplots")
#library(ggplot2)
box <- ggplot(data=exam, aes(x=year, y=final))
box + geom_boxplot(aes(fill=year)) +
ylab("final") + ggtitle("Final Boxplots") +
stat_summary(fun.y=mean, geom="point", shape=5, size=4)
attach(exam)
The following objects are masked from exam (pos = 3):
final, mid, year
The following objects are masked from exam (pos = 4):
final, mid, year
m2<-lm(final~mid)
summary(m2)
Call:
lm(formula = final ~ mid)
Residuals:
Min 1Q Median 3Q Max
-36.062 -12.562 -3.029 10.211 61.213
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.44481 5.14339 1.642 0.106
mid 0.55168 0.09554 5.774 2.79e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 20.54 on 61 degrees of freedom
Multiple R-squared: 0.3534, Adjusted R-squared: 0.3428
F-statistic: 33.34 on 1 and 61 DF, p-value: 2.791e-07
par(mfrow=c(2,2));
plot(m2)
par(mfrow=c(1,1));
plot(final~mid)
abline(m2)
summary(mid); sd(mid)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 25.50 46.00 46.52 67.50 110.00
[1] 27.30228
There are some other alternative ways to download data from the web. See Getting Data From One Online Source