一.ANOVA
透過anova檢定 看看學群跟延畢率是否有顯著關係
設定虛無假設H0為學群之間與延畢率沒差異
library(readxl)
## Warning: package 'readxl' was built under R version 3.4.4
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
library(data.table)
## Warning: package 'data.table' was built under R version 3.4.4
s<-data.frame(fread('Student_RPT_19.csv'))
h<-read_excel("高教資料總表串聯3.xlsx")
#colnames(h)#釐清欄位總類別
hd<-h$delay_rate
hc<-h$college
str(hd)
## num [1:1792] 0.082 0.0344 0.0513 0.0316 0 ...
summary(hc)
## Length Class Mode
## 1792 character character
anovatest<-aov(hd~hc,h)
summary(anovatest)
## Df Sum Sq Mean Sq F value Pr(>F)
## hc 121 0.4622 0.003820 3.404 <2e-16 ***
## Residuals 1669 1.8730 0.001122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
顯示為三顆星
也就是P值介於0與0.001之間
所以結果相當顯著 表示學群之間是有差異的
二.線性迴歸 X軸令為學生人數 Y軸令為延畢率 plot簡易畫圖
hs<-h$students_amount
hl<-lm(hd~hs)
plot(hd~hs,data=h)
print(hl)
##
## Call:
## lm(formula = hd ~ hs)
##
## Coefficients:
## (Intercept) hs
## 5.201e-02 -1.379e-05
可觀察到線性並不明顯但有負相關的趨勢