# 单侧 t 检验 alpha <- 0.05 n <- 100 N <- 1000 t_list <-NULL
for(i in1:N){ y <- rnorm(n,0,1) t <-(mean(y)-0)/(sd(y)/sqrt(n))# t 检验量 t_list <-c(t_list, t)# t 检验量加入 t_list } c<- qt(alpha, lower.tail =FALSE, df = n -1) sum(t_list <=c)/ N # N 个样本拒绝的次数 / 总模拟次数 # 0.948
对于第 I 类错误,控制在 α 的 R 语言模拟验证:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
# 控制第 I 类错误:H0 对但拒绝 alpha <- 0.05 n <- 100 N <- 1000 p_t_list <-NULL
for(i in1:N){ y <- rnorm(n,0,1) t <-(mean(y)-0)/(sd(y)/sqrt(n))# t 检验量 p_t <- pt(t, df = n -1)# P(X < t) = p_t, X ~ t(n-1) 这个问题的 p 值 p_t_list <-c(p_t_list, p_t)# t 检验量分位数对应的 t(n-1) 分布概率加入 t_list } sum(p_t_list <= alpha)/ N # 犯第 I 类错误 # 0.058
p 值是统计量,且满足均匀分布 U[0,1) 。
2 常见的统计推断
2.1 正态总体均值
第一步验证模型假设,例如使用 Q-Q 图验证是否近似正态。
2.1.1 t.test() 检验函数
t-检验:t.test() 提供了正态总体均值的 t。检验方法。
1 2 3 4
t.test(x, y =NULL, alternative =c("two.sided","less","greater"), mu =0, paired =FALSE, var.equal =FALSE, conf.level =0.95, ...)
1 2 3 4
x, y : 可以单样本也可以两样本 alternative : 方向,单侧 "less","greater" 或是双侧 "two.sided" 与备择假设同向 mu =0: 检验均值的 mu_0 ,默认 0 conf.level =0.95: 置信水平
例如:
1 2
X <-c(159,280,101,212,224,379,179,264,222,362,168,250,149,260,485,170) t.test(X, alternative ="greater", mu =225)
1 2 3 4 5 6 7 8 9 10 11
One Sample t-test
data: X t =0.66852, df =15, p-value =0.257 alternative hypothesis: true mean is greater than 225 95 percent confidence interval: 198.2321Inf sample estimates: mean of x 241.5
T 检验量的值 t = 0.66852
df 自由度 n - 1 = 15
p-value p-value = 0.257 所以拒绝 true mean is greater than 225
data: score by Gender t =1.9163, df =97.963, p-value =0.05824# df = m + n - 2 alternative hypothesis: true difference in means between group Female and group Male is not equal to 0 95 percent confidence interval: -0.14226618.1422661 sample estimates: mean in group Female mean in group Male 73.1269.12
检验模型近似正态:
1 2 3 4 5 6 7 8 9 10
x <- df$score[df$Gender =="Female"] y <- df$score[df$Gender =="Male"] png("./img/qq_score.png", width =2400, height =1200, res =200) op <- par(mfrow =c(1,2)) qqnorm(x, main ="Female") qqline(x) qqnorm(y, main ="Male") qqline(y) dev.off() par(op)
**检验方差:**经验直观的方法,比较 IQR
1
boxplot(score ~ Gender, data = df)
或者使用 vat.test()
1 2 3 4 5 6 7
## Default S3 method: var.test(x, y, ratio =1, alternative =c("two.sided","less","greater"), conf.level =0.95, ...)
## S3 method for class 'formula' var.test(formula, data, subset, na.action, ...)
例如:
1
var.test(score ~ Gender, data = df)
1 2 3 4 5 6 7 8 9 10 11
F test to compare two variances
data: score by Gender F=1.0397, num df =49, denom df =49, p-value =0.892 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.59003091.8322278 sample estimates: ratio of variances 1.039746
注意到方差不相等,则
1
t.test(score ~ Gender, data = df, var.equal =FALSE)
1 2 3 4 5 6 7 8 9 10 11
Welch Two Sample t-test
data: score by Gender t =1.9163, df =97.963, p-value =0.05824 alternative hypothesis: true difference in means between group Female and group Male is not equal to 0 95 percent confidence interval: -0.14226618.1422661 sample estimates: mean in group Female mean in group Male 73.1269.12
## Default S3 method: cor.test(x, y, alternative =c("two.sided","less","greater"), method =c("pearson","kendall","spearman"), exact =NULL, conf.level =0.95, continuity =FALSE, ...)
## S3 method for class 'formula' cor.test(formula, data, subset, na.action, ...)
alternative :单侧或是双侧
method :选择三个相关系数之一
conf.level :置信水平
有关 alternative (与备择假设同方向):
H1:ρ=0 对应 alternative = c("two.sided")
H1:ρ>0 对应 alternative = c("greater")
H1:ρ<0 对应 alternative = c("less")
计算相关系数的函数 cor()
1 2
cor(x, y =NULL, use ="everything", method =c("pearson","kendall","spearman"))
method :选择三个相关系数之一
use :可选 "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs"
例如:
1 2 3 4 5 6
X <-c(7.7,8.2,7.8,6.9,8.4,8.1,7.1,7.5,7.6,7.6,7.9,7.6,7.5,7.6,7.6) Y <-c(7.2,6.7,5.7,4.0,5.7,6.4,4.5,5.5,NA,5.4,6.1,6.9,3.9,5.7,3.7)
cor(X, Y, use ="na.or.complete", method ="pearson")# 0.5801752
cor.test(X, Y, method ="pearson")
1 2 3 4 5 6 7 8 9 10 11
Pearson's product-moment correlation data: X and Y t = 2.4675, df = 12, p-value = 0.02963 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.07165242 0.84931184 sample estimates: cor 0.5801752