有下面這個數(shù)據(jù)
分A和B兩組人群
下面4行是不同疾病患病數(shù)

# 首先我們建立一個dataframe
dat <- data.frame(low=c(13,7,21,6),
high=c(77,22,21,71))
# 而A組總共有66個樣本,B組有128個樣本
total_no <- c(66,128)
# 先以dat第一行建立一個四格表
# low high
# 13 77
# 53 51
tmp <- chisq.test(rbind(dat[1,], total_no-dat[1,]))
# 提取卡方和p值
tmp$statistic
tmp$p.value

# 其實可以手動計算另外3行,但是想試一試循環(huán)
# 先建立一個空的向量
k <- rep(NA, 4)
p <- rep(NA, 4)
# 接下來開始循環(huán)
for (i in c(1:4)) {
a <- chisq.test(rbind(dat[i,], total_no-dat[i,]))
k[i] <- a$statistic
p[i] <- a$p.value
}
results <- rbind(k,p)
results
最后得到結(jié)果

故事還沒有結(jié)束。。。。
用SPSS做出的結(jié)果和R的結(jié)果有出入

而R做出來的卡方值是

為什么?為什么?
尋找原因
R的數(shù)值錄入有問題?
所以重新錄入,模仿SPSS
使用t()函數(shù)對數(shù)據(jù)進(jìn)行轉(zhuǎn)化

dat <- data.frame(low=c(13,7,21,6),
high=c(77,22,21,71))
total_no <- c(66,128)
# 在這步加入t()轉(zhuǎn)換
tmp <- chisq.test(t(rbind(dat[1,], total_no-dat[1,])))
tmp$statistic
tmp$p.value
但是結(jié)果依舊是

R和SPSS的參數(shù)不同?
查看R的幫助文檔,發(fā)現(xiàn)蛛絲馬跡

原來有一個叫Yates Correction的東西在搞鬼(主要是我的統(tǒng)計知識太菜)
再次跑R

bingo!和SPSS的卡方值一樣了
Yates Correction是什么東西
以下參考:
https://www.statisticshowto.datasciencecentral.com/what-is-the-yates-correction/
為什么要用yates correction?
The Yates correction is a correction made to account for the fact that both Pearson’s chi-square test and McNemar’s chi-square test are biased upwards for a 2 x 2 contingency table. An upwards bias tends to make results larger than they should be. If you are creating a 2 x 2 contingency table that uses either of these two tests, the Yates correction is usually recommended, especially if the expected cell frequencies are below 10 (some authors put that figure at 5).
Chi2 tests are biased upwards when used on 2 x 2 contingency tables. The reason is that the statistical Chi2 distribution is continuous and the 2 x 2 contingency table is dichotomous (in other words, it isn’t continuous, there are two variables). All you really need to know is that if your expected cell frequencies are below 10, you probably should be using the Yates correction.
而R默認(rèn)是使用yates correction,所以有了上面這個故事。