ryamadaの遺伝学・遺伝統計学メモ

2016-05-19

Multiple-Comparison Issue

When you test multiple times, you should not believe nominal p-values of individual tests. 検定を複数行った場合には、個々の検定のp値をそのまま使って解釈できない

p <- runif(10^5)
hist(p)
plot(sort(p),pch=20,cex=0.1)
alpha <- 0.05
abline(v=length(p)*alpha,col="red")
abline(h=alpha,col="blue")

Multiple tests (100 tests) 100個の検定

n <- 100
m <- 1000
ps <- matrix(runif(n*m),n,m)
min.ps <- apply(ps,2,min)
hist(min.ps)
plot(sort(min.ps),pch=20,cex=0.1)
alpha <- 0.05
abline(v=length(min.ps)*alpha,col="red")
abline(h=alpha,col="blue")

False Disovery Rate
- q-value not p-value
- Some tests should be "significant"; many genes should be significantly diffrently expressed between cancer and normal cells.
- The smallest p-value should be small enough, but the 2nd, 3rd,... smallest could be not so small and they should be judged by different threshold.

n.null <- 80
n.alt <- 20
p <- runif(n.null)
p <- c(p,pchisq(rchisq(n.alt,df=1,ncp=10),df=1,lower.tail=FALSE))
p <- sort(p)
plot(p)
q <- p.adjust(p,"BH")
k <- max(which(q<0.05))
plot(p,pch=20)
abline(0,p[k]/k,col="red")

plot(q,pch=20)

- Further reading : Large-scale inference; When you have many observations all together, use their overall distribution/behavior for judgement of each.

はてなブログをはじめよう！

ryamada22さんは、はてなブログを使っています。あなたもはてなブログをはじめてみませんか？

はてなブログをはじめる（無料）

はてなブログとは

ryamadaの遺伝学・遺伝統計学メモ

Powered by Hatena Blog | ブログを報告する

引用をストックしました

引用するにはまずログインしてください

引用をストックできませんでした。再度お試しください

限定公開記事のため引用できません。

読者です読者をやめる読者になる読者になる