Mendelian Randomization - ryamadaの遺伝学・遺伝統計学メモ

以前、こんな風にまとめた
別の書き方をしてみる

簡単に書き直すと、YにXが影響を与えているのだが、それは隠れた因子UがXとYに影響を与えているからかもしれないので、Uに関係なくXがYに影響を与えているかを知りたい。しかしながらUはどんなものがあるのかさえ分からず、情報はない。その代わりにXには影響を与えている(かもしれない)が、Uには影響を与えていないと信じられるZを使ってXの純粋なYへの影響の有無を調べよう、という話。ZがXに関連している力を使って、Uの影響を排除したXのYへの影響を調べることができる。ZがXに関係していないと、この「力」は出てこない Assume X affects on Y. There might be U that affects both X and Y. We want to know whether X affects on Y conditioned Z. Unfortunately we don't have information on U. Now pick Z that is associated with X but that we believe is not associated with U. The effect of X on Y conditioned U can be studied with the capacity of Z that can extract component of X independent from U.
そんなデータセットを、最も単純な正規分布的なデータ型で線形関係があるものとして作ってみて、とても基本的にこれをやってくれるパッケージAERのivreg()関数で試してみる The following R codes simulate data sets of Z,X,U,and Y that are in normal distribution and they are evaluated with ivreg() function in package "AER".

YとXには相関があるのがわかる(この相関はUを介したものとUを介さないX独自のものとの和)。Y.nullとXにも相関があるのがわかる(この相関はUを介したもの) X and U affect on Y. Y.null is independent from X.

n <- 1000 # 検体数

Z <- rnorm(n) # Zは正規分布
# Xに対するZの線形影響
az <- runif(1) 
bz <- runif(1)
# Uという、隠れた変数
U <- rnorm(n)
# そのUのXへの線形影響
au <- runif(1)
bu <- runif(1)
# XをZとUとの線形影響として作る。乱雑項を入れる
X <- az*Z + bz + au*U + bu + rnorm(n,0,0.1)
# YにはXとUとが影響を与えている
# axyが0でなければ、XはYへの真の影響があるということ
axy <- runif(1)
bxy <- runif(1)
auy <- runif(1)
buy <- runif(1)
# UはXとYとの両方に影響を与えているが
# X自身もYに影響を与えている
Y <- axy * X + bxy + auy * U + buy + rnorm(n,0,0.1)
# 帰無仮説としてXのYへの影響はUを介したもののみの場合も作る
Y.null <- Y - axy * X - bxy
# 作成した複数の変数同士の関連をプロットで見る
ZXUYY. <- cbind(Z,X,U,Y,Y.null)
plot(as.data.frame(ZXUYY.))
# いわゆる線形回帰をしたり、そこに「共変量としてのZ」を入れても
# UがもたらしているXのYへの影響は排除できない
# Y.nullの場合のXの影響は0であるべき(pは大き目であるべき)なのだが
summary(lm(Y ~ X + Z))
summary(lm(Y ~ X))
summary(lm(Y.null ~ X + Z)) # これのXの影響の検定が帰無仮説をサポートしてほしい
summary(lm(Y.null ~ X))

# Mendelian Randomizationをする
library(AER)
summary(ivreg(Y ~ X  ))
# Xの影響はZを考慮しても検出されている(Yではそれが真。Y.nullではそれが偽)
summary(ivreg(Y ~ X | Z))
summary(ivreg(Y.null ~ X  ))
# Zを使ってXのYへの直接影響だけを見ると、Xの直接影響はない、という帰無仮説が棄却されない
summary(ivreg(Y.null ~ X | Z))

結果の大事なところを抜書き Pertinent results

summary(ivreg(Y ~ X  ))
X           0.746010   0.004606   162.0   <2e-16 ***
summary(ivreg(Y ~ X | Z))
X           0.634995   0.007947   79.91   <2e-16 ***
summary(ivreg(Y.null ~ X  ))
X           0.113848   0.004606   24.72   <2e-16 ***
summary(ivreg(Y.null ~ X | Z))
X           0.002833   0.007947   0.357    0.722

ZとXとの関係をなくすと(az=0とすると)、たとえ、XがYに直接の影響があっても、その影響は検出できない When no association between Z and X, X's effect on Y conditioned U can not be observed.

Z <- rnorm(n) # Zは正規分布
# Xに対するZの線形影響
az <- runif(1)
az <- 0 
bz <- runif(1)
# Uという、隠れた変数
U <- rnorm(n)
# そのUのXへの線形影響
au <- runif(1)
bu <- runif(1)
# XをZとUとの線形影響として作る。乱雑項を入れる
X <- az*Z + bz + au*U + bu + rnorm(n,0,0.1)
# YにはXとUとが影響を与えている
# axyが0でなければ、XはYへの真の影響があるということ
axy <- runif(1)
bxy <- runif(1)
auy <- runif(1)
buy <- runif(1)
# UはXとYとの両方に影響を与えているが
# X自身もYに影響を与えている
Y <- axy * X + bxy + auy * U + buy + rnorm(n,0,0.1)
# 帰無仮説としてXのYへの影響はUを介したもののみの場合も作る
Y.null <- Y - axy * X - bxy
# 作成した複数の変数同士の関連をプロットで見る
ZXUYY. <- cbind(Z,X,U,Y,Y.null)
plot(as.data.frame(ZXUYY.))

summary(lm(Y ~ X + Z))
summary(lm(Y ~ X))
summary(lm(Y.null ~ X + Z)) # これのXの影響の検定が帰無仮説をサポートしてほしい
summary(lm(Y.null ~ X))


# Mendelian Randomizationをする
library(AER)
summary(ivreg(Y ~ X  ))
# Xの影響はZを考慮しても検出されている(Yではそれが真。Y.nullではそれが偽)
summary(ivreg(Y ~ X | Z))
summary(ivreg(Y.null ~ X  ))
# Zを使ってXのYへの直接影響だけを見ると、Xの直接影響はない、という帰無仮説が棄却されない
summary(ivreg(Y.null ~ X | Z))

> summary(ivreg(Y ~ X | Z))

Call:
ivreg(formula = Y ~ X | Z)

Residuals:
      Min        1Q    Median        3Q       Max 
-1.390250 -0.293837 -0.009938  0.291430  1.187418 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.3729     3.3893   0.110    0.912
X             0.9494     2.0825   0.456    0.649

> summary(ivreg(Y ~ X  ))

Call:
ivreg(formula = Y ~ X)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.56347 -0.11027 -0.00246  0.12478  0.71595 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.382332   0.006289   219.8   <2e-16 ***
X           0.746010   0.004606   162.0   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1761 on 998 degrees of freedom
Multiple R-Squared: 0.9633,     Adjusted R-squared: 0.9633 
Wald test: 2.623e+04 on 1 and 998 DF,  p-value: < 2.2e-16 

> # Xの影響はZを考慮しても検出されている(Yではそれが真。Y.nullではそれが偽)
> summary(ivreg(Y ~ X | Z))

Call:
ivreg(formula = Y ~ X | Z)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.817847 -0.151576 -0.006081  0.145768  0.803509 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.452700   0.008629  168.36   <2e-16 ***
X           0.634995   0.007947   79.91   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2215 on 998 degrees of freedom
Multiple R-Squared: 0.942,      Adjusted R-squared: 0.942 
Wald test:  6385 on 1 and 998 DF,  p-value: < 2.2e-16 

> summary(ivreg(Y.null ~ X  ))

Call:
ivreg(formula = Y.null ~ X)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.56347 -0.11027 -0.00246  0.12478  0.71595 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.605179   0.006289   96.23   <2e-16 ***
X           0.113848   0.004606   24.72   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1761 on 998 degrees of freedom
Multiple R-Squared: 0.3797,     Adjusted R-squared: 0.3791 
Wald test: 610.9 on 1 and 998 DF,  p-value: < 2.2e-16 

> # Zを使ってXのYへの直接影響だけを見ると、Xの直接影響はない、という帰無仮説が棄却されない
> summary(ivreg(Y.null ~ X | Z))

Call:
ivreg(formula = Y.null ~ X | Z)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.817847 -0.151576 -0.006081  0.145768  0.803509 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.675547   0.008629  78.290   <2e-16 ***
X           0.002833   0.007947   0.357    0.722    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2215 on 998 degrees of freedom
Multiple R-Squared: 0.01866,    Adjusted R-squared: 0.01768 
Wald test: 0.1271 on 1 and 998 DF,  p-value: 0.7215

さて。このivreg()関数が何をしているのかを知ることで、この手法が何をしているのかを確認してみる。ivreg()関数の中を覗いて、省ける条件分岐を除去してやると以下のようになる。Let's take a look inside of ivreg() as below. Take out the core lines of ivreg() function and let them process our simulational datasets, so that we can understand what the function does.

n <- 1000 # 検体数

Z <- rnorm(n) # Zは正規分布
# Xに対するZの線形影響
az <- runif(1)
az <- 0 
bz <- runif(1)
# Uという、隠れた変数
U <- rnorm(n)
# そのUのXへの線形影響
au <- runif(1)
bu <- runif(1)
# XをZとUとの線形影響として作る。乱雑項を入れる
X <- az*Z + bz + au*U + bu + rnorm(n,0,0.1)
# YにはXとUとが影響を与えている
# axyが0でなければ、XはYへの真の影響があるということ
axy <- runif(1)
bxy <- runif(1)
auy <- runif(1)
buy <- runif(1)
# UはXとYとの両方に影響を与えているが
# X自身もYに影響を与えている
Y <- axy * X + bxy + auy * U + buy + rnorm(n,0,0.1)
# 帰無仮説としてXのYへの影響はUを介したもののみの場合も作る
Y.null <- Y - axy * X - bxy
# 作成した複数の変数同士の関連をプロットで見る
ZXUYY. <- cbind(Z,X,U,Y,Y.null)
plot(as.data.frame(ZXUYY.))


# Mendelian Randomizationをする
library(AER)

# 関数の変数表記に書き換える
x <- matrix(X,ncol=1)
y <- Y
z <- matrix(Z,ncol=1)

n <- NROW(y)
p <- ncol(x)

# ZとXの関係を調べて、Zが関係している部分のみに「純化」して
# 以降の処理に使う(らしい)
auxreg <- lm.fit(z, x)
xz <- as.matrix(auxreg$fitted.values)
colnames(xz) <- colnames(x)

# 純化した変数とyとの関係を調べる
fit <- lm.fit(xz, y)
ok <- which(!is.na(fit$coefficients))
yhat <- drop(x[, ok, drop = FALSE] %*% fit$coefficients[ok])
names(yhat) <- names(y)
res <- y - yhat
ucov <- chol2inv(fit$qr$qr[1:length(ok), 1:length(ok), drop = FALSE])
colnames(ucov) <- rownames(ucov) <- names(fit$coefficients[ok])
rss <- sum(res^2)

rval <- list(coefficients = fit$coefficients, residuals = res, 
    fitted.values = yhat, weights = weights, offset = if (identical(offset, 
        rep(0, n))) NULL else offset, n = n, nobs = n, rank = fit$rank, df.residual = fit$df.residual, 
        cov.unscaled = ucov, sigma = sqrt(rss/fit$df.residual), 
        x = xz)