2016-05-21
Dually Flat Manifolds

情報幾何尤度 Dually flat 指数分布族混合分布族 KL divergence
昨日、階層構造を持つ確率分布のための情報幾何について少し書いた
特に、 $\{0,1\}^d$ なd元分割表のためのe-平坦、m-平坦なパラメタの取り方について、丁寧に確認してみたい
Rでパラメタ変換をしてみるのが、手っ取り早そうなので、それでやってみる
Rmd
2重平坦パラメタリゼーションを学ぶ: 2のd乗組み合わせデータ構造のための情報幾何 (English Edition)
作者: ryamada
発売日: 2016/05/19
メディア: Kindle版
この商品を含むブログ (1件) を見る
---
title: "Dual flat parameterization for {0,1}^d"
author: "ryamada"
date: "Friday, May 20, 2016"
output: html_document
---

# Brief introduction

{0,1}の値を取るd個の変数$X_i;i=1,2,...$があるときに、その値セットの取り方($\mathbf{x} = (0,0,...,0),(1,0,...),...,(1,1,...,1)$)は
$D=2^d$通りある。

今、この$D$通りの取り方の生起確率ベクトルを$freq=(f_1,...,f_D)$とすれば、
$\sum f_i = 1$の制約があり、自由度は$D-1$である。


このような階層構造を持つ組み合わせについては、情報幾何的な観点から有用な、パラメタの取り方が2通り知られている。

It is know that there are tow useful ways to parameterize the frequency with the hierarchical structure.

この文書の目的は、頻度ベクトルを２通りのパラメタに置き換えたり、パラメタの値を指定して、頻度ベクトルを作ったりする道具を提供すること。

This document is written to offer utility functions to transform frequency vectors to two types of parameters and their inverse.




詳しい理論は ["Information Geometry on Hierarchy of Probability
Distributions" (IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 5, JULY 2001)](https://www.google.co.jp/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwja3aaXwOXMAhXHXaYKHVSXCzAQFgghMAA&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F18%2F20133%2F00930911.pdf%3Farnumber%3D930911&usg=AFQjCNFd9avfcHWB3jmeVpfvnuWzCKo5rQ&sig2=OBOG6MXZgpPJvHd2JksQIQ) を参照のこと。

You can learn its theory in ["Information Geometry on Hierarchy of Probability
Distributions" (IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 5, JULY 2001)](https://www.google.co.jp/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwja3aaXwOXMAhXHXaYKHVSXCzAQFgghMAA&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F18%2F20133%2F00930911.pdf%3Farnumber%3D930911&usg=AFQjCNFd9avfcHWB3jmeVpfvnuWzCKo5rQ&sig2=OBOG6MXZgpPJvHd2JksQIQ)

## e-flat parameterization

$\mathbf{\theta}$は長さ$D-1$のパラメタベクトル。

$\mathbf{\theta}$ is a parameter vector with $D-1$ elements.

$\mathbf{\theta} = ((\theta_1,\theta_2,...,\theta_d,),(\theta_{1,2},...,\theta_{d-1,d}),...,(\theta_{1,2,...,d}))$

$\mathbf{x}=(x_1,...,x_d)$というパターンを取る確率
$$
\log{p(\mathbf{x})}=(x_1,...,x_d))= \sum \theta_i x_i + \sum \theta_{i,j} x_i x_j + ... + \theta_{1,2,...,d} x_1 x_2 ... x_d - \psi
$$

$psi$の値は$\mathbf{\theta}$が決まると(自由度がもうないので)一意に決まる。

The value of $\psi$ is not free because $\mathbf{\theta}$ has $D-1=df$ elements.

$\theta$ は1要素の寄与、要素ペアの寄与、要素トリオの寄与…に対応づく。


$\theta$ represents single effect, pairwise effect, triowise... and so on.

## m-flat parameterization

$\mathbf{\eta}$は、長さ$D-1$のパラメタベクトル。

$\mathbf{\eta}$ is a parameter vector with $D-1$ elements.

$\mathbf{\eta} = ((\eta_1,\eta_2,...,\eta_d,),(\eta_{1,2},...,\eta_{d-1,d}),...,(\eta_{1,2,...,d}))$

$\eta$の各要素は、変数(単独・組み合わせの)期待値を表す。(多元分割表で言えば)周辺頻度にも対応する。

$\eta$ corresponds to the expected value of single/combination of variables, or in other words, marginal frequency of multi-way table.

$$
\eta_{i} = Prob(x_i=1)\\
\eta_{i,j} = Prob(x_i=x_j=1)\\
...
\eta_{1,2,...,d} = Prob(x_1=x_2=...=x_d=1)
$$

# R codes in the ryamada22/Ronlyryamada github-package

# Utility functions and their usage example
```{r}
#library(devtools)
#install_github("ryamada22/Ronlyryamada")
library(Ronlyryamada)
example(my.haplotypes)
```

# Vignettes on functions

## Enumeration of $2^d$ patterns
d個の変数、$2^d$の場合の列挙。
1行目から順番に、値が1である変数の数が、0,1,...,dと増えるような並び方になっている。

$2^d$ patterns for d variables.
From the top row to the bottom, number of variables with value 1 increases like 0,1,...,d.


```{}
my.haplotypes <- function(d){
  n <- 2^d
	X <- matrix(0,d,n)
	cnt <- 1
	for(i in 1:d){
		tmp <- combn(d,i)
		for(j in 1:length(tmp[1,])){
			X[tmp[,j],cnt+j] <- 1
		}
		cnt <- cnt + length(tmp[1,])
	}
	return(X)
}
```
```{r}
d <- 3
my.haplotypes(d)
```

## Matrices to convert between frequency, theta and eta.

```{}
my.freq2eta.mat <- function(d){
  X <- my.haplotypes(d)
	n <- 2^d
	M <- matrix(0,n,n)

	for(i in 1:n){
		M[,i] <- apply(X >= X[,i],2,prod)
	}
	return(t(M[2:n,2:n]))
}
my.theta2freq.mat <- function(d){
  K <- my.freq2eta.mat(d)
	n <- 2^d
	M <- matrix(0,n,n)
	M[2:n,2:n] <- K
	M[,1] <- -1
	return(M)
}
my.freq2theta.mat <- function(d){
  M <- my.theta2freq.mat(d)
	return(solve(M))
}
# eta2freq.mat = inverse of freq2eta.mat
```
```{r}
my.freq2eta.mat(d)
my.theta2freq.mat(d)
my.freq2theta.mat(d)
```

## Transformations among three parameterizations

"freq", "$\theta$", "$eta$"は3通りのパラメタ表現。

"$\theta$"の場合、自動的に決まるD番目の値を計算しておく方がよい。ほかの2つのパラメタシステムの場合は足して1になるところからの差、と言う形で簡単に計算できるが、$psi$はそうではなく面倒くさいから。


"freq" stands for regular frequency vector with $2^d=D$ elements.

"$theta$" stands for e-flat parameters.

"$eta$" stands for m-flat parameters.

For $\theta$, the residual parameter "\psi" should be also calculated because it is a bit difficult than other two systems' D-th value, even though it is automatically determined. 

```{}
my.theta2psi <- function(theta){
  len <- length(theta)
	d <- log(len+1,2)
	M <- my.theta2freq.mat(d)
	#tmp <- M %*% c(0,theta)
	tmp <- my.Inf.matrix.mult(M,c(0,theta))
	delta <- sum(exp(tmp))
	psi <- log(delta)
	return(list(psi=psi,M=M))
}
#' @export

my.theta2freq <- function(theta){
	out <- my.theta2psi(theta)
	psi.theta <- c(out$psi,theta)
	M <- out$M
	return(exp(my.Inf.matrix.mult(M,psi.theta)))
}
#' @export

my.freq2thetapsi <- function(freq){
	d <- log(length(freq),2)
	M <- my.freq2theta.mat(d)
	#return(M %*% log(freq))
	return(my.Inf.matrix.mult(M,log(freq)))
}
#' @export

my.freq2eta <- function(freq){
	d <- log(length(freq),2)
	K <- my.freq2eta.mat(d)
	return(K %*% freq[-1])
}
#' @export

my.eta2freq <- function(eta){
	d <- log(length(eta)+1,2)
	K.inv <- solve(my.freq2eta.mat(d))
	tmp <- K.inv %*% eta
	return(c(1-sum(tmp),tmp))
}
#' @export

my.theta2eta <- function(theta){
	freq <- my.theta2freq(theta)
	return(my.freq2eta(freq))
}
#' @export

my.eta2thetapsi <- function(eta){
	freq <- my.eta2freq(eta)
	return(my.freq2thetapsi(freq))
}
```

## Infinite values の扱い

freqベクトルに0が含まれると、$\theta$が無限大になる。

無限大を含めて計算できるようにするために、少し、工夫が必要。そのユーティリティ関数。

無限大と非無限大とを分離して行列計算し、最後にまとめる。

```{r}
my.matrix.Inf.separator <- function(M){
  if(!is.matrix(M)){
		M <- matrix(M,ncol=1)
	}
	M.inf <- M.noninf <- matrix(0,length(M[,1]),length(M[1,]))
	infs <- which(is.infinite(M))
	noninfs <- which(!is.infinite(M))
	M.noninf[noninfs] <- M[noninfs]
	M.inf[infs] <- sign(M[infs])
	return(list(M.noninf=M.noninf,M.inf=M.inf))
}

my.Inf.matrix.mult <- function(M1,M2){
	out1 <- my.matrix.Inf.separator(M1)
	out2 <- my.matrix.Inf.separator(M2)
	
	tmp.n.n <- out1$M.noninf %*% out2$M.noninf
	tmp.i.i <- out1$M.inf %*% out2$M.inf
	tmp.n.i <- out1$M.noninf %*% out2$M.inf
	tmp.i.n <- out1$M.inf %*% out2$M.noninf
	
	tmp.any.i <- sign(tmp.i.i + tmp.n.i + tmp.i.n)
	non.zero <- which(tmp.any.i!=0)
	tmp.any.i[non.zero] <- tmp.any.i[non.zero] * Inf
	tmp <- tmp.n.n + tmp.any.i
	return(tmp)
}
```