いい感じの事前分布 - ryamadaの遺伝学・遺伝統計学メモ

事前分布(Prior)について、Wikiの記事の引用文献は1968年のそれである(Wikiの記事によれば、2009年のに再掲載されている
Maximum entropy Probability Distribution(Wiki)の話である
- 最大エントロピー原理は、ある情報が与えられた状態で(与えられていなくてもよいが)、その情報を満足するような確率分布でエントロピーが最大になるような分布を取りましょう、という話
  - エントロピーが最大ということは、何の決まりもないときのための「決まり」だから
- 置換群(transformation group/permutation group)/対称群的に考えてもよい(し、さらにこのことは別の視点から論じて、同じ結論になってもよいようだが…)
Rに何かパッケージがあるかなーと思ったら、mebootというパッケージがあって、「標本を与えて、それに適合させて乱数を発生」させるものがあった。時系列データのために作られているようだが、その用途に限ってしか使えないわけではないようだ
mebootパッケージのPDFにあるとおり、以下のアルゴリズムである
- Sort the original data in increasing order and store the ordering index vector.
- Compute intermediate points on the sorted series.
- Compute lower limit for left tail (xmin) and upper limit for right tail (xmax). This is done by computing the trim (e.g. 10
- Compute the mean of the maximum entropy density within each interval in such a way that the mean preserving constraint is satisfied. (Denoted as m_t in the reference paper.) The first and last interval means have distinct formulas. See Theil and Laitinen (1980) for details.
- Generate random numbers from the [0,1] uniform interval and compute sample quantiles at those points.
- Apply to the sample quantiles the correct order to keep the dependence relationships of the observed data.
- Repeat the previous steps several times (e.g. 999).
使ってみる
- オリジナル標本データと発生乱数のクオンタイル点の値をコプロットで比較してみる

library(meboot)
## Ensemble for the AirPassenger time series data
    set.seed(345)
    out <- meboot(x=AirPassengers, reps=100, trim=0.10, elaps=TRUE)
qs <- seq(from = 0, to = 1, length = 1000)
q1 <- quantile(AirPassengers,qs)
# out$ensembleが発生した乱数
q2 <- quantile(out$ensemble, qs)
plot(q1,q2)