- コドンの3番目の塩基はアミノ酸決定に関して情報量が少ないという
- こちらで、第1塩基と第2塩基の「えらさ」を議論している
- その「えらさ」を、数字にするとして、大雑把に、どうすることができるだろうか
- 第1,2,3の塩基ごとに、アミノ酸のどれに対応しているかを集計して、4x21(21は20種のアミノ酸と1種のストップコドン)の表を作って、カイ自乗検定統計量で見てみることにする
b1<-c("bT","bC","bA","bG")
b2<-c("bt","bc","ba","bg")
b3<-c("bた","bし","bあ","bぐ")
codon<-as.matrix(expand.grid(b3,b2,b1))
codonid<-expand.grid(1:4,1:4,1:4)
aa<-c("F","F","L","L","S","S","S","S","Y","Y","Z","Z","C","C","Z","W",
"L","L","L","L","P","P","P","P","H","H","Q","Q","R","R","R","R",
"I","I","I","M","T","T","T","T","N","N","K","K","S","S","R","R",
"V","V","V","V","A","A","A","A","D","D","E","E","G","G","G","G")
codonaa<-cbind(codon[,3],codon[,2],codon[,1],aa)
codonaa
freq<-list()
for(i in 1:3){
tmp<-codonaa[,c(i,4)]
tmptable<-table(tmp[,1],tmp[,2])
print(chisq.test(tmptable))
print(tmptable)
}
Pearson's Chi-squared test
data: tmptable
X-squared = 160, df = 60, p-value = 4.903e-11
A C D E F G H I K L M N P Q R S T V W Y Z
bA 0 0 0 0 0 0 0 3 2 0 1 2 0 0 2 2 4 0 0 0 0
bC 0 0 0 0 0 0 2 0 0 4 0 0 4 2 4 0 0 0 0 0 0
bG 4 0 2 2 0 4 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0
bT 0 2 0 0 2 0 0 0 0 2 0 0 0 0 0 4 0 0 1 2 3
Pearson's Chi-squared test
data: tmptable
X-squared = 176, df = 60, p-value = 2.487e-13
A C D E F G H I K L M N P Q R S T V W Y Z
ba 0 0 2 2 0 0 2 0 2 0 0 2 0 2 0 0 0 0 0 2 2
bc 4 0 0 0 0 0 0 0 0 0 0 0 4 0 0 4 4 0 0 0 0
bg 0 2 0 0 0 4 0 0 0 0 0 0 0 0 6 2 0 0 1 0 1
bt 0 0 0 0 2 0 0 3 0 6 1 0 0 0 0 0 0 4 0 0 0
Pearson's Chi-squared test
data: tmptable
X-squared = 30.6667, df = 60, p-value = 0.9994
A C D E F G H I K L M N P Q R S T V W Y Z
bあ 1 0 0 1 0 1 0 1 1 2 0 0 1 1 2 1 1 1 0 0 2
bぐ 1 0 0 1 0 1 0 0 1 2 1 0 1 1 2 1 1 1 1 0 1
bし 1 1 1 0 1 1 1 1 0 1 0 1 1 0 1 2 1 1 0 1 0
bた 1 1 1 0 1 1 1 1 0 1 0 1 1 0 1 2 1 1 0 1 0
- さて、上の処理では、4種類の塩基を順序なしカテゴリで扱い、21種類のアミノ酸+ストップも順序なしカテゴリで扱った
- アミノ酸を「化学的性質」でグループ化するには、aaオブジェクトの中身を調節すればほい
- 塩基をプリン・ピリミジンなどでグループ化するには、b1,b2,b3オブジェクトの中身を調整すればよい