2015-02-11

arima()とarima.sim()とからARIMAモデルを確認する

時系列 ARIMA

こちらで時系列データの扱いについて、だらだらと書いている(途中)
filtering,convolution,累積和,ARIMAモデルのことが出てくるが、それについての細かいことを、arima()関数とarima.sim()関数とのコードを眺めるという視点で確認する
いじった結果をshinyアプリでインターラクティブにしつつ文書にしてみました
- shinyアプリ公開にしようかと思ったのですが、エンコードとかで躓いたので、それはせずに、Rmdファイルを以下に(一つ目はShiny形式でインターラクティブ、後のファイルは、インターラクティブ部分を非実行にし、コードのみを記載した通常のRmd形式→epub化したのはこちらの非インターラクティブ版)：

非インターラクティブ版はepub化してryamada@kindleにも置いてあります

作者: ryamada
発売日: 2015/02/11
メディア: Kindle版
この商品を含むブログ (1件) を見る

---
title: "ARIMA Autoregressive integrated and moving average"
author: "ryamada"
date: "Wednesday, February 11, 2015"
output: html_document
runtime: shiny
---

# 基礎知識
ARIMAモデルというのは、時系列値列がMoving average(ma)という要素とAutoregression integrated(ar)という要素とを想定した確率過程

- maはfilter()関数による"convolution"的フィルタリングが働いていると考えるモデル
-- これは、数値列が局所平均があって、それが数値列の位置によって変動するというモデル
- raはfilter()関数による"recursive"的フィルタリングが働いていると考えるモデル
-- これは、未来が現在と過去の値に上乗せする形で決まると考えるモデル

http://wolfweb.unr.edu/~zal/STAT758/Lab_3.pdf
 のAppendixにarima.sim()を使った実習資料があるので、それに沿って、ARIMAモデルに沿ったデータ作成をしながら、ARIMAモデルについて理解していくことにする

# 正規乱数とフィルタリングの基礎

基本は正規乱数
```{r}
Z1<-rnorm(100)
```

## 変化させないフィルタリング、単純累積和を取るフィルタリング

フィルタをかける。convolutionフィルタとrecursiveフィルタの2つがある。
この２つのフィルタタイプは、ARIMA的にはautoregressive integratedとmoving averageとに相当するので、arフィルタ、maフィルタとも呼ぶことにする。

両者はフィルタとして何を用いるかで変化するが、最も基本的なフィルタを使うと、超基本的操作ができる。
maでは、何も変えない、という操作。

```{r}
f1 <- c(1)
X1.ma <- filter(Z1,f1,method="convolution")
plot(Z1,X1.ma)
```

arでは、累積和、という操作。
```{r}
f1 <- c(1)
X1.ar <- filter(Z1,f1,method="recursive")
plot(cumsum(Z1),X1.ar)
```

## 平滑化
maフィルタを掛けて平滑化する

次のようなフィルタを使う。指定数n=1,2,...に対して、長さ2n+1のベクトルを用意し、要素の値が、標準正規分布確率密度関数値(-n),(-n+1),...,0,1,...,nに比例した重みを割り付けるようなフィルタ

フィルタの重みベクトルは、自身とその前後に掛かる

N は2n+1より大きい数。
黒点が乱数。
赤線はmaフィルタ。


```{r,echo=FALSE}
X <- rnorm(10000)
inputPanel(
  sliderInput("N", label = "no values:",
              min = 1, max = 5000, value = 1000, step = 1),
  sliderInput("n", label = "value specifies length of filter:",
              min = 0, max = 1000, value = 0, step = 1)
  )
  
renderPlot({
  n <- input$n
  #tmp <- dnorm((-n):n,0,input$r)
  tmp <- rep(1,2*n+1)
  fil <- tmp/sum(tmp)
  
  x <- X[1:input$N]
  fout <- filter(x,fil,method="convolution")
  #fout2 <- filter(x,fil,method="recursive")
  #ylim <- range(na.omit(c(x,cumsum(x),fout,fout2)))
  ylim <- range(na.omit(c(x,fout)))
  yr <- ylim[2]-ylim[1]
  ylim[1] <- ylim[1] - yr*0.1
  ylim[2] <- ylim[2] + yr*0.1
  plot(x,pch=20,ylim=ylim)
  points(fout,type="l",col=2,lwd=5)
  #points(cumsum(x),pch=1,col=1)
  #points(fout2,type="l",col=3)

  
})
```

## 再帰フィルタ(Recursive filter)

再帰フィルタは、過去の情報を累積的に、どんどんため込む形をしている。
したがって、単調に増加していくときには、指数関数的な増加が現れる。

逆に、単純な累積和であると、増えたり減ったりが激しく見えても、一貫して増えているわけではなく、増えたり減ったりの繰り返しであるなら、平坦な値列に変換するようなフィルタである。

フィルタベクトルは、自身より前の値に掛かる。

```{r}
x <- rep(1,10)
y1 <- filter(x,c(1),method="recursive")
y2 <- filter(x,c(1,1),method="r")
y3 <- filter(x,c(1,1,1),method="r")
matplot(cbind(y1,y2,y3),type="l")
```

具体的にコードがあった方がわかるので、作ってみると、頭の方から、filter後の値を作り、それを累加していくようになっていることがわかる。

```{r}
my.recursive.filter <- function(x,f){
  ret <- x
	for(i in 1:length(x)){
		for(j in 1:length(f)){
			if(i-j>0){
				ret[i] <- ret[i] + f[j]*ret[i-j]
			}
		}
	}
	ret
}

x <- rnorm(10)
my.recursive.filter(x,c(1,1))
filter(x,c(1,1),"r")
```

N はn+1より大きい数。
黒点が乱数の単純累積和。
赤線はraフィルタ。


```{r,echo=FALSE}
X <- rnorm(10000)
inputPanel(
  sliderInput("N2", label = "no values:",
              min = 1, max = 5000, value = 1000, step = 1),
  sliderInput("n2", label = "value specifies length of filter:",
              min = 0, max = 10, value = 0, step = 1)
  )
  
renderPlot({
  n <- input$n2
  #tmp <- dnorm((-n):n,0,input$r)
  tmp <- rep(1,n+1)
  fil <- tmp/sum(tmp)
  
  x <- X[1:input$N2]
  #fout <- filter(x,fil,method="convolution")
  fout2 <- filter(x,fil,method="recursive")
  #ylim <- range(na.omit(c(x,cumsum(x),fout,fout2)))
  ylim <- range(na.omit(c(cumsum(x),fout2)))
  yr <- ylim[2]-ylim[1]
  ylim[1] <- ylim[1] - yr*0.1
  ylim[2] <- ylim[2] + yr*0.1
  plot(cumsum(x),pch=20,ylim=ylim)
  points(fout2,type="l",col=2,lwd=5)

})
```

## フィルタを使って時系列データをシミュレーション作成する

正規乱数列を発生し、それをそのまま、値列とするとき、maフィルタを使うなら
```{r}
x <- rnorm(10)
x.ma <- filter(x,c(1),method="convolution",sides = 1L) # sides=1Lは過去側のみのフィルタリングを指定
x
x.ma
plot(x.ma)
```

raモデルを使うなら

```{r}
x.ra <- filter(x,c(0),method="recursive")
x
x.ra
plot(x.ra)
```

フィルタの長さを固定して、フィルタを色々に変えてどのようになるかを見てみよう。
ma = (1,0,0)の場合とar = (0,0,0)の場合は、発生した乱数そのもので、白色ノイズとなっている。

maの方は３パラメタを動かしてもがたぼこの具合に若干の違いが出るだけであるが、arの方は、急に値が発散したりする様子がわかる。

また、ar = (1,0,0)のときは、単純な酔歩になる様子も見て取れる。

```{r,echo=FALSE}
X <- rnorm(10000)
inputPanel(

  sliderInput("N3", label = "no values:",
              min = 1, max = 5000, value = 1000, step = 1),
  sliderInput("ar1", label = "ar1:",
              min = -2, max = 2, value = 0, step = 0.01),
  sliderInput("ar2", label = "ar2:",
              min = -2, max = 2, value = 0, step = 0.01),
  sliderInput("ar3", label = "ar3:",
              min = -2, max = 2, value = 0, step = 0.01),
  sliderInput("ma1", label = "ma1:",
              min = -2, max = 2, value = 1, step = 0.01),
  sliderInput("ma2", label = "ma2:",
              min = -2, max = 2, value = 0, step = 0.01),
  sliderInput("ma3", label = "ma3:",
              min = -2, max = 2, value = 0, step = 0.01)
    
  )
  
renderPlot({
  N <- input$N3
  ar <- c(input$ar1,input$ar2,input$ar3)
  ma <- c(input$ma1,input$ma2,input$ma3)
  x <- X[1:N]
  foutma <- filter(x,ma,method="convolution",sides=1L) # sides=1Lは過去側のみのフィルタリングを指定
  foutar <- filter(x,ar,method="recursive")

  #ylim <- range(na.omit(c(x,cumsum(x),fout,fout2)))
  ylim <- range(na.omit(c(foutma,foutar)))
  yr <- ylim[2]-ylim[1]
  ylim[1] <- ylim[1] - yr*0.1
  ylim[2] <- ylim[2] + yr*0.1
  plot(foutma,type="l",ylim=ylim)
  points(foutar,type="l",col=2)

})
```
# ARIMAモデル

## RのARIMAモデル関数
ARIMAモデルはmaフィルタで説明できる部分とarフィルタで説明できる部分とが合わさって時系列データを作るモデルであり、
実データに対してARIMAモデルを適用し、そのmaフィルタ・arフィルタ等を推定する。

RのARIMA関係関数には、その推定をするarima()関数とARIMAモデルに基づいて時系列データをシミュレーション作成する関数arima.sim()がある。

コードは以下で表示できる。

```{r}
# arima
# arima.sim
```

## arima.sim() 関数の引数

まず、arima.sim()の引数を確認しよう
```{r}
args(arima.sim)
```

### model 引数
引数 modelはもっとも基幹になる引数で
```{}
model <- list(ar = c(0.3,0.2),ma = c(0.1,0.1))
```
のように2ベクトルからなるリストである。
これが関数内部で以下のように使われる
```{}
if (length(model$ma)) {
        x <- filter(x, c(1, model$ma), sides = 1L) # method = "convolution" がデフォルト
        x[seq_along(model$ma)] <- 0 # NAを0にする処理
    }
    if (length(model$ar)) 
        x <- filter(x, model$ar, method = "recursive")
```
初めにmaフィルタを掛け、次にarフィルタを掛けている。
maフィルタのsides=1Lとは、フィルタを過去側のみに作用させる手法である。

また、maフィルタの冒頭に1が付加されている。これは、maがなくarもないときに、maフィルタがc(1)となるようにするための処理である。

したがって、arima.sim()関数の骨格は、ma型とar型のフィルタリングの連続実施であることがわかる。

### model$order 引数
それ以外の処理は、"stop("'model' must be list")"のように、処理を中止する条件確認がいくつかあるほかは、次の処理が大きな処理である。



```{}
x <- diffinv(x, differences = d)
```
これは、ma,raフィルタリングの結果であるxを引数dで処理する行である。
このdは基幹引数であるmodelというリストのオプショナルな1成分で、長さ3のベクトルの第二要素である。

ma,arフィルタで生成したxがy[k]-y[k-d]差分となるようなベクトルyを生成している。

```{r}
x.2 <- rnorm(10)
d <- 2
x.0 <- diffinv(x.2,d)
diff(x.0,d)
x.2
```

## model$ar 引数と発散
それ以外に気になる点と言えば、arの条件として、以下のものがある。


```{}
minroots <- min(Mod(polyroot(c(1, -model$ar))))
        if (minroots <= 1) 
            stop("'ar' part of model is not stationary")
```
arフィルタが、累積和的な動きをすることと関係していて、再帰度が上がれば上がるほど発散に向ける力が強くなることことと対応している。

これからわかるのは、$1 = ar[1]x+ar[2]x^2+...$の根はすべて、その絶対値は１より大であることが、stationary(発散しない)条件であるということである。


やってみる。

この多項式条件をクリアする。
根をランダムに作成する。複素根の場合は共役複素数も根であるはず(実数係数多項式だから)であるという条件とともに、以下のように、実根数、複素根数を指定し、絶対値の１つはすべて１より大とすることにする.

これを用いてARIMAシミュレーションをする。filter()を使うバージョンと、arima.sim()を使う場合とを両方試す。
```{r}
library(polynom)
n.real.roots <- 10
nhalf.complex.roots <- 5 # conjugate pairs as roots


thetas <- runif(nhalf.complex.roots)*2*pi
# All mods should be more than 1
mods <- runif(n.real.roots+nhalf.complex.roots) * 5 + 1

#ss <- sample(c(-1,1),n.real.roots+nhalf.complex.roots,replace=TRUE)

mods.real <- mods[1:n.real.roots]
mods.complex <- mods[(1+n.real.roots):(n.real.roots+nhalf.complex.roots)]
roots.R <- mods.real
roots.C1 <- mods.complex * exp(1i*thetas)
roots.C2 <- mods.complex * exp(1i*(-thetas))
roots <- c(roots.R,roots.C1,roots.C2)
polynom <- poly.from.roots(roots)
polynom
ar <- (-polynom/polynom[1])[-1]
min(Mod(polyroot(c(1, -ar))))


x <- rnorm(1000)

x.. <- arima.sim(model=list(ar=ar),n=1000)

x. <- filter(x,ar,method="recursive")
par(mfcol=c(1,2))
plot(x.)
plot(x..)
par(mfcol=c(1,1))

```

## rand.gen 引数

ランダム数列を発生させる際に、デフォルトでは、rnorm() 関数が使われる。

それ以外のランダム関数も使える。

```{r}
x <- rexp(1000,rate=1.3)
x. <- filter(x,ar,method="recursive")

plot(x.)
x.. <- arima.sim(model=list(ar=ar),n=1000,rand.gen=rexp,rate=1.3)
plot(x..)
```

# arima.sim()の資料でarima.sim()について確認する

http://wolfweb.unr.edu/~zal/STAT758/Lab_3.pdf　のAppendixにある例をなぞって、arima.sim()について確認する

二種類の正規乱数列Z1,Z2について、ma/convolutionタイプのフィルタリングをしている。arima.sim()ではconvolution フィルタを過去側にのみ掛けるが、リンク先資料では、両側フィルタとなっている。念のため片側の場合も付け加えてみよう。

値なしの要素が発生するが、その数に違いがあるので、シフトしてみえる。

```{r}
#=================================#
# STAT 758, Fall 2008 #
# Commands used in Lab 3 #
#=================================#
# White noises
#===========================
Z1<-rnorm(100)
Z2<-rnorm(100,mean=10,sd=2)
# MA simulation by filter()
#===========================
X1<-filter(Z1,rep(1,10))
X2<-filter(Z2,c(.6,.3,.1))
X3 <- filter(Z1,rep(1,10),sides=1)
X4 <- filter(Z2,c(.6,.3,.1),sides=1)
par(mfcol=c(1,2))
plot(X1)
points(X3,type="l",col=2)
plot(X2)
points(X4,type="l",col=2)
par(mfcol=c(1,1))
```

ar/recursive タイプのフィルタリングをしてみる

```{r}
# AR simulation by filter()
#===================================
X1.2<-filter(Z1,c(1,1),method='r')
X2.2<-filter(Z2,c(.6,.3,.1),method='r')
par(mfcol=c(1,2))
plot(X1.2)
plot(X2.2)
par(mfcol=c(1,1))
```
```{r}
# ARMA simulation using arima.sim()
#===================================
par(mfrow=c(2,4))
X<-arima.sim(n=100,list(ar=c(0.9),ma=c(0.2))) # ARMA(1,1)
plot(X)
X<-arima.sim(n=100,list(ar=c(0.9))) # AR(1)
plot(X)
X<-arima.sim(n=100,list(ma=c(0.2))) # MA(1)
plot(X)
X<-arima.sim(n=100,list(ma=c(0.2)),sd=10) # MA(1) with WN(0,10)
# sd=10はrnorm()関数の引数として与えられている
plot(X)
X<-arima.sim(n=100,list(ar=c(0.9)),innov=Z1) # AR(1) with given WN Z1
# innovは乱数列(の途中から後)に特定の数列を与える。前部分はstart.innov引数で与える
plot(X)
X<-arima.sim(n=100,list(ar=c(0.9)),rand.gen=rt,df=10) # AR(1) with mild long-tail (Student's t with df=10)
plot(X)
X<-arima.sim(n=100,list(ar=c(0.9)),rand.gen=runif) # AR(1) with U[0,1] WN
plot(X)
```

## ARMAではautocorrelation,partial autocorrelationの理論値がある

ARIMAモデルでは、タイムラグの程度ごとに、autocorrelation,partial autocorrelation の理論値がある。

```{r}
# ACF and PACF computations (theoretical)
#========================================
A<-ARMAacf(ar=c(.2),ma=c(.3,.2),lag.max=20)
P<-ARMAacf(ar=c(.2),ma=c(.3,.2),lag.max=20,pacf=TRUE)
par(mfcol=c(1,2))
plot(A,type="h")
plot(P,type="h")
par(mfcol=c(1,1))
```
## ARMAは、MAのみのモデルに変換することができる

以下では、ARMAtoMAによって、arパラメタが長さ1のベクトルであるときに、そのARMA過程がMAだったらどういうパラメタになるかを算出し、そのパラメタをma=としてARMAacfに与え、autocorrelation, partial autocorrelationが、ARMAのそれと同等になることを確認している

```{r}
# MA representation for ARMA
#============================
ar <- c(0.8)
lag.max <- 20
ma.eq<-ARMAtoMA(ar=ar, lag.max=lag.max)
#P<-ARMAtoMA(ar=c(0.9,.1),ma=c(1,2,3), lag.max=5)

A<-ARMAacf(ar=ar,lag.max=lag.max)
A. <- ARMAacf(ma=ma.eq,lag.max=lag.max)
par(mfcol=c(1,2))
plot(A,A.)
abline(0,1,col=2)
P<-ARMAacf(ar=ar,lag.max=lag.max,pacf=TRUE)
P. <- ARMAacf(ma=ma.eq,lag.max=lag.max,pacf=TRUE)
plot(P,P.)
abline(0,1,col=2)
par(mfcol=c(1,1))
```
同じことをarパラメタベクトルの長さが2以上のときでもやってみる。

```{r}
# MA representation for ARMA
#============================
ar <- c(0.3,0.1,-0.1)
lag.max <- 20
ma.eq<-ARMAtoMA(ar=ar, lag.max=lag.max)
#P<-ARMAtoMA(ar=c(0.9,.1),ma=c(1,2,3), lag.max=5)

A<-ARMAacf(ar=ar,lag.max=lag.max)
A. <- ARMAacf(ma=ma.eq,lag.max=lag.max)
par(mfcol=c(1,2))
plot(A,A.)
abline(0,1,col=2)
P<-ARMAacf(ar=ar,lag.max=lag.max,pacf=TRUE)
P. <- ARMAacf(ma=ma.eq,lag.max=lag.max,pacf=TRUE)
plot(P,P.)
abline(0,1,col=2)
par(mfcol=c(1,1))
```
ARIMA過程では、分散という指標があり、その理論値がある。それをmaとみなし、そのma用パラメタから算出することで近似がうまくいくことを示している。

```{r}
# Variance computation
#================================
P<-ARMAtoMA(ar=0.9, lag.max=100)
sum(P^2)+1
```

# ARIMAモデル推定

## ARIMAもでるのar,maフィルタ推定
ARIMAがどうなっているかは分かったので、それをデータから推定する話。

arima.sim()がやっていたように、
ar部分にいくつのフィルタパラメタを置き、ma部分に、いくつを置き、時間幅として、何区分分のずれを想定するか、とによって、ARIMAデータができているものとして、推定する。

orderという引数がそれで、arパラメタ数、区分ずれ数、maパラメタ数の３整数からなる。

次の例では、order=c(1,0,0)なので、区分ずれはなく、maは関与せず、arが1変数で決まるモデルでの推定となっている。
推定は、いろいろたくさんあるが…
```{r}
# ARMA estimation
#================================
X<-arima.sim(n=100,list(ar=0.9))
est<-arima(X,order=c(1,0,0))
str(est)
```

推定係数は次のように、切片(不動部分)とar1とからなっている。シミュレーションで用いたar=0.9とよく合致している
```{r}
est$coef
```
モデルをar+maと複雑にすると、nを大きくしないと推定値はよくならない。

```{r}
# ARMA estimation
#================================
X<-arima.sim(n=100000,list(ar=c(0.2),ma=c(0.2)))
est<-arima(X,order=c(1,0,1))
est$coef
```
orderパラメタもいじってみよう。

```{r}
# ARMA estimation
#================================
X<-arima.sim(n=100000,list(ar=c(0.2),ma=c(0.2),order=c(1,2,1)))
est<-arima(X,order=c(1,2,1))
est$coef
```

## autocorrelation, partial autocorrelation推定

ARIMAモデルからデータをシミュレーション作成し、その時系列データのautocorrelation, partial autocorrelationを計算し、それが、理論値とどれくらい近いかを確認してみる。


```{r}
# ACF, PACF estimation
#=====================================================================
n <- 100000
ar <- c(0.3,0.2)
ma <- c(0.1)
order <- c(length(ar),0,length(ma))
lag.max <- 20
X<-arima.sim(n=n,list(ar=ar,ma=ma,order=order)) # AR(1) model
out.cor <- acf(X,lag.max = lag.max,type = c("correlation"),plot=FALSE)
theor.cor <- ARMAacf(ar=ar,lag.max=lag.max)
plot(out.cor$lag,out.cor$acf,type="h")
points(out.cor$lag,theor.cor,col=2)
out.cor2 <- pacf(X,lag.max = lag.max,type = c("correlation"),plot=FALSE)
theor.cor2 <- ARMAacf(ar=ar,lag.max=lag.max,pacf=TRUE)
plot(out.cor2$lag,out.cor2$acf,type="h")
points(out.cor2$lag,theor.cor2,col=2)
```

非インターラクティブ版

---
title: "ARIMA Autoregressive integrated and moving average"
author: "ryamada"
date: "Wednesday, February 11, 2015"
output: html_document
---

# 基礎知識
ARIMAモデルというのは、時系列値列がMoving average(ma)という要素とAutoregression integrated(ar)という要素とを想定した確率過程

- maはfilter()関数による"convolution"的フィルタリングが働いていると考えるモデル
-- これは、数値列が局所平均があって、それが数値列の位置によって変動するというモデル
- raはfilter()関数による"recursive"的フィルタリングが働いていると考えるモデル
-- これは、未来が現在と過去の値に上乗せする形で決まると考えるモデル

http://wolfweb.unr.edu/~zal/STAT758/Lab_3.pdf
 のAppendixにarima.sim()を使った実習資料があるので、それに沿って、ARIMAモデルに沿ったデータ作成をしながら、ARIMAモデルについて理解していくことにする

# 正規乱数とフィルタリングの基礎

基本は正規乱数
```{r}
Z1<-rnorm(100)
```

## 変化させないフィルタリング、単純累積和を取るフィルタリング

フィルタをかける。convolutionフィルタとrecursiveフィルタの2つがある。
この２つのフィルタタイプは、ARIMA的にはautoregressive integratedとmoving averageとに相当するので、arフィルタ、maフィルタとも呼ぶことにする。

両者はフィルタとして何を用いるかで変化するが、最も基本的なフィルタを使うと、超基本的操作ができる。
maでは、何も変えない、という操作。

```{r}
f1 <- c(1)
X1.ma <- filter(Z1,f1,method="convolution")
plot(Z1,X1.ma)
```

arでは、累積和、という操作。
```{r}
f1 <- c(1)
X1.ar <- filter(Z1,f1,method="recursive")
plot(cumsum(Z1),X1.ar)
```

## 平滑化
maフィルタを掛けて平滑化する

次のようなフィルタを使う。指定数n=1,2,...に対して、長さ2n+1のベクトルを用意し、要素の値が、標準正規分布確率密度関数値(-n),(-n+1),...,0,1,...,nに比例した重みを割り付けるようなフィルタ

フィルタの重みベクトルは、自身とその前後に掛かる

N は2n+1より大きい数。
黒点が乱数。
赤線はmaフィルタ。


```{r,eval=FALSE}
X <- rnorm(10000)
inputPanel(
  sliderInput("N", label = "no values:",
              min = 1, max = 5000, value = 1000, step = 1),
  sliderInput("n", label = "value specifies length of filter:",
              min = 0, max = 1000, value = 0, step = 1)
  )
  
renderPlot({
  n <- input$n
  #tmp <- dnorm((-n):n,0,input$r)
  tmp <- rep(1,2*n+1)
  fil <- tmp/sum(tmp)
  
  x <- X[1:input$N]
  fout <- filter(x,fil,method="convolution")
  #fout2 <- filter(x,fil,method="recursive")
  #ylim <- range(na.omit(c(x,cumsum(x),fout,fout2)))
  ylim <- range(na.omit(c(x,fout)))
  yr <- ylim[2]-ylim[1]
  ylim[1] <- ylim[1] - yr*0.1
  ylim[2] <- ylim[2] + yr*0.1
  plot(x,pch=20,ylim=ylim)
  points(fout,type="l",col=2,lwd=5)
  #points(cumsum(x),pch=1,col=1)
  #points(fout2,type="l",col=3)

  
})
```

## 再帰フィルタ(Recursive filter)

再帰フィルタは、過去の情報を累積的に、どんどんため込む形をしている。
したがって、単調に増加していくときには、指数関数的な増加が現れる。

逆に、単純な累積和であると、増えたり減ったりが激しく見えても、一貫して増えているわけではなく、増えたり減ったりの繰り返しであるなら、平坦な値列に変換するようなフィルタである。

フィルタベクトルは、自身より前の値に掛かる。

```{r}
x <- rep(1,10)
y1 <- filter(x,c(1),method="recursive")
y2 <- filter(x,c(1,1),method="r")
y3 <- filter(x,c(1,1,1),method="r")
matplot(cbind(y1,y2,y3),type="l")
```

具体的にコードがあった方がわかるので、作ってみると、頭の方から、filter後の値を作り、それを累加していくようになっていることがわかる。

```{r}
my.recursive.filter <- function(x,f){
  ret <- x
	for(i in 1:length(x)){
		for(j in 1:length(f)){
			if(i-j>0){
				ret[i] <- ret[i] + f[j]*ret[i-j]
			}
		}
	}
	ret
}

x <- rnorm(10)
my.recursive.filter(x,c(1,1))
filter(x,c(1,1),"r")
```

N はn+1より大きい数。
黒点が乱数の単純累積和。
赤線はraフィルタ。


```{r,eval=FALSE}
X <- rnorm(10000)
inputPanel(
  sliderInput("N2", label = "no values:",
              min = 1, max = 5000, value = 1000, step = 1),
  sliderInput("n2", label = "value specifies length of filter:",
              min = 0, max = 10, value = 0, step = 1)
  )
  
renderPlot({
  n <- input$n2
  #tmp <- dnorm((-n):n,0,input$r)
  tmp <- rep(1,n+1)
  fil <- tmp/sum(tmp)
  
  x <- X[1:input$N2]
  #fout <- filter(x,fil,method="convolution")
  fout2 <- filter(x,fil,method="recursive")
  #ylim <- range(na.omit(c(x,cumsum(x),fout,fout2)))
  ylim <- range(na.omit(c(cumsum(x),fout2)))
  yr <- ylim[2]-ylim[1]
  ylim[1] <- ylim[1] - yr*0.1
  ylim[2] <- ylim[2] + yr*0.1
  plot(cumsum(x),pch=20,ylim=ylim)
  points(fout2,type="l",col=2,lwd=5)

})
```

## フィルタを使って時系列データをシミュレーション作成する

正規乱数列を発生し、それをそのまま、値列とするとき、maフィルタを使うなら
```{r}
x <- rnorm(10)
x.ma <- filter(x,c(1),method="convolution",sides = 1L) # sides=1Lは過去側のみのフィルタリングを指定
x
x.ma
plot(x.ma)
```

raモデルを使うなら

```{r}
x.ra <- filter(x,c(0),method="recursive")
x
x.ra
plot(x.ra)
```

フィルタの長さを固定して、フィルタを色々に変えてどのようになるかを見てみよう。
ma = (1,0,0)の場合とar = (0,0,0)の場合は、発生した乱数そのもので、白色ノイズとなっている。

maの方は３パラメタを動かしてもがたぼこの具合に若干の違いが出るだけであるが、arの方は、急に値が発散したりする様子がわかる。

また、ar = (1,0,0)のときは、単純な酔歩になる様子も見て取れる。

```{r,eval=FALSE}
X <- rnorm(10000)
inputPanel(

  sliderInput("N3", label = "no values:",
              min = 1, max = 5000, value = 1000, step = 1),
  sliderInput("ar1", label = "ar1:",
              min = -2, max = 2, value = 0, step = 0.01),
  sliderInput("ar2", label = "ar2:",
              min = -2, max = 2, value = 0, step = 0.01),
  sliderInput("ar3", label = "ar3:",
              min = -2, max = 2, value = 0, step = 0.01),
  sliderInput("ma1", label = "ma1:",
              min = -2, max = 2, value = 1, step = 0.01),
  sliderInput("ma2", label = "ma2:",
              min = -2, max = 2, value = 0, step = 0.01),
  sliderInput("ma3", label = "ma3:",
              min = -2, max = 2, value = 0, step = 0.01)
    
  )
  
renderPlot({
  N <- input$N3
  ar <- c(input$ar1,input$ar2,input$ar3)
  ma <- c(input$ma1,input$ma2,input$ma3)
  x <- X[1:N]
  foutma <- filter(x,ma,method="convolution",sides=1L) # sides=1Lは過去側のみのフィルタリングを指定
  foutar <- filter(x,ar,method="recursive")

  #ylim <- range(na.omit(c(x,cumsum(x),fout,fout2)))
  ylim <- range(na.omit(c(foutma,foutar)))
  yr <- ylim[2]-ylim[1]
  ylim[1] <- ylim[1] - yr*0.1
  ylim[2] <- ylim[2] + yr*0.1
  plot(foutma,type="l",ylim=ylim)
  points(foutar,type="l",col=2)

})
```
# ARIMAモデル

## RのARIMAモデル関数
ARIMAモデルはmaフィルタで説明できる部分とarフィルタで説明できる部分とが合わさって時系列データを作るモデルであり、
実データに対してARIMAモデルを適用し、そのmaフィルタ・arフィルタ等を推定する。

RのARIMA関係関数には、その推定をするarima()関数とARIMAモデルに基づいて時系列データをシミュレーション作成する関数arima.sim()がある。

コードは以下で表示できる。

```{r}
# arima
# arima.sim
```

## arima.sim() 関数の引数

まず、arima.sim()の引数を確認しよう
```{r}
args(arima.sim)
```

### model 引数
引数 modelはもっとも基幹になる引数で
```{}
model <- list(ar = c(0.3,0.2),ma = c(0.1,0.1))
```
のように2ベクトルからなるリストである。
これが関数内部で以下のように使われる
```{}
if (length(model$ma)) {
        x <- filter(x, c(1, model$ma), sides = 1L) # method = "convolution" がデフォルト
        x[seq_along(model$ma)] <- 0 # NAを0にする処理
    }
    if (length(model$ar)) 
        x <- filter(x, model$ar, method = "recursive")
```
初めにmaフィルタを掛け、次にarフィルタを掛けている。
maフィルタのsides=1Lとは、フィルタを過去側のみに作用させる手法である。

また、maフィルタの冒頭に1が付加されている。これは、maがなくarもないときに、maフィルタがc(1)となるようにするための処理である。

したがって、arima.sim()関数の骨格は、ma型とar型のフィルタリングの連続実施であることがわかる。

### model$order 引数
それ以外の処理は、"stop("'model' must be list")"のように、処理を中止する条件確認がいくつかあるほかは、次の処理が大きな処理である。



```{}
x <- diffinv(x, differences = d)
```
これは、ma,raフィルタリングの結果であるxを引数dで処理する行である。
このdは基幹引数であるmodelというリストのオプショナルな1成分で、長さ3のベクトルの第二要素である。

ma,arフィルタで生成したxがy[k]-y[k-d]差分となるようなベクトルyを生成している。

```{r}
x.2 <- rnorm(10)
d <- 2
x.0 <- diffinv(x.2,d)
diff(x.0,d)
x.2
```

## model$ar 引数と発散
それ以外に気になる点と言えば、arの条件として、以下のものがある。


```{}
minroots <- min(Mod(polyroot(c(1, -model$ar))))
        if (minroots <= 1) 
            stop("'ar' part of model is not stationary")
```
arフィルタが、累積和的な動きをすることと関係していて、再帰度が上がれば上がるほど発散に向ける力が強くなることことと対応している。

これからわかるのは、$1 = ar[1]x+ar[2]x^2+...$の根はすべて、その絶対値は１より大であることが、stationary(発散しない)条件であるということである。


やってみる。

この多項式条件をクリアする。
根をランダムに作成する。複素根の場合は共役複素数も根であるはず(実数係数多項式だから)であるという条件とともに、以下のように、実根数、複素根数を指定し、絶対値の１つはすべて１より大とすることにする.

これを用いてARIMAシミュレーションをする。filter()を使うバージョンと、arima.sim()を使う場合とを両方試す。
```{r}
library(polynom)
n.real.roots <- 10
nhalf.complex.roots <- 5 # conjugate pairs as roots


thetas <- runif(nhalf.complex.roots)*2*pi
# All mods should be more than 1
mods <- runif(n.real.roots+nhalf.complex.roots) * 5 + 1

#ss <- sample(c(-1,1),n.real.roots+nhalf.complex.roots,replace=TRUE)

mods.real <- mods[1:n.real.roots]
mods.complex <- mods[(1+n.real.roots):(n.real.roots+nhalf.complex.roots)]
roots.R <- mods.real
roots.C1 <- mods.complex * exp(1i*thetas)
roots.C2 <- mods.complex * exp(1i*(-thetas))
roots <- c(roots.R,roots.C1,roots.C2)
polynom <- poly.from.roots(roots)
polynom
ar <- (-polynom/polynom[1])[-1]
min(Mod(polyroot(c(1, -ar))))


x <- rnorm(1000)

x.. <- arima.sim(model=list(ar=ar),n=1000)

x. <- filter(x,ar,method="recursive")
par(mfcol=c(1,2))
plot(x.)
plot(x..)
par(mfcol=c(1,1))

```

## rand.gen 引数

ランダム数列を発生させる際に、デフォルトでは、rnorm() 関数が使われる。

それ以外のランダム関数も使える。

```{r}
x <- rexp(1000,rate=1.3)
x. <- filter(x,ar,method="recursive")

plot(x.)
x.. <- arima.sim(model=list(ar=ar),n=1000,rand.gen=rexp,rate=1.3)
plot(x..)
```

# arima.sim()の資料でarima.sim()について確認する

http://wolfweb.unr.edu/~zal/STAT758/Lab_3.pdf　のAppendixにある例をなぞって、arima.sim()について確認する

二種類の正規乱数列Z1,Z2について、ma/convolutionタイプのフィルタリングをしている。arima.sim()ではconvolution フィルタを過去側にのみ掛けるが、リンク先資料では、両側フィルタとなっている。念のため片側の場合も付け加えてみよう。

値なしの要素が発生するが、その数に違いがあるので、シフトしてみえる。

```{r}
# White noises
#===========================
Z1<-rnorm(100)
Z2<-rnorm(100,mean=10,sd=2)
# MA simulation by filter()
#===========================
X1<-filter(Z1,rep(1,10))
X2<-filter(Z2,c(.6,.3,.1))
X3 <- filter(Z1,rep(1,10),sides=1)
X4 <- filter(Z2,c(.6,.3,.1),sides=1)
par(mfcol=c(1,2))
plot(X1)
points(X3,type="l",col=2)
plot(X2)
points(X4,type="l",col=2)
par(mfcol=c(1,1))
```

ar/recursive タイプのフィルタリングをしてみる

```{r}
# AR simulation by filter()
#===================================
X1.2<-filter(Z1,c(1,1),method='r')
X2.2<-filter(Z2,c(.6,.3,.1),method='r')
par(mfcol=c(1,2))
plot(X1.2)
plot(X2.2)
par(mfcol=c(1,1))
```
```{r}
# ARMA simulation using arima.sim()
#===================================
par(mfrow=c(2,4))
X<-arima.sim(n=100,list(ar=c(0.9),ma=c(0.2))) # ARMA(1,1)
plot(X)
X<-arima.sim(n=100,list(ar=c(0.9))) # AR(1)
plot(X)
X<-arima.sim(n=100,list(ma=c(0.2))) # MA(1)
plot(X)
X<-arima.sim(n=100,list(ma=c(0.2)),sd=10) # MA(1) with WN(0,10)
# sd=10はrnorm()関数の引数として与えられている
plot(X)
X<-arima.sim(n=100,list(ar=c(0.9)),innov=Z1) # AR(1) with given WN Z1
# innovは乱数列(の途中から後)に特定の数列を与える。前部分はstart.innov引数で与える
plot(X)
X<-arima.sim(n=100,list(ar=c(0.9)),rand.gen=rt,df=10) # AR(1) with mild long-tail (Student's t with df=10)
plot(X)
X<-arima.sim(n=100,list(ar=c(0.9)),rand.gen=runif) # AR(1) with U[0,1] WN
plot(X)
```

## ARMAではautocorrelation,partial autocorrelationの理論値がある

ARIMAモデルでは、タイムラグの程度ごとに、autocorrelation,partial autocorrelation の理論値がある。

```{r}
# ACF and PACF computations (theoretical)
#========================================
A<-ARMAacf(ar=c(.2),ma=c(.3,.2),lag.max=20)
P<-ARMAacf(ar=c(.2),ma=c(.3,.2),lag.max=20,pacf=TRUE)
par(mfcol=c(1,2))
plot(A,type="h")
plot(P,type="h")
par(mfcol=c(1,1))
```
## ARMAは、MAのみのモデルに変換することができる

以下では、ARMAtoMAによって、arパラメタが長さ1のベクトルであるときに、そのARMA過程がMAだったらどういうパラメタになるかを算出し、そのパラメタをma=としてARMAacfに与え、autocorrelation, partial autocorrelationが、ARMAのそれと同等になることを確認している

```{r}
# MA representation for ARMA
#============================
ar <- c(0.8)
lag.max <- 20
ma.eq<-ARMAtoMA(ar=ar, lag.max=lag.max)
#P<-ARMAtoMA(ar=c(0.9,.1),ma=c(1,2,3), lag.max=5)

A<-ARMAacf(ar=ar,lag.max=lag.max)
A. <- ARMAacf(ma=ma.eq,lag.max=lag.max)
par(mfcol=c(1,2))
plot(A,A.)
abline(0,1,col=2)
P<-ARMAacf(ar=ar,lag.max=lag.max,pacf=TRUE)
P. <- ARMAacf(ma=ma.eq,lag.max=lag.max,pacf=TRUE)
plot(P,P.)
abline(0,1,col=2)
par(mfcol=c(1,1))
```
同じことをarパラメタベクトルの長さが2以上のときでもやってみる。

```{r}
# MA representation for ARMA
#============================
ar <- c(0.3,0.1,-0.1)
lag.max <- 20
ma.eq<-ARMAtoMA(ar=ar, lag.max=lag.max)
#P<-ARMAtoMA(ar=c(0.9,.1),ma=c(1,2,3), lag.max=5)

A<-ARMAacf(ar=ar,lag.max=lag.max)
A. <- ARMAacf(ma=ma.eq,lag.max=lag.max)
par(mfcol=c(1,2))
plot(A,A.)
abline(0,1,col=2)
P<-ARMAacf(ar=ar,lag.max=lag.max,pacf=TRUE)
P. <- ARMAacf(ma=ma.eq,lag.max=lag.max,pacf=TRUE)
plot(P,P.)
abline(0,1,col=2)
par(mfcol=c(1,1))
```
ARIMA過程では、分散という指標があり、その理論値がある。それをmaとみなし、そのma用パラメタから算出することで近似がうまくいくことを示している。

```{r}
# Variance computation
#================================
P<-ARMAtoMA(ar=0.9, lag.max=100)
sum(P^2)+1
```

# ARIMAモデル推定

## ARIMAもでるのar,maフィルタ推定
ARIMAがどうなっているかは分かったので、それをデータから推定する話。

arima.sim()がやっていたように、
ar部分にいくつのフィルタパラメタを置き、ma部分に、いくつを置き、時間幅として、何区分分のずれを想定するか、とによって、ARIMAデータができているものとして、推定する。

orderという引数がそれで、arパラメタ数、区分ずれ数、maパラメタ数の３整数からなる。

次の例では、order=c(1,0,0)なので、区分ずれはなく、maは関与せず、arが1変数で決まるモデルでの推定となっている。
推定は、いろいろたくさんあるが…
```{r}
# ARMA estimation
#================================
X<-arima.sim(n=100,list(ar=0.9))
est<-arima(X,order=c(1,0,0))
str(est)
```

推定係数は次のように、切片(不動部分)とar1とからなっている。シミュレーションで用いたar=0.9とよく合致している
```{r}
est$coef
```
モデルをar+maと複雑にすると、nを大きくしないと推定値はよくならない。

```{r}
# ARMA estimation
#================================
X<-arima.sim(n=100000,list(ar=c(0.2),ma=c(0.2)))
est<-arima(X,order=c(1,0,1))
est$coef
```
orderパラメタもいじってみよう。

```{r}
# ARMA estimation
#================================
X<-arima.sim(n=100000,list(ar=c(0.2),ma=c(0.2),order=c(1,2,1)))
est<-arima(X,order=c(1,2,1))
est$coef
```

## autocorrelation, partial autocorrelation推定

ARIMAモデルからデータをシミュレーション作成し、その時系列データのautocorrelation, partial autocorrelationを計算し、それが、理論値とどれくらい近いかを確認してみる。


```{r}
# ACF, PACF estimation
#=====================================================================
n <- 100000
ar <- c(0.3,0.2)
ma <- c(0.1)
order <- c(length(ar),0,length(ma))
lag.max <- 20
X<-arima.sim(n=n,list(ar=ar,ma=ma,order=order)) # AR(1) model
out.cor <- acf(X,lag.max = lag.max,type = c("correlation"),plot=FALSE)
theor.cor <- ARMAacf(ar=ar,lag.max=lag.max)
plot(out.cor$lag,out.cor$acf,type="h")
points(out.cor$lag,theor.cor,col=2)
out.cor2 <- pacf(X,lag.max = lag.max,type = c("correlation"),plot=FALSE)
theor.cor2 <- ARMAacf(ar=ar,lag.max=lag.max,pacf=TRUE)
plot(out.cor2$lag,out.cor2$acf,type="h")
points(out.cor2$lag,theor.cor2,col=2)
```

arima()

> arima
function (x, order = c(0L, 0L, 0L), seasonal = list(order = c(0L, 
    0L, 0L), period = NA), xreg = NULL, include.mean = TRUE, 
    transform.pars = TRUE, fixed = NULL, init = NULL, method = c("CSS-ML", 
        "ML", "CSS"), n.cond, SSinit = c("Gardner1980", "Rossignol2011"), 
    optim.method = "BFGS", optim.control = list(), kappa = 1e+06) 
{
    "%+%" <- function(a, b) .Call(C_TSconv, a, b)
    SSinit <- match.arg(SSinit)
    SS.G <- SSinit == "Gardner1980"
    upARIMA <- function(mod, phi, theta) {
        p <- length(phi)
        q <- length(theta)
        mod$phi <- phi
        mod$theta <- theta
        r <- max(p, q + 1L)
        if (p > 0) 
            mod$T[1L:p, 1L] <- phi
        if (r > 1L) 
            mod$Pn[1L:r, 1L:r] <- if (SS.G) 
                .Call(C_getQ0, phi, theta)
            else .Call(C_getQ0bis, phi, theta, tol = 0)
        else mod$Pn[1L, 1L] <- if (p > 0) 
            1/(1 - phi^2)
        else 1
        mod$a[] <- 0
        mod
    }
    arimaSS <- function(y, mod) {
        .Call(C_ARIMA_Like, y, mod, 0L, TRUE)
    }
    armafn <- function(p, trans) {
        par <- coef
        par[mask] <- p
        trarma <- .Call(C_ARIMA_transPars, par, arma, trans)
        if (is.null(Z <- tryCatch(upARIMA(mod, trarma[[1L]], 
            trarma[[2L]]), error = function(e) NULL))) 
            return(.Machine$double.xmax)
        if (ncxreg > 0) 
            x <- x - xreg %*% par[narma + (1L:ncxreg)]
        res <- .Call(C_ARIMA_Like, x, Z, 0L, FALSE)
        s2 <- res[1L]/res[3L]
        0.5 * (log(s2) + res[2L]/res[3L])
    }
    armaCSS <- function(p) {
        par <- as.double(fixed)
        par[mask] <- p
        trarma <- .Call(C_ARIMA_transPars, par, arma, FALSE)
        if (ncxreg > 0) 
            x <- x - xreg %*% par[narma + (1L:ncxreg)]
        res <- .Call(C_ARIMA_CSS, x, arma, trarma[[1L]], trarma[[2L]], 
            as.integer(ncond), FALSE)
        0.5 * log(res)
    }
    arCheck <- function(ar) {
        p <- max(which(c(1, -ar) != 0)) - 1
        if (!p) 
            return(TRUE)
        all(Mod(polyroot(c(1, -ar[1L:p]))) > 1)
    }
    maInvert <- function(ma) {
        q <- length(ma)
        q0 <- max(which(c(1, ma) != 0)) - 1L
        if (!q0) 
            return(ma)
        roots <- polyroot(c(1, ma[1L:q0]))
        ind <- Mod(roots) < 1
        if (all(!ind)) 
            return(ma)
        if (q0 == 1) 
            return(c(1/ma[1L], rep.int(0, q - q0)))
        roots[ind] <- 1/roots[ind]
        x <- 1
        for (r in roots) x <- c(x, 0) - c(0, x)/r
        c(Re(x[-1L]), rep.int(0, q - q0))
    }
    series <- deparse(substitute(x))
    if (NCOL(x) > 1L) 
        stop("only implemented for univariate time series")
    method <- match.arg(method)
    x <- as.ts(x)
    if (!is.numeric(x)) 
        stop("'x' must be numeric")
    storage.mode(x) <- "double"
    dim(x) <- NULL
    n <- length(x)
    if (!missing(order)) 
        if (!is.numeric(order) || length(order) != 3L || any(order < 
            0)) 
            stop("'order' must be a non-negative numeric vector of length 3")
    if (!missing(seasonal)) 
        if (is.list(seasonal)) {
            if (is.null(seasonal$order)) 
                stop("'seasonal' must be a list with component 'order'")
            if (!is.numeric(seasonal$order) || length(seasonal$order) != 
                3L || any(seasonal$order < 0L)) 
                stop("'seasonal$order' must be a non-negative numeric vector of length 3")
        }
        else if (is.numeric(order)) {
            if (length(order) == 3L) 
                seasonal <- list(order = seasonal)
            else ("'seasonal' is of the wrong length")
        }
        else stop("'seasonal' must be a list with component 'order'")
    if (is.null(seasonal$period) || is.na(seasonal$period) || 
        seasonal$period == 0) 
        seasonal$period <- frequency(x)
    arma <- as.integer(c(order[-2L], seasonal$order[-2L], seasonal$period, 
        order[2L], seasonal$order[2L]))
    narma <- sum(arma[1L:4L])
    xtsp <- tsp(x)
    tsp(x) <- NULL
    Delta <- 1
    for (i in seq_len(order[2L])) Delta <- Delta %+% c(1, -1)
    for (i in seq_len(seasonal$order[2L])) Delta <- Delta %+% 
        c(1, rep.int(0, seasonal$period - 1), -1)
    Delta <- -Delta[-1L]
    nd <- order[2L] + seasonal$order[2L]
    n.used <- sum(!is.na(x)) - length(Delta)
    if (is.null(xreg)) {
        ncxreg <- 0L
    }
    else {
        nmxreg <- deparse(substitute(xreg))
        if (NROW(xreg) != n) 
            stop("lengths of 'x' and 'xreg' do not match")
        ncxreg <- NCOL(xreg)
        xreg <- as.matrix(xreg)
        storage.mode(xreg) <- "double"
    }
    class(xreg) <- NULL
    if (ncxreg > 0L && is.null(colnames(xreg))) 
        colnames(xreg) <- if (ncxreg == 1L) 
            nmxreg
        else paste0(nmxreg, 1L:ncxreg)
    if (include.mean && (nd == 0L)) {
        xreg <- cbind(intercept = rep(1, n), xreg = xreg)
        ncxreg <- ncxreg + 1L
    }
    if (method == "CSS-ML") {
        anyna <- anyNA(x)
        if (ncxreg) 
            anyna <- anyna || anyNA(xreg)
        if (anyna) 
            method <- "ML"
    }
    if (method == "CSS" || method == "CSS-ML") {
        ncond <- order[2L] + seasonal$order[2L] * seasonal$period
        ncond1 <- order[1L] + seasonal$period * seasonal$order[1L]
        ncond <- if (!missing(n.cond)) 
            ncond + max(n.cond, ncond1)
        else ncond + ncond1
    }
    else ncond <- 0
    if (is.null(fixed)) 
        fixed <- rep(NA_real_, narma + ncxreg)
    else if (length(fixed) != narma + ncxreg) 
        stop("wrong length for 'fixed'")
    mask <- is.na(fixed)
    no.optim <- !any(mask)
    if (no.optim) 
        transform.pars <- FALSE
    if (transform.pars) {
        ind <- arma[1L] + arma[2L] + seq_len(arma[3L])
        if (any(!mask[seq_len(arma[1L])]) || any(!mask[ind])) {
            warning("some AR parameters were fixed: setting transform.pars = FALSE")
            transform.pars <- FALSE
        }
    }
    init0 <- rep.int(0, narma)
    parscale <- rep(1, narma)
    if (ncxreg) {
        cn <- colnames(xreg)
        orig.xreg <- (ncxreg == 1L) || any(!mask[narma + 1L:ncxreg])
        if (!orig.xreg) {
            S <- svd(na.omit(xreg))
            xreg <- xreg %*% S$v
        }
        dx <- x
        dxreg <- xreg
        if (order[2L] > 0L) {
            dx <- diff(dx, 1L, order[2L])
            dxreg <- diff(dxreg, 1L, order[2L])
        }
        if (seasonal$period > 1L & seasonal$order[2L] > 0) {
            dx <- diff(dx, seasonal$period, seasonal$order[2L])
            dxreg <- diff(dxreg, seasonal$period, seasonal$order[2L])
        }
        fit <- if (length(dx) > ncol(dxreg)) 
            lm(dx ~ dxreg - 1, na.action = na.omit)
        else list(rank = 0L)
        if (fit$rank == 0L) {
            fit <- lm(x ~ xreg - 1, na.action = na.omit)
        }
        n.used <- sum(!is.na(resid(fit))) - length(Delta)
        init0 <- c(init0, coef(fit))
        ses <- summary(fit)$coefficients[, 2L]
        parscale <- c(parscale, 10 * ses)
    }
    if (n.used <= 0) 
        stop("too few non-missing observations")
    if (!is.null(init)) {
        if (length(init) != length(init0)) 
            stop("'init' is of the wrong length")
        if (any(ind <- is.na(init))) 
            init[ind] <- init0[ind]
        if (method == "ML") {
            if (arma[1L] > 0) 
                if (!arCheck(init[1L:arma[1L]])) 
                  stop("non-stationary AR part")
            if (arma[3L] > 0) 
                if (!arCheck(init[sum(arma[1L:2L]) + 1L:arma[3L]])) 
                  stop("non-stationary seasonal AR part")
            if (transform.pars) 
                init <- .Call(C_ARIMA_Invtrans, as.double(init), 
                  arma)
        }
    }
    else init <- init0
    coef <- as.double(fixed)
    if (!("parscale" %in% names(optim.control))) 
        optim.control$parscale <- parscale[mask]
    if (method == "CSS") {
        res <- if (no.optim) 
            list(convergence = 0L, par = numeric(), value = armaCSS(numeric()))
        else optim(init[mask], armaCSS, method = optim.method, 
            hessian = TRUE, control = optim.control)
        if (res$convergence > 0) 
            warning(gettextf("possible convergence problem: optim gave code = %d", 
                res$convergence), domain = NA)
        coef[mask] <- res$par
        trarma <- .Call(C_ARIMA_transPars, coef, arma, FALSE)
        mod <- makeARIMA(trarma[[1L]], trarma[[2L]], Delta, kappa, 
            SSinit)
        if (ncxreg > 0) 
            x <- x - xreg %*% coef[narma + (1L:ncxreg)]
        arimaSS(x, mod)
        val <- .Call(C_ARIMA_CSS, x, arma, trarma[[1L]], trarma[[2L]], 
            as.integer(ncond), TRUE)
        sigma2 <- val[[1L]]
        var <- if (no.optim) 
            numeric()
        else solve(res$hessian * n.used)
    }
    else {
        if (method == "CSS-ML") {
            res <- if (no.optim) 
                list(convergence = 0L, par = numeric(), value = armaCSS(numeric()))
            else optim(init[mask], armaCSS, method = optim.method, 
                hessian = FALSE, control = optim.control)
            if (res$convergence == 0) 
                init[mask] <- res$par
            if (arma[1L] > 0) 
                if (!arCheck(init[1L:arma[1L]])) 
                  stop("non-stationary AR part from CSS")
            if (arma[3L] > 0) 
                if (!arCheck(init[sum(arma[1L:2L]) + 1L:arma[3L]])) 
                  stop("non-stationary seasonal AR part from CSS")
            ncond <- 0L
        }
        if (transform.pars) {
            init <- .Call(C_ARIMA_Invtrans, init, arma)
            if (arma[2L] > 0) {
                ind <- arma[1L] + 1L:arma[2L]
                init[ind] <- maInvert(init[ind])
            }
            if (arma[4L] > 0) {
                ind <- sum(arma[1L:3L]) + 1L:arma[4L]
                init[ind] <- maInvert(init[ind])
            }
        }
        trarma <- .Call(C_ARIMA_transPars, init, arma, transform.pars)
        mod <- makeARIMA(trarma[[1L]], trarma[[2L]], Delta, kappa, 
            SSinit)
        res <- if (no.optim) 
            list(convergence = 0, par = numeric(), value = armafn(numeric(), 
                as.logical(transform.pars)))
        else optim(init[mask], armafn, method = optim.method, 
            hessian = TRUE, control = optim.control, trans = as.logical(transform.pars))
        if (res$convergence > 0) 
            warning(gettextf("possible convergence problem: optim gave code = %d", 
                res$convergence), domain = NA)
        coef[mask] <- res$par
        if (transform.pars) {
            if (arma[2L] > 0L) {
                ind <- arma[1L] + 1L:arma[2L]
                if (all(mask[ind])) 
                  coef[ind] <- maInvert(coef[ind])
            }
            if (arma[4L] > 0L) {
                ind <- sum(arma[1L:3L]) + 1L:arma[4L]
                if (all(mask[ind])) 
                  coef[ind] <- maInvert(coef[ind])
            }
            if (any(coef[mask] != res$par)) {
                oldcode <- res$convergence
                res <- optim(coef[mask], armafn, method = optim.method, 
                  hessian = TRUE, control = list(maxit = 0L, 
                    parscale = optim.control$parscale), trans = TRUE)
                res$convergence <- oldcode
                coef[mask] <- res$par
            }
            A <- .Call(C_ARIMA_Gradtrans, as.double(coef), arma)
            A <- A[mask, mask]
            var <- crossprod(A, solve(res$hessian * n.used, A))
            coef <- .Call(C_ARIMA_undoPars, coef, arma)
        }
        else var <- if (no.optim) 
            numeric()
        else solve(res$hessian * n.used)
        trarma <- .Call(C_ARIMA_transPars, coef, arma, FALSE)
        mod <- makeARIMA(trarma[[1L]], trarma[[2L]], Delta, kappa, 
            SSinit)
        val <- if (ncxreg > 0L) 
            arimaSS(x - xreg %*% coef[narma + (1L:ncxreg)], mod)
        else arimaSS(x, mod)
        sigma2 <- val[[1L]][1L]/n.used
    }
    value <- 2 * n.used * res$value + n.used + n.used * log(2 * 
        pi)
    aic <- if (method != "CSS") 
        value + 2 * sum(mask) + 2
    else NA
    nm <- NULL
    if (arma[1L] > 0L) 
        nm <- c(nm, paste0("ar", 1L:arma[1L]))
    if (arma[2L] > 0L) 
        nm <- c(nm, paste0("ma", 1L:arma[2L]))
    if (arma[3L] > 0L) 
        nm <- c(nm, paste0("sar", 1L:arma[3L]))
    if (arma[4L] > 0L) 
        nm <- c(nm, paste0("sma", 1L:arma[4L]))
    if (ncxreg > 0L) {
        nm <- c(nm, cn)
        if (!orig.xreg) {
            ind <- narma + 1L:ncxreg
            coef[ind] <- S$v %*% coef[ind]
            A <- diag(narma + ncxreg)
            A[ind, ind] <- S$v
            A <- A[mask, mask]
            var <- A %*% var %*% t(A)
        }
    }
    names(coef) <- nm
    if (!no.optim) 
        dimnames(var) <- list(nm[mask], nm[mask])
    resid <- val[[2L]]
    tsp(resid) <- xtsp
    class(resid) <- "ts"
    structure(list(coef = coef, sigma2 = sigma2, var.coef = var, 
        mask = mask, loglik = -0.5 * value, aic = aic, arma = arma, 
        residuals = resid, call = match.call(), series = series, 
        code = res$convergence, n.cond = ncond, model = mod), 
        class = "Arima")
}

arima.sim()

> arima.sim
function (model, n, rand.gen = rnorm, innov = rand.gen(n, ...), 
    n.start = NA, start.innov = rand.gen(n.start, ...), ...) 
{
    if (!is.list(model)) 
        stop("'model' must be list")
    if (n <= 0L) 
        stop("'n' must be strictly positive")
    p <- length(model$ar)
    if (p) {
        minroots <- min(Mod(polyroot(c(1, -model$ar))))
        if (minroots <= 1) 
            stop("'ar' part of model is not stationary")
    }
    q <- length(model$ma)
    if (is.na(n.start)) 
        n.start <- p + q + ifelse(p > 0, ceiling(6/log(minroots)), 
            0)
    if (n.start < p + q) 
        stop("burn-in 'n.start' must be as long as 'ar + ma'")
    d <- 0
    if (!is.null(ord <- model$order)) {
        if (length(ord) != 3L) 
            stop("'model$order' must be of length 3")
        if (p != ord[1L]) 
            stop("inconsistent specification of 'ar' order")
        if (q != ord[3L]) 
            stop("inconsistent specification of 'ma' order")
        d <- ord[2L]
        if (d != round(d) || d < 0) 
            stop("number of differences must be a positive integer")
    }
    if (!missing(start.innov) && length(start.innov) < n.start) 
        stop(sprintf(ngettext(n.start, "'start.innov' is too short: need %d point", 
            "'start.innov' is too short: need %d points"), n.start), 
            domain = NA)
    x <- ts(c(start.innov[seq_len(n.start)], innov[1L:n]), start = 1 - 
        n.start)
    if (length(model$ma)) {
        x <- filter(x, c(1, model$ma), sides = 1L)
        x[seq_along(model$ma)] <- 0
    }
    if (length(model$ar)) 
        x <- filter(x, model$ar, method = "recursive")
    if (n.start > 0) 
        x <- x[-(seq_len(n.start))]
    if (d > 0) 
        x <- diffinv(x, differences = d)
    as.ts(x)
}