forked from englianhu/binary.com-interview-question
-
Notifications
You must be signed in to change notification settings - Fork 0
/
binary-Q1FiGJRGARCH.Rmd
786 lines (624 loc) · 35.3 KB
/
binary-Q1FiGJRGARCH.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
---
title: "<img src='www/binary-logo-resize.jpg' width='240'>"
subtitle: "[binary.com](https://github.com/englianhu/binary.com-interview-question) 面试试题 I - GARCH模型中的`ARIMA(p,d,q)`参数最优化"
author: "[<span style='color:blue'>®γσ, Lian Hu</span>](https://englianhu.github.io/) <img src='www/ENG.jpg' width='24'> <img src='www/RYO.jpg' width='24'>®"
date: "`r lubridate::today('Asia/Tokyo')`"
output:
html_document:
number_sections: yes
toc: yes
toc_depth: 4
toc_float:
collapsed: yes
smooth_scroll: yes
---
# 简介
近几年开始着手汇市预测与投资模式,分别使用了`ARIMA`、`ETS`、`GARCH`等等统计模型。在比较了多模型后,GJR-GARCH预测最为精准,然而在默认模型下并没有将`ARIMA(p,d,q)`值最优化。
**原文**:[哥哥姐姐,请问IGARCH模型的参数估计怎么编程实现啊.](https://d.cosx.org/d/2689-2689/10),此文章添加解释与一些参考文献,并且测试3年移动数据以确定新GARCH模型是否更为精准。
```{r setup}
suppressPackageStartupMessages(require('BBmisc'))
## 读取程序包
pkg <- c('lubridate', 'plyr', 'dplyr', 'magrittr', 'stringr', 'rugarch', 'forecast', 'quantmod', 'microbenchmark', 'knitr', 'kableExtra', 'formattable')
suppressAll(lib(pkg))
rm(pkg)
```
无意中发现,分享下`rugarch`中的GARCH模式最优化...
# 数据
首先读取[Binary.com Interview Q1 (Extention)](http://rpubs.com/englianhu/binary-Q1E)的汇市数据。
```{r read-data, warning=FALSE}
cr_code <- c('AUDUSD=X', 'EURUSD=X', 'GBPUSD=X', 'CHF=X', 'CAD=X',
'CNY=X', 'JPY=X')
#'@ names(cr_code) <- c('AUDUSD', 'EURUSD', 'GBPUSD', 'USDCHF', 'USDCAD',
#'@ 'USDCNY', 'USDJPY')
names(cr_code) <- c('USDAUD', 'USDEUR', 'USDGBP', 'USDCHF', 'USDCAD', 'USDCNY', 'USDJPY')
price_type <- c('Op', 'Hi', 'Lo', 'Cl')
## 读取雅虎数据。
mbase <- sapply(names(cr_code), function(x) readRDS(paste0('./data/', x, '.rds')) %>% na.omit)
```
数据简介报告。
```{r data-summary}
sapply(mbase, summary) %>%
kable %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
scroll_box(width = '100%', height = '400px')
```
*桌面2.1:数据简介。*
# 统计建模
## 基础模型
> **fGarch** : Various submodels arise from this model, and are passed to the ugarchspec "variance.model" list via the submodel option,
>
- The simple GARCH model of Bollerslev (1986) when $\lambda = \delta = 2$ and $\eta_{1j} = \eta_{2j} = 0$ (submodel = 'GARCH').
- The Absolute Value GARCH (AVGARCH) model of Taylor (1986) and Schwert (1990) when $\lambda = \delta = 1$ and $|\eta_{1j} | ≤ 1$ (submodel = 'AVGARCH').
- The GJR GARCH (GJRGARCH) model of Glosten et al. (1993) when $\lambda = \delta = 2$ and $\eta_{2j} = 0$ (submodel = 'GJRGARCH').
- The Threshold GARCH (TGARCH) model of Zakoian (1994) when $\lambda = \delta = 1, \eta_{2j} = 0$ and $|\eta_{1j} | ≤ 1$ (submodel = 'TGARCH').
- The Nonlinear ARCH model of Higgins et al. (1992) when $\delta = \lambda$ and $\eta_{1j} = \eta_{2j} = 0$ (submodel = 'NGARCH').
- The Nonlinear Asymmetric GARCH model of Engle and Ng (1993) when $\delta = \lambda = 2$ and $\eta_{1j} = 0$ (submodel = 'NAGARCH').
- The Asymmetric Power ARCH model of Ding et al. (1993) when $\delta = \lambda, \eta_{2j} = 0$ and $|\eta_{1j} | ≤ 1$ (submodel = 'APARCH').
- The Exponential GARCH model of Nelson (1991) when $\delta = 1, \lambda = 0$ and $\eta_{2j} = 0$ (not implemented as a submodel of fGARCH).
- The Full fGARCH model of Hentschel (1995) when $\delta = \lambda$ (submodel = 'ALLGARCH').
> The choice of distribution is entered via the 'distribution.model' option of the ugarchspec method. The package also implements a set of functions to work with the parameters of these distributions. These are:
>
- `ddist(distribution = "norm", y, mu = 0, sigma = 1, lambda = -0.5, skew = 1, shape = 5)`. The density (d*) function.
- `pdist(distribution = "norm", q, mu = 0, sigma = 1, lambda = -0.5, skew = 1, shape = 5)`. The distribution (p*) function.
- `qdist(distribution = "norm", p, mu = 0, sigma = 1, lambda = -0.5, skew = 1, shape = 5)`. The quantile (q*) function.
- `rdist(distribution = "norm", n, mu = 0, sigma = 1, lambda = -0.5, skew = 1, shape = 5)`. The sampling (q*) function.
- `fitdist(distribution = "norm", x, control = list())`. A function for fitting data using any of the included distributions.
- `dskewness(distribution = "norm", skew = 1, shape = 5, lambda = -0.5)`. The distribution skewness (analytical where possible else by quadrature integration).
- `dkurtosis(distribution = "norm", skew = 1, shape = 5, lambda = -0.5)`. The distribution excess kurtosis (analytical where it exists else by quadrature integration).
> The family of APARCH models includes the ARCH and GARCH models, and five other ARCH extensions as special cases:
>
- ARCH Model of Engle when $\delta = 2$, $\gamma_{i} = 0$, and $\beta_{j} = 0$.
- GARCH Model of Bollerslev when $\delta = 2$, and $\gamma_{i} = 0$.
- TS-GARCH Model of Taylor and Schwert when $\delta = 1$, and $\gamma_{i} = 0$.
- GJR-GARCH Model of Glosten, Jagannathan, and Runkle when $\delta = 2$.
- T-ARCH Model of Zakoian when $\delta = 1$.
- N-ARCH Model of Higgens and Bera when $\gamma_{i} = 0$, and $\beta_{j} = 0$.
- Log-ARCH Model of Geweke and Pentula when $\delta → 0$.
原文:[Parameter Estimation of ARMA Models with GARCH/APARCH Errors - An R and SPlus Software Implementation](https://github.com/englianhu/binary.com-interview-question/raw/master/reference/Parameter%20Estimation%20of%20ARMA%20Models%20with%20GARCH%20or%20APARCH%20Errors%20-%20An%20R%20and%20SPlus%20Software%20Implementation.pdf)文献中的*2. Mean and Variance Equation*。
有关多种GARCH模式比较,请参考参考文献中的链接3... 包括比较:
- auto.arima
- exponential smoothing models (ETS)
- GARCH (包括GARCH、eGARCH、iGARCH、fGARCH、gjrGARCH等模式)
- exponential weighted models
$$\begin{equation}
\sigma^2_{t} = \omega + \sum_{i=1}^{\rho}(\alpha_{i} + \gamma_{i} I_{t-i}) \varepsilon_{t-i}^{2} + \sum_{j=1}^{q}\beta_{j}\sigma^{2}_{t-j}\ \cdots\ Equation\ 3.1.1
\end{equation}$$
在之前的文章已经分别比较多种统计模式,得知GJR-GARCH模型的预测结果最为精准,以下稍微介绍下平滑移动加权模型。
## ARMA 模型
> ARMA Mean Equation: The `ARMA(p,q)` process of autoregressive order `p` and moving average order `q` can be described as
$$\begin{align*}
x_{t} &= \mu + \sum_{i=1}^{m} \alpha_{i} x_{t-i} + \sum^{n}_{j=1} \beta_{j} \varepsilon_{t-j} + \varepsilon_{t}
\\ &= \mu + \alpha(B)x_{t} + \beta(B) \varepsilon_{t}
\end{align*} \cdots\ Equation\ 3.2.1$$
以上函数乃滑动加权指数,请参阅以下链接以了解更多详情:
- [Computer Lab Sessions 2&3](https://github.com/englianhu/binary.com-interview-question/raw/master/reference/Computer%20Lab%20Sessions%202%263.pdf)
- [時間序列分析 - 總體經濟與財務金融之應用 - 定態時間序列II ARMA模型](https://github.com/englianhu/binary.com-interview-question/raw/master/reference/%E6%99%82%E9%96%93%E5%BA%8F%E5%88%97%E5%88%86%E6%9E%90%20-%20%E7%B8%BD%E9%AB%94%E7%B6%93%E6%BF%9F%E8%88%87%E8%B2%A1%E5%8B%99%E9%87%91%E8%9E%8D%E4%B9%8B%E6%87%89%E7%94%A8%20-%20%E5%AE%9A%E6%85%8B%E6%99%82%E9%96%93%E5%BA%8F%E5%88%97II%20ARMA%E6%A8%A1%E5%9E%8B.pdf)
- [Introduction to the rugarch package](https://github.com/englianhu/binary.com-interview-question/raw/master/reference/Introduction%20to%20the%20rugarch%20package.pdf)
- [Parameter Estimation of ARMA Models with GARCH/APARCH Errors - An R and SPlus Software Implementation](https://github.com/englianhu/binary.com-interview-question/raw/master/reference/Parameter%20Estimation%20of%20ARMA%20Models%20with%20GARCH%20or%20APARCH%20Errors%20-%20An%20R%20and%20SPlus%20Software%20Implementation.pdf)
- [How to choose the order of a GARCH model?](https://stats.stackexchange.com/questions/154754/how-to-choose-the-order-of-a-garch-model)
## GJR-GARCH:ARMA(p,q)值最优化(旧程序)
使用极大似然法计算最优arma order中的`p`值与`q`值,将原本默认值最优化,让模型马尔可夫化,每日的`p`与`q`值最将会使用最佳值... 不包括`d`值。
```{r arma-order1, warning = FALSE}
armaSearch <- suppressWarnings(function(data, .method = 'CSS-ML') {
## I set .method = 'CSS-ML' as default method since the AIC value we got is
## smaller than using method 'ML' while using method 'CSS' facing error.
##
## https://stats.stackexchange.com/questions/209730/fitting-methods-in-arima
## According to the documentation, this is how each method fits the model:
## - CSS minimises the sum of squared residuals.
## - ML maximises the log-likelihood function of the ARIMA model.
## - CSS-ML mixes both methods: first, CSS is run, the starting parameters
## for the optimization algorithm are set to zeros or to the values given
## in the optional argument init; then, ML is applied passing the CSS
## parameter estimates as starting parameter values for the optimization algorithm.
.methods = c('CSS-ML', 'ML', 'CSS')
if(!.method %in% .methods)
stop(paste('Kindly choose .method among ',
paste0(.methods, collapse = ', '), '!'))
armacoef <- data.frame()
for (p in 0:5){
for (q in 0:5) {
#data.arma = arima(diff(data), order = c(p, 0, q))
#'@ data.arma = arima(data, order = c(p, 1, q), method = .method)
if(.method == 'CSS-ML') {
data.arma = tryCatch({
arma = arima(data, order = c(p, 1, q), method = 'CSS-ML')
mth = 'CSS-ML'
list(arma, mth)
}, error = function(e) tryCatch({
arma = arima(data, order = c(p, 1, q), method = 'ML')
mth = 'ML'
list(arma = arma, mth = mth)
}, error = function(e) {
arma = arima(data, order = c(p, 1, q), method = 'CSS')
mth = 'CSS'
list(arma = arma, mth = mth)
}))
} else if(.method == 'ML') {
data.arma = tryCatch({
arma = arima(data, order = c(p, 1, q), method = 'ML')
mth = 'ML'
list(arma = arma, mth = mth)
}, error = function(e) tryCatch({
arma = arima(data, order = c(p, 1, q), method = 'CSS-ML')
mth = 'CSS-ML'
list(arma = arma, mth = mth)
}, error = function(e) {
arma = arima(data, order = c(p, 1, q), method = 'CSS')
mth = 'CSS'
list(arma = arma, mth = mth)
}))
} else if(.method == 'CSS') {
data.arma = tryCatch({
arma = arima(data, order = c(p, 1, q), method = 'CSS')
mth = 'CSS'
list(arma = arma, mth = mth)
}, error = function(e) tryCatch({
arma = arima(data, order = c(p, 1, q), method = 'CSS-ML')
mth = 'CSS-ML'
list(arma = arma, mth = mth)
}, error = function(e) {
arma = arima(data, order = c(p, 1, q), method = 'ML')
mth = 'ML'
list(arma = arma, mth = mth)
}))
} else {
stop(paste('Kindly choose .method among ', paste0(.methods, collapse = ', '), '!'))
}
names(data.arma) <- c('arma', 'mth')
#cat('p =', p, ', q =', q, 'AIC =', data.arma$arma$aic, '\n')
armacoef <- rbind(armacoef,c(p, q, data.arma$arma$aic))
}
}
## ARMA Modeling寻找AIC值最小的p,q
colnames(armacoef) <- c('p', 'q', 'AIC')
pos <- which(armacoef$AIC == min(armacoef$AIC))
cat(paste0('method = \'', data.arma$mth, '\', the min AIC = ', armacoef$AIC[pos],
', p = ', armacoef$p[pos], ', q = ', armacoef$q[pos], '\n'))
return(armacoef)
})
```
然后把以上的函数嵌入以下GARCH模型,将原本固定参数的ARMA值浮动化。
```{r garch-model1, warning = FALSE}
calC <- function(mbase, currency = 'JPY=X', ahead = 1, price = 'Cl') {
# Using "memoise" to automatically cache the results
source('function/filterFX.R')
source('function/armaSearch.R')
mbase = suppressWarnings(filterFX(mbase, currency = currency, price = price))
armaOrder = suppressWarnings(armaSearch(mbase))
armaOrder %<>% dplyr::filter(AIC == min(AIC)) %>% .[c('p', 'q')] %>% unlist
spec = ugarchspec(
variance.model = list(
model = 'gjrGARCH', garchOrder = c(1, 1),
submodel = NULL, external.regressors = NULL,
variance.targeting = FALSE),
mean.model = list(
armaOrder = armaOrder,
include.mean = TRUE, archm = FALSE,
archpow = 1, arfima = FALSE,
## https://stats.stackexchange.com/questions/73351/how-does-one-specify-arima-p-d-q-in-ugarchspec-for-ugarchfit-in-rugarch?answertab=votes#tab-top
## https://d.cosx.org/d/2689-2689/9
external.regressors = NULL,
archex = FALSE),
distribution.model = 'snorm')
fit = ugarchfit(spec, mbase, solver = 'hybrid')
fc = ugarchforecast(fit, n.ahead = ahead)
res = tail(attributes(fc)$forecast$seriesFor, 1)
colnames(res) = names(mbase)
latestPrice = tail(mbase, 1)
#rownames(res) <- as.character(forDate)
latestPrice <- xts(latestPrice)
#res <- as.xts(res)
tmp = list(latestPrice = latestPrice, forecastPrice = res,
AIC = infocriteria(fit))
return(tmp)
}
```
## Fi-GJR-GARCH:ARFIMA(p,d,q)值最优化(新程序)
> **The fractionally integrated GARCH model ('fiGARCH')** : Contrary to the case of the ARFIMA model, the degree of persistence in the FIGARCH model operates in the oppposite direction, so that as the fractional differencing parameter d gets closer to one, the memory of the FIGARCH process increases, a direct result of the parameter acting on the squared errors rather than the conditional variance. When `d = 0` the FIGARCH collapses to the vanilla GARCH model and when `d = 1` to the integrated GARCH model...
>
> Motivated by the developments in long memory processes, and in particular the ARFIMA type models (see section 2.1), Baillie et al. (1996) proposed the fractionally integrated generalized autoregressive conditional heteroscedasticity, or FIGARCH, model to capture long memory (in essence hyperbolic memory). Unlike the standard GARCH where shocks decay at an exponential rate, or the integrated GARCH model where shocks persist forever, in the FIGARCH model shocks decay at a slower hyperbolic rate. Consider the standard GARCH equation:
>
$$\begin{equation}
\sigma^{2}_{t} = \omega + \alpha (L) \varepsilon^{2}_{t} + \beta (L) \sigma^{2} \cdots\ Equation\ 3.1.2
\end{equation}$$
原文:[Introduction to the rugarch package](https://github.com/englianhu/binary.com-interview-question/raw/master/reference/Introduction%20to%20the%20rugarch%20package.pdf)文献中的*2.2.10 The fractionally integrated GARCH model ('fiGARCH')*
然后计算最优arma order... 也包括`d`值。
```{r arma-order2, warning = FALSE}
opt_arma <- function(mbase){
#ARMA Modeling minimum AIC value of `p,d,q`
fit <- auto.arima(mbase)
arimaorder(fit)
}
```
再来就设置Garch模型中的`arfima`参数,将原本固定的`d`值浮动化。
```{r garch-model2, warning = FALSE}
calc_fx <- function(mbase, currency = 'JPY=X', ahead = 1, price = 'Cl') {
## Using "memoise" to automatically cache the results
## http://rpubs.com/englianhu/arma-order-for-garch
source('function/filterFX.R')
#'@ source('function/armaSearch.R') #old optimal arma p,q value searching, but no d value.
source('function/opt_arma.R') #rename the function best.ARMA()
mbase = suppressWarnings(filterFX(mbase, currency = currency, price = price))
armaOrder = opt_arma(mbase)
## Set arma order for `p, d, q` for GARCH model.
#'@ https://stats.stackexchange.com/questions/73351/how-does-one-specify-arima-p-d-q-in-ugarchspec-for-ugarchfit-in-rugarch
spec = ugarchspec(
variance.model = list(
model = 'gjrGARCH', garchOrder = c(1, 1),
submodel = NULL, external.regressors = NULL,
variance.targeting = FALSE),
mean.model = list(
armaOrder = armaOrder[c(1, 3)], #set arma order for `p` and `q`.
include.mean = TRUE, archm = FALSE,
archpow = 1, arfima = TRUE, #set arima = TRUE
external.regressors = NULL,
archex = FALSE),
fixed.pars = list(arfima = armaOrder[2]), #set fixed.pars for `d` value
distribution.model = 'snorm')
fit = ugarchfit(spec, mbase, solver = 'hybrid')
fc = ugarchforecast(fit, n.ahead = ahead)
#res = xts::last(attributes(fc)$forecast$seriesFor)
res = tail(attributes(fc)$forecast$seriesFor, 1)
colnames(res) = names(mbase)
latestPrice = tail(mbase, 1)
#rownames(res) <- as.character(forDate)
latestPrice <- xts(latestPrice)
#res <- as.xts(res)
tmp = list(latestPrice = latestPrice, forecastPrice = res,
AIC = infocriteria(fit))
return(tmp)
}
```
# 模式比较
## 运行时间
首先比较运行时间,哪个比较高效。
```{r processing-time, warning = FALSE}
## 测试运行时间。
#'@ microbenchmark(fit <- calc_fx(mbase[[names(cr_code)[sp]]], currency = cr_code[sp]))
#'@ microbenchmark(fit2 <- calC(mbase[[names(cr_code)[sp]]], currency = cr_code[sp]))
## 随机抽样货币数据,测试运行时间。
sp <- sample(1:7, 1)
system.time(fit1 <- calc_fx(mbase[[names(cr_code)[sp]]], currency = cr_code[sp]))
system.time(fit2 <- calC(mbase[[names(cr_code)[sp]]], currency = cr_code[sp]))
```
由于使用`microbenchmark`非常耗时,而且双方实力悬殊,故此僕使用`system.time()`比较运行速度,结果还是新程序`calc_fx()`比旧程序`calC()`迅速。
## 数据误差率
以下僕运行数据测试后事先储存,然后直接读取。首先过滤`timeID`时间参数,然后才模拟预测汇价。
```{r tidy-data}
#'@ ldply(mbase, function(x) range(index(x)))
# .id V1 V2
#1 USDAUD 2012-01-02 2017-08-30
#2 USDEUR 2012-01-02 2017-08-30
#3 USDGBP 2012-01-02 2017-08-30
#4 USDCHF 2012-01-02 2017-08-30
#5 USDCAD 2012-01-02 2017-08-30
#6 USDCNY 2012-01-02 2017-08-30
#7 USDJPY 2012-01-02 2017-08-30
timeID <- llply(mbase, function(x) as.character(index(x))) %>%
unlist %>% unique %>% as.Date %>% sort
timeID <- c(timeID, xts::last(timeID) + days(1)) #the last date + 1 in order to predict the next day of last date to make whole dataset completed.
timeID0 <- ymd('2013-01-01')
timeID <- timeID[timeID >= timeID0]
## ---------------- 6个R进程并行运作 --------------------
start <- seq(1, length(timeID), ceiling(length(timeID)/6))
#[1] 1 204 407 610 813 1016
stop <- c((start - 1)[-1], length(timeID))
#[1] 203 406 609 812 1015 1217
cat(paste0('\ntimeID <- timeID[', paste0(start, ':', stop), ']'), '\n')
#timeID <- timeID[1:203]
#timeID <- timeID[204:406]
#timeID <- timeID[407:609]
#timeID <- timeID[610:812]
#timeID <- timeID[813:1015]
#timeID <- timeID[1016:1217]
## Some currency data doesn't open market in speficic date.
#Error:
#data/fx/USDCNY/pred1.2015-04-15.rds saved! #only USDJPY need to review
#data/fx/USDGBP/pred1.2015-12-07.rds saved! #only USDCHF need to review
#data/fx/USDCAD/pred1.2016-08-30.rds saved! #only USDCNY need to review
#data/fx/USDAUD/pred1.2016-11-30.rds saved! #only USDEUR need to review
#data/fx/USDCNY/pred1.2017-01-12.rds saved! #only USDJPY need to review
#data/fx/USDEUR/pred1.2017-02-09.rds saved! #only USDGBP need to review
#timeID <- timeID[timeID > ymd('2017-03-08')]
#data/fx/USDCAD/pred2.2015-06-09.rds saved! #only USDCNY need to review
#data/fx/USDCAD/pred2.2015-06-16.rds saved! #only USDCNY need to review
#data/fx/USDCAD/pred2.2015-06-17.rds saved! #only USDCNY need to review
```
模拟`calC()`函数预测汇价数据。
```{r sim-pred1, eval = FALSE, warning = FALSE}
## ------------- 模拟calC()预测汇价 ----------------------
pred1 <- list()
for (dt in timeID) {
for (i in seq(cr_code)) {
smp <- mbase[[names(cr_code)[i]]]
dtr <- xts::last(index(smp[index(smp) < dt]), 1) #tail(..., 1)
smp <- smp[paste0(dtr %m-% years(1), '/', dtr)]
pred1[[i]] <- ldply(price_type, function(y) {
df = calC(smp, currency = cr_code[i], price = y)
df = data.frame(Date = index(df[[1]][1]),
Type = paste0(names(df[[1]]), '.', y),
df[[1]], df[[2]], t(df[[3]]))
names(df)[4] %<>% str_replace_all('1', 'T+1')
df
})
if (!dir.exists(paste0('data/fx/', names(pred1[[i]])[3])))
dir.create(paste0('data/fx/', names(pred1[[i]])[3]))
saveRDS(pred1[[i]], paste0(
'data/fx/', names(pred1[[i]])[3], '/pred1.',
unique(pred1[[i]]$Date), '.rds'))
cat(paste0(
'data/fx/', names(pred1[[i]])[3], '/pred1.',
unique(pred1[[i]]$Date), '.rds saved!\n'))
}; rm(i)
}
```
查询模拟测试进度的函数`task_progress()`如下。
```{r check-progress}
task_progress <- function(scs = 60, .pattern = '^pred1', .loops = TRUE) {
## ------------- 定时查询进度 ----------------------
## 每分钟自动查询与更新以上模拟calC()预测汇价进度(储存文件量)。
if (.loops == TRUE) {
while(1) {
cat('Current Tokyo Time :', as.character(now('Asia/Tokyo')), '\n\n')
z <- ldply(mbase, function(dtm) {
y = index(dtm)
y = y[y >= timeID0]
cr = as.character(unique(substr(names(dtm), 1, 6)))
x = list.files(paste0('./data/fx/', cr), pattern = .pattern) %>%
str_extract_all('[0-9]{4}-[0-9]{2}-[0-9]{2}') %>%
unlist %>% as.Date %>% sort
x = x[x >= y[1] & x <= xts::last(y)]
data.frame(.id = cr, x = length(x), n = length(y)) %>%
mutate(progress = percent(x/n))
})# %>% tbl_df
print(z)
prg = sum(z$x)/sum(z$n)
cat('\n================', as.character(percent(prg)), '================\n\n')
if (prg == 1) break #倘若进度达到100%就停止更新。
Sys.sleep(scs) #以上ldply()耗时3~5秒,而休息时间60秒。
}
} else {
cat('Current Tokyo Time :', as.character(now('Asia/Tokyo')), '\n\n')
z <- ldply(mbase, function(dtm) {
y = index(dtm)
y = y[y >= timeID0]
cr = as.character(unique(substr(names(dtm), 1, 6)))
x = list.files(paste0('./data/fx/', cr), pattern = .pattern) %>%
str_extract_all('[0-9]{4}-[0-9]{2}-[0-9]{2}') %>%
unlist %>% as.Date %>% sort
x = x[x >= y[1] & x <= xts::last(y)]
data.frame(.id = cr, x = length(x), n = length(y)) %>%
mutate(progress = percent(x/n))
})# %>% tbl_df
print(z)
prg = sum(z$x)/sum(z$n)
cat('\n================', as.character(percent(prg)), '================\n\n')
}
}
```
```{r check-files, echo = FALSE, eval = FALSE}
## ------------- 查询缺失文件 ----------------------
## 查询缺失文件。
dts <- sapply(mbase, function(x) {
y = index(x)
y[y >= timeID0]
})
sapply(mbase, function(x) as.character(index(x)) %>% as.Date %>% sort)
fls <- sapply(names(cr_code), function(x) {
list.files(paste0('./data/fx/', x), pattern = '^pred1') %>%
str_extract_all('[0-9]{4}-[0-9]{2}-[0-9]{2}') %>%
unlist %>% as.Date %>% sort
})
sapply(fls, function(x) timeID[!timeID %in% x] %>% sort)
timeID <- llply(fls, function(x) timeID[!timeID %in% x] %>% sort) %>% unlist %>% as.Date %>% sort
names(timeID) <- NULL
timeID %<>% unique
```
模拟`calc_fx()`函数预测汇价数据。
```{r sim-pred2, eval = FALSE, warning = FALSE}
## ------------- 模拟calc_fx()预测汇价 ----------------------
pred2 <- list()
for (dt in timeID) {
for (i in seq(cr_code)) {
smp <- mbase[[names(cr_code)[i]]]
dtr <- xts::last(index(smp[index(smp) < dt]), 1) #tail(..., 1)
smp <- smp[paste0(dtr %m-% years(1), '/', dtr)]
pred2[[i]] <- ldply(price_type, function(y) {
df = calc_fx(smp, currency = cr_code[i], price = y)
df = data.frame(Date = index(df[[1]][1]),
Type = paste0(names(df[[1]]), '.', y),
df[[1]], df[[2]], t(df[[3]]))
names(df)[4] %<>% str_replace_all('1', 'T+1')
df
})
if (!dir.exists(paste0('data/fx/', names(pred2[[i]])[3])))
dir.create(paste0('data/fx/', names(pred2[[i]])[3]))
saveRDS(pred2[[i]], paste0(
'data/fx/', names(pred2[[i]])[3], '/pred2.',
unique(pred2[[i]]$Date), '.rds'))
cat(paste0(
'data/fx/', names(pred2[[i]])[3], '/pred2.',
unique(pred2[[i]]$Date), '.rds saved!\n'))
}; rm(i)
}
```
模拟完毕后,再来就查看数据结果。
```{r data-error}
## calC()模拟数据误差率
task_progress(.pattern = '^pred1', .loops = FALSE)
## calc_fx()模拟数据误差率
task_progress(.pattern = '^pred2', .loops = FALSE)
```
以上结果显示,模拟后的数据的误差率非常渺小^[一些数据模拟时,出现不知名错误。]。以下筛选`pred1`与`pred2`同样日期的有效数据。
```{r tidy-data2}
##数据1
fx1 <- llply(names(cr_code), function(x) {
fls <- list.files(paste0('data/fx/', x), pattern = '^pred1')
dfm <- ldply(fls, function(y) {
readRDS(paste0('data/fx/', x, '/', y))
}) %>% data.frame(Cat = 'pred1', .) %>% tbl_df
names(dfm)[4:5] <- c('Price', 'Price.T1')
dfm
})
names(fx1) <- names(cr_code)
##数据2
fx2 <- llply(names(cr_code), function(x) {
fls <- list.files(paste0('data/fx/', x), pattern = '^pred2')
dfm <- ldply(fls, function(y) {
readRDS(paste0('data/fx/', x, '/', y))
}) %>% data.frame(Cat = 'pred2', .) %>% tbl_df
names(dfm)[4:5] <- c('Price', 'Price.T1')
dfm
})
names(fx2) <- names(cr_code)
#合并,并且整理数据。
fx1 %<>% ldply %>% tbl_df
fx2 %<>% ldply %>% tbl_df
fx <- suppressAll(bind_rows(fx1, fx2) %>% arrange(Date) %>%
mutate(.id = factor(.id), Cat = factor(Cat)) %>%
ddply(.(Cat, Type), function(x) {
x %>% mutate(Price.T1 = lag(Price.T1, 1))
}) %>% tbl_df %>%
dplyr::filter(Date >= ymd('2013-01-01') & Date <= ymd('2017-08-30')))
rm(fx1, fx2)
```
```{r tidy-data3}
## filter all predictive error where sd >= 20%.
notID <- fx %>% mutate(diff = abs(Price.T1/Price), se = ifelse(diff <= 0.8 | diff >= 1.25, 1, 0)) %>% dplyr::filter(se == 1)
ntimeID <- notID %>% .$Date %>% unique
notID %>%
kable(caption = 'Error data') %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
scroll_box(width = '100%', height = '400px')
```
僕尝试运行好几次,`USDCHF`都是获得同样的结果。然后将默认的`snorm`分布更换为`norm`就没有出现错误。至于`USDCNY`原始数据有误就不是统计模型的问题了。
```{r tidy-data4}
fx %<>% dplyr::filter(!Date %in% ntimeID)
```
## 精准度
现在就比较下双方的MSE值与AIC值。
```{r aic1}
acc <- ddply(fx, .(Cat, Type), summarise,
mse = mean((Price.T1 - Price)^2),
n = length(Price),
Akaike.mse = (-2*mse)/n+2*4/n,
Akaike = mean(Akaike),
Bayes = mean(Bayes),
Shibata = mean(Shibata),
Hannan.Quinn = mean(Hannan.Quinn)) %>%
tbl_df %>% mutate(mse = round(mse, 6)) %>%
arrange(Type)
acc %>%
kable(caption = 'Group Table Summary') %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
group_rows('USD/AUD Open', 1, 2, label_row_css = 'background-color: #e68a00; color: #fff;') %>%
group_rows('USD/AUD High', 3, 4, label_row_css = 'background-color: #e68a00; color: #fff;') %>%
group_rows('USD/AUD Low', 5, 6, label_row_css = 'background-color: #e68a00; color: #fff;') %>%
group_rows('USD/AUD Close', 7, 8, label_row_css = 'background-color: #e68a00; color: #fff;') %>%
group_rows('USD/EUR Open', 9, 10, label_row_css = 'background-color: #6666ff; color: #fff;') %>%
group_rows('USD/EUR High', 11, 12, label_row_css = 'background-color: #6666ff; color: #fff;') %>%
group_rows('USD/EUR Low', 13, 14, label_row_css = 'background-color:#6666ff; color: #fff;') %>%
group_rows('USD/EUR Close', 15, 16, label_row_css = 'background-color: #6666ff; color: #fff;') %>%
group_rows('USD/GBP Open', 17, 18, label_row_css = 'background-color: #339966; color: #fff;') %>%
group_rows('USD/GBP High', 19, 20, label_row_css = 'background-color: #339966; color: #fff;') %>%
group_rows('USD/GBP Low', 21, 22, label_row_css = 'background-color: #339966; color: #fff;') %>%
group_rows('USD/GBP Close', 23, 24, label_row_css = 'background-color: #339966; color: #fff;') %>%
group_rows('USD/CHF Open', 25, 26, label_row_css = 'background-color: #808000; color: #fff;') %>%
group_rows('USD/CHF High', 27, 28, label_row_css = 'background-color: #808000; color: #fff;') %>%
group_rows('USD/CHF Low', 29, 30, label_row_css = 'background-color: #808000; color: #fff;') %>%
group_rows('USD/CHF Close', 31, 32, label_row_css = 'background-color: #808000; color: #fff;') %>%
group_rows('USD/CAD Open', 33, 34, label_row_css = 'background-color: #666; color: #fff;') %>%
group_rows('USD/CAD High', 35, 36, label_row_css = 'background-color: #666; color: #fff;') %>%
group_rows('USD/CAD Low', 37, 38, label_row_css = 'background-color: #666; color: #fff;') %>%
group_rows('USD/CAD Close', 39, 40, label_row_css = 'background-color: #666; color: #fff;') %>%
group_rows('USD/CNY Open', 41, 42, label_row_css = 'background-color: #e60000; color: #fff;') %>%
group_rows('USD/CNY High', 43, 44, label_row_css = 'background-color: #e60000; color: #fff;') %>%
group_rows('USD/CNY Low', 45, 46, label_row_css = 'background-color: #e60000; color: #fff;') %>%
group_rows('USD/CNY Close', 47, 48, label_row_css = 'background-color: #e60000; color: #fff;') %>%
group_rows('USD/JPY Open', 49, 50, label_row_css = 'background-color: #ff3377; color: #fff;') %>%
group_rows('USD/JPY High', 51, 52, label_row_css = 'background-color: #ff3377; color: #fff;') %>%
group_rows('USD/JPY Low', 53, 54, label_row_css = 'background-color: #ff3377; color: #fff;') %>%
group_rows('USD/JPY Close', 55, 56, label_row_css = 'background-color: #ff3377; color: #fff;') %>%
scroll_box(width = '100%', height = '400px')
```
```{r aic2}
acc <- ddply(fx, .(Cat, .id), summarise,
mse = mean((Price.T1 - Price)^2),
n = length(Price),
Akaike.mse = (-2*mse)/n+2*4/n,
Akaike = mean(Akaike),
Bayes = mean(Bayes),
Shibata = mean(Shibata),
Hannan.Quinn = mean(Hannan.Quinn)) %>%
tbl_df %>% mutate(mse = round(mse, 6)) %>%
arrange(.id)
acc %>%
kable(caption = 'Group Table Summary') %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
group_rows('USD/AUD', 1, 2, label_row_css = 'background-color: #003399; color: #fff;') %>%
group_rows('USD/CAD', 3, 4, label_row_css = 'background-color: #003399; color: #fff;') %>%
group_rows('USD/CHF', 5, 6, label_row_css = 'background-color: #003399; color: #fff;') %>%
group_rows('USD/CNY', 7, 8, label_row_css = 'background-color: #003399; color: #fff;') %>%
group_rows('USD/EUR', 9, 10, label_row_css = 'background-color: #003399; color: #fff;') %>%
group_rows('USD/GBP', 11, 12, label_row_css = 'background-color: #003399; color: #fff;') %>%
group_rows('USD/JPY', 13, 14, label_row_css = 'background-color: #003399; color: #fff;') %>%
scroll_box(width = '100%', height = '400px')
```
```{r aic3}
acc <- ddply(fx, .(Cat), summarise,
mse = mean((Price.T1 - Price)^2),
n = length(Price),
Akaike.mse = (-2*mse)/n+2*4/n,
Akaike = mean(Akaike),
Bayes = mean(Bayes),
Shibata = mean(Shibata),
Hannan.Quinn = mean(Hannan.Quinn)) %>%
tbl_df %>% mutate(mse = round(mse, 6))
acc %>%
kable(caption = 'Group Table Summary') %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive'))
```
# 结论
结果新的Fi-gjrGARCH函数pred2胜出,比旧的gjrGARCH的pred1更优秀,证明`p`值、`d`值与`q`值仨都可以优化。目前正在编写着[Q1App2](https://beta.rstudioconnect.com/content/3138/)自动交易应用。“商场如战场”,除了模式最优化以外,程序运作上分秒必争... `microbenchmark`测试效率,之前编写了个[DataCollection](https://beta.rstudioconnect.com/content/3153/)应用采集实时数据以方便之后的高频率交易自动化建模^[不过数据量多就会当机,得继续提升才行。]。欲知更多详情,请参阅[Real Time FXCM](https://github.com/scibrokes/real-time-fxcm)。
[Generalised Autoregressive Conditional Heteroskedasticity GARCH(p, q) Models for Time Series Analysis](https://www.quantstart.com/articles/Generalised-Autoregressive-Conditional-Heteroskedasticity-GARCH-p-q-Models-for-Time-Series-Analysis):
- [Discrete White Noise and Random Walks](https://www.quantstart.com/articles/White-Noise-and-Random-Walks-in-Time-Series-Analysis)
- [AR(p)](https://www.quantstart.com/articles/Autoregressive-Moving-Average-ARMA-p-q-Models-for-Time-Series-Analysis-Part-1)
- [MA(q)](https://www.quantstart.com/articles/Autoregressive-Moving-Average-ARMA-p-q-Models-for-Time-Series-Analysis-Part-2)
- [ARMA(p,q)](https://www.quantstart.com/articles/Autoregressive-Moving-Average-ARMA-p-q-Models-for-Time-Series-Analysis-Part-3)
- [ARIMA(p,d,q)](https://www.quantstart.com/articles/Autoregressive-Integrated-Moving-Average-ARIMA-p-d-q-Models-for-Time-Series-Analysis)
**投注模式**
![](www/fractional-kelly.jpg)
除此之外,由于$k=\frac{1}{2}$凯里模式开始时期的增长率比`k=1`高,故此`k`值可设置为`0.5 ≤ k ≤ 1`。[Application of Kelly Criterion model in Sportsbook Investment](https://github.com/scibrokes/kelly-criterion)科研也将着手於凯里模式中的`k`值浮动化。
# 附录
## 文件与系统资讯
以下乃此文献资讯:
- 文件建立日期:2018-08-07
- 文件最新更新日期:`r today('Asia/Tokyo')`
- `r R.version.string`
- R语言版本:`r getRversion()`
- [**rmarkdown** 程序包](https://github.com/rstudio/rmarkdown)版本:`r packageVersion('rmarkdown')`
- 文件版本:1.0.1
- 作者简历:[®γσ, Eng Lian Hu](https://beta.rstudioconnect.com/content/3091/ryo-eng.html)
- GitHub:[源代码](https://github.com/englianhu/binary.com-interview-question)
- 其它系统资讯:
```{r info, echo = FALSE, warning = FALSE, results = 'asis'}
suppressMessages(require('dplyr', quietly = TRUE))
suppressMessages(require('formattable', quietly = TRUE))
sys1 <- devtools::session_info()$platform %>% unlist %>% data.frame(Category = names(.), session_info = .)
rownames(sys1) <- NULL
#'@ sys1 %>% formattable %>% as.htmlwidget
sys2 <- data.frame(Sys.info()) %>% mutate(Category = rownames(.)) %>% .[2:1] %>% rename(Sys.info = Sys.info..)
#'@ sys2 %>% formattable %>% as.htmlwidget
sys1 %<>% rbind(., data.frame(Category = 'Current time', session_info = paste(as.character(now('Asia/Tokyo')), 'JST')))
cbind(sys1, sys2) %>%
kable(caption = 'System Summary') %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive'))
rm(sys1, sys2)
```
## 参考文献
01. [How does one specify arima (p,d,q) in ugarchspec for ugarchfit in rugarch?](https://stats.stackexchange.com/questions/73351/how-does-one-specify-arima-p-d-q-in-ugarchspec-for-ugarchfit-in-rugarch)<img src='www/hot.jpg' width='20'>
02. [How to read p,d and q of auto.arima()?](https://stats.stackexchange.com/questions/178577/how-to-read-p-d-and-q-of-auto-arima)
03. [binary.com : Job Application - Quantitative Analyst](https://github.com/englianhu/binary.com-interview-question)
--------------------
**Powered by - Copyright® Intellectual Property Rights of <img src='www/oda-army2.jpg' width='24'> [Scibrokes®](http://www.scibrokes.com)個人の経営企業**