Skip to content

Commit

Permalink
Updated ✍
Browse files Browse the repository at this point in the history
  • Loading branch information
shayandavoodii committed Mar 13, 2024
1 parent 3a0b3cf commit 91f0645
Showing 1 changed file with 65 additions and 3 deletions.
68 changes: 65 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,68 @@
# CepstralCoefficients.jl

[![CI](https://github.com/shayandavoodii/CepstralCoefficients.jl/actions/workflows/ci.yml/badge.svg)](https://github.com/shayandavoodii/CepstralCoefficients.jl/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/shayandavoodii/CepstralCoefficients.jl/graph/badge.svg?token=A70LOIP6F9)](https://codecov.io/gh/shayandavoodii/CepstralCoefficients.jl)
This repo contains the Julia implementation of a study entitled "Distance Measures for Effective Clustering of ARIMA Time-Series"[[1](https://doi.org/10.1109/ICDM.2001.989529)]. The research proposes a distinguished approach to classify time series regarding their inner patterns. The authors used the Cepstral Analysis concept to find the patterns underneath the time series.
This package provides different methods of calculating cepstral coefficients. Three implemented methods are as follows:

1. Cepstral coefficients based on Auto Regressive Moving Average (ARMA) coefficients
2. Cepstral coefficients based on Auto Regressive (AR) coefficients
3. Real cepstral coefficients

---

- **Cepstral coefficients based on ARMA coefficients**

Considering the following $ARMA(p, q)$ process:

```math
{X_t} = \sum\limits_{i = 1}^p {{\phi _r}{X_{t - r}} + {\epsilon_t} + \sum\limits_{i = 1}^q {{\theta _r}} } {\epsilon_{t - r}}
```

where ${\phi _r}{\rm{, }}r = 1,2,...,p$ are the autoregressive (AR) parameters, ${\theta _r}{\rm{, }}r = 1,2,...,q$ are the moving average (MA) parameters and ${{\varepsilon _t}}$, is a white noise process. The spectral density of an $ARMA(p, q)$ process is defined as

```math
{f_x}(\omega ) = {{{\sigma ^2}} \over {2\pi }}\left| {{{1 - \sum\limits_{h - 1}^p {{\phi _h}{e^{ihw}}} } \over {1 - \sum\limits_{h - 1}^q {{\theta _h}{e^{ihw}}} }}} \right|
```

where ${{\sigma ^2}}$ is the variance of ${{\varepsilon _t}}$. The logarithm of an estimated spectral density function can be approximated using an exponential form for the log spectral density function, namely,

```math
{\lambda _x}(\omega ) = \log {f_x}(\omega ) = {{{\sigma ^2}} \over {2\pi }}\exp (2\sum\limits_{h - 1}^p {{\psi _h}\cos (h\omega )} )
```

where $0 < \omega < \pi $, and where ${\sigma ^2}$ and ${\psi _1},{\psi _2},...,{\psi _p}$ are unknown parameters. The following approximation of the log of the log spectral density function, namely, the spectrum of the log spectral density function, the cepstrum of ${X_t}$ is intruduced:

```math
CP(\omega ) = \log {\lambda _x}(\omega ) = {\psi _0} + 2\sum\limits_{h - 1}^p {{\psi _k}\cos (2\pi h)}
```

where ${\psi _0} = \int\limits_0^1 {\log {\lambda _x}(\omega )d} \omega $ is the logarithm of the variance of the white noise process ${{\varepsilon _t}}$. Under the absolute integrability on (0,1) of $\log {\lambda _x}(\omega )$, the Fourier coefficients of the expansion of $\log {\lambda _x}(\omega )$ are defined by:

```math
{\psi _k} = \int\limits_0^1 {\log {\lambda _x}(\omega )\cos (2\pi k)d} \omega {\rm{ }}
```

for $k = 0,1,2,...$ and are referred to as the cepstral coefficients. Due to the convergence in mean square of $\log {f_x}(\omega )$ with increasing $p$, only a small number of cepstral coefficients can describe the second order characteristics of a time series [[1](https://doi.org/10.1016/j.eswa.2020.113705)].

---

- **Cepstral coefficients based on AR coefficients**

Consider a time–series $X_t$ defined by an $AR(p)$ model $X_t+\alpha_1X_{t-1}+\dots+\alpha_pX_{t-p}=\epsilon_t$ where $\alpha_1+\dots+\alpha_p$ are the auto-regression coefficients and $\epsilon_t$ is white noise with mean $0$ and certain non-zero variance. Note that for every ARIMA model there exists an equivalent AR model, that can be obtained from the ARIMA model by polynomial division. Hence, without loss of generality, for the remainder of this paper we focus on AR time–series.
The cepstral coefficients for an $AR(p)$ time–series can be derived from the auto-regression coefficients [[2](https://doi.org/10.1109/ICDM.2001.989529)]:

```math
{c_n} = \left\{ {\begin{array}{*{20}{c}}
{ - {\alpha _1},}&{{\text{if n = 1}}} \\
{ - {\alpha _n} - \sum\nolimits_{m = 1}^{n - 1} {\left( {1 - \frac{m}{n}} \right){\alpha _m}{c_{n - m}},} }&{{\text{if }}1 < n \leqslant p} \\
{ - \sum\nolimits_{m = 1}^p {\left( {1 - \frac{m}{n}} \right){\alpha _m}{c_{n - m}},} }&{{\text{if }}p < n}
\end{array}} \right.
```

---

- **Real cepstral coefficients**

The (real) cepstrum is defined as the inverse Fourier transform of the (real) logarithm of the Fourier transform of the time series.

## How to use?

Expand Down Expand Up @@ -39,7 +101,7 @@ querry = [get_prices(ticker, startdt="2019-01-01", enddt="2020-01-01")["adjclose
prices = stack(querry, dims=1);
```

Afterward, the [`cc`](https://github.com/shayandavoodii/TimeSeries-Cepstral-Clustering/blob/b586666d6764ac4e742cc07549c0247be30baa1b/src/CepstralClustering.jl#L12-L46) function is employed to calculate the cepstral coefficients.
Afterward, the [`cc`](https://github.com/shayandavoodii/TimeSeries-Cepstral-Clustering/blob/b586666d6764ac4e742cc07549c0247be30baa1b/src/CepstralClustering.jl#L15-L96) function is employed to calculate the cepstral coefficients.

### Calculate cepstral coefficients

Expand Down Expand Up @@ -95,6 +157,7 @@ plot(
left_margin=6mm,
)
```

![img](https://github.com/shayandavoodii/TimeSeries-Cepstral-Clustering/blob/main/assets/StockPrices.png)

The results are not satisfactory, which is expected since the PAM clustering is inaccurate due to its random initialization. The random initialization may result in a nonoptimal solution. As seen in the figure above, the 'ABDE' and 'NVDA' series follow similar patterns but are in different clusters; this is surprising because the opposite was expected.
Expand All @@ -113,4 +176,3 @@ The results are not satisfactory, which is expected since the PAM clustering is
doi={10.1109/ICDM.2001.989529}
}
```

0 comments on commit 91f0645

Please sign in to comment.