Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Different results every run #1128

Open
gusamarante opened this issue Jan 13, 2025 · 1 comment
Open

[Question] Different results every run #1128

gusamarante opened this issue Jan 13, 2025 · 1 comment

Comments

@gusamarante
Copy link

Description

I am trying to estimate a 4-state Gaussian Hidden Markov Model. Two things are happening:

  1. Every time I run the same code, I am getting different estimates for the parameters, even when setting a random seed. Is this the behavior?
  2. Even though the transition matrix edges is different every time, the computed steady state distribution always ends up in uniform distribution.

It may very well be the case that I am doing something wrong, but I went deep into the documentation and could not find something to help. Due to the sample size, I believe there is no identification problem.

Reproduction Code

I am reading data from this excel worksheet.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from pomegranate.distributions import Normal
from pomegranate.hmm import DenseHMM

n_states = 4

# READ DATA
file_path = r"path/to/file/NAVs.xlsx"
df = pd.read_excel(file_path, index_col=0)
df.index = pd.to_datetime(df.index)
rets = df.resample("M").last().pct_change(1).dropna()  # about 330 lines and 5 columns

# THE MODEL
hmm = DenseHMM(
    distributions=[Normal() for _ in range(n_states)],
    verbose=True,
)
hmm = hmm.fit(X=np.array([rets.values]))  # to make sure that X is 3D

# Transition Probability (changes every time I run)
trans_prob = pd.DataFrame(np.exp(np.array(hmm.edges)))
trans_prob = trans_prob.div(trans_prob.sum(axis=1), axis=0)  # Reduce numerical error

# Stationary distribution (always outputs a uniform distribution)
vals, vecs = np.linalg.eig(trans_prob)
stat_dist = pd.Series(vecs[:, np.argmax(vals)])
stat_dist = stat_dist * np.sign(stat_dist)
stat_dist = stat_dist / stat_dist.sum()
stat_dist.plot(kind="bar")
plt.show()

# States probabilities (changes every time I run)
state_probs = pd.DataFrame(data=hmm.predict_proba(np.array([rets.values]))[0], index=rets.index)
state_probs.plot()
plt.show()

# Predicted / Most likely State (changes every time I run)
state_pred = pd.Series(data=hmm.predict(np.array([rets.values]))[0], index=rets.index)
state_pred.plot()
plt.show()
@gusamarante gusamarante changed the title Different results every run [Question] Different results every run Jan 13, 2025
@jmschrei
Copy link
Owner

Sorry you're encountering this issue.

The only source of randomness in pomegranate's HMMs should be in the initial clustering. The predictions do not involve randomness at all. How are you setting a seed? You might need to set both the numpy and torch random seeds. Unfortunately, torch is a bit challenging to ensure randomness for. Maybe you could try the first-k initialization and see? Can you ost your code where you set the random seed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants