Convergence of Generators
Convergence of a generator from data is more subtle than the convergence of a transfer operator from data due to the presence of time scales in the former. For a fixed timestep size $\Delta t$ and infinite data we have convergence to the associated transfer operator. Thus we take the limit of $\Delta t \rightarrow 0$ afterwards to get convergence to the generator.
Given the exact generator
Q_exact = [-1.0 2.0; 1.0 -2.0]
+Q_exact
2×2 Matrix{Float64}:
+ -1.0 2.0
+ 1.0 -2.0
We generate a Markov chains of increasing size at a fixed timestep $dt$ and compute the empirical transition matrix. We then verify that the empirical transition matrix converges to the exact transition operator. Let us first generate a Markov chain of size 100.
using MarkovChainHammer.Trajectory: generate
+dt = 1.0
+steps = 100
+markov_chain = generate(Q_exact, steps; dt=dt);
We now load the empirical operator constructors for both the generator and the transfer operator in order to seperately check for convergence.
First let us construct the transfer operator
using MarkovChainHammer.TransitionMatrix: perron_frobenius
+ℳ_empirical = perron_frobenius(markov_chain)
2×2 Matrix{Float64}:
+ 0.625 0.657143
+ 0.375 0.342857
We see that the above matrix is close to the exact transfer operator
ℳ_exact = exp(Q_exact)
2×2 Matrix{Float64}:
+ 0.683262 0.633475
+ 0.316738 0.366525
Now let us construct the generator
using MarkovChainHammer.TransitionMatrix: generator
+Q_empirical = generator(markov_chain; dt = dt)
2×2 Matrix{Float64}:
+ -0.375 0.666667
+ 0.375 -0.666667
We see that the above matrix is quite far from the exact generator, which leads to the question "what happened?". The resolution to this dilemma is to observe that timescales of the markov chain were too large as compared to the expected holding times of the exact generator, which are 1.0 for state 1 and $1/2$ for state 2. If the chain is instead generated with smaller timesteps
dt = 0.1
+markov_chain = generate(Q_exact, steps; dt=dt);
+Q_empirical = generator(markov_chain; dt = dt)
2×2 Matrix{Float64}:
+ -0.694444 1.42857
+ 0.694444 -1.42857
We can see that the estimate of the generator has improved. Note that we can also calculate the generator by taking the matrix logarithm and dividing by the timescale to get a similar answer
log(perron_frobenius(markov_chain)) / dt
2×2 Matrix{Float64}:
+ -0.628269 1.59311
+ 0.628269 -1.59311
Let us now automate the process of checking convergence as function of fixed timestep size. We generate a Markov chain of increasing size and compute the empirical transition matrix. We then verify that the empirical transition matrix converges to the exact transition operator. We use the function norm
from the LinearAlgebra
package to compute the norm of the difference between the empirical and exact transition matrices. We use the difference between the matrices divided by the norm of the exact matrix to get a relative error. At small timesteps the transfer operator converges to the identity matrix, hence we divide the error by a factor of $dt$ in order to make a fair comparison to the relative error of the generator matrix. The takeaway message is that the error of the generator is bounded by both the timestep size and the available number of independent samples in the data. Here is the code to verify this statement
using LinearAlgebra: norm
+println("Transfer Operator Convergence")
+for dt in [1.0, 0.1, 0.01, 0.001]
+ ℳ_exact_local = exp(Q_exact * dt)
+ println("For a timestep of $dt")
+ for i in 2:2:6
+ n = 10^i
+ markov_chain_local = generate(ℳ_exact_local, n)
+ number_of_states = 2
+ ℳ_empirical_local = perron_frobenius(markov_chain_local, number_of_states)
+ empirical_error = norm(ℳ_empirical_local - ℳ_exact_local) / norm(ℳ_exact_local) / dt
+ println("A chain of size 10^$i yields a relative empirical error of $(empirical_error)")
+ end
+ println("---------------------------")
+end
+
+println("---------------------------")
+println("Generator Convergence")
+for dt in [1.0, 0.1, 0.01, 0.001]
+ ℳ_exact_local = exp(Q_exact * dt)
+ println("For a timestep of $dt")
+ for i in 2:2:6
+ n = 10^i
+ markov_chain_local = generate(ℳ_exact_local, n)
+ number_of_states = 2
+ Q_empirical_local = generator(markov_chain_local, number_of_states; dt = dt)
+ empirical_error = norm(Q_empirical_local - Q_exact) / norm(Q_exact)
+ println("A chain of size 10^$i yields a relative empirical error of $(empirical_error)")
+ end
+ println("---------------------------")
+end
Transfer Operator Convergence
+For a timestep of 1.0
+A chain of size 10^2 yields a relative empirical error of 0.2052272765020095
+A chain of size 10^4 yields a relative empirical error of 0.0009399099727159542
+A chain of size 10^6 yields a relative empirical error of 0.00044363459027215016
+---------------------------
+For a timestep of 0.1
+A chain of size 10^2 yields a relative empirical error of 0.6406130397167876
+A chain of size 10^4 yields a relative empirical error of 0.024372108654314568
+A chain of size 10^6 yields a relative empirical error of 0.00969501376207933
+---------------------------
+For a timestep of 0.01
+A chain of size 10^2 yields a relative empirical error of 13.527939471702238
+A chain of size 10^4 yields a relative empirical error of 0.1376897600902883
+A chain of size 10^6 yields a relative empirical error of 0.024057157177490955
+---------------------------
+For a timestep of 0.001
+A chain of size 10^2 yields a relative empirical error of 2.2360632258701383
+A chain of size 10^4 yields a relative empirical error of 1.3390790093116096
+A chain of size 10^6 yields a relative empirical error of 0.06102579821063059
+---------------------------
+---------------------------
+Generator Convergence
+For a timestep of 1.0
+A chain of size 10^2 yields a relative empirical error of 0.6603436539783516
+A chain of size 10^4 yields a relative empirical error of 0.6791138371288149
+A chain of size 10^6 yields a relative empirical error of 0.6838007838954553
+---------------------------
+For a timestep of 0.1
+A chain of size 10^2 yields a relative empirical error of 1.3652450841832677
+A chain of size 10^4 yields a relative empirical error of 0.139696015339922
+A chain of size 10^6 yields a relative empirical error of 0.13947707201691414
+---------------------------
+For a timestep of 0.01
+A chain of size 10^2 yields a relative empirical error of 1.8027440423029175
+A chain of size 10^4 yields a relative empirical error of 0.05086091818483745
+A chain of size 10^6 yields a relative empirical error of 0.01982217935355222
+---------------------------
+For a timestep of 0.001
+A chain of size 10^2 yields a relative empirical error of 1.0
+A chain of size 10^4 yields a relative empirical error of 0.1861324264839971
+A chain of size 10^6 yields a relative empirical error of 0.033407350060732074
+---------------------------
In particular, for the convergence of the generator, we see that the error is roughly on the order of the timestep size, except for the last case. A heuristic explanation for this behavior is as follows. If we think of each entry of the matrix as a random variable then the convergence to the expected value (the exact entry of the matrix) is inversely proportional to the square root of the number of independent samples. As one decreases the timestep size, the number of independent samples decrease since the number of timesteps is fixed, that is to say we are looking at the simulation for an increasingly shorter amount of physical time. Roughly speaking one would expect the error to be
\[error \propto \max \left(dt, \frac{1}{\sqrt{S}} \right)\]
where $S$ is the number of independent samples. In the last case, since the largest decorrelation time of the markov chain is on the order of 1 time unit, we expect the number of independent samples in a physical time interval of $10^6 dt = 10^3$ to be roughly on the order of $10^3 / 5$, where 5 time units is taken as a decorrelation threshold for the markov chain. We see that the error is
approximate_error = 1 / sqrt(10^3 / 5)
0.07071067811865475
which is roughly the same order of magnitude as the final error.
We haven't discussed what happens as the number of states increases and the more nuanced behavior of both the generator matrix and transfer operator as the number of states increases. In general, the more states that one has the more data is needed to estimate the different entries of the matrix; however this need not always be the case, as there are some interesting exceptions to this rule. In particular if there is an underlying (or approximate) tensor product structure to the generator or transfer operator then the number of independent samples needed to estimate the entries of the matrix can be reduced.