The R package ssar is a fast implementation of Gillespie's Stochastic Simulation Algorithm. It combines R's graphical and statistical capabilities with the speed of C++. In addition, the package allows for simulation of stochastic processes with time-dependent propensity functions Thus, ssar represents an improvement over the previous package available at CRAN GillespieSSA.
This package is for you if:
-
You want to simulate Continuous Time Markov Chains (CTMC), Stochastic Compartmental Models (like the ones in chemistry, ecology, epidemiology).
-
You want to use Gillespie's Stochastic Simulation Algorithm with time-dependent parameters, random parameters and/or time-dependent random parameters. (Areas might include Bayesian MonteCarlo parameter estimation of some Stochastic Processes or Inhomogeneous Continuous Time Markov Chains)
-
You are tired of the current packages being too slow.
-
You want to have fun simulating stuff!
ssar is still at its developmental stage. You need to install from github:
install.packages("devtools")
devtools::install_github("INSP-RH/ssar")
To adecquately run ssar
you need to install a C++
compiler:
- On Windows, install Rtools.
- On Mac, install Xcode from the App store.
- On Linux,
sudo apt-get install r-base-dev
or similar.
After installing you need to call the ssar package for using it.
library(ssar)
First, we set the seed for the simulation.
set.seed(123)
Initial data must be inputed as a matrix.
X <- matrix(c(N=500), nrow = 1)
The propensity vector should also be in matrix form:
v <- matrix( c(+1, -1), ncol = 2)
The propensity scores must also be a matrix-valued function depdendent on 3 parameters: time (t
), the state of the system (X
) and additional parameters (params
) which we discuss later.
pfun <- function(t,X,params){ cbind(2 * X[, 1], (1 + 1*X[, 1]/1000)*X[, 1]) }
The model runs automatically from 0 to 1 conducting 10 simulations and generating a plot.
simulation <- ssa(X, pfun, v)
The nsim
variable specifies the number of simulations. Suppose we want 20 of them:
simulation <- ssa(X, pfun, v, nsim = 20)
The tmin
and tmax
variables specify the initial time and final time of the process. Suppose we want to simulate from time 2 to time 10 with 20 simulations:
simulation <- ssa(X, pfun, v, tmin = 2, tmax = 10, nsim = 20)
Plot characteristics can be specified by title
, xlab
and ylab
:
simulation <- ssa(X, pfun, v, tmin = 2, tmax = 10, nsim = 20,
title = "Logistic Growth: Example 1",
xlab = "Time", ylab = "Individuals")
Making plots can really slow down the process. The option: plot.sim
when set to FALSE
allows us to keep the data without making any plot:
simulation <- ssa(X, pfun, v, tmin = 2, tmax = 10, nsim = 20, plot.sim = FALSE)
The simulation
dataframe looks like this:
## Simulation Iteration Time Var1
## 1 1 0 2 500
## 2 2 0 2 500
## 3 3 0 2 500
## 4 4 0 2 500
## 5 5 0 2 500
## 6 6 0 2 500
Suppose we are using almost the same model as in the previous example:
set.seed(322)
X <- matrix(c(N=500), nrow = 1) #Initial values
v <- matrix( c(+1, -1), ncol = 2) #Propensity scores
But the propensity function now depends on time:
pfun <- function(t,X,params){ cbind(2 * X[, 1],
(2 + sin(t*pi)*X[, 1]/1000)*X[, 1]) }
Simulation is done in exactly the same manner as previously done. No change needed!
simulation <- ssa(X, pfun, v, tmin = 2, tmax = 10, nsim = 20,
title = "Time-dependent Logistic Growth: Example2",
xlab = "Time", ylab = "Individuals")
This is a new model given by the following parameters:
#Start the parameters
X <- matrix(c(N = 10), nrow = 1)
pfun <- function(t, X, params){ cbind(1.1 + sin(pi*t/0.01))*X[,1] }
v <- matrix( c(+1), ncol=1)
simulation <- ssa(X, pfun, v,
title = "Example 3", xlab = "Time", ylab = "Value")
The option maxiter
establishes the maximum number of iterations done by the model before stopping. For example if we wish to know where the model is after 100 changes set maxiter = 100
:
simulation <- ssa(X, pfun, v, maxiter = 100,
title ="Model after 100 changes: Example 3",
xlab = "Time", ylab = "Value")
The option print.time
prints to screen at what in time of the simulation we are. For example if the model goes from tmin = 0
to tmax = 1
setting print.time = TRUE
will print at which moment in time the model is simulating:
simulation <- ssa(X, pfun, v, maxiter = 100, tmin = 0, tmax = 1,
plot.sim = FALSE, print.time = TRUE)
## Time = 0
## Time = 0
## Time = 0.00284842
## Time = 0.0210763
## Time = 0.115414
## Time = 0.219506
## Time = 0.277526
The option maxtime
establishes how much computer-time (in seconds) will be used for the model. This is specially useful for models which might take a lot of time to run. In the following example, we run the model for 2 seconds:
simulation <- ssa(X, pfun, v, maxtime = 2,
title ="Model after 2 seconds: Example 3",
xlab = "Time", ylab = "Value" )
We find it easier to assign the parameters (constants) used by the propensity function as a separate vector. This is done in the following simulation:
#Set seed
set.seed(3289650)
#Get initial parameters
params <- c(a = 3, b = 0.01, c = 2)
X <- matrix(c(100, 100), ncol = 2)
#Propensity function
pfun <- function(t, X, params){ cbind(params[1]*t*X[,1] + 1,
params[2]*X[,1]*X[,2],
params[3]*X[,2]) }
#Propensity score
v <- matrix(c(+1,-1,0,0,+1,-1),nrow=2,byrow=TRUE)
#Simulate
simulation <- ssa(X, pfun, v, params,
title = "Example 4: Time-dependent Lotka-Volterra",
xlab = "Time", ylab = "Number of individuals")
The ssa
function works by creating a file called "Temporary_File_ssa.txt". Setting to TRUE
option keep.file
does not remove the temporary file. Furthermore, the option fname
allows you to rename the file. This option is really helpful if you want to keep a database of your simulation:
simulation <- ssa(X, pfun, v, params, keep.file = TRUE, fname ="My_simulation.txt",
plot.sim = FALSE)
You can read the file with the read.table
function:
sim <- read.table("My_simulation.txt", header = TRUE)
## Simulation Iteration Time Var1 Var2
## 1 1 0 0 100 100
## 2 2 0 0 100 100
## 3 3 0 0 100 100
## 4 4 0 0 100 100
## 5 5 0 0 100 100
## 6 6 0 0 100 100
If you are a ggplot2
kind of person you can plot easily your simulations:
library(ggplot2)
ggplot(data = sim, aes(x = Time, group = as.factor(Simulation))) +
geom_line(aes(y = Var1, color = "Prey")) +
geom_line(aes(y = Var2, color = "Predator")) +
ggtitle("Example 4: Lotka Volterra with ggplot2") +
xlab("Time") + ylab("Individuals") +
scale_color_manual("Creature",
values = c("Prey" = "deepskyblue4","Predator" = "tomato3"))
This is almost the same Lotka-Volterra model; however in this case the parameters a and b are random variables.
#Set seed
set.seed(3289650)
#Get initial parameters
params <- c(amu = 3, asd = 0.01, bmin = 0.001, bmax = 0.015, c = 2)
X <- matrix(c(100, 100), ncol = 2)
#Propensity function
pfun <- function(t, X, params){ cbind(rnorm(1,params[1], params[2])*X[,1] + 1,
runif(1,params[3],params[4])*X[,1]*X[,2],
params[5]*X[,2]) }
#Propensity score
v <- matrix(c(+1,-1,0,0,+1,-1),nrow=2,byrow=TRUE)
#Simulate
simulation <- ssa(X, pfun, v, params,
title = "Example 5: Lotka-Volterra with random variables",
xlab = "Time", ylab = "Number of individuals")
Notice that the random variables in the model can also be time-dependent:
#Propensity function
pfun <- function(t, X, params){
cbind(rnorm(1,t + params[1], params[2])*X[,1] + 1,
runif(1,params[3],params[4])*X[,1]*X[,2], params[5]*X[,2]) }
#Simulate
simulation <- ssa(X, pfun, v, params,
title = "Example 5: Lotka-Volterra with time-dependent random variables",
xlab = "Time", ylab = "Number of individuals")
Sometimes your model might take a lot of time to run. The following list of options might help you speed it up:
- Do not print the current time:
print.time = FALSE
The fastest way to speed up your code is via the file.only
and kthsave
options
As we said in the previous section, the program generates a Temporary File. The file.only
option generates the file but does not return any values to R
nor does it generate a plot. It is meant for making fast simulations in which the user might not be interested in generating a plot inside the function.
As an example, consider the Lotka-Volterra model.
#Set seed
set.seed(3289650)
#Get initial parameters
params <- c(a = 3, b = 0.01, c = 2)
X <- matrix(c(100, 100), ncol = 2)
#Propensity function
pfun <- function(t, X, params){ cbind(params[1]*t*X[,1] + 1,
params[2]*X[,1]*X[,2],
params[3]*X[,2]) }
#Propensity score
v <- matrix(c(+1,-1,0,0,+1,-1),nrow=2,byrow=TRUE)
Without the file.only
option:
#Simulate
simulation <- ssa(X, pfun, v, params, plot.sim = FALSE)
With the file.only
option:
#Simulate
simulation <- ssa(X, pfun, v, params, file.only = TRUE)
## [1] "********** OVERALL TIME EVALUATION **********"
## [1] "Normal eval: 0.0787589550018311"
## [1] "file.only = TRUE: 0.0568051338195801"
## [1] "*********************************************"
This might not look as fast; However in bigger files, it is really important. Additional benchmarks are provided in the Benchmarking section
The Stochastic Simulation Algorithm computes and saves every transition made in the model. This might not be a problem for short simulations; but in the long run generates large databases which are pretty memory intensive. The kthsave
option is here to help.
Consider the following model which is a variant of the SIS model for epidemics:
#Initial parameters
k <- 24576.5529836797
delta <- 0.0591113454895868 + 0.208953907151055
gamma_ct <- 0.391237630231631
params <- c(k = k, delta = delta, gamma_ct = gamma_ct)
X <- matrix(c(S = 1000000000, I = 1000), ncol = 2)
pfun <- function(t, X, params){
#Value to return
matreturn <- matrix(NA, nrow = length(t), ncol = 6)
#Create birth function
lambda <- function(t){ return(4.328e-4 - (2.538e-7)*t -
(3.189e-7)*sin(2 * t * pi/52) -
(3.812e-7)*cos(2 * t * pi/52) ) }
#Create death function
mu <- function(t){ return(9.683e-5 + (1.828e-8)*t +
(2.095e-6)*sin(2 * t * pi/52) -
(8.749e-6)*cos(2 * t * pi/52))}
#Create infectives function
beta_fun <- function(t){ return( 0.479120824267286 +
0.423263042762498*sin(-2.82494252560096 + 2*t*pi/52) )}
#Estimate values
matreturn[,1] <- lambda(t)*(X[,1] + X[,2])
matreturn[,2] <- mu(t)*X[,1]
matreturn[,3] <- beta_fun(t)*X[,1]*X[,2]/(1 + params[1]*X[,2])
matreturn[,4] <- mu(t)*X[,2]
matreturn[,5] <- params[2]*X[,2]
matreturn[,6] <- params[3]*X[,2]
#Return
return(matreturn)
}
v <- matrix(c(1,-1, -1, 0, 0, 1, 0, 0, 1, -1, -1, -1), nrow = 2, byrow = TRUE)
tmin <- 0
tmax <- 2
nsim <- 100
Running 100 simulations for 2 days generates over 4 GB of information:
DO NOT RUN: MIGHT TAKE SEVERAL MINUTES
#DO NOT RUN
simulation <- ssa(X, pfun, v, params, tmin, tmax, nsim = nsim, print.time = FALSE,
plot.sim = FALSE, keep.file = TRUE)
#DO NOT RUN
Running the simulation for 52 days generates over 30 GB of information. In order to speed the program and reduce the simulation time we can only save every kth iteration. The command kthsave does te trick.
The first 1000 iterations of the model look like this:
set.seed(123)
simulation1 <- ssa(X, pfun, v, params, tmin, tmax, nsim = 10, print.time = FALSE,
plot.sim = FALSE, maxiter = 5000, keep.file = TRUE,
fname = "sim1.txt")
We now consider saving only every 10 iterations of the model:
set.seed(123)
simulation2 <- ssa(X, pfun, v, params, tmin, tmax, nsim = 10, print.time = FALSE,
plot.sim = FALSE, maxiter = 5000, kthsave = 10, keep.file = TRUE,
fname = "sim2.txt")
There are almost no noticable differences between the models:
ggplot(simulation1, aes(x = Time, y = Var2, group=as.factor(Simulation))) +
geom_point(data = simulation2,
aes(color = "Every 10 values")) +
geom_step(data = simulation1,
aes(color = "All values"), size = 0.5) +
theme(legend.position="none") + theme_bw() +
ggtitle(paste0("SIS example; Infected cases ", 10, " simulations")) +
xlab("Time") + ylab("Individuals")
Changing kthsave
to 10 reduces 10 times the file size. In addition, it almost halves the modeling speed:
## [1] "********** OVERALL TIME EVALUATION **********"
## [1] "All values: 0.843676090240479"
## [1] "Every 10 values: 0.372447967529297"
## [1] "*********************************************"
In order to show the advantage or this package over the existing GillespieSSA we show several benchmarks. Running this in your computer requires installation of the GillespieSSA package and the microbenchmark package.
library(microbenchmark)
First, we run the program from GillespieSSA:
#Running the program from GillespieSSA
set.seed(1)
parms <- c(c=0.5)
x0 <- c(X=10000)
a <- c("c*X")
nu <- matrix(-1)
gilltime <- microbenchmark(
out1 <- GillespieSSA::ssa(x0,a,nu,parms,tf = 100)
)
Notice that running this program results in only one simulation:
In the case of our model: we can make 5 simulations faster than 1 simulation from GillespieSSA:
set.seed(1)
parms <- c(0.5)
x0 <- matrix(c(X=10000), ncol = 1)
pfun <- function(t,X,params){ return(as.matrix(params[1]*X[,1])) }
nu <- matrix(-1)
#Keeping all the information
alltime1 <- microbenchmark(
out2 <- ssar::ssa(x0, pfun, nu, parms, tmin = 0,
tmax = 100, nsim = 5, plot.sim = FALSE))
#All the iterations and file.only option
alltime2 <- microbenchmark(
out2 <- ssar::ssa(x0, pfun, nu, parms, tmin = 0,
tmax = 100, nsim = 5, file.only = TRUE))
#Keeping every 10 iterations. This is really fast in comparison.
tentime1 <- microbenchmark(
out2 <- ssar::ssa(x0, pfun, nu, parms, tmin = 0,
tmax = 100, nsim = 5, plot.sim = FALSE,
kthsave = 10))
#10 iterations and file.only option
tentime2 <- microbenchmark(
out2 <- ssar::ssa(x0, pfun, nu, parms, tmin = 0,
tmax = 100, nsim = 5, file.only = TRUE,
kthsave = 10))
Notice that ssar creates 5 simulations:
The overall times (seconds):
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
---|---|---|---|---|---|---|
GillespieSSA | 0.9532 | 0.9894 | 1.0040 | 1.0090 | 1.0270 | 1.0790 |
All times | 0.7357 | 0.7781 | 0.8007 | 0.8047 | 0.8254 | 1.0560 |
All times file.only | 0.5058 | 0.5280 | 0.5347 | 0.5373 | 0.5405 | 0.6135 |
kthsave 10 | 0.3357 | 0.3546 | 0.3668 | 0.3701 | 0.3810 | 0.4668 |
kthsave 10 and file.only | 0.2926 | 0.3127 | 0.3243 | 0.3280 | 0.3384 | 0.4751 |
The function pfun
is not returning a matrix. You can use as.matrix
or cbind
(depending on your function) to return a matrix value. As an example:
#THIS IS INCORRECT (USING c NOT cbind):
pfun <- function(t,X,params){ c(params[1] *(1 + sin(t))* X[,1],
(params[2] + (params[1]-params[2])*X[,1]/params[3])*X[,1]) }
#THIS IS CORRECT:
pfun <- function(t,X,params){ cbind(params[1] *(1 + sin(t))* X[,1],
(params[2] + (params[1]-params[2])*X[,1]/params[3])*X[,1] ) }
The value xinit
is probably a vector and not a matrix.
#THIS IS INCORRECT (USING c NOT cbind):
xinit <- c(X = 1, Y = 2)
#THIS IS CORRECT:
xinit <- matrix(c(X = 1, Y = 2), ncol = 1)
The pfun
function is probably missing one of the arguments that go into the function (either t
, X
or params
). As an example:
#THIS IS INCORRECT (MISSING T):
pfun <- function(X,params){ cbind(params[1] *(1 + sin(t))* X[,1],
(params[2] + (params[1]-params[2])*X[,1]/params[3])*X[,1] ) }
#THIS IS CORRECT:
pfun <- function(t,X,params){ cbind(params[1] *(1 + sin(t))* X[,1],
(params[2] + (params[1]-params[2])*X[,1]/params[3])*X[,1] ) }
If the number of simulations, nsim
is smaller than 2, or is not an integer, the program automatically chooses a new value for nsim
looking for the closest integer.
If the number indicating after how many iterations to save, kthsave
is smaller than 2, or is not an integer, the program automatically chooses a new value for kthsave
looking for the closest positive integer.
The time at which the simulation starts tmin
is bigger or equal than the time at which the simulation ends tmax
The project still needs a lot of testing and debugging. Furthermore, we are developing an automatic tau-leaping algorithm to compliment our package.
Please feel free to contribute to the project.
As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.
This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0, available from http://contributor-covenant.org/version/1/0/0/
This package is free and open source software, licensed under GPL-3.
If you use this package please don't forget to cite it.
- Rodrigo Zepeda-Tello [email protected]
- Dalia Camacho-García-Formentí [email protected]