Tracking potential performance improvement #72

DominiqueMakowski · 2024-06-08T09:31:56Z

DominiqueMakowski
Jun 8, 2024

As we know, performance is currently the main limitation. Maybe we can put in this thread potential directions, optimization tips, benchmarks etc. to discuss and track how things are evolving on the performance front.

https://github.com/marius311/MuseInference.jl: I didn't test it but it sounds promising
https://github.com/JaimeRZP/MicroCanonicalHMC.jl: haven't tested it either but it could also scrap some gains
https://mlcolab.github.io/Pathfinder.jl/dev/examples/turing/: Pathfinder to "preheat" the MCMC

itsdfish · 2024-06-08T13:08:05Z

itsdfish
Jun 8, 2024
Maintainer

Nice finds! I think these would be worth exploring. If they seem useful I can add a performance tips page in the documentation. As far as I can tell, there are three varieties of performance tips:

Changes to code syntax and structure which improve speed
Choice of AD
Algorithmic improvments

The two packages you found fall into the third category. I think I should have some time tomorrow to look at one of them and do a comparision to NUTS.

0 replies

itsdfish · 2024-06-09T11:52:49Z

itsdfish
Jun 9, 2024
Maintainer

@DominiqueMakowski, I am seeking clarification on AD requirements and parallel capabilities. Currently, I was only able to run MuseInference with Zygote AD on a single thread , which has comparable performance to NUTS with ReverseDiff AD (but we could get higher ESS by running multiple threads).

0 replies

DominiqueMakowski · 2024-07-12T11:13:36Z

DominiqueMakowski
Jul 12, 2024
Author

Just wanted to flag Tor's fantastic talk on what's new in Turing. Some questions that I have:

Should we rewrite the docs to make use of the new model | (y=data, ) syntax? I'm not sure I'm yet 100% convinced, though I like the proximity with Bayes formula
In the potential section on performance, we should probably showcase their new benchmark_model()

4 replies

itsdfish Jul 12, 2024
Maintainer

I'm not sure the benefit of model | (y=data, ) over model(y) (if any), warrents re-writting the documentation. I can mention the new syntax in the first Turing example.

benchmark_model() does seem useful. I will make note of that in the documentation.

itsdfish Jul 12, 2024
Maintainer

One of the drawbacks of using the new syntax is that you cannot derive other variables like min_rt in a safe way. In the example below, data referenced the function signature is a global varaible because data is not passed directly with the new syntax. This can be problematic for two reasons: (1) global variables have performance penalties, and (2) the wrong data will be used if the global and local variables have different names. See the example below.

using LinearAlgebra
using SequentialSamplingModels
using Turing 

data = rand(LBA(), 50)
data1 = rand(LBA(; τ = .5), 50)

@model function model(; min_rt = minimum(data.rt))
    
    # Priors
    ν ~ MvNormal(zeros(2), I * 2)
    A ~ truncated(Normal(.8, .4), 0.0, Inf)
    k ~ truncated(Normal(.2, .2), 0.0, Inf)
    τ  ~ Uniform(0.0, min_rt)
    println("min_rt $min_rt")

    # Likelihood
    data ~ LBA(;ν, A, k, τ )
end

# correct: data global and data local are the same 
sample(model() | (; data), NUTS(), 1000)
# incorrect data1 != data
sample(model() | (; data=data1), NUTS(), 1000)

In addition, its not possible to define min_rt in the body of the function because data is not defined at the point when using the new syntax. So I would say that it is a potential footgun when variables must be processed from function inputs. It is OK in other cases.

torfjelde Nov 5, 2024

Hi!

@DominiqueMakowski just made me aware of this pattern as a downside with the new condition syntax. IMHO it's generally not advised to write code a model like the above, i.e. having a kwarg depend on a global variable 😬

If you want to be the ability to pass in data as an argument but still use the condition syntax, you can just define this convenience constructor by hand:)

@model function model(; min_rt)
    ...
end

# Define a new constructor that takes data.
model(data) = model(; min_rt = minimum(data.rt))

It's a bit verbose, but it works 🤷

itsdfish Nov 5, 2024
Maintainer

Thanks for the suggestion!

It looks like we could used model(data) and define min_rt within the function definition.

kiante-fernandez · 2024-07-12T11:36:13Z

kiante-fernandez
Jul 12, 2024

I was just chatting with him over lunch! The syntax change seems cool. We also spoke about getting the start of TuringGLM. I was gonna play around with those syntax changes for a few examples.

3 replies

DominiqueMakowski Jul 12, 2024
Author

Can you suggest that they rename TuringGLM (if it's someday to be able to accommodate non-GLM models like GAMs or whatnot) to something like TuringModeling.jl 😁

DominiqueMakowski Jul 12, 2024
Author

About the new syntax, my only worry is that it adds a layer of complexity, because arguably all new users of Julia will understand calling a model as a function and inputting the data as input parameters, but the new syntax will require explaining named tuples (and why is there a trailing comma , which is not the most streamlined and elegant looking thing) and the fact that the model definition uses variables that are not defined in the arguments

DominiqueMakowski Jul 12, 2024
Author

Could we extend support to dataframes? that would be neat I think:

df = DataFrame(y = rand(100))

# Condition on relevant data from dataframe
fit = model | df

Would it be doable with an extension?

itsdfish · 2024-08-31T17:55:05Z

itsdfish
Aug 31, 2024
Maintainer

Today I performed a quick benchmark with Enzyme. The first point of good news is that it ran without error and produced the correct output. The second point of good news is that it ran faster than ForwardDiff and ReverseDiff.

cd(@__DIR__)
using Pkg 
Pkg.activate("")
using Distributions
using Enzyme
using LinearAlgebra
using ReverseDiff
using SequentialSamplingModels
using Turing  

Enzyme.API.runtimeActivity!(true)

n_choices = 20
ν = fill(1, n_choices)
# Generate some data with known parameters
dist = LBA(; ν, A = .8, k = .2, τ = .3)
data = rand(dist, 100)

# Specify LBA model
@model function model(data, n_choices; min_rt = minimum(data.rt))
    # Priors
    ν ~ MvNormal(zeros(n_choices), I * 2)
    A ~ truncated(Normal(.8, .4), 0.0, Inf)
    k ~ truncated(Normal(.2, .2), 0.0, Inf)
    τ  ~ Uniform(0.0, min_rt)

    # Likelihood
    data ~ LBA(; ν, A, k, τ)
end

# 97.95 seconds
chains_forward = sample(model(data, n_choices), NUTS(1000, .85), 1000)

# 47.58 seconds
chains_enzyme = sample(model(data, n_choices), NUTS(1000, .85; adtype = AutoEnzyme()), 1000)

# compile = false ≈ 3960 seconds (early termination)
# compile = true ≈ 960 seconds (early termination, potentially unsafe caching)
chains_reverse = sample(model(data, n_choices), NUTS(1000, .85; adtype = AutoReverseDiff()), 1000)

The downside is that there are still some known problems with Distributions.jl. Nonetheless, progress is being made.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking potential performance improvement #72

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Tracking potential performance improvement #72

DominiqueMakowski Jun 8, 2024

Replies: 5 comments · 7 replies

itsdfish Jun 8, 2024 Maintainer

itsdfish Jun 9, 2024 Maintainer

DominiqueMakowski Jul 12, 2024 Author

itsdfish Jul 12, 2024 Maintainer

itsdfish Jul 12, 2024 Maintainer

torfjelde Nov 5, 2024

itsdfish Nov 5, 2024 Maintainer

kiante-fernandez Jul 12, 2024

DominiqueMakowski Jul 12, 2024 Author

DominiqueMakowski Jul 12, 2024 Author

DominiqueMakowski Jul 12, 2024 Author

itsdfish Aug 31, 2024 Maintainer

DominiqueMakowski
Jun 8, 2024

Replies: 5 comments 7 replies

itsdfish
Jun 8, 2024
Maintainer

itsdfish
Jun 9, 2024
Maintainer

DominiqueMakowski
Jul 12, 2024
Author

itsdfish Jul 12, 2024
Maintainer

itsdfish Jul 12, 2024
Maintainer

itsdfish Nov 5, 2024
Maintainer

kiante-fernandez
Jul 12, 2024

DominiqueMakowski Jul 12, 2024
Author

DominiqueMakowski Jul 12, 2024
Author

DominiqueMakowski Jul 12, 2024
Author

itsdfish
Aug 31, 2024
Maintainer