Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple improvements to pymcmodels #83

Merged
merged 24 commits into from
Apr 22, 2017
Merged

Multiple improvements to pymcmodels #83

merged 24 commits into from
Apr 22, 2017

Conversation

jchodera
Copy link
Member

@jchodera jchodera commented Apr 17, 2017

I didn't have time to update any examples to try specifying the F_PL and its uncertainty dF_PL as optional arguments to pymcmodels.make_model(), but feel free to add to this branch/PR directly as you test it out, @sonyahanson!

@jchodera
Copy link
Member Author

@sonyahanson : I'm having trouble figuring out how to test any of these improvements.

What can I run to actually test things? I can't find anything in examples/direct-fluorescence-assay that will actually generate some output to test, and the Jupyter notebooks I've found (like 3a Bayesian fit xml file - SrcGefitinib) don't seem to actually use make_model from pymcmodels.py.

@jchodera jchodera changed the title Add support for specified F_PL and its uncertainty dF_PL Multiple improvements to pymcmodels Apr 18, 2017
@jchodera
Copy link
Member Author

@sonyahanson : Regarding [L]=0 support for +/- protein: Do we want to support only a single well for [L]=0 with protein and a single well without protein, or do you want to be able to add an arbitrary number of [L]=0 control wells with or without protein?

@sonyahanson
Copy link
Contributor

You need to make sure the xml file path in inputs_p38_singlet actually points to your xml files. I can make the behavior here, more intuitive. (e.g. throw an error saying it didn't find the xml files).

We definitely need more/any tests, and better documentation on the examples. This is on my to do list to work on in the next week anyway. Let me know of suggestions.

Regarding [L]=0, whichever is easiest for now. A single well is perfectly fine for now, but maybe in the future we would want an arbitrary number, but maybe that future will see a pretty large overhaul of the API anyway...

@jchodera
Copy link
Member Author

You need to make sure the xml file path in inputs_p38_singlet actually points to your xml files. I can make the behavior here, more intuitive. (e.g. throw an error saying it didn't find the xml files).

That would be super useful!

We definitely need more/any tests, and better documentation on the examples. This is on my to do list to work on in the next week anyway. Let me know of suggestions.

Priorities for tests are probably:

  • Does it run at all with python 2/3?
  • Does it produce correct results for synthetic data?
  • Can it process a variety of XML files (with short runs) without dying?

Regarding [L]=0, whichever is easiest for now. A single well is perfectly fine for now, but maybe in the future we would want an arbitrary number, but maybe that future will see a pretty large overhaul of the API anyway...

The difference in API between one and multiple wells is very little, but it does change the implementation under the hood.

@jchodera
Copy link
Member Author

OK, things seem to work locally now, so I'm going to finish the [L]=0 case and then add a very rudimentary test just to make sure things run.

@jchodera
Copy link
Member Author

I think I was going about the [L]=0 case the wrong way by trying to break it out as a separate set of five things to pass into the API. Instead, I think I can just detect which elements of Lstated are zero and handle those as special cases internally, which doesn't require any changes to the API at all. Trying this now.

@jchodera
Copy link
Member Author

jchodera commented Apr 19, 2017

I've implemented the scheme I suggested above for [L]=0. We autodetect when either Lstated or Pstated have zero concentration entries and deal with those appropriately.

I've also added a quickmodel run as a travis test.

This PR should now be complete.

delg_bosutinib-ab-2017-04-18 2342

@jchodera
Copy link
Member Author

@sonyahanson : Review and merge when ready!

.travis.yml Outdated
@@ -30,6 +30,8 @@ script:
- conda install --yes --quiet nose nose-timer
# Test the package
- cd devtools && nosetests $PACKAGENAME --nocapture --verbosity=2 --with-timer -a '!slow' && cd ..
# Run quickmodel
- pushd . && cd examples/direct-fluorescence-assay && env PYTHONPATH="./" quickmodel --inputs 'inputs_p38_singlet' && popd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great that quickmodel has been added to travis. However, quickmodel runs the default number of MCMC steps, currently set as 20000 PyMC moves, which may take a while on travis. How about we augment the argparser on quickmodel's so that we can specify far fewer moves on travis? Parsing the number of MCMC moves to quickmodel will make it easier to use as well.

Copy link
Collaborator

@gregoryross gregoryross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me! My comment on quickmodel is minor, and could be added in another PR if we want to use these changes asap.

@sonyahanson
Copy link
Contributor

I'm going to work through this now. One thing that pops out immediately: why were the defaults for run_mcmc changed from nthin=50, nburn=500, niter=1000 to nthin=50, nburn=0, niter=20000?

I agree with Greg that travis should be run with fewer iterations.

@jchodera
Copy link
Member Author

One thing that pops out immediately: why were the defaults for run_mcmc changed from nthin=50, nburn=500, niter=1000 to nthin=50, nburn=0, niter=20000?

The output of quickmodel was absolute garbage otherwise. Runtime was ~1 min/dataset on my machine, and total processing time for the quickmodel example was < 10 min, which seems reasonable for travis.

I'm not certain why travis isn't running, however. We might have to fix that in a different branch.

I like the idea of adding the command-line option to quickmodel, but think the new defaults are sensible. I'll add the command-line argument idea as an issue.

@sonyahanson
Copy link
Contributor

@sonyahanson
Copy link
Contributor

I will be careful to use the same nthin,niter,nburn to compare the master branch to this updated 'improved' branch just to make sure the comparisons are fair.

@jchodera
Copy link
Member Author

Looks like you're discarding the initial non-equilibrated DeltaG values, but not discarding the initial non-equilibrated traces when plotting:
https://github.com/choderalab/assaytools/blob/master/scripts/quickmodel.py#L129-L134

Can I fix that too?

@sonyahanson
Copy link
Contributor

Yes, that's correct, feel free to fix.

@sonyahanson
Copy link
Contributor

Maybe include but colored differently?

@jchodera
Copy link
Member Author

I will be careful to use the same nthin,niter,nburn to compare the master branch to this updated 'improved' branch just to make sure the comparisons are fair.

I totally understand. How about this: The current settings are configured to do the same amount of sampling. Maybe run the data and compare it to what you have printed out?

@sonyahanson
Copy link
Contributor

on it

@jchodera
Copy link
Member Author

Here's the Bosutinib-CD example from above:
delg_bosutinib isomer-cd-2017-04-19 1345

@jchodera
Copy link
Member Author

The pre-equilibration traces are still shown, just very lightly shaded. They're mostly lying underneath the darker traces, but you can see them a little bit in the salmon traces here:
delg_erlotinib-ef-2017-04-19 1352

@jchodera
Copy link
Member Author

I'm moving on to other projects now, so I'll let you folks take the PR over from here.

I can tackle the outlier detection in a separate PR when I have time to cycle back.

@sonyahanson
Copy link
Contributor

yeah, I think this is fine for now, but don't think we should merge quite yet.

@jchodera
Copy link
Member Author

I've just fixed a bug in the code where the same ligand concentration was being used for both the +protein and -protein rows. This prevented the true ligand concentrations from adjusting to deviations from the expected binding curve even when dispensing errors occurred.

I've also temporarily disabled the Metropolis step methods, and am testing how things work without them.

@sonyahanson
Copy link
Contributor

So it seems like the big problem was allowing the metropolis tuning throughout, since we've now taken out the burnin step.

But making this changed improved our overlap between repeats from this:
compare_delg_jdc_improve
To this:
comparing_src_erl_3iter_small

Merge away. We can address further problems in a future PR.

@sonyahanson
Copy link
Contributor

This change fcde1cf doesn't quite match the commit message, what was the motivation? I'm curious if this also effected the change in our results...

@jchodera
Copy link
Member Author

Whoops---there were two changes in rapid succession:

  • Tune throughout (this was the major remedy)
  • Also adjust the scale of the AdaptiveMetropolis method that makes joint proposals between F_PL and DeltaG. This was not being tuned before, but once tuning was working, the exact choice became less relevant. I changed it to 0.1 to match all the other Metropolis step methods

@jchodera
Copy link
Member Author

Can we merge this and forge ahead?

@jchodera
Copy link
Member Author

Merging this so I can forge ahead.

@jchodera jchodera merged commit eded3f8 into master Apr 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants