-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
higher resolution nucpos.bed.gz is empty #55
Comments
Sorry, didn't see your followup on that issue! Can you run NucleoATAC on the test data in the example folder?
|
I didn't notice there was an example dataset. I tried it. I got an empty I didn't see any errors. This was the output:
|
Thanks for running the test data. Number of lines in test_results.occpeaks.bed is correct, but there should be 130 lines in the test_results.nucpos.bed.gz file, so something is definitely going wrong in execution of code independent of particulars of input data. A couple more Q's: NucleoATAC version? Within NucleoATAC directory can you try: |
NucleoATAC version: 0.3.3 (just reinstalled it to be safe) nosetests result:
|
Thanks for the additional info. The versions of those packages or the python version don't seem to be a problem on my system (ubuntu), and I haven't been able to reproduce the failing test or erroneous file output. Could you additionally give the versions of all the other dependencies? And is your version installed in a virtual environment? |
I am not using a virtual environment. Is that a problem? I am not sure sure how to get just the dependencies, but here is my full
|
I've found that installing in a virtual environment can help with some dependency issues. In particular, in some cases I have found that when installing Cython using pip that an older version of Cython installed previously would actually get imported instead of the newly installed version (but the presence of the new version was what was registered when checking package dependencies upon installation). Setting up a virtual environment and installing all the dependencies fresh in that environment solved the issue. I haven't gotten a chance yet to test the package against all the versions you've listed, but that will be my next step in trying to reproduce the error. |
I had trouble using |
I resolved my
I updated to the latest version and ran again:
Unfortunately, I am still getting 0 in |
I unfortunately still can't reproduce that output or the failing test with those package versions, making getting to the bottom of this issue difficult. Do any of the other tests fail? With v0.3.4 all the tests can be run with "python tests.py" in the main directory? |
I got some fails with that.
|
Thanks, it looks like those are the same tests from tests/test_var.py run earlier. Seem like the problem is likely with the calculateCov function in the nucleoatac/multinomial_cov.pyx file, although I am confused as to why it would work differently on your system instead of mine (Ubuntu) |
Do you know how I might be able to troubleshoot this? Is there a specific function that might be responsible that I can quickly test without requiring the whole test dataset? |
For troubleshooting, perhaps figuring out more specifically why the tests are failing will help. If you try the following code within python in the main NucleoATAC directory (or edit file paths if in another directory): import numpy as np
import nucleoatac.NucleosomeCalling as Nuc
import pyatac.VMat as V
from pyatac.chunkmat2d import BiasMat2D
from pyatac.chunk import ChunkList
from pyatac.bias import InsertionBiasTrack
bed_list = ChunkList.read('example/example.bed')
chunk = bed_list[0]
vmat = V.VMat.open('example/example.VMat')
biastrack = InsertionBiasTrack(chunk.chrom, chunk.start, chunk.end)
biastrack.read_track('example/example.Scores.bedgraph.gz')
biasmat = BiasMat2D(chunk.chrom,chunk.start+200,chunk.end-200,100,250)
biasmat.makeBiasMat(biastrack)
signaldist = Nuc.SignalDistribution(chunk.start+300,vmat,biasmat,35)
signaldist.simulateDist(5000)
sd1 = np.std(signaldist.scores)
sd2 = signaldist.analStd()
var_term = np.sum(signaldist.prob_mat*(1-signaldist.prob_mat)*signaldist.vmat.mat**2)
tmp = signaldist.prob_mat *signaldist.vmat.mat
cov_term = np.sum(np.outer(tmp,tmp))-np.sum(tmp**2)
sd3 = np.sqrt(signaldist.reads * (var_term - cov_term)) What are the values of sd1, sd2, and sd3? (should all be approx equal, about 0.72) |
Thanks for the test code. I got an error for sd2:
Everything else was fine I think:
|
Ok that helps confirm that the problem is with calculateCov function specifically. After running the first two blocks of code above (through signaldist = ...) can you try: import pyximport; pyximport.install(setup_args={"include_dirs":np.get_include()})
from nucleoatac.multinomial_cov import calculateCov
flatv = np.ravel(signaldist.vmat.mat)
var = calculateCov(signaldist.probs, flatv, signaldist.reads) What does var give? If we can't figure out why that function is not working on your system, one thing we can possibly do as a temporary measure is to create a branch where that function is replaced by the method used for calculating "sd3". That is the same calculation, just a slower implementation. One thing I will try is updating the way the Cython code is included in the package to better follow recommendations (http://cython.readthedocs.io/en/latest/src/reference/compilation.html). I will also look into why compilation of the file on different system could potentially lead to problems. |
I think you meant sd2, not sd3, right? Here are the latest results:
Is that the expected value? |
Yes that is the expected value. 'signaldist.analStd()' should just do those two computations and then take the square root of var -- very weird that it seems to work when calculateCov is called directly but not when it is called within 'signaldist.analStd()'. Perhaps the issue is with way pyximport is being used sd2 in the example above is computed using the actual method employed in NucleoATAC. sd3 is computed using an alternate, slower method |
Thanks for clarifying. I guess we at least know where the problem is. Maybe the solution could be to just add an if-else based on |
I am not sure what that ifelse would test for, as i am not sure whether in other cases the function could give an other type of spurious signal. I am trying to update the code to change the way the Cython code is imported, although have run into some roadblocks and unfortunately don't have a good estimate of when i'll be able to get that working and/or see if it might fix this issue |
By if-else, I meant if Anyway, thanks for following up. Hopefully you will be able to fix this at some point. |
Same issue, nucpos.bed.gz files empty. looking forward to the updates. Thank you so much! |
One possible fix is using a docker container... I started looking into how to make one, and found that it seems like someone else has already created one: |
Thanks for a very prompt response. Seems the issue is OS specific and docker could be possible fix. I will keep you posted. |
Unfortunately, I am running this on a cluster and can't use Docker. Obviously it would be nice to have a Docker image for people who can use that option. |
I tried to solve the issue. Found something interesting. It has to do with the version of your gcc C compiler. In your home are there is a directory called .pyxbld. There is where the .c and .so files compiled from multinomial_cov.pyx are placed. I deleted that .pyxbld directory from my home area (which will force nucleoatac to re-compile multinomial_cov.pyx) and changed my gcc version. By default CentOS 6 uses gcc version 4.4.7. I have environment modules, so I am able to load other gcc versions I have compiled in the past. When I loaded gcc 4.9.3, it works fine after multinomial_cov.pyx was built again. Maybe the problem is that the C compiler has some issues with cython, which builds a non-functional multinomial_cov.pyx (even though it compiles fine). Hope this helps to find the exact cause. Cheers! |
Thanks for the suggestion @vatese ! I was also on CentOS 6 with gcc 4.4. I switched to gcc 6.1. Everything seems to be working now.
|
Thanks @vatese for figuring this out! I guess I'm still not sure how to actually make the code compatible with the earlier version of gcc, but at least now we know the source of the problem and one way to get around it |
Hi @AliciaSchep ,
My
|
@sameet Unfortunately I'm not sure what the best next step would be 🙁 I still don't know the underlying cause of this bug, other than that the gcc version seems important. Given that someone reported using gcc 6.1 without this problem, if it is feasible to try that version out that might be one possible thing to try. |
I originally posted a comment on a previous issue: #38 , but the original issue is closed, so maybe it wasn't seen. Thus, I am reposting as a new issue since there have been no updates for a few months. Sorry if you just didn't get around to responding.
In the meantime, I ran NucleoATAC on other samples and encounter the same problem every time. I tried both human and much smaller fly genome (so at least coverage should not be a problem).
The text was updated successfully, but these errors were encountered: