Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing misc.bio_utils.bam_utils #34

Closed
tmargus opened this issue Jan 31, 2017 · 11 comments
Closed

missing misc.bio_utils.bam_utils #34

tmargus opened this issue Jan 31, 2017 · 11 comments
Assignees

Comments

@tmargus
Copy link

tmargus commented Jan 31, 2017

Hi,

in OS X 10.11.6; Python 3.5.2 (Anaconda)
First,
pip install .
did not install misc and flexbar and Exit with 0.

I added misc library from git:
git clone https://bitbucket.org/bmmalone/misc.git
cd misc
pip install .
worked ok but next problem appears - this misc distribution do not have bio_uitls
Part of error message is here
File "/Users/tmargus/anaconda/lib/python3.5/site-packages/rpbp/orf_profile_construction/create_orf_profiles.py", line 11, in
import misc.bio_utils.bam_utils as bam_utils
ImportError: No module named 'misc.bio_utils'
........................

Question is from where I should download misc library what contains needed module?

Tonu

@bmmalone bmmalone self-assigned this Jan 31, 2017
@bmmalone
Copy link
Contributor

Hi Tonu,

Our package does treat flexbar as an external dependency, so, unfortunately, you will have to manually install it (https://github.com/seqan/flexbar). STAR, bowtie2 and samtools are also required dependencies that our installation process does not install.

However, it should install the misc.bio_utils module (the one you linked to is correct). I will investigate this in the morning to identify the problem.

In the meantime, as a possible work-around, you can try installing the packages as editable ("pip3 install -e .")

Also, we have not tested in the anaconda environment, only with a standard python3 installation. If it is possible to test outside of anaconda, that may also be a temporary solution.

Have a good day,
Brandon

@tmargus
Copy link
Author

tmargus commented Jan 31, 2017 via email

@bmmalone
Copy link
Contributor

bmmalone commented Feb 1, 2017

Hi Tonu,

There seems to be some problem finding the input files. The output is a bit dense, but the lines like: "WARNING misc.utils 2017-01-31 20:31:50,529 : Some input files ['/Users/tmargus/projects/Rp-Bp/c-elegans-chrI-examplec-elegans.test-chrI.rep-1.fastq.gz'] are missing." give this information. Later on, it also can't find the bowtie indices, etc.

Just looking at the paths in the error logs and your command prompt (":c-elegans-chrI-example$"), it looks like there may be a "/" missing in the paths in the config files. For example, "Users/tmargus/projects/Rp-Bp/c-elegans-chrI-examplec-elegans.test-chrI.rep-1.fastq.gz" looks like it could be "/Users/tmargus/projects/Rp-Bp/c-elegans-chrI-example / c-elegans.test-chrI.rep-1.fastq.gz" with the "/" between "example" and "c-elegans".

Could you please take a look and let me know if that is the case? If so, then you can just do a find-and-replace in the two config files. It may be easiest to just delete the entire directory and re-run both the genome index creation script and then the actual pipeline.

Of course, this doesn't fix the "bio_utils" installation issue, but hopefully it at least gets you started.

Have a good day,
Brandon

P.S. You can also reduce the logging output by adding "--logging-level WARNING" to the command line call; additionally, if you'd like to (append) it to a file, you can use "--log-file log.txt". Also, the "deprecated" warnings are fixed in the dev version, and I hope to merge that into master soon, so that will also cut down on the output.

@bmmalone
Copy link
Contributor

bmmalone commented Feb 1, 2017

Hi Tonu,

I was able to reproduce the "ImportError: No module named 'misc.bio_utils'" error on my machine. I believe it was the result of a missing configuration file in the misc repository. This commit resolved the problem for me.

In addition to the missing "/", could you please let me know if this also resolves the installation problem for you? That is, after cloning this repo, " pip3 install ." should be all that is required to install.

Have a good day,
Brandon

@tmargus
Copy link
Author

tmargus commented Feb 1, 2017

Hi Brandon,
Yes, that missing "/" was my bad. I added slash and run script in new folder.
It runs up to STAR alignment and finished with empty BAM files.
I found that reason was input method for STAR. It assumes input fastQ to be compressed *.Z and used zcat but input files were actually gzip'ed *.gz

  1. I changed the file create_base_genome_profile.py
    line: star_compression_str = "--readFilesCommand zcat
    to: star_compression_str = "--readFilesCommand gzcat
    STAR alignment runs smoothly

  2. Next problem appears running create_orf_profiles.py where it did not find (non)periodic_models
    ................
    estimate-metagene-profile-bayes-factors: error: argument --periodic-models: expected at least one argument
    ...................

create_orf_profiles.py can't pick up location of models. Where in configuration files these can be specified or program should figure out itself?
Anyway, by hard coding models into
create_orf_profiles.py
periodic_models_str = "/Users/tmargus/git/rp-bp/rpbp_models/periodic/start-high-low-low.stan"
non_periodic_models_str = "/Users/tmargus/git/rp-bp/rpbp_models/nonperiodic/no-periodicity.stan"

 It runs future.

I am not sure it was the right thing to do because it might cause following errors what were related to estimate-metagene-profile-bayes-factors.

I put here only part of message because it is long.
Part of Error message:
.......
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/tmargus/anaconda/lib/python3.5/site-packages/joblib/_parallel_backends.py", line 344, in call
return self.func(*args, **kwargs)
File "/Users/tmargus/anaconda/lib/python3.5/site-packages/joblib/parallel.py", line 131, in call
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/Users/tmargus/anaconda/lib/python3.5/site-packages/joblib/parallel.py", line 131, in
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/Users/tmargus/anaconda/lib/python3.5/site-packages/riboutils/estimate_metagene_profile_bayes_factors.py", line 72, in estimate_profile_bayes_factors
periodic_models = [pickle.load(open(pm, 'rb')) for pm in args.periodic_models]
File "/Users/tmargus/anaconda/lib/python3.5/site-packages/riboutils/estimate_metagene_profile_bayes_factors.py", line 72, in
periodic_models = [pickle.load(open(pm, 'rb')) for pm in args.periodic_models]
_pickle.UnpicklingError: invalid load key, 'f'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/tmargus/anaconda/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/Users/tmargus/anaconda/lib/python3.5/site-packages/joblib/_parallel_backends.py", line 353, in call
raise TransportableException(text, e_type)
joblib.my_exceptions.TransportableException: TransportableException


UnpicklingError Wed Feb 1 16:43:32 2017
PID: 23231 Python 3.5.2: /Users/tmargus/anaconda/bin/python
...........................................................................
/Users/tmargus/anaconda/lib/python3.5/site-packages/joblib/parallel.py in call(self=<joblib.parallel.BatchedCalls object>)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self._size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
self.items = [(, ( position count type length
0 -50 ... 20 0 start 16

[71 rows x 4 columns], Namespace(chains=2, count_field='count', file_lo...stdout_logging_level='NOTSET', type_field='type')), {})]
132
133 def len(self):
134 return self._size
135

................................................................................
.............


Traceback (most recent call last):
File "/Users/tmargus/anaconda/bin/create-orf-profiles", line 11, in
load_entry_point('rpbp', 'console_scripts', 'create-orf-profiles')()
File "/Users/tmargus/git/rp-bp/rpbp/orf_profile_construction/create_orf_profiles.py", line 212, in main
file_checkers=file_checkers, overwrite=args.overwrite, call=call)
File "/Users/tmargus/git/rp-bp/src/misc/misc/utils.py", line 661, in call_if_not_exists
ret_code = check_call(cmd, call=call, raise_on_error=raise_on_error)
File "/Users/tmargus/git/rp-bp/src/misc/misc/utils.py", line 526, in check_call
return check_call_step(cmd, call=call, raise_on_error=raise_on_error)
File "/Users/tmargus/git/rp-bp/src/misc/misc/utils.py", line 510, in check_call_step
raise subprocess.CalledProcessError(ret_code, cmd)
subprocess.CalledProcessError: Command 'estimate-metagene-profile-bayes-factors /Users/tmargus/projects/rpbp/data/metagene-profiles/c-elegans-rep-1.test-unique.metagene-profile.csv.gz /Users/tmargus/projects/rpbp/data/metagene-profiles/c-elegans-rep-1.test-unique.metagene-periodicity-bayes-factors.csv.gz --num-cpus 4 --periodic-models /Users/tmargus/git/rp-bp/rpbp_models/periodic/start-high-low-low.stan --nonperiodic-models /Users/tmargus/git/rp-bp/rpbp_models/nonperiodic/no-periodicity.stan --periodic-offset-start -20 --periodic-offset-end 0 --metagene-profile-length 21 --seed 8675309 --chains 2 --iterations 500 --log-file log_4.txt --logging-level WARNING --stderr-logging-level NOTSET --file-logging-level NOTSET --stdout-logging-level NOTSET' returned non-zero exit status 1
Traceback (most recent call last):

................

Cheers,
Tonu

@bmmalone
Copy link
Contributor

bmmalone commented Feb 1, 2017

Hi Tonu,

Okay, glad to hear that the first set of things seems to be taken care of. I'm going to close this issue and open two separate ones about the STAR readFilesCommand option (#35) and the model locations (#36).

The readFilesCommand issue is straightforward; I will implement that in the next day.

The model locations issue is more tricky. We use appdirs to determine where the compiled models should go. According to their docs, this should be ~/Library/Application Support/rpbp/rpbp_models/. Could you please have a look in that location and let me know if anything is there at #36?

I appreciate you working through the installation process here. We have tested pretty thoroughly in the debian/ubuntu environments we have locally, but not much in others.

Have a good day,
Brandon

@bmmalone bmmalone closed this as completed Feb 1, 2017
@tmargus
Copy link
Author

tmargus commented Feb 1, 2017

Hi Brandon,

I am most interested in results of modules/part what selects a range of read length (good 3 nt periodicity) and applies correction for each read length. I even started to write some code and up to now it can make nice metagenomic plots but what strategy to use for selecting usable read length range and find automatic correction wasn't very clear to me. Then you work came out what I hope will help me to come close to most suitable solution. That is a part of my motivation to go through this installation process and help here.

Cheers,
Tonu

@bmmalone
Copy link
Contributor

bmmalone commented Feb 1, 2017

Hi,

We also felt the selection of read lengths and P-site offsets so far had been rather ad hoc. I'm glad to hear that others are also interested in principled selection techniques :)

It is a bit buried in the documentation, but the output of the read length periodicity analysis is described in the "Output files/Metagene profiles" part.

If you have any questions about that part (either output, code, models, etc.) please just let me know.

Also, your readme mentions yeast. In some prokaryotes, I've heard it is more common to use the 3' end of the reads for everything. Is that the case with your data?

Have a good day,
Brandon

@tmargus
Copy link
Author

tmargus commented Feb 1, 2017 via email

@tmargus
Copy link
Author

tmargus commented Feb 1, 2017 via email

@bmmalone
Copy link
Contributor

bmmalone commented Feb 2, 2017

Hi,

Ah, okay. I'm interested to see how your results turn out.

When you look at the offsets for the 3' ends, is everything still in-frame w.r.t. the 5' offsets?

Have a good day,
Brandon

P.S. While I don't image I'll get to it anytime soon, I did add an issue here (#42) to note where most of the 5' selection stuff happens. If that is something you revisit again, it could be nice to include it here, so that it integrates seamlessly with the rest of the periodicity selection stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants