-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue with large files (~200 MB) #9
Comments
I think the issue here is very large memory usage as the parse tree is being constructed. One fix would be to allow extracting information as soon as a syntactic item is matched, so that only the interesting items can be preserved and the rest thrown away instead of building a massive parse tree first. The package Lerche, which does the parsing, doesn't allow this yet. I'll raise an issue on that package. |
In case this helps, I noticed after reporting this issue that the BioStructures.jl package can also read mmCIF (and simple STAR) files, and that its |
Interesting. That parser works by splitting the file into whitespace-separated tokens (handling quoted strings), then working through these tokens to allocate them to data blocks and data names. A different paradigm to the general one used here and clearly super fast. |
Hello,
As mentioned in #8, trying to read a STAR file about 200 MB in size hanged "forever" until I canceled the command (I waited a bit more than one hour). The Julia process doing this ate up to 14 GB of RAM (out of 16), and was still occupying more RAM (slowly) when I decided to cancel the command. This happened when I tried the commands below in a freshly opened Julia session (maybe I should have read a small file with similar structure first, to get compilation out of the way before trying to read the large file?).
Here is this
particles.star
file (link valid for 5 days): https://drop.chapril.org/download/311b4a22f7b03565/#X9xYmmEtcD4A4WZQKjbxvgI can share even larger star files (up to ~800 MB) if you want to really stress test the package.
The text was updated successfully, but these errors were encountered: