You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
bash scripts for search bin/run_search.sh uses up to 12 CPUs. Could be extended by re-organizing the created in ./blobs by ./bin/create_blobs.sh into one folder continuously enumerated blob_001 ... blob_720. Then ./bin/run_search.sh could assign any to be create process a fraction of the blobs as inputs: ca. 720 blobs divided by # process
speed up creation of blobs by multiprocessing the .smiles input files, or splitting it up into several files, starting more parallel process in ./bin/run_search.sh
Keyword Argument
add keyword argument for fingerprints
add keyword argument for similarity metric
Add Tests
add public available .smiles files of a few thousand lines for testing from somewhere
Write Unit-Tests for Functions
Write Procedural Test for Scripts
Python Multiprocessing
repair multiprocessing in python
reading from file: Performance by processing line by line (no) vs chunk by chunck in python
Logging
add logging of runs ?
Use zipped files
create_blobs.py currently reads smiles-files, but the original data is zipped. Check if reading directly from zipped files leaves performance similar, see modular zipfile with zipfile.open method.
blobs created by create_blobs.py are very large (500-600 GB) for the full Enamine REAL dataset. Check if compression leaves performance similar, e.g. using zipfiles.
bash scripts
bin/run_search.sh
uses up to 12 CPUs. Could be extended by re-organizing the created in./blobs
by./bin/create_blobs.sh
into one folder continuously enumeratedblob_001
...blob_720
. Then./bin/run_search.sh
could assign any to be create process a fraction of the blobs as inputs: ca. 720 blobs divided by # process.smiles
input files, or splitting it up into several files, starting more parallel process in./bin/run_search.sh
Keyword Argument
Add Tests
Python Multiprocessing
Logging
Use zipped files
create_blobs.py
currently readssmiles
-files, but the original data is zipped. Check if reading directly from zipped files leaves performance similar, see modular zipfile withzipfile.open
method.create_blobs.py
are very large (500-600 GB) for the full Enamine REAL dataset. Check if compression leaves performance similar, e.g. using zipfiles.To check
Benchmarking to other solutions
Solution using chemfp
Solution using rdkit functionality
The text was updated successfully, but these errors were encountered: