Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks and Enhancements #1

Open
16 tasks
enryH opened this issue Nov 4, 2019 · 0 comments
Open
16 tasks

Tasks and Enhancements #1

enryH opened this issue Nov 4, 2019 · 0 comments
Labels
enhancement New feature or request

Comments

@enryH
Copy link
Owner

enryH commented Nov 4, 2019

bash scripts

  • bash scripts for search bin/run_search.sh uses up to 12 CPUs. Could be extended by re-organizing the created in ./blobs by ./bin/create_blobs.sh into one folder continuously enumerated blob_001 ... blob_720. Then ./bin/run_search.sh could assign any to be create process a fraction of the blobs as inputs: ca. 720 blobs divided by # process
  • speed up creation of blobs by multiprocessing the .smiles input files, or splitting it up into several files, starting more parallel process in ./bin/run_search.sh

Keyword Argument

  • add keyword argument for fingerprints
  • add keyword argument for similarity metric

Add Tests

  • add public available .smiles files of a few thousand lines for testing from somewhere
  • Write Unit-Tests for Functions
  • Write Procedural Test for Scripts

Python Multiprocessing

  • repair multiprocessing in python
  • reading from file: Performance by processing line by line (no) vs chunk by chunck in python

Logging

  • add logging of runs ?

Use zipped files

  • create_blobs.py currently reads smiles-files, but the original data is zipped. Check if reading directly from zipped files leaves performance similar, see modular zipfile with zipfile.open method.
  • blobs created by create_blobs.py are very large (500-600 GB) for the full Enamine REAL dataset. Check if compression leaves performance similar, e.g. using zipfiles.

To check

Benchmarking to other solutions

Solution using chemfp

Solution using rdkit functionality

@enryH enryH added the enhancement New feature or request label Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant