-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add memory benchmarks #142
base: master
Are you sure you want to change the base?
Conversation
Added all the competing libraries to the list in the README. Added additional dependencies that are required for running 'make setup' directly. May not be an exhaustive listing of dependencies. Cross checking with different types of platforms required. Changes to be committed: modified: README.md
Changes to be committed: modified: libraries/ann_install.sh modified: libraries/annoy_install.sh modified: libraries/dlibml_install.sh modified: libraries/download_packages.sh modified: libraries/dtimeout_install.sh modified: libraries/flann_install.sh modified: libraries/hlearn_install.sh modified: libraries/install_all.sh modified: libraries/milk_install.sh modified: libraries/mlpack_install.sh modified: libraries/mlpy_install.sh modified: libraries/mrpt_install.sh modified: libraries/nearpy_install.sh modified: libraries/r_install.sh modified: libraries/scikit_install.sh modified: libraries/weka_install.sh
Shogun's script install python3.5 interface for shogun. Now it's been generalized to install for systems python3 version. Changes to be committed: modified: libraries/shogun_install.sh
Changes to be committed: new file: libraries/mprofiler_install.sh modified: libraries/package-urls.txt modified: libraries/install_all.sh
Changes to be committed: modified: run.py
@@ -77,7 +78,13 @@ def run_timeout_wrapper(): | |||
logging.info('Run: %s' % (str(instance))) | |||
|
|||
# Run the metric method. | |||
result = instance.metric() | |||
mem_use, result = memory_usage(instance.metric, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does mprofiler track a process that is called with subprocess
as well? Or does it only track the python process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, It can track the processes spawned using subprocess. The include_children
option here in the function is specifically for this purpose. I have also gone through the source code of memory_profiler a little bit. So, I am sure of it. I will leave a link to it here.
Thanks for looking into this, excited to see some first results. |
@zoq So, I have some questions regarding the memory benchmarking system.
|
For the runtime benchmarks, we do not include data loading as the focus is on the method runtime; I think for the memory benchmarks we should do the same. For the data loading part, some libs support different formats like loading binary, which are often faster or some libs do some data encoding which isn't part of the method itself, so to do a fair comparison it makes sense to exclude the data loading part.
If we can avoid that, which I think we can if we put the memory benchmark inside the method script we should do that.
Actually, I think we want to track child processes as well, to track methods that split the process into multiple children. |
I felt the same initially, but then I realized that when the benchmark scripts use So,
Cool. |
I see that is tricky, in the past I used
We time the data loading/saving part and subtract it from the overall runtime; see: benchmarks/methods/mlpack/allkfn.py Line 76 in c11f08d
for an example. We do the same for MATLAB , R, etc. |
So we have the following options:
That's great. I didn't see that line. :) |
Personally I would go with option 2, as holding the data is part of how a method performances memory wise. We might want to take a look into |
Hi @zoq and @rcurtin , pardon me for contributing to this repository after a long break. So, the last time we talked about benchmarking memory usage of the machine learning algorithms on the IRC. So, I have made some changes to the repository to do this, which I am presenting here. I made some minor fixes in some of the initial commits. The main commits corresponding to memory benchmarking are c33047a and db9df17. Let me know what you think of these changes.
The f1ee3aa commit was made because running
make setup
doesn't install the packages directly and can throw errors if dependencies are not satisfied. Also, I don't think that these dependencies apply to all kinds of Linux systems. Mine is Ubuntu 18.04, and I have limited knowledge of other systems to talk about these dependencies confidently.PS: This is not the final pull request. Feedback is needed.