-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure parallelization and Caching documentation #609
base: docs/restructure
Are you sure you want to change the base?
Restructure parallelization and Caching documentation #609
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of comments. Haven't reviewed all of it.
A more general note is: does this really count as a "how to" guide? If I understood correctly, those should focus on answering the question, not on explaining how things work. Opinions anyone?
I would also say that in the current form it is mixing a how-to guide and explanations. An ideal how-to guide according to the diataxis framework is extremely focused on solving one specific user need. Users are usually in a hurry and don't read anything carefully. So the how-to guide needs to be very easy to skim. Less text and more code snippets is better. But you do need a bit of text (headings and introductory sentences) to reassure the reader they are at the right place or tell them where they should go instead. I think I also have made the mistake of adding explanations in almost all how-to guides I wrote. But I'll try to make an example of how I understand how-to guides now. An example of a specific situation could be: A user wants to do data valuation. It's slow but they have a big computer and wonder how they can use it. Then the how-to guide could be structured as follows: How to speed up data valuation with parallelizationThis guide will show you how to speed up data valuation algorithms by using parallel hardware. For alternative ways to speed up data valuation see [caching]. For parallelization of influence functions see [...]. Parallelization on a single computerThis approach is a good idea if you are working on a laptop or small server. # code snippet here
# the code should already make it obvious how one sets the number of cores
# no need to mention default backends or stuff (Explain what the code does but do not describe in words how it could be changed or how joblib works. If you want to show variations, add a second code snippet. If there are many code snippets, use tabs.) For advanced configuration see [joblib docs] Parallelization on multiple machines... |
Thanks for the feedback! I have restructured and updated the content. I want your opinion on a few ideas:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's very good now. Just have two minor comments we can discuss later.
) | ||
``` | ||
|
||
## Parallelization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is very good but it is explaining background instead of showing how to get stuff done. I would keep it in this document but switch the order of "Local Parallelization" and "Parallelization" and rename "Parallelization" to something like "Understanding the pattern" or "Understanding what happened". That way people first see the code example they can copy and paste and then can decide on their own if they want to continue reading.
see [[speed-up-value-with-parallel]]. | ||
|
||
|
||
### Sequential Computation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the sequential computation should be explained in a separate document that is marked as prerequisite for this one.
Description
This PR addresses part of #581
Changes
Checklist
Wrote Unit tests (if necessary)Updated ChangelogIf notebooks were added/changed, added boilerplate cells are tagged with"tags": ["hide"]
or"tags": ["hide-input"]