-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GetBestRecogTrainExp job and related helpers #108
Comments
I think for this to consistently work, we would either have to have a Job that does a variable number of recognitions (and scorings) in itself or use the async workflow in sisyphus that then would make the graph dependent on some intermediate output (i.e. the scores of the epochs).
Well, I do in fact only run recognitions on a fixed set of epochs independent of their score. This might be sub-optimal, but then my experiments have shown that once you are in the converged phase of model training WER and cv-score become decorrelated. Therefore I always have a fixed graph that is pre-determined ab initio.
|
See my solution in my current implementation. I just pass a generic function, like (model -> score), which is then executed for all relevant models, i.e. all relevant epochs. This does not need the async workflow. I'm personally also not a fan of the async workflow. (See here.)
Yea, that was a bit misleading. I only use
This could be an option to the |
Yes, you do not explicitly use the async workflow functions of sisyphus, but you are still passing a callback function that only gets executed once the corresponding Even on a high level your proposed workflow would not work without some form of asynchronicity, because the number of epochs that would need to be recognized are not known until some part of the graph has already been computed.
There is a possibility, if we allow to give up this bit of efficiency. Then would always know exactly the Jobs that need running. Pseudo-pipeline code:
Where |
Why is this problematic? I don't see any problem with this.
No, it's a bit different in the blog post. This is a well known metaphor, the coloring of functions w.r.t. async. It means that functions are distinguished as either "async" or "normal". See also here, here, many more follow-up thoughts on this aspect.
These blog posts and many others as well provide many arguments of why and how this complicates everything and other downsides. A callback function is just a normal function. It is not an async function. So the callback can be called everywhere without any complicated overhead. I'm not really speaking about asynchronous execution in general (which we have all the time with Sisyphus) but really about the Python async concept, and the Sisyphus async workflow uses that. I don't have any issue with asynchronous execution in general. I don't see a problem with that.
Yes I know. Why is that a problem? My code handles that just fine.
I don't see how that could happen. A job will only run when its inputs are available. So I cannot access anything before it is computed.
When you want it to be dynamic as I outlined in my initial post, this logic must be within the manager. There is not really any way around it. You could potentially defer it to another job, to determine the list of relevant epochs, but then, again in the manager, you need to read that list of relevant epochs. At some point in the manager, you need to get that dynamic list of epochs. But again, if this is done careful, I don't see too much a problem in that. I think in
Multiple problems with this:
And I don't really see the advantage with this. The code is maybe slightly simpler, but what else? And this issue is about having some common code which everyone can use so that by using such common pipeline code, it's also just as simple. |
On a high-level, my training setup works like:
I want that the recog on fixed epochs runs as soon as those epochs are ready. I do that via
Job.update
. For the other epochs, this needs the final learning-rate-file with the scores, so it depends on that. This is then also viaJob.update
, to dynamically add some recogs. Note that the number of epochs where recog is performed on is variable, because there might be overlaps between those sets.I assume this is a quite reasonable and common pipeline, which you are probably also doing like this, or similar.
I think it's good if we have some common pipeline or helper code for this, and not that everyone has its own custom solution.
So I want to discuss this here. We can implement sth new, or use some existing code. For example, I have implemented exactly that already. See my
GetBestRecogTrainExp
job, therecog_training_exp
function, and related code.The text was updated successfully, but these errors were encountered: