You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently log some stage info at the end of every stage. This is useful, but we need to read through a lot of logs to get to the actually useful information.
It would be more convenient to print a summary at the end of the job, which conveniently points out the long running stages. This would then also make it much easier to compare runs of the same process graph.
We can implement the onApplicationEnd method to achieve this.
We'll need to store relevant stage statistics to be able to summarize them.
The summary should show total stage runtime and total number of stages, and then list stages that are responsible for 80% of the runtime. Threshold is a bit arbitrary, need to see how that works out in practice. The idea is that we should not list 'marginal' stages.
Another relevant metric is total executor allocation time, which determines job cost, versus actual task time, but not sure yet how we can compute that.
The text was updated successfully, but these errors were encountered:
To compute executor allocation time: there are 'executor added/removed' events as well, and they seem to provide a 'time'. This would allow us to compute total executor allocated time (the difference).
It is then the ratio of total task time versus allocated time that is very interesting. A low ratio means paying for executors while they are not doing anything.
We currently log some stage info at the end of every stage. This is useful, but we need to read through a lot of logs to get to the actually useful information.
It would be more convenient to print a summary at the end of the job, which conveniently points out the long running stages. This would then also make it much easier to compare runs of the same process graph.
The current logging implementation is here:
https://github.com/Open-EO/openeo-geotrellis-extensions/blob/49ed4eb9fe03dae666587bfe6fb419cec8cb039a/openeo-geotrellis/src/main/java/org/openeo/sparklisteners/BatchJobProgressListener.scala#L19
We can implement the onApplicationEnd method to achieve this.
We'll need to store relevant stage statistics to be able to summarize them.
The summary should show total stage runtime and total number of stages, and then list stages that are responsible for 80% of the runtime. Threshold is a bit arbitrary, need to see how that works out in practice. The idea is that we should not list 'marginal' stages.
Another relevant metric is total executor allocation time, which determines job cost, versus actual task time, but not sure yet how we can compute that.
The text was updated successfully, but these errors were encountered: