Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log performance summary at end of job #989

Open
jdries opened this issue Jan 9, 2025 · 1 comment
Open

log performance summary at end of job #989

jdries opened this issue Jan 9, 2025 · 1 comment
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented Jan 9, 2025

We currently log some stage info at the end of every stage. This is useful, but we need to read through a lot of logs to get to the actually useful information.
It would be more convenient to print a summary at the end of the job, which conveniently points out the long running stages. This would then also make it much easier to compare runs of the same process graph.

The current logging implementation is here:
https://github.com/Open-EO/openeo-geotrellis-extensions/blob/49ed4eb9fe03dae666587bfe6fb419cec8cb039a/openeo-geotrellis/src/main/java/org/openeo/sparklisteners/BatchJobProgressListener.scala#L19

We can implement the onApplicationEnd method to achieve this.
We'll need to store relevant stage statistics to be able to summarize them.

The summary should show total stage runtime and total number of stages, and then list stages that are responsible for 80% of the runtime. Threshold is a bit arbitrary, need to see how that works out in practice. The idea is that we should not list 'marginal' stages.

Another relevant metric is total executor allocation time, which determines job cost, versus actual task time, but not sure yet how we can compute that.

@jdries
Copy link
Contributor Author

jdries commented Jan 9, 2025

To compute executor allocation time: there are 'executor added/removed' events as well, and they seem to provide a 'time'. This would allow us to compute total executor allocated time (the difference).
It is then the ratio of total task time versus allocated time that is very interesting. A low ratio means paying for executors while they are not doing anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants