-
Notifications
You must be signed in to change notification settings - Fork 449
LogExtension
Vitalii Koshura edited this page Apr 11, 2023
·
2 revisions
Current available traces contain the availability information about the participants. More information would be useful to improve the allocation of jobs and the checkpoint mechanism. For example, the time wasted by the interruptions could be analyzed in order to determine the best checkpoint period.
The following information could be saved on each client:
- job scheduling actions (start/stop/exit of jobs with the ID of the related jobs)
- checkpointing events
- change in fraction done of running jobs
Retrieving these data represents a traffic that the server may not be able to support. Several non-exclusive solutions are possible:
- retrieve only a subset of the log (e.g., only job scheduling actions)
- retrieve only data that are new
- compact the log
- sample the clients for which the log is retrieved
- based on a period, but the retrieved log will not be representative of participants that have a low rate of connection to the server
- based on the user ID
- request the log only when it reaches a significant size to avoid multiple small transfers