Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update job statuses #26

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 29 additions & 57 deletions docs/agave/guides/jobs/job-monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -128,96 +128,68 @@ The list of all possible job statuses is given in table 2.
</thead>
<tbody>
<tr>
<td>CREATED</td>
<td>The job was updated</td>
<td>ACCEPTED</td>
<td>The job accepted for processing</td>
</tr>
<tr>
<td>UPDATED</td>
<td>The job was updated</td>
</tr>
<tr>
<td>DELETED</td>
<td>The job was deleted</td>
</tr>
<tr>
<td>PERMISSION_GRANT</td>
<td>User permission was granted</td>
</tr>
<tr>
<td>PERMISSION_REVOKE</td>
<td>Permission was removed for a user on this job</td>
<td>PENDING</td>
<td>The job processing beginning</td>
</tr>
<tr>
<td>PENDING</td>
<td>Job accepted and queued for submission.</td>
<td>PROCESSING_INPUTS</td>
<td>Identifying input files for staging</td>
</tr>
<tr>
<td>STAGING_INPUTS</td>
<td>Transferring job input data to execution system</td>
</tr>
<tr>
<td>CLEANING_UP</td>
<td>Job completed execution</td>
</tr>
<tr>
<td>ARCHIVING</td>
<td>Transferring job output to archive system</td>
</tr>
<tr>
<td>STAGING_JOB</td>
<td>STAGED</td>
<td>Job inputs staged to execution system</td>
</tr>
<tr>
<td>FINISHED</td>
<td>Job complete</td>
</tr>
<tr>
<td>KILLED</td>
<td>Job execution killed at user request</td>
<td>STAGING_JOB</td>
<td>Staging runtime assets to execution system.</td>
</tr>
<tr>
<td>FAILED</td>
<td>Job failed</td>
<td>SUBMITTING</td>
<td>Submitting job to execution system</td>
</tr>
<tr>
<td>STOPPED</td>
<td>Job execution intentionally stopped</td>
<td>QUEUED</td>
<td>Job queued in execution system queue</td>
</tr>
<tr>
<td>RUNNING</td>
<td>Job started running</td>
</tr>
<tr>
<td>PAUSED</td>
<td>Job execution paused by user</td>
<td>Job running on execution system</td>
</tr>
<tr>
<td>QUEUED</td>
<td>Job successfully placed into queue</td>
<td>CLEANING_UP</td>
<td>Job completed execution</td>
</tr>
<tr>
<td>SUBMITTING</td>
<td>Preparing job for execution and staging binaries to execution system</td>
<td>ARCHIVING</td>
<td>Transferring job output to archive system</td>
</tr>
<tr>
<td>STAGED</td>
<td>Job inputs staged to execution system</td>
<td>BLOCKED</td>
<td>Job blocked</td>
</tr>
<tr>
<td>PROCESSING_INPUTS</td>
<td>Identifying input files for staging</td>
<td>PAUSED</td>
<td>Job processing suspended</td>
</tr>
<tr>
<td>ARCHIVING_FINISHED</td>
<td>Job archiving complete</td>
<td>FINISHED</td>
<td>Job completed successfully</td>
</tr>
<tr>
<td>ARCHIVING_FAILED</td>
<td>Job archiving failed</td>
<td>STOPPED</td>
<td>Job execution intentionally stopped</td>
</tr>
<tr>
<td>HEARTBEAT</td>
<td>Job heartbeat received</td>
<td>FAILED</td>
<td>Job failed</td>
</tr>
</tbody>
</table>
Expand All @@ -229,7 +201,7 @@ The list of all possible job statuses is given in table 2.
<p class="table-caption">Table 2. Job statuses listed in progressive order from job submission to completion.</p>


Polling is an incredibly effective approach, but it is bad practice for two reasons. First, it does not scale well. Querying for one job status every few seconds does not take much effort, but querying for 100 takes quite a bit of time and puts unnecessary load on Tapis's servers. Second, polling provides what is effectively a binary response. It tells you whether a job is done or not done, it does not give you any information on what is actually going on with the job or where it is in the overall execution process.
Polling is an incredibly effective approach, but it is bad practice for two reasons. First, it does not scale well. Querying for one job status every few seconds does not take much effort, but querying for 100 takes quite a bit of time and puts unnecessary load on Tapis's servers. Second, polling provides what is effectively a binary response. It tells you whether a job is done or not done, it does not give you any information on what is actually going on with the job or where it is in the overall execution process. Subscribing to any job event notification will automatically subscribe you to: `FINISHED`, `STOPPED`, `BLOCKED`, `PAUSED` and `FAILED` notifications.

The job history URL provides much more detailed information on the various state changes, system messages, and progress information associated with data staging. The syntax of the job history URL is as follows:

Expand Down