-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: sbatch stderr parsing #2 #165
Conversation
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…t failure of a job submission
Caution Review failedThe pull request is closed. WalkthroughThe changes in this pull request focus on enhancing error handling in the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
I'm happy to report that it works as expected! |
@fgvieira Hoi Filipe, time for another go? ;-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
snakemake_executor_plugin_slurm/__init__.py (3)
236-239
: Consider making the error detection more robust.While checking for "submission failed" is a good start, the error detection could be more comprehensive. Consider:
- Making the check case-insensitive
- Including other common SLURM error patterns
- Adding debug logging for stderr when it's non-empty but doesn't match the pattern
- if "submission failed" in err: # any other error message indicating failure? + # Check for various SLURM error patterns + if err: + self.logger.debug(f"SLURM stderr output: {err}") + error_patterns = [ + "submission failed", + "SUBMISSION FAILED", + "error:", + "invalid" + ] + if any(pattern.lower() in err.lower() for pattern in error_patterns): + raise WorkflowError( + f"SLURM job submission failed. The error message was {err}" + )
Line range hint
227-239
: Consider enhancing the error handling architecture.The current error handling structure could be improved for better maintainability and debugging:
- Consider extracting error handling into a dedicated method
- Add structured logging for error contexts
- Separate SLURM-specific from general subprocess errors
Example structure:
def _handle_submission_error(self, err: str, returncode: int = None) -> None: """Handle SLURM submission errors with structured logging.""" self.logger.debug(f"SLURM submission error context: returncode={returncode}, stderr={err}") if returncode is not None: # Handle subprocess errors raise WorkflowError( f"SLURM job submission process failed with code {returncode}. Error: {err}" ) # Handle SLURM-specific errors if self._is_slurm_submission_error(err): raise WorkflowError(f"SLURM job submission failed. Error: {err}") def _is_slurm_submission_error(self, err: str) -> bool: """Check for SLURM-specific submission errors.""" error_patterns = ["submission failed", "error:", "invalid"] return any(pattern.lower() in err.lower() for pattern in error_patterns)🧰 Tools
🪛 Ruff
233-235: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Line range hint
241-245
: Enhance job ID validation after submission.The current job ID extraction could be more robust by adding format validation:
slurm_jobid = out.strip().split(";")[0] if not slurm_jobid: raise WorkflowError("Failed to retrieve SLURM job ID from sbatch output.") + # Validate job ID format (typically numeric) + if not slurm_jobid.isdigit(): + self.logger.warning( + f"Unexpected SLURM job ID format: {slurm_jobid}. " + "Please verify if this is correct for your SLURM configuration." + ) slurm_logfile = slurm_logfile.replace("%j", slurm_jobid)🧰 Tools
🪛 Ruff
233-235: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
Sure! :) The two error messages are identical and that might make it a bit hard to figure out why it failed. |
Hm, I am open to suggestions, because, yes: to a human reader it looks redundant. I assume, that you are referring to lines 232-238? The thing is: one is catching an error from lauchning the process itself. If it occurs, the likely cause is a cluster failure of some kind, where |
I did not meant merge them, but maybe have slightly different error messages so it is easier to identify where the error comes from. EDIT: made a suggestion |
Co-authored-by: Filipe G. Vieira <[email protected]>
yeah, fine. Inserting a clarification and distinction is just good. |
@fgvieira I truncated the error string again - otherwise the linter will fail. Breaking the string in a multiline string yielded ugly output. The current string is a compromise. |
🤖 I have created a release *beep* *boop* --- ## [0.11.2](v0.11.1...v0.11.2) (2024-11-07) ### Bug Fixes * sbatch stderr parsing ([#161](#161)) ([0368197](0368197)) * sbatch stderr parsing [#2](#2) ([#165](#165)) ([348e537](348e537)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
next attempt to fix the behaviour described in #161
Summary by CodeRabbit
Bug Fixes
sbatch
command failures.Chores