Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to be able to print shell commands for final_report rule #139

Closed
Blosberg opened this issue Jan 9, 2019 · 13 comments
Closed

Need to be able to print shell commands for final_report rule #139

Blosberg opened this issue Jan 9, 2019 · 13 comments

Comments

@Blosberg
Copy link
Collaborator

Blosberg commented Jan 9, 2019

The final report rule in BS seq uses run instead of shell, which means we cannot see what's happening inside snakemake with the option printshellcmds.

This enormously increases the difficulty of debugging when issues are encountered (e.g. Issue #119).

We need to either change "run" to "shell" in this rule, or find some other way to output the command.

@rekado
Copy link
Member

rekado commented Jan 9, 2019

Note that the rule calls generateReports, which does use shell in the end.

@Blosberg
Copy link
Collaborator Author

Blosberg commented Jan 9, 2019

yes, but at the level of job submission, --printshellcmds will not propagate into the generateReports function. If we're using a shell command anyway, then I don't see any reason why we can't just have this directly in the rule. What is the point of the dumps variable?

@Blosberg
Copy link
Collaborator Author

Blosberg commented Jan 9, 2019

More generally though, the problem is just that it makes things non-standard. If all of our other output is using shell to produce output, then we should avoid changing that for just one specific rule, unless we really need to. Doing things the same way every time just keeps everything cleaner. irregularities are messy and invariably lead to problems like this.

@Blosberg
Copy link
Collaborator Author

resolved by commit f7508cb

Final report generation shell command is now sent to the snakemake log file.

@rekado
Copy link
Member

rekado commented Jan 10, 2019

More generally though, the problem is just that it makes things non-standard. If all of our other output is using shell to produce output, then we should avoid changing that for just one specific rule, unless we really need to. Doing things the same way every time just keeps everything cleaner. irregularities are messy and invariably lead to problems like this.

I disagree. There's nothing wrong with using whatever mechanism there is if it allows us to solve problems. We ended up using run for a reason. Calling it an irregularity makes it sound like there is no reason to do it this way in the first place. git blame may help you answer this question.

(Besides, if anything is messy it's using shell at all ;))

@Blosberg
Copy link
Collaborator Author

makes it sound like there is no reason to do it this way in the first place.

That was the main point of this question: why exactly it was being done this way. Git blame on the file didn't provide any illuminating answer on this --perhaps I missed something. Can you explain why this is run stuff is necessary? ( or post a more specific git blame command that would provide that answer)

@alexg9010
Copy link
Member

I think this solution of printing into stdout is very nice, only nice thing would be to make it conditional ( only when printShellCmd).

About the why are we using it? :

I think run is used to execute python code, which was needed in this case, because we built the command using python. Especially this dump:q was to place the json formatted params dict into the command.

The benefit of this functionality is that we do not have to explicitly have to hardcode the params of the two reports (final, diffmeth), but can define them in the rule and pass them as dict.

@Blosberg
Copy link
Collaborator Author

The benefit of this functionality is that we do not have to explicitly have to hardcode the params of the two reports (final, diffmeth), but can define them in the rule and pass them as dict.

ok, I'm a little foggy on some of the details in there, but that sounds like a sufficiently good reason for setting it up in this way.

only nice thing would be to make it conditional ( only when printShellCmd).

Agreed, that would be ideal, but passing the flag into the function is an extra step of work. Moreover, this output isn't going to screen; I just tested it and saw that this it is actually sent to the snakemake output log file for the job in question (which is, itself, going to be relegated to a sub-folder of pigx_work/), so it's not really cluttering up any output. The only time a user will see the output of this print statement is if they are specifically looking into the log files (in which case, it would usually make sense to see the command), so I'm ok with just leaving it as is.

@Blosberg
Copy link
Collaborator Author

Unfortunately, a simple print statement doesn't actually convey all the necessary information. This is what the output looks like currently:

==== The present shell command being submitted by the generateReport function is as follows: 
nice -19 /home/bosberg/projects/pigx/pigx_bsseq_production/guix/.guix-profile/bin/Rscript --vanilla {DIR_scripts}/generate_report.R --scriptsDir=/home/bosberg/projects/pigx/pigx_bsseq_production/scripts/ --reportFile={input.template} --outFile={output.report} --finalReportDir=/scratch/AG_Akalin/bosberg/cfDNA_cardiac_pigx_output/Final_Reports/ --report.params={dumps:q} --logFile={log}
==== End of shell command submitted by the generateReport function.

And then when the job crashes, the only additional information comes from the SM run:

Error in rule final_report:
    jobid: 121
    output: /scratch/AG_Akalin/bosberg/cfDNA_cardiac_pigx_output/Final_Reports/N2_RDML00564_HT2LKCCXY_L1_1_val_1_bt2.sorted.deduped_hg19_final.html
    log: /scratch/AG_Akalin/bosberg/cfDNA_cardiac_pigx_output/Final_Reports/N2_RDML00564_HT2LKCCXY_L1_1_val_1_bt2.sorted.deduped_hg19_final.log
    cluster_jobid: Your job 3166421 ("snakejob.final_report.121.sh") has been submitted

So at least we know what the output is, but it's still a massive pain to figure out what the inputs and all the parameters were in order to run the job manually in an R session.

Especially this dump:q was to place the json formatted params dict into the command.

Why do we need to dump this all in json format? We used to just include the parameters directly in the shell command.

@Blosberg Blosberg reopened this Jan 14, 2019
@alexg9010
Copy link
Member

The dumping was because:

The benefit of this functionality is that we do not have to explicitly have to hardcode the params of the two reports (final, diffmeth), but can define them in the rule and pass them as dict.

If we have so many issues with the generate_report() function we might aswell trash it and write explicit shell commands inside the rule.

@rekado @katwre @Blosberg what do you think?

@katwre
Copy link
Contributor

katwre commented Jan 15, 2019

@alexg9010 I don't mind any, using explicit shell commands or run, if there are more advantages of using shell lets go for it. I remember that for final and diffmeth it was a bit tricky with the code and it was the only way to go (passing parameters) and so we had to use run.

@Blosberg
Copy link
Collaborator Author

Print ArgsL rds file to pigx_work, and then make a note of that fact in the SM message.

@alexg9010
Copy link
Member

should be fixed with #159

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants