Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: more information in SEACR log output #91

Open
SolKatzman opened this issue Jan 11, 2023 · 0 comments
Open

Request: more information in SEACR log output #91

SolKatzman opened this issue Jan 11, 2023 · 0 comments

Comments

@SolKatzman
Copy link

During it's run SEACR outputs some helpful progress information.

First, a nitpick: For better readability, each output line should be preceded (not followed) by a timestamp.

Currently:
Normalizing control to experimental bedgraph
Using relaxed threshold
Creating experimental AUC file: Wed Jan 11 08:53:11 PST 2023
Creating control AUC file: Wed Jan 11 08:53:28 PST 2023
Calculating optimal AUC threshold: Wed Jan 11 08:53:39 PST 2023

Preferred:
Wed Jan 11 08:53:01 PST 2023: Normalizing control to experimental bedgraph
Wed Jan 11 08:53:05 PST 2023: Using relaxed threshold
Wed Jan 11 08:53:11 PST 2023: Creating experimental AUC file
Wed Jan 11 08:53:28 PST 2023: Creating control AUC file
Wed Jan 11 08:53:39 PST 2023: Calculating optimal AUC threshold

More substantially, it would be very useful to get information about the processing as it proceeds, to help debug unexpected or questionable results in the output set of peaks. Currently, the only such item that I see in the log is "Empirical false discovery rate".
Some useful items of interest would be the following, but basically any parameter that the program finds would be worthwhile to report:

threshold values calculated for target and control
number of "raw" peaks at threshold in target and control
number of peaks after merging nearby features in target and control
number of peaks for target that were filtered out due to overlap with control

A more demanding request would be for an output file with (data and) figures, such as the graphs (with actual labeled axes) in Figure 2a in the SEACR paper:

"Peak calling by Sparse Enrichment Analysis for CUT&RUN chromatin profiling"

Thanks for your attention,
Sol Katzman
UC Santa Cruz Genomics Institute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant