Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional resources #10

Open
ytakemon opened this issue Jul 19, 2019 · 2 comments
Open

Additional resources #10

ytakemon opened this issue Jul 19, 2019 · 2 comments

Comments

@ytakemon
Copy link

Hello cancerdatasci team,

I read the publication for CERES by Meyers et al. on Nat Genet, and I am excited to try CERES to correct for copy number effect in our CRISPR KO screen (which didn't use Gecko or Wang library). I'd like to learn more about this tool so that I can adapt it for use in our lab. I'm curious, are there are any tutorials out there besides what is currently in the README.md file? After running through the examples on README.md, it left me with a few questions for example:

  • What are the input file format requirements?
  • How were the example data generated (for example, Gecko.gct)
  • What is the "zmad" argument given to dep_normalize in the prepare_ceres_input()?
  • How was list(lambda_g=0.68129207) decided as an argument for params in wrap_ceres()?

Thanks for your time!
-Yuka

@joshdempster
Copy link
Collaborator

Hi Yuka,

To run CERES you will need:

  • log fold change data in a gct format

  • a replicate map (table) with the columns "Replicate" and "CellLine". Columns must be tab-separated.

  • A gene annotation file from CCDS. If you have successfully run the demo, you already have this. It must match the genome assembly you pass in prepare_inputs.

  • Segmented copy number (table) with columns "CCLE_name", "Chromosome", "Start", "End", "Num_Probes", and "Segment_Mean". CCLE name should match a cell line name in your replicate file. Chromosome is 1 or 2 characters, e.g. "1", "21", "X". Start and End are integers bounding the segment.
    They must correspond to the genome assembly (e.g. hg19) you choose for running prepare_inputs. Num_Probes is irrelevant and can be filled with any integer. Segment_Mean is the inferred copy number of the segment / 2, not log transformed.

You will also want to pass arguments specifying the genome assembly, what chromosomes to use (alignments to chromosomes not listed will be ignored), and how to normalize the logfold change ("zmad" sets each cell line to have zero median and median absolute deviation 1). Chromosomes must be listed with the characters "chr" in front, unlike in the segmented copy number file.

For the data preparation procedures used to create log fold change, please refer to Meyers et al.

Lambda_g specifies the strength of the hierarchical regularization. Higher values reduce gene score variance. The value given in the README was chosen to maximize out of sample predictive accuracy. However, it is probably too conservative for most use cases. We use 0.4 in Achilles runs.

Regards,

Josh

@ytakemon
Copy link
Author

Hi Josh,

Thank you for the detailed response! Looking forward to using CERES on our dataset.

Kind regards,
Yuka

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants