Additional resources #10

ytakemon · 2019-07-19T16:22:37Z

Hello cancerdatasci team,

I read the publication for CERES by Meyers et al. on Nat Genet, and I am excited to try CERES to correct for copy number effect in our CRISPR KO screen (which didn't use Gecko or Wang library). I'd like to learn more about this tool so that I can adapt it for use in our lab. I'm curious, are there are any tutorials out there besides what is currently in the README.md file? After running through the examples on README.md, it left me with a few questions for example:

What are the input file format requirements?
How were the example data generated (for example, Gecko.gct)
What is the "zmad" argument given to dep_normalize in the prepare_ceres_input()?
How was list(lambda_g=0.68129207) decided as an argument for params in wrap_ceres()?

Thanks for your time!
-Yuka

joshdempster · 2019-07-19T21:11:54Z

Hi Yuka,

To run CERES you will need:

log fold change data in a gct format
a replicate map (table) with the columns "Replicate" and "CellLine". Columns must be tab-separated.
A gene annotation file from CCDS. If you have successfully run the demo, you already have this. It must match the genome assembly you pass in prepare_inputs.
Segmented copy number (table) with columns "CCLE_name", "Chromosome", "Start", "End", "Num_Probes", and "Segment_Mean". CCLE name should match a cell line name in your replicate file. Chromosome is 1 or 2 characters, e.g. "1", "21", "X". Start and End are integers bounding the segment.
They must correspond to the genome assembly (e.g. hg19) you choose for running prepare_inputs. Num_Probes is irrelevant and can be filled with any integer. Segment_Mean is the inferred copy number of the segment / 2, not log transformed.

You will also want to pass arguments specifying the genome assembly, what chromosomes to use (alignments to chromosomes not listed will be ignored), and how to normalize the logfold change ("zmad" sets each cell line to have zero median and median absolute deviation 1). Chromosomes must be listed with the characters "chr" in front, unlike in the segmented copy number file.

For the data preparation procedures used to create log fold change, please refer to Meyers et al.

Lambda_g specifies the strength of the hierarchical regularization. Higher values reduce gene score variance. The value given in the README was chosen to maximize out of sample predictive accuracy. However, it is probably too conservative for most use cases. We use 0.4 in Achilles runs.

Regards,

Josh

ytakemon · 2019-07-22T15:22:05Z

Hi Josh,

Thank you for the detailed response! Looking forward to using CERES on our dataset.

Kind regards,
Yuka

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional resources #10

Additional resources #10

ytakemon commented Jul 19, 2019

joshdempster commented Jul 19, 2019

ytakemon commented Jul 22, 2019

Additional resources #10

Additional resources #10

Comments

ytakemon commented Jul 19, 2019

joshdempster commented Jul 19, 2019

ytakemon commented Jul 22, 2019