Skip to content

Commit

Permalink
gsva poisson kernel, #226
Browse files Browse the repository at this point in the history
  • Loading branch information
zqfang committed Oct 20, 2023
1 parent cbd959c commit edb3459
Show file tree
Hide file tree
Showing 8 changed files with 33,054 additions and 11 deletions.
1 change: 1 addition & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ GSEApy has six sub-commands available: ``gsea``, ``prerank``, ``ssgsea``, ``repl
:gsea: The ``gsea`` module produces `GSEA <http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Main_Page>`_ results. The input requries a txt file(FPKM, Expected Counts, TPM, et.al), a cls file, and gene_sets file in gmt format.
:prerank: The ``prerank`` module produces **Prerank tool** results. The input expects a pre-ranked gene list dataset with correlation values, provided in .rnk format, and gene_sets file in gmt format. ``prerank`` module is an API to `GSEA` pre-rank tools.
:ssgsea: The ``ssgsea`` module performs **single sample GSEA(ssGSEA)** analysis. The input expects a pd.Series (indexed by gene name), or a pd.DataFrame (include ``GCT`` file) with expression values and a ``GMT`` file. For multiple sample input, ssGSEA reconigzes gct format, too. ssGSEA enrichment score for the gene set is described by `D. Barbie et al 2009 <http://www.nature.com/nature/journal/v462/n7269/abs/nature08460.html>`_.
:gsva: The ``gsva`` module performs `GSVA <https://github.com/rcastelo/GSVA>` method. The input is same to ssgsea.
:replot: The ``replot`` module reproduce GSEA desktop version results. The only input for GSEApy is the location to ``GSEA`` Desktop output results.
:enrichr: The ``enrichr`` module enable you perform gene set enrichment analysis using ``Enrichr`` API. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr . It runs very fast.
:biomart: The ``biomart`` module helps you convert gene ids using BioMart API.
Expand Down
18 changes: 14 additions & 4 deletions gseapy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -652,13 +652,23 @@ def gsva(
:param outdir: Results output directory. If None, nothing will write to disk.
:param str kcdf: "Gaussian", "Possion" or None.
:param str kcdf: "Gaussian", "Possion" or None.
The non-parametric estimation of the cumulative distribution function
:param str weighted_score_type: tau of gsva. Default: 1.
:param bool mx_diff: Offers two approaches to calculate the enrichment statistic (ES) from the KS random walk statistic. see GSVA doc
:param bool abs_rnk: Flag used only when mx_diff=True. see GSVA doc: https://rdrr.io/bioc/GSVA/man/gsva.html
:param bool mx_diff: Offers two approaches to calculate the enrichment statistic (ES) from the KS random walk statistic.
If True (default), ES is calculated as the difference
between the maximum positive and negative random walk deviations.
If False, ES is calculated as the maximum positive to 0.
:param bool abs_rnk: Flag used only when mx_diff=True.
If False (default), the enrichment statistic (ES) is calculated taking the magnitude difference
between the largest positive and negative random walk deviations.
If True, feature sets with features enriched on either extreme (high or low)
will be regarded as 'highly' activated.
see GSVA doc: https://rdrr.io/bioc/GSVA/man/gsva.html
:param int min_size: Minimum allowed number of genes from gene set also the data set. Default: 15.
Expand Down
10 changes: 10 additions & 0 deletions gseapy/gsva.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,16 @@ def run(self):
self._logger.info("Parsing data files for GSVA.............................")
# load data
df = self.load_data()
if self.rnaseq:
self._logger.debug(
"Poisson kernel selected. round input values to intergers!"
)
df = df.astype(int)
self._logger.debug(
"Poisson kernel selected. convert negative values to 0 !"
)
df = df.clip(lower=0)

self.data = df
# normalized samples, and rank
# filtering out gene sets and build gene sets dictionary
Expand Down
5 changes: 3 additions & 2 deletions src/gsva.rs
Original file line number Diff line number Diff line change
Expand Up @@ -121,9 +121,10 @@ impl GSVA {
for i in 0..x.len() {
if self.rnaseq {
// ppois(y[j], x[i]+bw, TRUE, FALSE):
// this function returns the value of the inverse Poisson cumulative density function
// this function returns the value of the Poisson cumulative density function
// NOTE: input has to be intergers
let pois = Poisson::new(x[i] + bw).unwrap();
left_tail += pois.inverse_cdf(y[j]) as f64;
left_tail += pois.cdf(y[j] as u64) as f64;
} else {
left_tail += self.precomputed_cdfs(y[j] - x[i], bw, pre_cdf);
}
Expand Down
Loading

0 comments on commit edb3459

Please sign in to comment.