This project provides a framework to reanalyse public Proteomics data, including amount other PRIDE Data or PeptideAtlas . Amount other tools the framework provides methods for prediction of Search Parameters of public submissions, reanalysis pipelines for peptide/protein identification, de novo search or Quality assesment of the final results.
Please you can contact using github issues: https://github.com/PRIDE-Cluster/cluster-data-generation/issues or to the following email: Yasset Perez-Riverol
Contributors: Marc Vaudel , Kenneth Verheggen
In order to build the project the developer should first clone the project and the corresponding submodules:
git clone --recursive https://github.com/PRIDE-Cluster/cluster-data-generation
When the porject is download, the developer should make cd
into the project folder and execute:
$ mvn clean
$ mvn install
All the tools, and corresponding scripts would be store in the resources
folder.
A set of tools has been developed to enable the user to perform the following tasks:
- Download a Protein database from external Repository (e.g UniProt Proteomes):
FastaDownloadTool
- Processing a Fasta File including the following tasks:
FastaProcessingTool
- Append a Database to the original Database (e.g contaminants database)
- Add Decoys to the result database
The resources folder contains all the tools, scripts and python tools to work with the data data. Most of the scripts are designed for working with LSF jobs.
If you have problem to pull the latest version please, force the repository by doing:
git reset --hard origin/master
and
git pull