KEGGCharter as a proper tool of science
Implemented COG2KO
This idea belongs to Lovro Grum. For each KO, COGs are extracted from their KEGG HTML page. This information is reversed, and becomes COG to KO conversion.
New database, making KEGGCharter far more powerful! Makes for a great synergy with reCOGnizer.
Because this is webscrapping, 403 - Forbidden
and Timeouts may often occur.
KEGGCharter gives some time between failed tries, and at the end checks for any KOs whose HTMLs were not retrieved. It tries to retrieve those as well.
Sanitization of input file
Checks if:
- inputted columns exist in the input file
- if
--kegg-column
,--ko-column
,--ec-column
,--cog-column
columns don't have invalid values / bad characters (" " and ";").
Added parameter for dividing quantification of each enzyme by the KOs assigned to it
When set, the --distribute-quantification
parameter will instruct KEGGCharter to split the quantification of each enzyme by all the KOs that were assigned to it.
This information is outputted in data_for_charting.tsv
.
New tests for several different parameters' combinations
show-available-maps
for --show-available-maps
parameter.
input-quantification-and-taxonomy
for --input-taxonomy
and --input-quantification
parameters.
include-missing-genomes
for --include-missing-genomes
parameter.
map-all
for --map-all
parameter.
New output folders and writting of JSON information
KEGGCharter now stores metabolic maps representations in a maps
folder. No brainer.
KEGGCharter additionally stores the information concerning the maps into a json
folder. This folder will contain the dictionaries used for generating both the potential
and differential
maps.
"Potential" JSONs come in the form {box_id: [tax1, tax2, ...]}
.
"Differential" JSONs come in the form {box_id: [col1, col2, ...]}
. In the future, these should include the quantification value instead.