-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where the train_data.pkl and test_data.pkl uploaded on 02-Dec-2019 came from? #37
Comments
Hi, |
Hi! rm results/deepgoplus_mf.txt python diamond_data.py -df data/train_data.pkl -o data/train_data.fa python diamond_data.py -df data/test_data.pkl -o data/test_data.fa diamond makedb --in data/train_data.fa -d data/train_data #creates train_data.dmnd diamond blastp -d data/train_data.dmnd --more-sensitive -t /tmp -q data/test_data.fa --outfmt 6 qseqid sseqid bitscore -o data/test_diamond.res Then with the test_diamond.res generated, the train_data.pkl and test_data.pkl that I've been talking about and the go.obo file available in your data web page uploaded on 01-Dec-2019 (one day before the pkl files), I try to run evaluate_diamondscore.py. I have supposed this go.obo file match with the pkl files of 2-Dec-2019, but the evaluation of diamond present some problems. First, it produces an error in the evaluate_annotations (line 151) function because the variable "total" divide the variable "ru" by zero (line 158). I've been studying your code, and this produces because the filter you use to maintain just the go terms that belong to the GO subontology (mf, bp or cc) eliminates every go terms in the labels and preds list. Then, after the filter of lines 84 and 107 both lists are empty, and the "for" of the evaluation never occurs. So I suspect the go terms of the go.obo are in a different format of the GOterms in train and test.pkl (the set of the go.obo appear with |IDA and this kind of details but the sets of labels and preds no), but I'm not sure. I will really appreciate it if you can help me, I need to use precisely these pkl files because my research team invested a lot of time in simulating the structural properties of the proteins contain there, and now we want to send our paper in which, of course, we cite you and your paper. But I need to make your diamond evaluation work with this pkl first. |
I did solve it by changing the string 'annotations' in lines 48 and 52 of evaluate_diamondscore.py by 'prop_annotations'. You use this column in the other evaluations files. Is this right? |
Hi, |
thanks! |
where is data-deepgo2016/test-mf-preds.pkl? |
Could you please remind me where is this file referenced? |
Indeed, I want to evaluate our annotation results using CAFA3. However, I found that only Fmax can be calculated by cafa3. I found another program (deepgoplus) can generate other values (such as Smin), and the two files were contained in deepgoplus.
At 2023-04-03 13:26:12, "Maxat Kulmanov" ***@***.***> wrote:
Could you please remind me where is this file referenced?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
To evaluate CAFA3, we ran cafa3_data.py script to generate cafa test_data.pkl and the run our model to get the predictions.pkl. The data is available here https://deepgo.cbrc.kaust.edu.sa/data/data-cafa.tar.gz |
thank you! I will try it again!
At 2023-04-03 17:23:55, "Maxat Kulmanov" ***@***.***> wrote:
To evaluate CAFA3, we ran cafa3_data.py script to generate cafa test_data.pkl and the run our model to get the predictions.pkl. The data is available here https://deepgo.cbrc.kaust.edu.sa/data/data-cafa.tar.gz
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
Only gene IDs and GO IDs were contained in our annotation results. however, the labels and preds should be contained in the input file when using evaluate_cafa3.py. according to your suggestion, we run cafa3_data.py, but the information of labels and preds seemed not to be added in our file. could you tell me how to add the information into our files? thanks
At 2023-04-03 17:23:55, "Maxat Kulmanov" ***@***.***> wrote:
To evaluate CAFA3, we ran cafa3_data.py script to generate cafa test_data.pkl and the run our model to get the predictions.pkl. The data is available here https://deepgo.cbrc.kaust.edu.sa/data/data-cafa.tar.gz
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
I would like to know where the file train_data.pkl and test_data.pkl came from? Specifically, the train_data.pkl and test_data.pkl uploaded on 02-Dec-2019 to the data page https://deepgo.cbrc.kaust.edu.sa/data/. These files are not the same train and test files of the data-cafa.tar, data-2016.tar, or another folder available on the webpage. However, I have been using these files in some experiments, and I recently realized it is not the data used to generate the tables presented in the Deepgoplus paper. Despite this, to interpret my results, I need to know the origin of these data files, if they are some merge or section of the other datasets of the data webpage, some version of Uniprot, CAFA, or whatever. Thanks for your help.
The text was updated successfully, but these errors were encountered: