Skip to content

researchgraph/covid19-graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

covid19-graph

Apply Non-Negative Matrix Factoriazation(NMF) topic modelling to COVID research articles.

prerequisite

  • Download "search_results" zip file from COVID-19 Graph upzip it in the coronavirus_twenty_years_of_research directory.

  • Install required packages by using requirements.txt.

About program process

Step1. Data preprocessing.ipynb Load JSON file and preprocess text data.

  • INPUT:
    • coronavirus_twenty_years_of_research/search_results/covid_19.json
    • coronavirus_twenty_years_of_research/search_results/covid19.json coronavirus_twenty_years_of_research/search_results/sars_cov_2.json
  • OUTPUT: coronavirus_twenty_years_of_research/technical_validation/merged_covid_articles.pkl

Step2. Extract Features.ipynb Convert the preprocessed data into TF-IDF matrix and train a word2vec model.

  • INPUT: coronavirus_twenty_years_of_research/technical_validation/merged_covid_articles.pkl
  • OUTPUT:
    • coronavirus_twenty_years_of_research/technical_validation/covid_100d.model
    • coronavirus_twenty_years_of_research/technical_validation/vocid_100d.txt
    • coronavirus_twenty_years_of_research/technical_validation/covid_tfidf_d.pkl
    • coronavirus_twenty_years_of_research/technical_validation/covid_tfidf_v.pkl

Step3. Apply topic modelling (NMF).ipynb Apply NMF topic modelling and produce outputs in JSON format.

  • INPUT:
    • coronavirus_twenty_years_of_research/technical_validation/merged_covid_articles.pkl
    • coronavirus_twenty_years_of_research/technical_validation/covid_100d.model
    • coronavirus_twenty_years_of_research/technical_validation/covid_tfidf_d.pkl
    • coronavirus_twenty_years_of_research/technical_validation/covid_tfidf_v.pkl
  • OUTPUT:
    • coronavirus_twenty_years_of_research/technical_validation/NMF_topic_modelling_results(8clusters).csv

Step4. Produce outputs of topic modelling.ipynb Read topic moding results and produce outputs in JSON format.

  • INPUT: coronavirus_twenty_years_of_research/technical_validation/NMF_topic_modelling_results(8clusters).csv
  • OUTPUT: coronavirus_twenty_years_of_research/clusters/cluster*/*.json

About data directories

  • The 'search_results' folder contains the extracted articles metadata.
  • The 'clusters' folder is the core of our dataset that contains the classified articles into the eight clusters.
  • The 'technical_validation' folder includes all csv files which are used to create figures in this paper.
  • The 'time_lapse_visualization' folder contains a video that animates the cluster trends.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published