Skip to content

A database, visualization, and overview of environmental research topics through available sources

Notifications You must be signed in to change notification settings

branisk/Eco-Sourced

Repository files navigation

Eco-Sourced

A database, visualization, and overview of environmental research topics through available sources

What is this?

The goal for this project is to aggregate a large amount of Environmental research papers, and organize them for simplicity at understanding the topics and taxonomy of Environmental Sciences. This will aid in answering questions such as:

  • What Environmental Science fields are being studied?
  • What Environmental Science research has already been done?
  • What Environmental Science topics can I contribute to?
  • What Environmental Science topics need to be worked on?
  • What are the most densely worked on topics?

Relevent Papers:

Relevent Sources:

Technical Flowchart

classDiagram
    class ArXiv_AWS_S3 {
        5.6 TB
        Updates Monthly
    }
    class PDF_Storage {
        2.7 TB
        +100 GB/month
    }
    class Source_Storage {
        2.9 TB
    }
    class Preprocessing {
        Dask
    }
    class Analysis {
        Dask
    }
    class Visualization {
        Vaex
    }
    class Clustering {
        HDBSCAN
    }
    class Database {
        AWS_RDS
    }

    ArXiv_AWS_S3 -- PDF_Storage: boto3
    ArXiv_AWS_S3 -- Source_Storage: boto3
    PDF_Storage --> PDF_to_Text: PyMuPDF
    Source_Storage --> TEX_to_Text: TEX_EXTRACTOR
    Source_Storage --> Other_Text: Other_EXTRACTOR
    PDF_to_Text --> Processed_Data_S3
    TEX_to_Text --> Processed_Data_S3
    Other_Text --> Processed_Data_S3
    Processed_Data_S3 --> Preprocessing
    Preprocessing --> Analysis
    Analysis --> Visualization
    Analysis --> Clustering
    Clustering --> Visualization
    Clustering --> Categorization
    Clustering --> Database
    Categorization --> Visualization
    Categorization --> Database
    Database --> Query

Loading

Contributions are very welcomed! Please submit a pull request, or feel free to reach out at [email protected].

About

A database, visualization, and overview of environmental research topics through available sources

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published