Skip to content

Mofek/PDF_TO_Knowledge_Graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

program Overview This program automates the retrieval of open-access articles from PubMed Central (PMC), extracts PDF files from each downloaded package, and transforms the extracted text into a knowledge graph. The process involves:

Searching PMC for articles matching specified criteria (e.g., open-access or containing certain keywords). Downloading article packages in .tar.gz format from the official PMC Open Access Subset. Extracting PDF files from the downloaded archive, then converting the PDF text into manageable chunks. Generating a knowledge graph from these text chunks using a large language model (LLM), identifying entities and relationships for further analysis. By combining these steps, the program offers a streamlined way to gather, parse, and structure scholarly content for more efficient discovery and analysis.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages