Skip to content

Commit

Permalink
updated for sot abstract
Browse files Browse the repository at this point in the history
  • Loading branch information
tomlue committed Nov 2, 2024
1 parent 9f258e3 commit ad40292
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 3 deletions.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,7 @@ Data evaluation records from EPA and potentially other sources

## Assets
1. **brick/riskder.pdf**: a directory with data evaluation records
2. **brick/riskder.parquet**: a parquet file linking urls to the pdfs in `riskder.pdf`
2. **brick/riskder.parquet**: a parquet file linking urls to the pdfs in `riskder.pdf`

## Links
- [google drive folder](https://drive.google.com/drive/folders/1-UgyHD8XCbuhfwXxyqcZfO0QtNsf0hda?usp=drive_link)
10 changes: 10 additions & 0 deletions documents/sot_abstract/sot_abstract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Automating the Extraction of Experimental Outcomes from Regulatory Guidance Documents
A large number of guidance documents exist across regulatory agencies, including the EPA and FDA, which provide essential data on chemical properties, toxicity risks, and testing protocols. These documents are valuable for regulatory, research, and risk assessment applications but remain challenging to access and utilize efficiently. In this work, we aggregated and processed 1,812 "Data Evaluation Records" for various chemicals, developing an automated pipeline for extracting core information such as substance names, test guidelines, testing metrics (e.g., Lowest Observed Adverse Effect Level (LOAEL), No Observed Adverse Effect Level (NOAEL)), and specific metric values. Our system accurately identifies and organizes these data points, creating a structured dataset that is readily usable for regulatory and safety analysis. This approach provides an on-ramp for unstructured regulatory documents, enabling scalable chemical safety monitoring and supporting data-driven decision-making in regulatory science.

<!-- generate some kind of alphanumeric pasword -->
import random
import string

# Generate a random 12-character alphanumeric password
password = ''.join(random.choices(string.ascii_letters + string.digits, k=12))
print(f"Generated password: {password}")
3 changes: 2 additions & 1 deletion notebook.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import pandas as pd

results = pd.read_parquet('brick/riskder.parquet')
results = pd.read_parquet('brick/riskder.parquet')
extraction = pd.read_parquet('brick/extraction.parquet')
2 changes: 1 addition & 1 deletion stages/03_data_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,5 +128,5 @@ def extract_testing_results(pdf_path):
continue


aggdf.to_csv('brick/extraction.csv')
aggdf.to_parquet('brick/extraction.parquet')
# endregion

0 comments on commit ad40292

Please sign in to comment.