updated for sot abstract

biobricks-ai · Nov 2, 2024 · ad40292 · ad40292
1 parent 9f258e3
commit ad40292
Show file tree

Hide file tree

Showing 4 changed files with 17 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -3,4 +3,7 @@ Data evaluation records from EPA and potentially other sources
 
 ## Assets
 1. **brick/riskder.pdf**: a directory with data evaluation records
-2. **brick/riskder.parquet**: a parquet file linking urls to the pdfs in `riskder.pdf`
+2. **brick/riskder.parquet**: a parquet file linking urls to the pdfs in `riskder.pdf`
+
+## Links
+- [google drive folder](https://drive.google.com/drive/folders/1-UgyHD8XCbuhfwXxyqcZfO0QtNsf0hda?usp=drive_link)
diff --git a/documents/sot_abstract/sot_abstract.md b/documents/sot_abstract/sot_abstract.md
@@ -0,0 +1,10 @@
+# Automating the Extraction of Experimental Outcomes from Regulatory Guidance Documents
+A large number of guidance documents exist across regulatory agencies, including the EPA and FDA, which provide essential data on chemical properties, toxicity risks, and testing protocols. These documents are valuable for regulatory, research, and risk assessment applications but remain challenging to access and utilize efficiently. In this work, we aggregated and processed 1,812 "Data Evaluation Records" for various chemicals, developing an automated pipeline for extracting core information such as substance names, test guidelines, testing metrics (e.g., Lowest Observed Adverse Effect Level (LOAEL), No Observed Adverse Effect Level (NOAEL)), and specific metric values. Our system accurately identifies and organizes these data points, creating a structured dataset that is readily usable for regulatory and safety analysis. This approach provides an on-ramp for unstructured regulatory documents, enabling scalable chemical safety monitoring and supporting data-driven decision-making in regulatory science.
+
+<!-- generate some kind of alphanumeric pasword -->
+import random
+import string
+
+# Generate a random 12-character alphanumeric password
+password = ''.join(random.choices(string.ascii_letters + string.digits, k=12))
+print(f"Generated password: {password}")
diff --git a/notebook.py b/notebook.py
@@ -1,3 +1,4 @@
 import pandas as pd
 
-results = pd.read_parquet('brick/riskder.parquet')
+results = pd.read_parquet('brick/riskder.parquet')
+extraction = pd.read_parquet('brick/extraction.parquet')
diff --git a/stages/03_data_extractor.py b/stages/03_data_extractor.py
@@ -128,5 +128,5 @@ def extract_testing_results(pdf_path):
         continue
 
 
-aggdf.to_csv('brick/extraction.csv')
+aggdf.to_parquet('brick/extraction.parquet')
 # endregion