Demonstrates how to build an AI apps on top of mixed data - where documents are connected by structured data. This is akin to structured tables linking to unstructured text fields. Leverages United States Security & Exchange Commission (SEC) company filings data. Semi-structured data comes from form13 Asset manager ownership data while unstructured text comes from form10k sections.
This project is based off the work from this blog. It has been adjusted here to demonstrate Amazon Bedrock integration.
- See configuration prerequisites from the main README.md
- Create a Neo4j database. Easiest to do through AuraDB Pro free trial.
- Create and fill in a
cred.env
file using cred.env.template as a template - Run extract-pdf-to-json.ipynb to create extracted, structured, json data
- Run ingest-json-to-graph.ipynb to ingest into Neo4j
- In terminal run
export $(grep -v ^# cred.env | xargs)
- Run the agent with
python test_agent.py