This is an experiment to create an AI Agent that can crawl and interact with webpages to achieve desired goal. Fully on Cloudflare (almost).
Services used:
- Cloudflare Workers - responding to HTTP requests
- Cloudflare Durable Objects - running agent's core loop
- Cloudflare Browser Rendering - programmatically control a web browser
- Cloudflare AI Gateway - monitor requests to OpenAI (to be replaced with Workers AI 🤞, all we need is a model with bigger context window and function calling)
- Cloudflare R2 - store screenshots of the interactions
- Cloudflare D1 - store jobs and the logs
- Baselime (acquired by Cloudflare) - tracing using OpenTelemetry & logging
- Node 18 or greater
- pnpm
- Cloudflare account with Workers Paid Plan
- OpenAI API Key
pnpm i
npx wrangler secret put OPENAI_API_KEY # and fill with your OpenAI key
# You can also put it inside .dev.vars (copy .dev.vars.template as a reference)
# Run migrations on local SQLite instance
npx wrangler d1 execute ai-agent-jobs --local --file=migrations/0000_init.sql
# Run migrations on remote SQLite instance
npx wrangler d1 execute ai-agent-jobs --remote --file=migrations/0000_init.sql
# You can use `pnpm run dev` as well but Browser Rendering does not work locally
pnpm run deploy
curl -X POST \
<URL to your deployed worker> \
-d '{"baseUrl": "https://chatwithcloud.ai", "goal": "Extract pricing data" }' \ # Replace with your URL and goal
--no-buffer
- User sends request to the Cloudflare Worker
- Cloudflare Worker passes that to the Durable Object
- Durable Object starts or reuses Browser and loads
baseUrl
from the request's body - The Goal and HTML is passed via AI Gateway to the LLM. LLM Responds with:
- Either the goal is met and final answer is returned
- Or LLM decides to do one of three things:
- Click something on the page
- Type something
- Select something
- After each interaction, the current browser window screenshot is stored in R2. The resulting HTML (or error) is passed to the LLM to generate next step (back to 4).
- To prevent huge bills, Cloudflare Worker is capped at 2 requests per 10 seconds (adjustable in
wrangler.toml
) - GPT-4o context window allows up to 128K tokens. HTML code of many pages exceeeds that
- Browser Rendering session is limited to 180 seconds (can be changed in code though by adjusting
KEEP_BROWSER_ALIVE_IN_SECONDS
)
- Create
.env
file and fill with your Cloudflare account ID, D1 Databse ID and Cloudflare D1 token with edit permissions. Use.env.template
as a reference
cp .env.template .env
npx drizzle-kit studio
- Head to https://local.drizzle.studio
Learn more about Drizzle Studio and its D1 configuration
- nice frontend to display the results from the DB, maybe using Hono?