AI-Controlled Browser on Cloudflare

This is an experiment to create an AI Agent that can crawl and interact with webpages to achieve desired goal. Fully on Cloudflare (almost).

Services used:

Cloudflare Workers - responding to HTTP requests
Cloudflare Durable Objects - running agent's core loop
Cloudflare Browser Rendering - programmatically control a web browser
Cloudflare AI Gateway - monitor requests to OpenAI (to be replaced with Workers AI 🤞, all we need is a model with bigger context window and function calling)
Cloudflare R2 - store screenshots of the interactions
Cloudflare D1 - store jobs and the logs
Baselime (acquired by Cloudflare) - tracing using OpenTelemetry & logging

Prerequisites

Node 18 or greater
pnpm
Cloudflare account with Workers Paid Plan
OpenAI API Key

Setup & Usage

pnpm i
npx wrangler secret put OPENAI_API_KEY # and fill with your OpenAI key
# You can also put it inside .dev.vars (copy .dev.vars.template as a reference)

# Run migrations on local SQLite instance
npx wrangler d1 execute ai-agent-jobs --local --file=migrations/0000_init.sql

# Run migrations on remote SQLite instance
npx wrangler d1 execute ai-agent-jobs --remote --file=migrations/0000_init.sql

 # You can use `pnpm run dev` as well but Browser Rendering does not work locally
pnpm run deploy

curl -X POST \
  <URL to your deployed worker> \
  -d '{"baseUrl": "https://chatwithcloud.ai", "goal": "Extract pricing data" }' \ # Replace with your URL and goal
  --no-buffer

The loop

User sends request to the Cloudflare Worker
Cloudflare Worker passes that to the Durable Object
Durable Object starts or reuses Browser and loads baseUrl from the request's body
The Goal and HTML is passed via AI Gateway to the LLM. LLM Responds with:

Either the goal is met and final answer is returned
Or LLM decides to do one of three things:
- Click something on the page
- Type something
- Select something

After each interaction, the current browser window screenshot is stored in R2. The resulting HTML (or error) is passed to the LLM to generate next step (back to 4).

Limitations

To prevent huge bills, Cloudflare Worker is capped at 2 requests per 10 seconds (adjustable in wrangler.toml)
GPT-4o context window allows up to 128K tokens. HTML code of many pages exceeeds that
Browser Rendering session is limited to 180 seconds (can be changed in code though by adjusting KEEP_BROWSER_ALIVE_IN_SECONDS)

Using Drizzle Studio to view and edit the database

Create .env file and fill with your Cloudflare account ID, D1 Databse ID and Cloudflare D1 token with edit permissions. Use .env.template as a reference

cp .env.template .env

npx drizzle-kit studio
Head to https://local.drizzle.studio

Learn more about Drizzle Studio and its D1 configuration

Todo

nice frontend to display the results from the DB, maybe using Hono?

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
migrations		migrations
src		src
.dev.vars.template		.dev.vars.template
.env.template		.env.template
.gitignore		.gitignore
README.md		README.md
diagram.png		diagram.png
drizzle.config.ts		drizzle.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
worker-configuration.d.ts		worker-configuration.d.ts
wrangler.toml		wrangler.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Controlled Browser on Cloudflare

Prerequisites

Setup & Usage

The loop

Limitations

Using Drizzle Studio to view and edit the database

Todo

About

Releases

Packages

Languages

RafalWilinski/cloudflare-agentic-ai-browser

Folders and files

Latest commit

History

Repository files navigation

AI-Controlled Browser on Cloudflare

Prerequisites

Setup & Usage

The loop

Limitations

Using Drizzle Studio to view and edit the database

Todo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages