Convex Coding Evals

LLMs don't have perfect knowledge of Convex, so they require some prompting to help them along. This repo contains a set of prompts for coding a Convex backend, a set of human-curated solutions, and a script for evaluating the LLM's output.

Running the evaluations

pip install pdm
pdm install

npm install -g bun
bun install

cat "ANTHROPIC_API_KEY=<your ANTHROPIC_API_KEY>" > .env

pdm run python runner/main.py

If you'd like to grade the evaluations again without regenerating them, run:

pdm run python runner/main.py --skip-generation

You can also write out a JSON report for pretty printing:

pdm run python runner/main.py --report=/tmp/report.json
pdm run python print_report.py /tmp/report.json

Creating a new evaluation

pdm run python create_eval.py <name> <category>

For example, adding a new fundmentals eval for using HTTP actions and storage would be:

pdm run python create_eval.py http_actions_file_storage 000-fundamentals

Note that test or category names cannot contain dashes.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
evals/000-fundamentals		evals/000-fundamentals
runner		runner
.gitignore		.gitignore
.pdm-python		.pdm-python
HALLUCINATIONS.md		HALLUCINATIONS.md
README.md		README.md
bun.lockb		bun.lockb
create_eval.py		create_eval.py
eslint.config.mjs		eslint.config.mjs
package.json		package.json
pdm.lock		pdm.lock
print_report.py		print_report.py
pyproject.toml		pyproject.toml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Convex Coding Evals

Running the evaluations

Creating a new evaluation

About

Releases

Packages

Languages

get-convex/convex-evals

Folders and files

Latest commit

History

Repository files navigation

Convex Coding Evals

Running the evaluations

Creating a new evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages