This implements the AI-Maintainer agent harness for benchmarking.
If you want to rapidly benchmark or get test driven development type results on your agent this shows you how to quickly get set up.
The library it references is the AI-Maintainer Coder Evals Repository.
To see what benchmarks we have started with, check out the AI-Maintainer Benchmark.
If you want additional benchmarks, please open an issue on the benchmarks repo, anything added there will automatically be added to our list of benchmarks anyone can run.
You also can find additional documentation at docs.ai-maintainer.com
- Clone this repo
- Install the dependencies with
pip install -r requirements.txt
- Copy the .env.example file to .env and fill in the values
- Run the agent with
python main.py
- Watch the agent run and see the results in the console!
In order to run this agent you will need to have the AI Maintainer Platform running locally. You can find instructions on how to do that here.