local-cat provides a completely local setup for CheshireCat. Local-cat leverages Local runners + Qdrant to run your preferred LLM, Embedder and VectorDB locally.
Warning
- Technical Expertise Required: Setting up and running local-cat requires some technical know-how.
- Hardware Requirements: Performance may be slow without a recent GPU or NPU.
Important
Ollama can be instable with latest models or non-common use models(like qwen, deepseek)!! If you encount inference problems, downgrade ollama image or open an issue to Ollama
- Clone the Repository:
git clone https://github.com/cheshire-cat-ai/local-cat.git
- Navigate to the Directory:
cd local-cat
- Start local-cat:
docker-compose up -d
- Pull Your Desired Model:
docker exec ollama_cat ollama pull <model_name:tag>
- Replace
<model_name:tag>
with the specific model you want to use.
- Replace
- Your Setup is Complete!
- You can now install additional plugins or start interacting with local-cat.
Ollama normally handles running the model with GPU acceleration. In order to use GPU acceleration on Mac OS it is recommended to run Ollama directly on the host machine rather than inside Docker. More info here.
Note
This is recommended until GPU acceleration is supported by Docker Desktop on MacOS.
To use local-cat with GPU acceleration on Mac:
- Install the menu bar app version of Ollama, which is the current recommended setup for MacOS users.
- Start using the following command
docker compose -f docker-compose-macos.yml up
- Configure the Ollama Base URL in the cat's LLM settings to
http://host.docker.internal:11434
.
Note: This configuration allows Docker containers to communicate with your locally running Ollama service and leverage MacOS GPU acceleration.
To use local-cat with AMD graphics that supports ROCm, use the following command:
docker compose -f docker-compose-amd.yml up