You can use SmolLM2 models locally with frameworks like Transformers.js, llama.cpp, MLX and MLC.
Here you can find the code for running SmolLM locally using each of these libraries. You can also find the conversions of SmolLM & SmolLM2 in these collections: SmolLM1 and SmolLM2.
Please first install each library by following its documentation:
Below are some demos we built for running SmolLM models on-device.
- WebGPU chat demo of SmolLM2 1.7B Instruct powered by Transformers.js and ONNX Runtime Web.
- Instant SmolLM powered by MLC for real-time generations of SmolLM-360M-Instruct.
The models are also available on Ollama and PocketPal-AI.
- Github Issue Generator running locally w/ SmolLM2 & WebGPU showcases how to use SmolLM2 1.7B for structured text extraction to convert complaints to structured GitHub issues. The demo leverages MLC WebLLM and XGrammar for structured generation. You can define a JSON schema, input free text and get structured data in your browser.
- Bunny B1 mapping natural language requests to local aplication calls using function calling and structured generation by outlines.
- You can also leverage function calling (without structured generation) by following the instructions in the model card or using SmolAgent from smol-tools
- Check the rewriting and summarization tools in smol-tools using llama.cpp