Local inference

You can use SmolLM2 models locally with frameworks like Transformers.js, llama.cpp, MLX and MLC.

Here you can find the code for running SmolLM locally using each of these libraries. You can also find the conversions of SmolLM & SmolLM2 in these collections: SmolLM1 and SmolLM2.

Please first install each library by following its documentation:

Transformers.js
llama-cpp-python
MLX
MLC

Demos

Below are some demos we built for running SmolLM models on-device.

In-browser chat assistants

WebGPU chat demo of SmolLM2 1.7B Instruct powered by Transformers.js and ONNX Runtime Web.
Instant SmolLM powered by MLC for real-time generations of SmolLM-360M-Instruct.

The models are also available on Ollama and PocketPal-AI.

Other use cases

Text extraction

Github Issue Generator running locally w/ SmolLM2 & WebGPU showcases how to use SmolLM2 1.7B for structured text extraction to convert complaints to structured GitHub issues. The demo leverages MLC WebLLM and XGrammar for structured generation. You can define a JSON schema, input free text and get structured data in your browser.

Function calling

Bunny B1 mapping natural language requests to local aplication calls using function calling and structured generation by outlines.
You can also leverage function calling (without structured generation) by following the instructions in the model card or using SmolAgent from smol-tools

Rewriting and Summarization

Check the rewriting and summarization tools in smol-tools using llama.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Local inference

Demos

In-browser chat assistants

Other use cases

Text extraction

Function calling

Rewriting and Summarization

Files

README.md

Latest commit

History

README.md

File metadata and controls

Local inference

Demos

In-browser chat assistants

Other use cases

Text extraction

Function calling

Rewriting and Summarization