Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 2.4 KB

README.md

File metadata and controls

31 lines (22 loc) · 2.4 KB

Local inference

You can use SmolLM2 models locally with frameworks like Transformers.js, llama.cpp, MLX and MLC.

Here you can find the code for running SmolLM locally using each of these libraries. You can also find the conversions of SmolLM & SmolLM2 in these collections: SmolLM1 and SmolLM2.

Please first install each library by following its documentation:

Demos

Below are some demos we built for running SmolLM models on-device.

In-browser chat assistants

  • WebGPU chat demo of SmolLM2 1.7B Instruct powered by Transformers.js and ONNX Runtime Web.
  • Instant SmolLM powered by MLC for real-time generations of SmolLM-360M-Instruct.

The models are also available on Ollama and PocketPal-AI.

Other use cases

Text extraction

  • Github Issue Generator running locally w/ SmolLM2 & WebGPU showcases how to use SmolLM2 1.7B for structured text extraction to convert complaints to structured GitHub issues. The demo leverages MLC WebLLM and XGrammar for structured generation. You can define a JSON schema, input free text and get structured data in your browser.

Function calling

  • Bunny B1 mapping natural language requests to local aplication calls using function calling and structured generation by outlines.
  • You can also leverage function calling (without structured generation) by following the instructions in the model card or using SmolAgent from smol-tools

Rewriting and Summarization

  • Check the rewriting and summarization tools in smol-tools using llama.cpp