SentenceTransformers Documentation

SentenceTransformers is a Python framework for state-of-the-art sentence and text embeddings. The initial work is described in our paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.

You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining.

The framework is based on PyTorch and Transformers and offers a large collection of pre-trained models tuned for various tasks. Further, it is easy to fine-tune your own models.

Installation

You can install it using pip:

pip install -U sentence-transformers

We recommand Python 3.6 or higher, and at least PyTorch 1.6.0. See installation for further installation options, especially if you want to use a GPU.

Usage

The usage is as simple as:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('paraphrase-distilroberta-base-v1')

#Our sentences we like to encode
sentences = ['This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.',
    'The quick brown fox jumps over the lazy dog.']

#Sentences are encoded by calling model.encode()
embeddings = model.encode(sentences)

#Print the embeddings
for sentence, embedding in zip(sentences, embeddings):
    print("Sentence:", sentence)
    print("Embedding:", embedding)
    print("")

Performance

Our models are evaluated extensively and achieve state-of-the-art performance on various tasks. Further, the code is tuned to provide the highest possible speed.

Model	STS benchmark	SentEval
Avg. GloVe embeddings	58.02	81.52
BERT-as-a-service avg. embeddings	46.35	84.04
BERT-as-a-service CLS-vector	16.50	84.66
InferSent - GloVe	68.03	85.59
Universal Sentence Encoder	74.92	85.10
Sentence Transformer Models
nli-bert-base	77.12	86.37
nli-bert-large	79.19	87.78
stsb-bert-base	85.14	86.07
stsb-bert-large	85.29	86.66
stsb-roberta-base	85.44	-
stsb-roberta-large	86.39	-
stsb-distilbert-base	85.16	-

Contact

Contact person: Nils Reimers, reimers@ukp.informatik.tu-darmstadt.de

https://www.ukp.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Citing & Authors

If you find this repository helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:

@inproceedings{reimers-2019-sentence-bert,
  title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
  author = "Reimers, Nils and Gurevych, Iryna",
  booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
  month = "11",
  year = "2019",
  publisher = "Association for Computational Linguistics",
  url = "https://arxiv.org/abs/1908.10084",
}

If you use one of the multilingual models, feel free to cite our publication Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation:

@inproceedings{reimers-2020-multilingual-sentence-bert,
  title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
  author = "Reimers, Nils and Gurevych, Iryna",
  booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
  month = "11",
  year = "2020",
  publisher = "Association for Computational Linguistics",
  url = "https://arxiv.org/abs/2004.09813",
}

If you use the code for data augmentation, feel free to cite our publication Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks:

@article{thakur-2020-AugSBERT,
  title = "Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks",
  author = "Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes and  Gurevych, Iryna",
  journal= "arXiv preprint arXiv:2010.08240",
  month = "10",
  year = "2020",
  url = "https://arxiv.org/abs/2010.08240",
}

.. toctree::
   :maxdepth: 2
   :caption: Overview

   docs/installation
   docs/quickstart
   docs/pretrained_models
   docs/pretrained_cross-encoders
   docs/publications

.. toctree::
   :maxdepth: 2
   :caption: Usage

   examples/applications/computing-embeddings/README
   docs/usage/semantic_textual_similarity
   examples/applications/clustering/README
   examples/applications/paraphrase-mining/README
   examples/applications/parallel-sentence-mining/README
   examples/applications/semantic-search/README
   examples/applications/information-retrieval/README
   examples/applications/cross-encoder/README

.. toctree::
   :maxdepth: 2
   :caption: Training

   docs/training/overview
   examples/training/multilingual/README
   examples/training/distillation/README
   examples/training/cross-encoder/README
   examples/training/data_augmentation/README

.. toctree::
   :maxdepth: 2
   :caption: Training Examples

   examples/training/sts/README
   examples/training/nli/README
   examples/training/quora_duplicate_questions/README
   examples/training/ms_marco/README

.. toctree::
   :maxdepth: 1
   :caption: Package Reference

   docs/package_reference/SentenceTransformer
   docs/package_reference/util
   docs/package_reference/models
   docs/package_reference/losses
   docs/package_reference/evaluation
   docs/package_reference/datasets
   docs/package_reference/cross_encoder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.rst

index.rst

SentenceTransformers Documentation

Installation

Usage

Performance

Contact

Citing & Authors

Files

index.rst

Latest commit

History

index.rst

File metadata and controls

SentenceTransformers Documentation

Installation

Usage

Performance

Contact

Citing & Authors