Skip to content

models Models documentation

github-actions[bot] edited this page Oct 14, 2023 · 36 revisions

Models

Models in this category


  • bert-base-cased

    The BERT model is a pre-trained model that has been trained on a large corpus of English language data. The model was trained using a masked language modeling (MLM) objective, meaning that the model is able to predict words that were randomly masked in an input sentence. The BERT model can also p...

  • bert-base-uncased

    BERT is a pre-trained model in the field of NLP (natural language processing) released by Google. It is an AI language model that has been trained on a large corpus of English data using a self-supervised method, learning to predict masked words in a sentence and to predict if two sentences are c...

  • bert-large-cased

    BERT is a pre-trained language model created by the Hugging Face team that uses masked language modeling (MLM) on a large corpus of English data. Its primary uses are for sequence classification and question answering, and it is not intended for text generation. It is important to note that this ...

  • bert-large-uncased

    BERT is a pre-trained language model created by the Hugging Face team that uses masked language modeling (MLM) on a large corpus of English data. Its primary uses are for sequence classification and question answering, and it is not intended for text generation. It is important to note that this ...

  • camembert-base

    CamemBERT is a state-of-the-art language model for French developed by a team of researchers. It is based on the RoBERTa model and is available in 6 different versions on Hugging Face. It can be used for fill-in-the-blank tasks. However, it has been pretrained on a subcorpus of OSCAR which may co...

  • compvis-stable-diffusion-v1-4

    CompVis/stable-diffusion-v1-4 is a latent text-to-image diffusion model known for generating highly realistic images from textual input. This model incorporates a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Im...

  • deepset-minilm-uncased-squad2

    The MiniLM-L12-H384-uncased model is a microsoft language model for extractive question answering in English. It was trained on the SQuAD 2.0 dataset and has been evaluated on the SQuAD 2.0 dev set with the official eval script. The model's performance results were an exact match of 76.13 and F1 ...

  • deepset-roberta-base-squad2

    Roberta-base is a fine-tuned language model for extractive Question Answering in English, trained on the SQuAD2.0 dataset. It is based on the "roberta-base" model, developed by deepset and can be used with Haystack and Transformers. The model requires 4 Tesla v100s and has a batch size of 96, 2 e...

  • deformable_detr_twostage_refine_r50_16x2_50e_coco

    deformable_detr_twostage_refine_r50_16x2_50e_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6...

  • distilbert-base-cased

    The DistilBERT model is a smaller, faster version of the BERT model for Transformer-based language modeling with 40% fewer parameters and 60% faster run time while retaining 95% of BERT's performance on the GLUE language understanding benchmark. This English language question answering model has ...

  • distilbert-base-cased-distilled-squad

    The DistilBERT model is a distilled, smaller, faster, and cheaper version of the BERT model for Transformer-based language model. It is specifically trained for question answering in English and has been fine-tuned using knowledge distillation on SQuAD v1.1. It has 40% less parameters than bert-b...

  • distilbert-base-uncased

    The DistilBERT base model (uncased) is a distilled version of the BERT base model that is smaller and faster than BERT. It was introduced in a specific paper and the code for creating the model can be found on a specific webpage. The model is uncased so it doesn't differentiate between lower and ...

  • distilbert-base-uncased-distilled-squad

    The DistilBERT model is a distilled version of the BERT language model with 40% fewer parameters, 60% faster run time, but with 95% of BERT's performance. It is trained for question answering and has a F1 score of 87.1 on SQuAD V1.1. The model is licensed under the Apache 2.0 license and is devel...

  • distilbert-base-uncased-finetuned-sst-2-english

    This is a fine-tuned version of DistilBERT-base-uncased, trained on SST-2, which reached 91.3 % accuracy on the dev set. Developed by Hugging Face, it's mainly intended to be used for topic classification and can be fine-tuned on downstream tasks, but it's important to keep in mind that it has ce...

  • distilgpt2

    DistilGPT2 is a distilled version of GPT-2, which is a transformer-based language model with 124 million parameters and an English language license. It is intended to be used for similar uses with the increased functionality of being smaller and easier to run than the base model. DistilGPT2 was t...

  • distilroberta-base

    DistilRoBERTa base is a distilled version of the RoBERTa-base model, with 6 layers, 768 dimension, and 12 heads, and 82M parameters, it is faster than RoBERTa-base. The model is primarily intended for fine-tuning on whole sentence-based tasks such as sequence classification, token classification,...

  • facebook-bart-large-cnn

    The BART model is a transformer encoder-encoder model trained on English language data, and fine-tuned on CNN Daily Mail. It is used for text summarization and has been trained to reconstruct text that has been corrupted using an arbitrary noising function. The model is effective for text generat...

  • facebook-deit-base-patch16-224

    This model is a more efficiently trained Vision Transformer (ViT). The Vision Transformer (ViT) is a transformer encoder model that is pre-trained and fine-tuned on a large collection of images in a supervised fashion. It is presented with images as sequences of fixed-size patches, which are line...

  • finiteautomata-bertweet-base-sentiment-analysis

    The pysentimiento library is an open-source tool for non-commercial use and scientific research purposes, used for Sentiment Analysis and Social NLP tasks. It was trained on about 40k tweets from the SemEval 2017 corpus, using the BERTweet - a RoBERTa model trained on English tweets and processes...

  • google-vit-base-patch16-224

    The Vision Transformer (ViT) is a BERT-like transformer encoder model which is pretrained on a large collection of images in a supervised fashion, such as ImageNet-21k. The ImageNet dataset comprises 1 million images and 1000 classes at a resolution of 224x224, which the model was fine-tuned on. ...

  • gpt2

    GPT-2 is a transformer-based language model intended for AI researchers and practitioners. It was trained on unfiltered content from Reddit and may have biases. It is best used for text generation, but the training data has not been publicly released. It has several limitations and should be used...

  • gpt2-large

    The OpenAI GPT-2 is a language model that is intended to be used primarily by AI researchers and practitioners. It is capable of performing various uses, including writing assistance and creative writing, but is not recommended to be deployed in human interaction systems without a thorough study ...

  • gpt2-medium

    The GPT-2 Transformer-based language model is designed primarily for use by AI researchers and practitioners. The intended uses of the language model include understanding the behavior, capability, biases, and constraints of large-scale generative language models. Secondary use cases of the langu...

  • Jean-Baptiste-camembert-ner

    Summary: camembert-ner is a NER model fine-tuned from camemBERT on the Wikiner-fr dataset and was validated on email/chat data. It shows better performance on entities that do not start with an uppercase. The model has four classes: O, MISC, PER, ORG and LOC. The model can be loaded using Hugging...

  • mask_rcnn_swin-t-p4-w7_fpn_1x_coco

    This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual ...

  • microsoft-beit-base-patch16-224-pt22k-ft22k

    The BEiT is a vision transformer that is similar to the BERT model, but is also capable of image analysis. The model is pre-trained on a large collection of images, and uses patches to analyze images. It uses relative position embeddings and mean-pooling to classify images, and can be used to ext...

  • microsoft-deberta-base

    DeBERTa is a version of the BERT model that has been improved through the use of disentangled attention and enhanced mask decoders. It outperforms BERT and RoBERTa on a majority of NLU tasks using 80GB of training data. It has been fine-tuned on NLU tasks and has achieved dev results on SQuAD 1.1...

  • microsoft-deberta-base-mnli

    DeBERTa is a version of the BERT model that has been improved through the use of disentangled attention and enhanced mask decoders. Compared to BERT and RoBERTa, it outperforms them on a majority of NLU tasks using 80GB of training data. It has been fine-tuned for NLU tasks and has achieved dev r...

  • microsoft-deberta-large

    Decoding-enhanced BERT with Disentangled Attention is that it is an improvement of the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With 80GB training data, it outperforms the BERT and RoBERTa models in many Natural Language Understanding (NLU) tasks. Key result...

  • microsoft-deberta-large-mnli

    DeBERTa is an improvement of BERT and RoBERTa using disentangled attention and enhanced mask decoder. With 80GB training data, it outperforms BERT and RoBERTa on the majority of NLU tasks. The fine-tuned DeBERTa with MNLI task results in the best performance on SQuAD 1.1/2.0 and GLUE benchmark ta...

  • microsoft-deberta-xlarge

    DeBERTa is a model that improves on the BERT and RoBERTa models by using disentangled attention and an enhanced mask decoder. It performance better on several NLU tasks than RoBERTa with 80GB training data. The DeBERTa XLarge model has 48 layers and a hidden size of 1024 with 750 million paramete...

  • microsoft-phi-1-5

    Microsoft Phi-1.5

Phi-1.5 is a Transformer-based language model with 1.3 billion parameters. It was trained on a combination of data sources, including an additional source of NLP synthetic texts. Phi-1.5 performs exceptionally well on benchmarks testing common sense, language understandin...

  • microsoft-swinv2-base-patch4-window12-192-22k

    The Swin Transformer is a type of Vision Transformer used in both image classification and dense recognition tasks. It builds hierarchical feature maps by merging image patches in deeper layers and has linear computation complexity to input image size due to computation of self-attention only wit...

  • OpenAI-CLIP-Image-Text-Embeddings-vit-base-patch32

    The CLIP model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-B/32 Transformer architecture as an image...

  • openai-clip-vit-base-patch32

    The CLIP model was developed by OpenAI researchers to learn about what contributes to robustness in computer vision tasks and to test the ability of models to generalize to arbitrary image classification tasks in a zero-shot manner. The model uses a ViT-B/32 Transformer architecture as an image...

  • openai-whisper-large

    Whisper is an OpenAI pre-trained speech recognition model with potential applications for ASR solutions for developers. However, due to weak supervision and large-scale noisy data, it should be used with caution in high-risk domains. The model has been trained on 680k hours of audio data represen...

  • roberta-base

    RoBERTa is a transformer-based language model that was fine-tuned from RoBERTa large model on Multi-Genre Natural Language Inference (MNLI) corpus for English. It can be used for zero-shot classification tasks and can be accessed from GitHub Repo. It is important to note that the model was traine...

  • roberta-base-openai-detector

    RoBERTa Base OpenAI Detector is a language model developed by OpenAI that is fine-tuned using outputs from the 1.5B GPT-2 model. It is designed to detect text generated by GPT-2 and is not meant to be used for malicious purposes or to evade detection. The main focus of the model is to aid in synt...

  • roberta-large

    The RoBERTa Large model is a pretrained language model developed by the Hugging Face team, based on the transformer architecture. It was trained on a large corpus of English data in a self-supervised manner using the masked language modeling (MLM) objective. The model is case-sensitive and primar...

  • roberta-large-mnli

    Roberta-large-MNLI is a fine-tuned version of the RoBERTa large model on the Multi-Genre Natural Language Inference (MNLI) corpus. It is a transformer-based language model for English. The model is developed on GitHub Repo by some developers, also licensed under MIT. The fine-tuned model can be u...

  • roberta-large-openai-detector

    RoBERTa Large OpenAI Detector is a fine-tuned transformer-based language model developed by OpenAI to detect text generated by GPT-2 models. The model has an accuracy of approximately 95% for detecting 1.5B GPT-2-generated text, but the developers note that accuracy may decrease as model sizes in...

  • runwayml-stable-diffusion-inpainting

    runwayml/stable-diffusion-inpainting is a versatile text-to-image model capable of producing realistic images from text input and performing inpainting using masks. It was initialized with Stable-Diffusion-v-1-2 weights and underwent two training phases: 595k steps of regular training and 4...

  • runwayml-stable-diffusion-v1-5

    runwayml/stable-diffusion-v1-5 is a powerful text-to-image latent diffusion model capable of generating photo-realistic images given any text input. The model uses a fixed pretrained text encoder (CLIP ViT-L/14) as suggested in the ...

  • sparse_rcnn_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco

    sparse_rcnn_r101_fpn_300_proposals_crop_mstrain_480-800_3x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d078...

  • sparse_rcnn_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco

    sparse_rcnn_r50_fpn_300_proposals_crop_mstrain_480-800_3x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787...

  • sshleifer-distilbart-cnn-12-6

    The RoBERTa Large model is a large transformer-based language model that was developed by the Hugging Face team. It is pre-trained on masked language modeling and can be used for tasks such as sequence classification, token classification, or question answering. Its primary usage is as a fine-tun...

  • stabilityai-stable-diffusion-2-1

    stabilityai/stable-diffusion-2-1 model is a fine-tuned version of the Stable Diffusion v2 model, with additional training steps on the same dataset. It's designed for generating and modifying images based on text prompts, utilizing a Latent Diffusion Model with a fixed, pretrained text encode...

  • stabilityai-stable-diffusion-2-inpainting

    stabilityai/stable-diffusion-2-inpainting model is a continuation of the stable-diffusion-2-base model, with an additional 200,000 steps of training. It utilizes a mask-generation strategy introduced in LAMA and combines this with latent Variational Autoencoder (VAE) representations of the ...

  • t5-base

    T5 Base is a text-to-text transformer model that can be used for a variety of NLP tasks, such as machine translation, document summarization, question answering and classification tasks, such as sentiment analysis. It was developed by a team at Google and is pre-trained on the Colossal Clean Craw...

  • t5-large

    The T5-Large is a text-to-text transfer transformer (T5) model with 770 million parameters. It has been developed by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. The T5 model is a language model that is pre-trained on a ...

  • t5-small

    T5 Small is a text-to-text transformer model with 60 million parameters. It is developed by a group of researchers and is based on the Text-To-Text Transfer Transformer (T5) framework, which allows for a unified text-to-text format for input and output of all NLP tasks. T5-Small can be trained fo...

  • vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco

    vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6061f92f...

  • vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco

    vfnet_x101_64x4d_fpn_mdconv_c3-c5_mstrain_2x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6...

  • yolof_r50_c5_8x8_1x_coco

    yolof_r50_c5_8x8_1x_coco model is from OpenMMLab's MMDetection library. This model is reported to obtain <a href="https://github.com/open-mmlab/mmdetection/blob/e9cae2d0787cd5c2fc6165a6061f92fa09e48fb1/configs/...

Clone this wiki locally