models compvis stable diffusion v1 4

compvis-stable-diffusion-v1-4

Overview

CompVis/stable-diffusion-v1-4 is a latent text-to-image diffusion model known for generating highly realistic images from textual input. This model incorporates a fixed, pretrained text encoder (CLIP ViT-L/14) as suggested in the Imagen paper. Stable-Diffusion-v1-4 model was fine-tuned from an earlier version, stable-diffusion-v1-2, on laion-aesthetics v2.5+ dataset. The model has various applications in research, art, education, and creative tools. However, there are strict guidelines for the model's use to prevent misuse and malicious activities. It should not be used to create harmful, offensive, or discriminatory content. Additionally, the model has limitations, such as difficulties with photorealism, rendering legible text, and generating complex compositions. The model's training data includes the LAION-2B dataset, primarily containing English descriptions, which can lead to biases and limitations in generating non-English content. To enhance safety, a Safety Checker is recommended for use with this model.

The above summary was generated using ChatGPT. Review the original-model-card to understand the data used to train the model, evaluation metrics, license, intended uses, limitations and bias before using the model.

Note: The inferencing script of this model is optimized for high-throughput, low latency using Deepspedd-mii library. Please use version 4 of this model for inferencing using default (FP32) diffusion pipeline implementation.

Inference samples

Inference type	Python sample (Notebook)	CLI with YAML
Real time	text-to-image-online-endpoint.ipynb	text-to-image-online-endpoint.sh
Batch	text-to-image-batch-endpoint.ipynb	text-to-image-batch-endpoint.sh

Inference with Azure AI Content Safety (AACS) samples

Inference type	Python sample (Notebook)
Real time	safe-text-to-image-online-deployment.ipynb
Batch	safe-text-to-image-batch-endpoint.ipynb

Sample inputs and outputs (for real-time inference)

Sample input

{
   "input_data": {
        "columns": ["prompt"],
        "data": ["a photograph of an astronaut riding a horse", "lion holding hunted deer in grass fields"],
        "index": [0, 1]
    }
}

Sample output

[
    {
        "prompt": "a photograph of an astronaut riding a horse",
        "generated_image": "image1",
        "nsfw_content_detected": False
    },
    {
        "prompt": "lion holding hunted deer in grass fields",
        "generated_image": "image2",
        "nsfw_content_detected": True
    }
]

Note:

"image1" and "image2" strings are base64 format.

If "nsfw_content_detected" is True then generated image will be totally black.

Model inference: visualization for the prompt - "a photograph of an astronaut riding a horse"

compvis_stable_diffusion_v1_4 visualization

Version: 5

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly