Skip to content

Control Technical

Vladimir Mandic edited this page Jan 6, 2025 · 1 revision

Control Overview

screenshot-processors

Native control module for SD.Next for Diffusers backend
Can be used for Control generation as well as Image and Text workflows

For a guide on the options and settings, as well as explanations for the controls themselves, see the Control Guide page.

Supported Control Models

  • lllyasviel ControlNet for SD 1.5 and SD-XL models
    Includes ControlNets as well as Reference-only mode and any compatible 3rd party models
    Original ControlNets for SD15 are 1.4GB each and for SDXL its at massive 4.9GB
  • VisLearn ControlNet XS for SD-XL models
    Lightweight ControlNet models for SDXL at 165MB only with near-identical results
  • TencentARC T2I-Adapter for SD 1.5 and SD-XL models
    T2I-Adapters provide similar functionality at much lower resource cost at only 300MB each
  • Kohya Control LLite for SD-XL models
    LLLite models for SDXL at 46MB only provide lightweight image control
  • TenecentAILab IP-Adapter for SD 1.5 and SD-XL models
    IP-Adapters provides great style transfer functionality at much lower resource cost at below 100MB for SD15 and 700MB for SDXL
    IP-Adapters can be combined with ControlNet for more stable results, especially when doing batch/video processing
  • CiaraRowles TemporalNet for SD 1.5 models
    ControlNet model designed to enhance temporal consistency and reduce flickering for batch/video processing

All built-in models are downloaded upon first use and stored stored in:
/models/controlnet, /models/adapter, /models/xs, /models/lite, /models/processor

Listed below are all models that are supported out-of-the-box:

ControlNet

  • SD15:
    Canny, Depth, IP2P, LineArt, LineArt Anime, MLDS, NormalBae, OpenPose,
    Scribble, Segment, Shuffle, SoftEdge, TemporalNet, HED, Tile
  • SDXL:
    Canny Small XL, Canny Mid XL, Canny XL, Depth Zoe XL, Depth Mid XL

Note: only models compatible with currently loaded base model are listed
Additional ControlNet models in safetensors can be downloaded manually and placed into corresponding folder: /models/control/controlnet

ControlNet XS

  • SDXL:
    Canny, Depth

ControlNet LLLite

  • SDXL:
    Canny, Canny anime, Depth anime, Blur anime, Pose anime, Replicate anime

Note: control-lllite is implemented using unofficial implementation and its considered experimental
Additional ControlNet models in safetensors can be downloaded manually and placed into corresponding folder: /models/control/lite

T2I-Adapter

'Segment': 'TencentARC/t2iadapter_seg_sd14v1',
'Zoe Depth': 'TencentARC/t2iadapter_zoedepth_sd15v1',
'OpenPose': 'TencentARC/t2iadapter_openpose_sd14v1',
'KeyPose': 'TencentARC/t2iadapter_keypose_sd14v1',
'Color': 'TencentARC/t2iadapter_color_sd14v1',
'Depth v1': 'TencentARC/t2iadapter_depth_sd14v1',
'Depth v2': 'TencentARC/t2iadapter_depth_sd15v2',
'Canny v1': 'TencentARC/t2iadapter_canny_sd14v1',
'Canny v2': 'TencentARC/t2iadapter_canny_sd15v2',
'Sketch v1': 'TencentARC/t2iadapter_sketch_sd14v1',
'Sketch v2': 'TencentARC/t2iadapter_sketch_sd15v2',
  • SD15:
    Segment, Zoe Depth, OpenPose, KeyPose, Color, Depth v1, Depth v2, Canny v1, Canny v2, Sketch v1, Sketch v2
  • SDXL:
    Canny XL, Depth Zoe XL, Depth Midas XL, LineArt XL, OpenPose XL, Sketch XL

Note: Only models compatible with currently loaded base model are listed

Processors

  • Pose style: OpenPose, DWPose, MediaPipe Face
  • Outline style: Canny, Edge, LineArt Realistic, LineArt Anime, HED, PidiNet
  • Depth style: Midas Depth Hybrid, Zoe Depth, Leres Depth, Normal Bae
  • Segmentation style: SegmentAnything
  • Other: MLSD, Shuffle

Note: Processor sizes can vary from none for built-in ones to anywhere between 200MB up to 4.2GB for ZoeDepth-Large

Segmentation Models

There are 8 Auto-segmentation models available:

  • Facebook SAM ViT Base (357MB)
  • Facebook SAM ViT Large (1.16GB)
  • Facebook SAM ViT Huge (2.56GB)
  • SlimSAM Uniform (106MB)
  • SlimSAM Uniform Tiny (37MB)
  • Rembg Silueta
  • Rembg U2Net
  • Rembg ISNet

Reference

Reference mode is its own pipeline, so it cannot have multiple units or processors

Workflows

Inputs & Outputs

  • Image -> Image
  • Batch: list of images -> Gallery and/or Video
  • Folder: folder with images -> Gallery and/or Video
  • Video -> Gallery and/or Video

Notes:

  • Input/Output/Preview panels can be minimized by clicking on them
  • For video output, make sure to set video options

Unit

  • Unit is: input plus process plus control
  • Pipeline consists of any number of configured units
    If unit is using using control modules, all control modules inside pipeline must be of same type
    e.g. ControlNet, ControlNet-XS, T2I-Adapter or Reference
  • Each unit can use primary input or its own override input
  • Each unit can have no processor in which case it will run control on input directly
    Use when you're using predefined input templates
  • Unit can have no control in which case it will run processor only
  • Any combination of input, processor and control is possible
    For example, two enabled units with process only will produce compound processed image but without control

What-if?

  • If no input is provided then pipeline will run in txt2img mode
    Can be freely used instead of standard txt2img
  • If none of units have control or adapter, pipeline will run in img2img mode using input image
    Can be freely used instead of standard img2img
  • If you have processor enabled, but no controlnet or adapter loaded,
    pipeline will run in img2img mode using processed input
  • If you have multiple processors enabled, but no controlnet or adapter loaded,
    pipeline will run in img2img mode on blended processed image
  • Output resolution is by default set to input resolution,
    Use resize settings to force any resolution
  • Resize operation can run before (on input image) or after processing (on output image)
  • Using video input will run pipeline on each frame unless skip frames is set
    Video output is standard list of images (gallery) and can be optionally encoded into a video file
    Video file can be interpolated using RIFE for smoother playback

Overrides

  • Control can be based on main input or each individual unit can have its own override input
  • By default, control runs in default control+txt2img mode
  • If init image is provided, it runs in control+img2img mode
    Init image can be same as control image or separate
  • IP adapter can be applied to any workflow
  • IP adapter can use same input as control input or separate

Inpaint

  • Inpaint workflow is triggered when input image is provided in inpaint mode
  • Inpaint mode can be used with image-to-image or controlnet workflows
  • Other unit types such as T2I, XS or Lite do not support inpaint mode

Outpaint

  • Outpaint workflow is triggered when input image is provided in outpaint mode
  • Outpaint mode can be used with image-to-image or controlnet workflows
  • Other unit types such as T2I, XS or Lite do not support outpaint mode
  • Recommendation is to increase denoising strength to at least 0.8 since outpained area is blank and needs to be filled with noise
  • Outpaint folloing input image can be controled by overlap setting - higher overlap and more of original image will be part of the outpaint process

Logging

To enable extra logging for troubleshooting purposes,
set environment variables before running SD.Next

  • Linux:

    export SD_CONTROL_DEBUG=true
    export SD_PROCESS_DEBUG=true
    ./webui.sh --debug

  • Windows:

    set SD_CONTROL_DEBUG=true
    set SD_PROCESS_DEBUG=true
    webui.bat --debug

Note: Starting with debug info enabled also enables Test mode in Control module

Limitations / TODO

Known issues

  • Issues after trying to use DWPose and installation fails: `` error.
    Example error message:

    Control processor DWPose: DLL load failed while importing _ext

    Workaround: Activate venv and run following commands to install dwpose dependencies manually:
    pip install --upgrade --no-deps --force-reinstall openmim==0.3.9 mmengine==0.10.4 mmcv==2.1.0 mmpose==1.3.1 mmdet==3.3.0

Clone this wiki locally