Add support for LTX-Video model in ImageToVideo Pipeline #394

RUFFY-369 · 2025-01-08T11:25:59Z

What does this PR do?

This PR adds support for the LTX-Video model in the ImageToVideoPipeline. It introduces a mechanism to dynamically load either the LTXImageToVideoPipeline or StableVideoDiffusionPipeline, based on the provided modelID. This change enhances the flexibility of the ImageToVideoPipeline, allowing it to handle both LTX-Video and StableVideoDiffusion models and with addition of LTXImageToVideoPipeline, the video generation can now be prompt text guided.

After building the Docker image, the setup has been tested by running it locally on a Uvicorn server.

cc @rickstaa

ad-astra-video · 2025-01-10T05:49:19Z

runner/app/pipelines/image_to_video.py

+                self.ldm = StableVideoDiffusionPipeline.from_pretrained(model_id, **kwargs)
+        except Exception as loading_error:
+            logger.error("Failed to load %s : %s." %(self.pipeline_name,loading_error))
+            # Trying to load the LTXImageToVideoPipeline if the StableVideoDiffusionPipeline fails to load and there is a chance that model name doesn't match the if condition for LTX-Video


I think we should not be retrying the LTXImageToVideoPipeline load again here.

Would switching to use DiffusionPipeline make the loading generic? We are only passing model_id and kwargs so seems like most of it is setup to use generic loading if its possible

@ad-astra-video I added the extra try except codeblock inside the primary except block for the case when user tries to load a ltx-video model from model hub which they have created new in their own HF repo but the naming standards aren't matched. All the models for LTX-Video on the model hub passes the condition for LTXImageToVideoPipeline and it is a good practice to name models based on their name so our code's if condition should be sufficient enough to deal with LTX-video model loading.

Regarding your second question, DiffusionPipeline is the base class for most to all of the pipelines in diffusers library. This can be used for generic downloading, loading and inference but using it blocks the ability to utilise pipeline specific features. For ex for LTX-Video there are two pipeline classes, ie., LTXImageToVideoPipeline (I2V) and LTXPipeline (T2V) and when we use generic pipeline class like DiffusionPipeline to load pipeline with model_id for e.g. Lightricks/LTX-Video so what DiffusionPipeline does is that it accesses the model_index.json from the model's folder and loads the config to retrieve the pipeline class for that specific model. The pipeline class is in model_index.json file in form of this KV pair: "_class_name": "LTXPipeline",. So, basically if we have to use the LTX-Video modelID for our I2V pipeline then either we have to create a separate model repo replicating it and modifying the _class_name or we have to use the specific pipeline class which is LTXImageToVideoPipeline. And this is just not for this specific model but for other tasks too where specific pipeline is needed to be mentioned.

Yeah if this PR was for the T2V pipeline. then for sure I would have used DiffusionPipeline class to keep everything generic.

PS I plan to use DiffusionPipeline class for a generic Diffusers pipeline in our offerings as I discussed with Rick.

cc @rickstaa

Thank you for the background. I was hoping we could use DiffusionPipeline to have a simple upgrade path by updating diffusers only to get new models. In this scenario when a model supports image and text input it gets more complicated.

Could we default to using DiffusionPipeline and for these models that do not specifiy the specific image to video pipelines we can update the pipeline with from_pipe to be the correct pipeline?

Oh yeah from_pipe can make DiffusionPipeline work for specific cases like that of
LTXImageToVideoPipeline. If we have to keep as DiffusionPipeline default then let me make changes accordingly.
Update:pushed the necessary commit and thanks for the from_pipe suggestion

ad-astra-video · 2025-01-10T06:06:03Z

runner/app/pipelines/image_to_video.py

@@ -113,6 +136,14 @@ def __call__(
        seed = kwargs.pop("seed", None)
        safety_check = kwargs.pop("safety_check", True)

+        if self.pipeline_name == "LTXImageToVideoPipeline":


Can we do something more generic that looks at the pipeline class args and del the kwargs keys if not present in the pipeline? If a heavy lift we can do in separate PR.

Good point would love to make all pipelines more generic. We can do this in seperate PR if to heavy.

@ad-astra-video That's a nice mention. Lets remove this stone age hard coded block and get something more generic.
Pushed the changes in the latest commit in this PR itself. 👍 Now it can work even when we make the whole pipeline generic based on task like i2i generic pipeline.

ad-astra-video · 2025-01-10T06:08:10Z

runner/app/pipelines/image_to_video.py

+                logger.error("Failed to load both LTXImageToVideoPipeline and StableVideoDiffusionPipeline: %s. Please ensure the model ID is compatible.", loading_error)
+                raise loading_error
+
+
        self.ldm.to(get_torch_device())

        sfast_enabled = os.getenv("SFAST", "").strip().lower() == "true"


Need to confirm if SFAST works on LTXImageToVideoPipeline. If it does not, only use it if StableVideoDiffusionPipeline is loaded.

SFAST doesn't work on LTXImageToVideoPipeline because it doesn't have unet attribute so can't compile the model. I did a test separately just for the pipeline on colab. And the same goes for DeepCacheSDHelper with the same reason.
Also, made the commit to handle the case. 👍

worker/multipart.go

ad-astra-video

PR looks good, have couple comments below and left suggestions/questions in the files:

LTXVideo requires diffusers 0.32.0 right? Suggest we update to diffusers 0.32.1 with this PR.
If fix the multipart.go suggested change I can build and test tomorrow

Co-authored-by: Brad | ad-astra <[email protected]>

…s not supported

RUFFY-369 · 2025-01-10T20:49:50Z

PR looks good, have couple comments below and left suggestions/questions in the files:

LTXVideo requires diffusers 0.32.0 right? Suggest we update to diffusers 0.32.1 with this PR.

If fix the multipart.go suggested change I can build and test tomorrow

Yeah LTX-Video is released in 0.32.0 version. I updated the diffusers version in the latest commits as in the end we need to upgrade diffusers.
I commited your suggestion already. 👍

Update: addressed all the suggested changes in the recent commits

feat:initial implementation to add support for LTX-Video model

4d2275a

RUFFY-369 requested a review from rickstaa January 8, 2025 11:26

RUFFY-369 marked this pull request as draft January 8, 2025 11:26

RUFFY-369 added 2 commits January 8, 2025 18:31

chore:add extra needed inputs for LTX-Video model

af765de

chore:make codegen

6c7f7f8

RUFFY-369 marked this pull request as ready for review January 8, 2025 13:03

ad-astra-video reviewed Jan 10, 2025

View reviewed changes

worker/multipart.go Outdated Show resolved Hide resolved

ad-astra-video reviewed Jan 10, 2025

View reviewed changes

RUFFY-369 and others added 5 commits January 10, 2025 19:32

chore: apply suggested changes

1433e06

Co-authored-by: Brad | ad-astra <[email protected]>

chore:add suggested changes for kwargs deletion

24ea92b

style:add line at EOF

a76bdbc

chore:disable deepcache and sfast for LTXImageToVideoPipeline as it i…

4a2b81d

…s not supported

chore:upgrade diffusers requirement

2651440

chore:suggested changes to make pipeline more generic

86e205e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for LTX-Video model in ImageToVideo Pipeline #394

Add support for LTX-Video model in ImageToVideo Pipeline #394

RUFFY-369 commented Jan 8, 2025 •

edited

Loading

ad-astra-video Jan 10, 2025 •

edited

Loading

RUFFY-369 Jan 10, 2025 •

edited

Loading

ad-astra-video Jan 10, 2025

RUFFY-369 Jan 10, 2025 •

edited

Loading

ad-astra-video Jan 10, 2025

rickstaa Jan 10, 2025 •

edited

Loading

RUFFY-369 Jan 10, 2025 •

edited

Loading

ad-astra-video Jan 10, 2025

RUFFY-369 Jan 10, 2025 •

edited

Loading

ad-astra-video left a comment

RUFFY-369 commented Jan 10, 2025 •

edited

Loading

Add support for LTX-Video model in ImageToVideo Pipeline #394

Are you sure you want to change the base?

Add support for LTX-Video model in ImageToVideo Pipeline #394

Conversation

RUFFY-369 commented Jan 8, 2025 • edited Loading

What does this PR do?

ad-astra-video Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

RUFFY-369 Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

ad-astra-video Jan 10, 2025

Choose a reason for hiding this comment

RUFFY-369 Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

ad-astra-video Jan 10, 2025

Choose a reason for hiding this comment

rickstaa Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

RUFFY-369 Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

ad-astra-video Jan 10, 2025

Choose a reason for hiding this comment

RUFFY-369 Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

ad-astra-video left a comment

Choose a reason for hiding this comment

RUFFY-369 commented Jan 10, 2025 • edited Loading

RUFFY-369 commented Jan 8, 2025 •

edited

Loading

ad-astra-video Jan 10, 2025 •

edited

Loading

RUFFY-369 Jan 10, 2025 •

edited

Loading

RUFFY-369 Jan 10, 2025 •

edited

Loading

rickstaa Jan 10, 2025 •

edited

Loading

RUFFY-369 Jan 10, 2025 •

edited

Loading

RUFFY-369 Jan 10, 2025 •

edited

Loading

RUFFY-369 commented Jan 10, 2025 •

edited

Loading