Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for LTX-Video model in ImageToVideo Pipeline #394

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

RUFFY-369
Copy link
Collaborator

@RUFFY-369 RUFFY-369 commented Jan 8, 2025

What does this PR do?

This PR adds support for the LTX-Video model in the ImageToVideoPipeline. It introduces a mechanism to dynamically load either the LTXImageToVideoPipeline or StableVideoDiffusionPipeline, based on the provided modelID. This change enhances the flexibility of the ImageToVideoPipeline, allowing it to handle both LTX-Video and StableVideoDiffusion models and with addition of LTXImageToVideoPipeline, the video generation can now be prompt text guided.

After building the Docker image, the setup has been tested by running it locally on a Uvicorn server.

cc @rickstaa

@RUFFY-369 RUFFY-369 requested a review from rickstaa January 8, 2025 11:26
@RUFFY-369 RUFFY-369 marked this pull request as draft January 8, 2025 11:26
@RUFFY-369 RUFFY-369 marked this pull request as ready for review January 8, 2025 13:03
self.ldm = StableVideoDiffusionPipeline.from_pretrained(model_id, **kwargs)
except Exception as loading_error:
logger.error("Failed to load %s : %s." %(self.pipeline_name,loading_error))
# Trying to load the LTXImageToVideoPipeline if the StableVideoDiffusionPipeline fails to load and there is a chance that model name doesn't match the if condition for LTX-Video
Copy link
Collaborator

@ad-astra-video ad-astra-video Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not be retrying the LTXImageToVideoPipeline load again here.

Would switching to use DiffusionPipeline make the loading generic? We are only passing model_id and kwargs so seems like most of it is setup to use generic loading if its possible

Copy link
Collaborator Author

@RUFFY-369 RUFFY-369 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ad-astra-video I added the extra try except codeblock inside the primary except block for the case when user tries to load a ltx-video model from model hub which they have created new in their own HF repo but the naming standards aren't matched. All the models for LTX-Video on the model hub passes the condition for LTXImageToVideoPipeline and it is a good practice to name models based on their name so our code's if condition should be sufficient enough to deal with LTX-video model loading.

Regarding your second question, DiffusionPipeline is the base class for most to all of the pipelines in diffusers library. This can be used for generic downloading, loading and inference but using it blocks the ability to utilise pipeline specific features. For ex for LTX-Video there are two pipeline classes, ie., LTXImageToVideoPipeline (I2V) and LTXPipeline (T2V) and when we use generic pipeline class like DiffusionPipeline to load pipeline with model_id for e.g. Lightricks/LTX-Video so what DiffusionPipeline does is that it accesses the model_index.json from the model's folder and loads the config to retrieve the pipeline class for that specific model. The pipeline class is in model_index.json file in form of this KV pair: "_class_name": "LTXPipeline",. So, basically if we have to use the LTX-Video modelID for our I2V pipeline then either we have to create a separate model repo replicating it and modifying the _class_name or we have to use the specific pipeline class which is LTXImageToVideoPipeline. And this is just not for this specific model but for other tasks too where specific pipeline is needed to be mentioned.

Yeah if this PR was for the T2V pipeline. then for sure I would have used DiffusionPipeline class to keep everything generic.

PS I plan to use DiffusionPipeline class for a generic Diffusers pipeline in our offerings as I discussed with Rick.

cc @rickstaa

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the background. I was hoping we could use DiffusionPipeline to have a simple upgrade path by updating diffusers only to get new models. In this scenario when a model supports image and text input it gets more complicated.

Could we default to using DiffusionPipeline and for these models that do not specifiy the specific image to video pipelines we can update the pipeline with from_pipe to be the correct pipeline?

Copy link
Collaborator Author

@RUFFY-369 RUFFY-369 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah from_pipe can make DiffusionPipeline work for specific cases like that of
LTXImageToVideoPipeline. If we have to keep as DiffusionPipeline default then let me make changes accordingly.
Update:pushed the necessary commit and thanks for the from_pipe suggestion

@@ -113,6 +136,14 @@ def __call__(
seed = kwargs.pop("seed", None)
safety_check = kwargs.pop("safety_check", True)

if self.pipeline_name == "LTXImageToVideoPipeline":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something more generic that looks at the pipeline class args and del the kwargs keys if not present in the pipeline? If a heavy lift we can do in separate PR.

Copy link
Member

@rickstaa rickstaa Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point would love to make all pipelines more generic. We can do this in seperate PR if to heavy.

Copy link
Collaborator Author

@RUFFY-369 RUFFY-369 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ad-astra-video That's a nice mention. Lets remove this stone age hard coded block and get something more generic.
Pushed the changes in the latest commit in this PR itself. 👍 Now it can work even when we make the whole pipeline generic based on task like i2i generic pipeline.

logger.error("Failed to load both LTXImageToVideoPipeline and StableVideoDiffusionPipeline: %s. Please ensure the model ID is compatible.", loading_error)
raise loading_error


self.ldm.to(get_torch_device())

sfast_enabled = os.getenv("SFAST", "").strip().lower() == "true"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to confirm if SFAST works on LTXImageToVideoPipeline. If it does not, only use it if StableVideoDiffusionPipeline is loaded.

Copy link
Collaborator Author

@RUFFY-369 RUFFY-369 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SFAST doesn't work on LTXImageToVideoPipeline because it doesn't have unet attribute so can't compile the model. I did a test separately just for the pipeline on colab. And the same goes for DeepCacheSDHelper with the same reason.
Also, made the commit to handle the case. 👍

worker/multipart.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@ad-astra-video ad-astra-video left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good, have couple comments below and left suggestions/questions in the files:

  • LTXVideo requires diffusers 0.32.0 right? Suggest we update to diffusers 0.32.1 with this PR.
  • If fix the multipart.go suggested change I can build and test tomorrow

@RUFFY-369
Copy link
Collaborator Author

RUFFY-369 commented Jan 10, 2025

PR looks good, have couple comments below and left suggestions/questions in the files:

  • LTXVideo requires diffusers 0.32.0 right? Suggest we update to diffusers 0.32.1 with this PR.
  • If fix the multipart.go suggested change I can build and test tomorrow
  • Yeah LTX-Video is released in 0.32.0 version. I updated the diffusers version in the latest commits as in the end we need to upgrade diffusers.
  • I commited your suggestion already. 👍

Update: addressed all the suggested changes in the recent commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants