Batch processing was slow (GPU support) #23

aripsam · 2023-08-04T06:13:13Z

aripsam
Aug 4, 2023

Hey just curious, compared to roop, which swaps upto 5-6 images per second, this took 10+ seconds per image.

Did I do something wrong or is there scope to improve the speed?

I appreciate that this is not the same as roop but asking because it’s supposed to work better than roop.

glucauze · 2023-08-04T07:29:46Z

glucauze
Aug 4, 2023
Maintainer

It depends :) It depends what you're comparing it with. If you're comparing with sd-webui-roop then it's supposed to be faster (or at least the same). If you're comparing it to roop, then no, it's slower.

There are several reasons for this:

the GPU is not currently implemented. The reason is that it would require additional VRAM and a bit of management to do it properly. Previous attempts have produced poor results.
i think roop uses multithreading for the CPU I think. I'm not sure how much of an impact that has on performance. I'd have to have a look.
The sd framework has a way of working that might imposes restrictions on how much stuff and when you can concurrently.

For the moment I think the extension is really designed to work with SD tools and in this context it brings something to the table compared to roop (inpainting, ...). In terms of performance, roop will no doubt always be a little ahead of the game because the video target is very different and requires that.

There's a tradeoff to be found in terms of resource consumption in SD so as not to run out of VRAM. I have 12Gb of VRAM.

0 replies

glucauze · 2023-08-04T08:27:16Z

glucauze
Aug 4, 2023
Maintainer

To provide a more comprehensive view on the subject, particularly for those familiar with ONNX, here's a breakdown of the challenge.

Loading the model is typically the most time-consuming part of the process, so you certainly wouldn't want to repeat it more than necessary. To prevent this, it makes sense to retain the model in RAM when not in use. If this same approach were taken with the GPU, the model would occupy VRAM throughout the execution of SD, potentially wasting precious resources. It's impractical to consume VRAM when the extension isn't actively being used. Ideally, we would be able to unload the model into RAM after execution and subsequently reload it into VRAM prior to the next execution, minimizing unnecessary VRAM consumption.

Insightface relies on onnxruntime for its operations. I'm not aware of any straightforward method to switch the model between RAM and VRAM with onnxruntime. This has led me to not focus too much on this aspect for now.

Conversely, roop doesn't face this issue as it doesn't require the loading of a large broadcast model alongside its main process. Consequently, it can comfortably retain the model in VRAM without concern, avoiding the complexity of managing the model's location between RAM and VRAM.

I could also activate the model and keep it in VRAM. But as I often have VRAM problems on my side and many use gpu with little VRAM, I haven't done it for the moment.

However, there may be a solution that I haven't seen or I may be wrong on onnxruntime. In which case I'd be pleased to hear any ideas you may have.

0 replies

aripsam · 2023-08-04T10:20:34Z

aripsam
Aug 4, 2023
Author

Thank you for taking time and responding. Makes complete sense and looks like a challenge that I’m sure can be solved but may need some time and work. Not having to load the model everytime will surely save time. I think this is implemented in sd-web-ui where the controlnet model is loaded just once during batch img2img but not sure exactly how that works.

0 replies

glucauze · 2023-08-04T12:39:14Z

glucauze
Aug 4, 2023
Maintainer

I am trying to implement it as an option : #24

0 replies

sdcarterchen · 2023-08-04T15:42:59Z

sdcarterchen
Aug 4, 2023

version 1.2.0 is not as good as 1.1.2 ，the swaped face is very blur, and more complex operation

0 replies

glucauze · 2023-08-04T16:21:22Z

glucauze
Aug 4, 2023
Maintainer

@sdcarterchen

version 1.2.0 is not as good as 1.1.2 ，

No

and more complex operation

Yes. If you don't care about the options in faces, don't use them. It will be back to very simple.

the swaped face is very blur :

Yes. I was fearing that comment.

I disabled codeformer by default in Global Postprocessing. That's why you get poor results. Activate it and that will be back to what you know. You can make it the default in global settings.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch processing was slow (GPU support) #23

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Batch processing was slow (GPU support) #23

aripsam Aug 4, 2023

Replies: 6 comments

glucauze Aug 4, 2023 Maintainer

glucauze Aug 4, 2023 Maintainer

aripsam Aug 4, 2023 Author

glucauze Aug 4, 2023 Maintainer

sdcarterchen Aug 4, 2023

glucauze Aug 4, 2023 Maintainer

aripsam
Aug 4, 2023

glucauze
Aug 4, 2023
Maintainer

glucauze
Aug 4, 2023
Maintainer

aripsam
Aug 4, 2023
Author

glucauze
Aug 4, 2023
Maintainer

sdcarterchen
Aug 4, 2023

glucauze
Aug 4, 2023
Maintainer