-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to run with 12g VRAM? Is there a way to optimize it #31
Comments
Same |
10 would be great ... actually I have a virtualized docker ubuntu image of this and it worked on my 10gb card (sharing roughly 10gb additional ram memory) |
I made a fork that runs on my 8GB GPU: It loads and unloads different stage models on demand so it doesn't require 16GB always. It can occasionally go OOM but it's pretty random. On windows you can enable Sysmem Fallback from Nvidia drivers and it'll not crash on those cases. This may even work on 6GB GPU's with these changes if you have that fallback enabled, but I can't test this myself. Note that most of that model management code is written by AI, so there can still be issues with it but it does work for me and few others. |
I pulled your fork and tried to run it, but it still resulted in an OOM error. I started investigating the issue. Since I was using xformers, I switched to flash-attn to see if it would help, but the OOM error persisted. At that time, I hadn't set NVIDIA to system fallback, so I made that adjustment, but it still didn't resolve the issue. For reference, my system is Windows 10 (22H2), with a 3060 (12GB VRAM) and 64GB of RAM. Perhaps I made some mistakes in my setup, and I’m continuing to troubleshoot. However, so far, it hasn't worked for me. Thank you for sharing your fork; I really appreciate it! |
After reviewing the code, it's pretty far from ideal and has bunch of logic issues, but it let me run this at reasonable speed. I've implemented more structured way to keeping models on CPU and only loading to GPU since but it doesn't actually make any impact since this initial quick hack generation yields similar gains, in fact I think a lot simpler model flush just before the initial mesh generation would have the same impact here. But I won't be pushing new changes to this until I find something more sophisticated that actually makes a difference. The issue seems to be that the first OOM happens on initial mesh generation and next when you actually simplify and bake the texture and they uses most GPU resources at once, those smaller models used on earlier stages aren't that big so this naive model unloading scheme isn't going to fix everything. I've found out that details on your generation image seem to matter a lot on VRAM usage, my previous test image with car almost never went OOM but my new test image with cartoon character eating spaghetti does it always.
Make sure the setting got applied, it really shouldn't OOM in normal use after you have it enabled. It should just make everything slower when you go over the limit. Fortunately TRELLIS isn't going over the VRAM amount by huge margin on initial mesh generation so this doesn't make it that much slower.
I'm on Win 11 Pro (24H2) with 2070 Super (8GB) and 64GB RAM. I'm using xformers. There are a ton of desktop apps that just take their own small slice from the GPU VRAM when you have GPU acceleration enabled on them all. I've used Process Explorer for MS to identify these (on my system even mouse software has some hidden AI prompt system running on the background). After cleaning most of the unneeded usage on desktop, I'm at 0.8 GB while writing this. It used to be close to 2GB on desktop when I first started looking into this, every small change counts when you are this close to limit. Besides the VRAM issue, you can also make the generation a lot faster by omitting extra video gens (if you use the demo gui, just keep the gaussian video if you want to preview results) and by using mesh baking set to fast option and by using 256 texture if you just want something to review quickly. These changes need only minor edits to app.py and postprocessing_utils.py. |
Could you show how to disable the video generation ? I tried but failed. |
I actually have version of that webui here that does directly give the mesh output with some extras parameters to tweak, but I've been stuck on trying to optimize the gen itself so haven't had time to clean it up. If you just want to reduce video gens to gaussian alone, that's easy since you just omit these lines: Lines 115 to 116 in ab1b84a
this alone makes it a lot faster step. To fully bypass the video you need some further gradio edits. |
@0lento I have run the example.py and noticed for me the most time is spent in texturing (not actual mesh generation which is reasonably fast). it would be interesting to disable that. do you know if it's possible ? in general there is performance degradation after the first run (on the main gradio app) edit: |
Note that if you skip the videos and texturing, you also don't need to create gaussian, only mesh. this doesn't have huge impact on speed. this is on my list to implement with the ui overhaul (I already have texture disable there but it still generates gaussian) but I'm currently looking into making things faster first. |
|
I made some small changes to make it work with 12gb vram: |
app_with_noglb.txt |
I force-pushed cleaned up version on my repo on this branch now: https://github.com/0lento/TRELLIS/tree/low-vram |
it worked on my 4 gb but took 23 minutes . but i included @erosDiffusion not baking textures and not rendering gaussian and normal video ,like in text i uploaded 12 steps of bothe |
4GB huh? well, you could make it slightly faster on my fork by changing this line: into: self.unload_models(['sparse_structure_flow_model','sparse_structure_decoder', 'slat_flow_model', 'slat_decoder_mesh', 'slat_decoder_gs', 'slat_decoder_rf']) These additional models were left from initial unload because it's just extra shuffling with GPU's that have 8+ GB (and these latent structure related models don't even take the whole VRAM on 8GB GPU. This won't make it fast on that VRAM amount but I guess every little can help. |
In my case, I have a slow GPU with 24GB VRAM, and a faster GPU with 10GB VRAM. I wanted to document my findings. And so, to work around that, you have to use In the run function, the only line that actually requires more than 10GB is this one: The rest of the ENTIRE pipeline can be run on my 10GB card. |
This is why we've linked solutions here to unload the models from gpu prior to this line. You can fit this to 10GB just fine if you unload the models that have already been used at this point (can load them back again on new runs). |
Yes. Because I have 2 GPUs, I don't have to offload anything to the CPU, which saves some time. This isn't exactly the same issue. I wanted to document this xformer bug somewhere for anyone else trying to switch the GPU of some pipeline's models, and I did not want to open a new issue for that. I thought this issue was close enough to talk about it. |
for interest, https://github.com/MontagueM/TRELLIS is a fork with some simple data type changes and makes flexicubes use f16 - I've found no significant visual difference in quality between f16 and f32 |
Thanks to all! it finally starts to work on my 12gb card. However sometimes I have an OOM in the bake_texture routine. Anyone has any idea on how to manage the memory in that routine? (Except skipping that part) |
I found that for a large amount of triangles, you need the simplify slider to be very high to not OOM, even on 24GB cards. This happens to me too for large landscapes when |
@MontagueM Your Fork is the only one that I was able to get this far. But Im not sure How to get passed this. ```[SPARSE] Backend: spconv, Attention: flash_attn To create a public link, set |
@cronobjs what pytorch version do you have? run updated fork to avoid hitting this anyway, its a tiny vram improvement |
Yes i also noticed that it depends on the triangles in the non simplified model. For now i avoid the texture baking for "large" models using the wax material. It is really fast. I also added a parameter to switch between baking and non baking. If I really need the texture i bake them on a low resolution model and transfer to the high resolution one in blender. |
@MontagueM Can you adapt multiple views? |
change this line: |
I used your fork,my videocard is 2060 super 8GB, but the following error occurs. Is there any solution? Thanks! :[SPARSE] Backend: spconv, Attention: xformers To create a public link, set |
This error isn't specific to my fork's current branch as I bring no further dependencies there at the moment. I do have faint memory of having to solve this in past too. I have torch 2.5.1+cu124 and utils3d 0.0.2 installed on my end to run this. To match that, you could run this on your virtual env:
|
Thank you for your reply. I solved this problem by: git clone https://github.com/EasternJournalist/utils3d.git But there is another problem : ModuleNotFoundError: No module named 'diff_gaussian_rasterization' |
Do look at what others did here to make it run on windows: #3 |
thanks! |
Im not sure if you got it figured out, but Here is how I managed to get it to work on my 3060. I git cloned with --recurse-submodules, 0lento's Fork of Trellis 8gb Vram. Then i copied the Windows Powershell scripts [*.ps1] from https://github.com/sdbds/TRELLIS-for-windows/ and requirements-uv.txt, then use install-uv-qinglong.ps1 in Powershell. Once finished, enjoy Trellis! |
I pushed update to https://github.com/0lento/TRELLIS/tree/low-vram/ with Jonathan's Marching Cubes PR #89 since it seems to be notably less VRAM hungry than flexicubes. I've also defaulted this to use fast texture baking path, because it's easier on VRAM. If you don't want these changes, they are contained on individual commits which you can revert if needed. Do note that I do want to keep this branch changes minimal and only keep things that lower the VRAM usage. I do have plans for exposing all these in better, more optional way on another branch in the near future. This low-vram branch is now up-to-date with TRELLIS main branch, including multi-image and gaussian exports. |
Wow, you're one step closer to being commercially viable |
All credits on that for jclarkk. His Gsplat PR also works for texture baking stage, but it doesn't change the vram usage + it's slightly slower to run so haven't yet included it here as I don't want to tweak this branches gradio etc to expose these things proper, just trying to keep everything else as stock as possible here and do bigger changes elsewhere. |
I also pushed another branch that has also Gsplat here https://github.com/0lento/TRELLIS/tree/low-vram-gsplat, I wanted to put this on separate branch because it introduces additional dependency and in my brief testing is bit slower, but this also does have more permissive license so many may want to use this. If you see more artifacts with this branch, try removing the fast bake option before drawing too many conclusions, fast bake is a compromise. If you already have TRELLIS setup and go to this branch, you need to install this within your TRELLIS python env: |
Is there a way to run with 12g VRAM? Is there a way to optimize it
The text was updated successfully, but these errors were encountered: