You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, I am trying to load an 8b int8 model on my device with 16 GB of RAM. The model should only be taking slightly over 8 GB of memory but it seems that the during the loading process, the model is being copied, which doubles the memory usage to over 16 GB and causing an OOM.
Is it not possible to stream the model instead?
The text was updated successfully, but these errors were encountered:
Hey, I am trying to load an 8b int8 model on my device with 16 GB of RAM. The model should only be taking slightly over 8 GB of memory but it seems that the during the loading process, the model is being copied, which doubles the memory usage to over 16 GB and causing an OOM.
Is it not possible to stream the model instead?
The text was updated successfully, but these errors were encountered: