You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
No matter what I try I can't set the context_length of a GPTQ model. It's overridden by ExLLAMA, which then sets the cache size and context_length whatever it set as default (in this case 2048).
First problem is that its actually using max_seq_len to set the context_length and the Config dataclass doesn't include that field. Even if I monkey patch the config dataclass and set the Config
No matter what I try I can't set the context_length of a GPTQ model. It's overridden by ExLLAMA, which then sets the cache size and context_length whatever it set as default (in this case 2048).
First problem is that its actually using max_seq_len to set the context_length and the Config dataclass doesn't include that field. Even if I monkey patch the config dataclass and set the Config
None of these will change the context_length used by the GPTQ model because it uses the ExLLAMA config instead.
If I reach in and modify the ExLLAMA config after loading the model via
It correctly sets the context_length that but its already allocated a cache size at 2048 and promptly crashes whenever you ask for a long response.
The text was updated successfully, but these errors were encountered: