-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Frequently Asked Questions
There are many "best" models for many situations. The factors of what is best for you depends on the following:
- How much effort you want to put into setting it up.
- If you want it all done for you "asap"
- Scroll through our "Add Models" list within the app. The models are pre-configured and ready to use.
- If you want to get a custom model and configure it yourself. These are NOT pre-configured; we have a WIKI explaining how to do this.
- Download models provided by the GPT4All-Community.
- Download using the keyword search function through our "Add Models" page to find all kinds of models from Hugging Face.
- Sideload from some other website.
- If you want it all done for you "asap"
-
Hardware requirements
- As a general rule of thump:
- Smaller models require less memory (RAM or VRAM) and will run faster.
- Larger models require more memory and will run slower, but outperform in terms of capabilities and produce better output.
- Newer models tend to outperform older models to such a degree that sometimes smaller newer models outperform larger older models.
- Check out https://llm.extractum.io/ to find models that fit into your RAM or VRAM.
- As a general rule of thump:
- What you need the model to do.
- The models working with GPT4All are made for generating text.
- Multi-lingual models are better at certain languages.
- Coding models are better at understanding code.
- Agentic or Function/Tool Calling models will use tools made available to them.
- Instruct models are better at being directed for tasks.
- Chat models are good for conversational purposes.
- Uncensored models are good for roleplaying or story writing.
- These come in various forms and are derived from the Chat or Instruct variants.
- The models working with GPT4All are made for generating text.
- Look at benchmarks to get an idea of which does what better.
- Settings directory:
C:\Users\%USERNAME%\AppData\Roaming\nomic.ai
- Models directory:
C:\Users\%USERNAME%\AppData\Local\nomic.ai\GPT4All
- Settings directory:
/Users/{username}/.config/gpt4all.io
- Models directory:
/Users/{username}/Library/Application Support/nomic.ai/GPT4All
- Settings directory:
/home/{username}/.config/nomic.ai
- Models directory:
/home/{username}/.local/share/nomic.ai/GPT4All
- Temperature: This controls the randomness of predictions, lower values make the model more deterministic, while higher values increase randomness.
- Top K: This limits the sampling pool to the most probable tokens. For example, if K=50, only the 50 most likely tokens are considered for the next word prediction.
- Top P: The model looks at all possible next tokens and picks the smallest group of tokens that together have a total probability of at least this percentage. For instance, a setting of "1" will include 100% of all probable tokens. If P=0.9, it includes the fewest number of tokens with a combined probability of at least 90%. The lower this number is set towards 0 the less tokens will be included in the set the model will use next.
- Min P: This sets a minimum probability threshold for individual tokens. The remaining selected tokens have a combined probability of 100%. A setting of "1" will include only 1 token with a probability if 100%. A much lower setting like P=0.05, includes the smallest number of tokens with a probability greater than 5%.
Experience how settings like Temperature, Top K, Top P, Min P change model behavior in this live example.
- Ensure you are using the GPU if you have one. See "Settings > Application : Device" Make sure it is set to use either Vulkan or Cuda.
- Find the right number of GPU layers in the model settings. If you have a small amount of GPU memory you will want to start low and move up until the model wont load. Then use the last known good setting.
Make sure the model has GPU support.
- Vulkan supports f16, Q4_0, Q4_1 models with GPU (some models won't have any GPU support).
- Cuda supports all gguf formats (some models won't have any GPU support). Cuda is also available for the LocalDocs feature.
When we speak of training in the ML field, we usually speak of pre-training (see also foundational models and unsupervised learning). GPT4all-Chat does not support finetuning or pre-training. At pre-training stage, models are often phantastic next token predictors and usable, but a little bit unhinged and random. After pre-training, models usually are finetuned on chat or instruct datasets with some form of alignment, which aims at making them suitable for most user workflows. Retrieval Augmented Generation (RAG), such as the LocalDocs feature, is a way to add more (relevant/specific) tokens to context. In other words: RAG will add additional tokens to the initial prompt given by the user to trigger a response. Since all the tokens in the context need to be processed (inferenced) to yield a response and your SSD/HDD is abysmally slow at doing that, it is usually done in RAM (still slow) or VRAM (fast). While it is technically possible to store this context (the conversation) permanently on your hard-drive and access it later (and load it back into RAM / VRAM), models have a maximum context window they are trained for and if you go beyond that limit, response quality degrades rapidly. That means, it is not possible to use RAG and prompting to "train" a model to your liking indefinitely, but you will have to reset your conversation and start anew at one point. While large language models are good in-context learners, there is a technical limit to that and if you want to go beyond that, you will have to do some additional pre-training or finetuning.