Replies: 4 comments 2 replies
-
I look at it from 2 sides: local-only and cloud AI. For cloud-based AI, I like claude3-haiku: Good, fast and dirt-cheap API (using it on a daily basis for 2 months, only cost under 2$ so far). (Unfortunately in the most recent version of Perplexica at the time of writing, There seem to be issues with rate-limiting) For local-only setups i got the best results with the popular llama3.1:8b model - my hardware is restricted to a stone-age 4GB VRAM GPU. Phi3 was a miss, as it does not embed the citations correctly in many situations, along with some minor weaknesses. I also used the two larger gemma2 models with very impressive results and for sensitive topics, dolphin-llama3:8b did the trick as censorship is a thing with the stock llama models. Interestingly, the results are heavily impacted by the embeddings model selected - subjectively even more than with the GPT model. The by far best results i had were with nomic-embed-text served by ollama. I would recommend the latter, as performance is not heavily affected. |
Beta Was this translation helpful? Give feedback.
-
I've been using Mistral-Nemo 12B with Perplexica and the results are great. |
Beta Was this translation helpful? Give feedback.
-
I agree, mistral-nemo 12b seems to be the best, at least for my 2080ti. q4_K_M is ok most of the time for me. What are you guys using for num_ctx on the chat model? The stock ollama 2048? I played around with 4096 and 8192, seems to help but really eats vram. I have a 2080ti so I can set num_ctx to 8k with q4_K_M and still get moderately fast speeds. Bumping down num_ctx to 4k with q4_K_M makes it fit in gpu vram and its pretty fast. Anyone else try messing with num_ctx? Is 8k really needed? I'm not sure. I also bumped temperature to 0.7 since I noticed in the code that is what the cloud models are using. One other thing regarding embedding model, I also used nomic-embed-text:latest but I bumped its num_ctx to 8k, not sure if its needed but that's the max it can handle. Seems to work ok for me. Update: I did some more testing and q5_K_M with 4096 num_ctx seems to be the sweet spot for response quality. |
Beta Was this translation helpful? Give feedback.
-
i'm using litellm connected to openrouter using openai/gpt-4o-mini and it work great. running small open embedding model |
Beta Was this translation helpful? Give feedback.
-
I tried several ai models using Perplexica. Llama 3, Llama 3.1, Phi-3 Medium, Command-r 28b.
My impressions are that even Llama 3 delivers good results, which means that even modest AI capable machines can benefit from this project using local models.
Using Phi-3 Things are getting more interesting. The model is very good. All my tests where Perplexica rivals Perplexity copilot are using Phi-3 medium.
Command-r even if this model does not rank well in leaderboards (which do measure mainly raw logic capabilities) this is truly astonishing model that excels in creative writing and RAG. And the match with Perplexica is really good.
To add further to the complexity different AI models perform in a different way with the same prompt, some deliver better results with simpler prompts some really seem to like detailed and markdown structured prompts.
I did some tests using gpt4o as arbiter. The results are that even the smallest Llama 3.1 8B is having results that are not ranking very differently to the bigger models. Two conclusions either the search engine connection is a bottle neck, or for this task there is no need for a really big model. The second aligns with my constatations that for basic search not needing reasoning the vanilla perplexity does not perform much worse that copilot perplexity and on some occasions being on par with their results.
As a whole I almost completely stopped using google and rely on perplexica and when I am not on my pc on perplexity. Especially if I search something like debugging.
Beta Was this translation helpful? Give feedback.
All reactions