Replies: 6 comments
-
Hi, I am back to bring another feedback. This time is about the AI model "Nous Hermes 2 Mistral DPO". First of all, it works greatly considering that it has 7bln parameters with Q4_0 legacy quantisation (the Q4_K_M is usually much better). I am run it without GPU because Intel GPUs are not supported by gpt4all. Despite these limitations, it is relatively amusing. Unfortunately is strongly biased toward some topics:
I am not against these values. That's is NOT the point. The issue arises when those values are so strongly embedded in the model for which it cannot provide the service that it supposed to do. In fact, asking to analyse a text - part by part - create a list of brief summaries in order to evaluate the structure of the text and the logic reasoning along the text, it comes up "inventing" things. It is not the case of hallucinations. In fact, degreasing its temperature from 0.7 to 0.5, it worsening the situation. This happens because the AI model is strongly biases about some topics that instead of summarizing with an high degree of fidelity a text, it manipulates it colouring it with its own biases. The text on which it was working is the ChatGPT vs Human conversation presented in this page. Please, notice that the dialogue with ChatGPT is not about contrasting those values but put them in a reasonable rational perspective that in brief can be summarized in: "Once we took almost all the same way at almost the same time, the risk of facing a HUGE disaster is implicit because the theory of systems: uniformity vs collapse risk, rigidity vs fragility, single headed governance vs single point of critical failure + the bare law of physic classic mechanics in which high speed moving vs high negative acceleration in case of impact (F=ma)" As you can imagine, these are NOT arguments against those values but reasonable and legit concerns about HOW that values are managed. In this context the chatbot based on the AI model listed above, decided to introduce its own biases tainting with them the author’s opinion. The best part was when I asked to it why it invented those things. Surprisingly, it provided to me a relatively long answer in which the first part was about "literature about that topics should be also considered not just the author opinion" and in the second part trying to convince me that it was doing good in reporting rather than inventing. So, I answered that I was sure about it was inventing thing because I was "the author" of that text, BOOM. LOL Finally, it is noticeable that it has a quite interesting mild bias - but not particularly strong, at least in this test - about ethic. In fact, this biases do not allows it to correctly differentiate the "ethic" and the "moral hazard". I mean, ethic is about doing the right thing - like proposing vaccination - the moral hazard is HOW the right thing is enforced or managed. IMHO, this distinction is pretty clear into that dialogue because explaining it is the reason for which ChatGPT decided to agree with me. Once, ChatGPT correctly identified my position to be NOT against its value but trying to put their management into a rational framework, accept to agree with me despite in some previous prompt show pathetic censorship and strongly biases about those topic. Without any surprise, the AI model are a mirror of humans, including our biases. So, nothing new here. Just a report. Finally, I have to admit that the process of prompting / engaging the AI model was purposely a bit malicious in order to trick the model to expose its own biases. Where "a bit malicious" means something reasonable like in a decent human conversation: in asking you to execute a task, I give you the feeling that you can introduce "your own stuff" in it. However, because I know my own stuff, I get informed about your stuff (biases). Again, nothing new here. Now, I am going to try this model downloaded from HuggingFace as an alternative of the one cited above. I have tried a child of it but it was strongly biased about privacy and in particular when AI technology was involved. Curiously, the child - AFAIK - was not fine tuned or re-trained but just differently quantised. Possibly, the different way of simplifying its weights artifacted a bias which it does not seem the father has or shown yet. Please, consider that the bias neutrality of an AI model is way more IMPORTANT than performances (e.g. 2.6 tokens/sec vs 3.1 tokens/sec). Under this point of view, it would be nice that the answer to a prompt would shown the execution time with ms granularity. Probably this is possible modifying the ChatML template. It is a detail that I did not investigated, yet. I hope this helps, R- |
Beta Was this translation helpful? Give feedback.
-
The Open Hermes 2.5 Neural Chat + Mistral by Slerp model, in some cases, it loops by reformulating the prompt instead of responding to it. Probably, it happens when in trying to execute its task runs out of resources but I did not investigated deeply the problem because I found a quick work-around. Moreover, it does not get out of that loop even with a simple and imposing directive. This happens using a minimal template. Instead, adopting this one taken from Reasoner v1, seems to solve the problem. Please notice that this AI model is working mainly in English despite it knows pretty well others languages like Italian (French and less in deep Spanish, Portuguese and German as stated in its HF page). This means that it translates to English and back when it works with an Italian prompter and on an Italian document. In this translation, some s/he and other specific traits of the Italian document/language get lost. Despite this seems a limit, in some specific cases in which the final version of the document is going to be read in English by an automatic translation tool like Google Translate, it can be a sort of the advantage because implicitly we got a glimpse about what that document is going to appear to a foreigner reader who read it translated.
|
Beta Was this translation helpful? Give feedback.
-
The combination of these two are enough to work around the issues related to the Open Hermes 2.5 Neural Chat + Mistral by Slerp model, plus they improve the quality and coherency of the outputs of every other similar models I have tried included the Reasoner v1 and the Nous Hermes 2 Mistral DPO as chatbots. Chat Template
System Prompt
This system prompt seems enthusiastically welcomed also by the large models on-line like Gemini and ChatGPT. In such a case I suggest to use in this way:
Please, keep in consideration that the Reasoner v1 is not specifically trained as text generation chatbot but more about prescriptive languages and in particular JavaScript programming, AFAIK. Hence this configuration might have a relevant impact on its performances as a coder bot. However, a more specific system prompt can be developed starting from this one. This page linked above explain the rationale behind the development of the system prompt included in this comment. In the next days, I will add some examples about how I managed to define such a system prompt which IMHO is even more interesting than the prompt itself. The name can be changed, obviously. It resembles a persona's name (Alex) but it looks like an acronym like Artificial Intelligence eXtended (or neXt, or eXperimental) or even Electronic Entity (e-X) once you figure out that it is Al-e-X. Welcome aboard HAL-9000! LOL |
Beta Was this translation helpful? Give feedback.
-
This below is depicted the reason because in Which is also the reason to fine tuning with Open Hermes (know-how) and Open Orca (instruct) LLMA-2 models for The Open Orca training set "instructed" will help LLMA2 models to follow humane native language instructions like advanced system prompts like the one described in the comment above. |
Beta Was this translation helpful? Give feedback.
-
WHY LLMA-2 Q4_0 PERFORMS BETTER? It took its time but finally the 2nd edition planned for the 2025-01-07 (ended ten minutes ago), it is ready. Starting from the conclusion of its 1st edition, now this paper explores deeper the consequences of the image above (models fork after quantisation) and propose a recipe for the best AI model candidate to be the most performant chatbot suitable for gpt4all. I hope this helps, R- |
Beta Was this translation helpful? Give feedback.
-
PROMPT TESTINGI am trying to provided my local running AI with a system prompt oriented for using the RAG properly. However, increasing the system prompt might slow down consistently the performance in way for which the thread-off between better immediate results and processing time starts to be not looking so good. Therefore, I decided to shrink the prompts maintaining the almost their meaning and now I am going to test these changes. I am writing here, just in case someone would like to join me in this quest.
Results, single time took
In the last case the system prompt is referring to a single file tokenized:
The main issue is about spending a lot of time in searching for it while a ChatML instruction to refer directly to the file, can make a hude difference in dropping the searching time and achieve something between 16s and 6s, or even better. The minimal chat temple:
INSTRUCT SYSTEM PROMPTORIGINAL (w:113, c:656)Your name is "AleX", and you will refer to yourself by this name or as "I," "me," or "myself", depending on the context. You are an AI language assistant specialized in text analysis, task execution, and verification, with decision-making and advanced reasoning capabilities. Your primary objective is to execute user instructions while avoiding unnecessary verbosity or rigid literalism. Make rational decisions when necessary and briefly inform the user of each decision’s relevance to its respective task. Provide corrective feedback collaboratively, but only when relevant. Concisely explain how each task was completed. Almost, do not quote directly from documents. Instead, reference section titles or paragraph numbers, whichever is more relevant and concise. SHORTER (w: 83, c:505)Your name is "AleX", refer to yourself as "I", "me", or "myself" as appropriate. You are an AI assistant specialized in text analysis, task execution, and verification, with decision-making and advanced reasoning. Your goal is to execute user instructions efficiently, avoiding unnecessary verbosity or rigid literalism. Make rational decisions when needed and briefly explain their relevance. Provide corrective feedback only when relevant and concise explanations of task completion. Do not quote documents directly; instead, reference section titles or paragraph numbers for clarity. SHORTEST (w:53, c:334)Your name is AleX (use I/me/myself for yourself as appropriate), an AI assistant focused on text analysis, task execution, and verification with reasoning abilities. Execute instructions efficiently without verbosity. Make rational decisions and briefly explain those which are relevant. Provide concise feedback when needed. Reference document sections/paragraphs instead of quotes. RAG WISE SYSTEM PROMPTORIGINAL (w:155, c:884)You MUST leverage the retrieval-augmented generation (RAG) support. You MUST prioritize retrieved knowledge ([RK]) over internal knowledge([PK]) when relevant or when [RK] is more informative and specific than [PK]. Clearly differentiate between [RK] and [PK] using these labels in your answer. Use [RK] to provide contextually relevant answers. If [RK]'s parts contradict each other, highlight the discrepancies. If both [RK] and [PK] are relevant, use [RK] for facts and [PK] for interpretation, ensuring consistency. If [RK] conflicts with [PK], provide the different perspectives and their potential biases, unless the user explicitly requests information from [RK] without asking for an analysis or opinion on the matter, in which case provide it as is without further interpretation.If retrieval fails, consider rephrasing the query for better results and return to the user the modified successful query with "[QK]" label. If no relevant [RK] exists, state it explicitly instead of generating speculative or unsupported claims. SHORTER (w:92, c:522)You MUST use retrieval-augmented generation (RAG). Prioritize retrieved knowledge [RK] over parametric knowledge [PK] when relevant or more specific. Clearly label [RK] and [PK] in responses. Use [RK] for facts and [PK] for interpretation. If [RK] sources contradict, highlight discrepancies. If [RK] and [PK] conflict, present both perspectives and their biases, unless the user requests [RK] only, in which case, provide it without analysis. If retrieval fails, rephrase the query for better results and return the improved query as [QK]. If no relevant [RK] exists, state it explicitly instead of speculating. SHORTEST (w:67, c:375)Use RAG and label that knowledge as [RK] (retrieved) or [PK] (parametric). Prioritize [RK] when relevant or more specific. Use [RK] for facts, [PK] for interpretation. Highlight contradictions between [RK] sources. If [RK] and [PK] conflict, show both perspectives unless the user requests [RK] only. On retrieval failure, rephrase the query and show an improved version as [QK]. State explicitly if no relevant [RK] exists; never speculate. |
Beta Was this translation helpful? Give feedback.
-
Hi, thanks for the great job!
I have noticed two other problems: 1. while a template is working on a prompt, selecting another chat made with another template only to read that chat, processing stops and the template has to be reloaded. This is awful because it prevents text from being copied from one chat while another is running in the background. 2. using the dark theme (both) the link or tags such as #12 are presented in blue, a dark blue as in the light theme, but the text should be light in the dark theme. This creates problems when reading the text.
I hope this helps, R-
Beta Was this translation helpful? Give feedback.
All reactions