Update Docs

Update docs
svilupp · Dec 23, 2023 · b7fd28c · b7fd28c
2 parents b4502c6 + ff4e7fc
commit b7fd28c
Show file tree

Hide file tree

Showing 8 changed files with 87 additions and 4,099 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,13 +6,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+
+### Fixed
+
+## [0.5.0]
+
 ### Added
 - Experimental sub-module RAGTools providing basic Retrieval-Augmented Generation functionality. See `?RAGTools` for more information. It's all nested inside of `PromptingTools.Experimental.RAGTools` to signify that it might change in the future. Key functions are `build_index` and `airag`, but it also provides a suite to make evaluation easier (see `?build_qa_evals` and `?run_qa_evals` or just see the example `examples/building_RAG.jl`)
 
 ### Fixed
 - Stricter code parsing in `AICode` to avoid false positives (code blocks must end with "```\n" to catch comments inside text)
 - Introduced an option `skip_invalid=true` for `AICode`, which allows you to include only code blocks that parse successfully (useful when the code definition is good, but the subsequent examples are not), and an option `capture_stdout=false` to avoid capturing stdout if you want to evaluate `AICode` in parallel (`Pipe()` that we use is NOT thread-safe)
 - `OllamaManagedSchema` was passing an incorrect model name to the Ollama server, often serving the default llama2 model instead of the requested model. This is now fixed.
+- Fixed a bug in kwarg `model` handling when leveraging PT.MODEL_REGISTRY
 
 ## [0.4.0]
 

diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "PromptingTools"
 uuid = "670122d1-24a8-4d70-bfce-740807c42192"
 authors = ["J S @svilupp and contributors"]
-version = "0.5.0-DEV"
+version = "0.5.0"
 
 [deps]
 Base64 = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

diff --git a/docs/make.jl b/docs/make.jl
@@ -28,6 +28,7 @@ makedocs(;
             "Various examples" => "examples/readme_examples.md",
             "Using AITemplates" => "examples/working_with_aitemplates.md",
             "Local models with Ollama.ai" => "examples/working_with_ollama.md",
+            "Custom APIs (Mistral, Llama.cpp)" => "examples/working_with_custom_apis.md",
             "Building RAG Application" => "examples/building_RAG.md",
         ],
         "F.A.Q." => "frequently_asked_questions.md",

diff --git a/docs/src/examples/building_RAG.md b/docs/src/examples/building_RAG.md
@@ -217,7 +217,7 @@ We're done for today!
 - Add filtering for semantic similarity (embedding distance) to make sure we don't pick up irrelevant chunks in the context
 - Use multiple indices or a hybrid index (add a simple BM25 lookup from TextAnalysis.jl)
 - Data processing is the most important step - properly parsed and split text could make wonders
-- Add re-ranking of context (see `rerank` function, you can use Cohere ReRank API)`)
+- Add re-ranking of context (see `rerank` function, you can use Cohere ReRank API)
 - Improve the question embedding (eg, rephrase it, generate hypothetical answers and use them to find better context)
 
 ... and much more! See some ideas in [Anyscale RAG tutorial](https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1)

diff --git a/docs/src/examples/working_with_custom_apis.md b/docs/src/examples/working_with_custom_apis.md
@@ -0,0 +1,69 @@
+# Custom APIs
+
+PromptingTools allows you to use any OpenAI-compatible API (eg, MistralAI), including a locally hosted one like the server from `llama.cpp`.
+
+````julia
+using PromptingTools
+const PT = PromptingTools
+````
+
+## Using MistralAI
+
+Mistral models have long been dominating the open-source space. They are now available via their API, so you can use them with PromptingTools.jl!
+
+```julia
+msg = aigenerate("Say hi!"; model="mistral-tiny")
+# [ Info: Tokens: 114 @ Cost: $0.0 in 0.9 seconds
+# AIMessage("Hello there! I'm here to help answer any questions you might have, or assist you with tasks to the best of my abilities. How can I be of service to you today? If you have a specific question, feel free to ask and I'll do my best to provide accurate and helpful information. If you're looking for general assistance, I can help you find resources or information on a variety of topics. Let me know how I can help.")
+```
+
+It all just works, because we have registered the models in the `PromptingTools.MODEL_REGISTRY`! There are currently 4 models available: `mistral-tiny`, `mistral-small`, `mistral-medium`, `mistral-embed`.
+
+Under the hood, we use a dedicated schema `MistralOpenAISchema` that leverages most of the OpenAI-specific code base, so you can always provide that explicitly as the first argument:
+
+```julia
+const PT = PromptingTools
+msg = aigenerate(PT.MistralOpenAISchema(), "Say Hi!"; model="mistral-tiny", api_key=ENV["MISTRALAI_API_KEY"])
+```
+As you can see, we can load your API key either from the ENV or via the Preferences.jl mechanism (see `?PREFERENCES` for more information).
+
+## Using other OpenAI-compatible APIs
+
+MistralAI are not the only ones who mimic the OpenAI API!
+There are many other exciting providers, eg, [Perplexity.ai](https://docs.perplexity.ai/), [Fireworks.ai](https://app.fireworks.ai/).
+
+As long as they are compatible with the OpenAI API (eg, sending `messages` with `role` and `content` keys), you can use them with PromptingTools.jl by using `schema = CustomOpenAISchema()`:
+
+```julia
+# Set your API key and the necessary base URL for the API
+api_key = "..."
+provider_url = "..." # provider API URL
+prompt = "Say hi!"
+msg = aigenerate(PT.CustomOpenAISchema(), prompt; model="<some-model>", api_key, api_kwargs=(; url=provider_url))
+```
+
+> [!TIP]
+> If you register the model names with `PT.register_model!`, you won't have to keep providing the `schema` manually.
+
+Note: At the moment, we only support `aigenerate` and `aiembed` functions.
+
+## Using llama.cpp server
+
+In line with the above, you can also use the [`llama.cpp` server](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md). 
+
+It is a bit more technically demanding because you need to "compile" `llama.cpp` first, but it will always have the latest models and it is quite fast (eg, faster than Ollama, which uses llama.cpp under the hood but has some extra overhead).
+
+Start your server in a command line (`-m` refers to the model file, `-c` is the context length, `-ngl` is the number of layers to offload to GPU):
+
+```bash
+./server -m models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf -c 2048 -ngl 99
+```
+
+Then simply access it via PromptingTools:
+
+```julia
+msg = aigenerate(PT.CustomOpenAISchema(), "Count to 5 and say hi!"; api_kwargs=(; url="http://localhost:8080/v1"))
+```
+
+> [!TIP]
+> If you register the model names with `PT.register_model!`, you won't have to keep providing the `schema` manually. It can be any `model` name, because the model is actually selected when you start the server in the terminal.