diff --git a/README.md b/README.md index d57b3df..c988868 100644 --- a/README.md +++ b/README.md @@ -106,14 +106,6 @@ python webapp.py --api_config demo_data/api_config.yaml
-## Contributing -We welcome contributions from the community! If you'd like to contribute, please follow these steps: -1. Fork the repository. -2. Create a new branch for your feature (`git checkout -b feature/AmazingFeature`). -3. Commit your changes (`git commit -m 'Add some AmazingFeature'`). -4. Push to the branch (`git push origin feature/AmazingFeature`). -5. Open a pull request. - ## Customize Your Experience @@ -132,6 +124,10 @@ python -m factcheck --modal string --input "MBZUAI is the first AI university in python -m factcheck --modal string --input "MBZUAI is the first AI university in the world" --api_config demo_data/test_api_config.yaml --prompt demo_data/sample_prompt.yaml ``` +## Contributing to Loki + +Welcome and thank you for your interest in the Loki project! We welcome contributions and feedback from the community. To get started, please refer to our [Contribution Guidelines](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/CONTRIBUTING.md). + ## Ready for More? ๐ช **Join Our Journey to Innovation with the Supporter Edition** @@ -178,45 +174,6 @@ Donโt miss out on the latest updates, feature releases, and community insights ๐ Subscribe now at [our website](https://www.librai.tech/)! -## Change Log - -1. **API Key Handling:** Transitioned from creating key files via copying to dynamically reading all API keys from a YAML file, streamlining configuration processes. -2. **Unified Configuration Dictionary:** Replaced platform-specific dictionaries with a unified dictionary that aligns with environmental variable naming conventions, enhancing consistency and maintainability. -3. **Model Switching:** Introduced a `--model` parameter that allows switching between different models, currently supporting OpenAI and Anthropic. -4. **Modular Architecture:** Restructured the codebase into one Base class file and individual class files for each model, enhancing modularity and clarity. -5. **Base Class Redefinition:** Redefined the Base class to abstract asynchronous operations and other functionalities. Users customizing models need only override three functions. -6. **Prompt Switching:** Added a `--prompt` parameter for switching between predefined prompts, initially supporting prompts for OpenAI and Anthropic. -7. **Prompt Definitions via YAML and JSON:** Enabled prompt definitions using YAML and JSON, allowing prompts to be automatically read from corresponding YAML or JSON files when the prompt parameter ends with `.yaml` or `.json`. -8. **Search Engine Switching:** Introduced a `--retriever` parameter to switch between different search engines, currently supporting Serper and Google. -9. **Webapp Frontend Optimization:** Optimized the web application frontend to prevent duplicate requests during processing, including disabling the submit button after a click and displaying a timer during processing. -10. **Client Switching:** introduce a --client parameter that allows switching between different client (chat API), currently support OpenAI compatible API (for local model and official model), and Anthropic chat API client. - -## Development Plan - -As Loki continues to evolve, our development plan focuses on broadening capabilities and enhancing flexibility to meet the diverse needs of our users. Here are the key areas we are working on: - -## 1. Support for Multiple Models -- **Broader Model Compatibility:** - - Integration with leading AI models besides ChatGPT and Claude to diversify fact-checking capabilities, including Command R and Gemini. - - Implementation of self-hosted model options for enhanced privacy and control, e.g., FastChat, TGI, and vLLM. - -## 2. Model-specific Prompt Engineering -- **Unit Testing for Prompts:** - - Develop robust unit tests for each step to ensure prompt reliability and accuracy across different scenarios. - -## 3. Expanded Search Engine Support -- **Diverse Search Engines:** - - Incorporate a variety of search engines including Bing, scraperapi to broaden search capabilities. - - Integration with [Searxng](https://github.com/searxng/searxng), an open-source metasearch engine. - - Support for specialized indexes like LlamaIndex and Langchain, and the ability to search local documents. - -## 4. Deployment and Scalability -- **Dockerization:** - - Packaging Loki into Docker containers to simplify deployment and scale-up operations, ensuring Loki can be easily set up and maintained across different environments. - -We are committed to these enhancements to make Loki not just more powerful, but also more adaptable to the needs of a global user base. Stay tuned as we roll out these exciting developments! - - ## License This project is licensed under the [MIT license](LICENSE.md) - see the LICENSE file for details. diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md new file mode 100644 index 0000000..7c5c5a0 --- /dev/null +++ b/docs/CONTRIBUTING.md @@ -0,0 +1,42 @@ +# Contribute to Loki + +Welcome and thank you for your interest in the Loki project! We welcome contributions and feedback from the community. This document outlines the process for contributing to the project. + +## How to Contribute + +We recommend a few best practices to make your contributions or reported errors easier to assist with. + +### For Pull Requests + +* PRs should be titled descriptively, and be opened with a brief description of the scope and intent of the new contribution. +* New features should have appropriate documentation added alongside them. +* Aim for code maintainability, and minimize code copying. + +### For Feature Requests + +* Provide a short paragraph's worth of description. What is the feature you are requesting? What is its motivation, and an example use case of it? How does this differ from what is currently supported? + +### For Bug Reports + +* Provide a short description of the bug. +* Provide a reproducible example--what is the command you run with our library that results in this error? Have you tried any other steps to resolve it? +* Provide a full error traceback of the error that occurs, if applicable. A one-line error message or small screenshot snippet is unhelpful without the surrounding context. +* Note what version of the codebase you are using, and any specifics of your environment and setup that may be relevant. + +## Code Style + +Loki uses [black](https://github.com/psf/black) and [flake8](https://pypi.org/project/flake8/) to enforce code style, via [pre-commit](https://pre-commit.com/). Before submitting a pull request, please run the following commands to ensure your code is properly formatted: + +```bash +pip install pre-commit +pre-commit install +``` + +## How Can I Get Involved? + +There are a number of distinct ways to contribute to Loki: + +* Implement new features or fix bugs by submitting a pull request: If you want to use a new model or retriever, or if you have an idea for a new feature, we would love to see your contributions. +* We have our [development plan](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/development_plan.md) that outlines the roadmap for the project. If you are interested in contributing to any of the tasks, please join our [Discord](https://discord.gg/NRge6RS7) and direct message to @Haonan Li. + +We hope you find this project interesting and would like to contribute to it. If you have any questions, please feel free to reach out to us on our [Discord](https://discord.gg/NRge6RS7). diff --git a/docs/DEVELOPMENT_PLAN.md b/docs/DEVELOPMENT_PLAN.md new file mode 100644 index 0000000..7ca0702 --- /dev/null +++ b/docs/DEVELOPMENT_PLAN.md @@ -0,0 +1,29 @@ +## Development Plan + +As Loki continues to evolve, our development plan focuses on broadening capabilities and enhancing flexibility to meet the diverse needs of our users. Here are the key areas we are working on: + +## 1. Support for Multiple Models +- **Broader Model Compatibility:** + - Integration with leading AI models besides ChatGPT and Claude to diversify fact-checking capabilities, including Command R and Gemini. + - Implementation of self-hosted model options for enhanced privacy and control, e.g., FastChat, TGI, and vLLM. + +## 2. Model-specific Prompt Engineering +- **Unit Testing for Prompts:** + - Develop robust unit tests for each step to ensure prompt reliability and accuracy across different scenarios. + +## 3. Expanded Search Engine Support +- **Diverse Search Engines:** + - Incorporate a variety of search engines including Bing, scraperapi to broaden search capabilities. + - Integration with [Searxng](https://github.com/searxng/searxng), an open-source metasearch engine. + - Support for specialized indexes like LlamaIndex and Langchain, and the ability to search local documents. + +## 4. Deployment and Scalability +- **Dockerization:** + - Packaging Loki into Docker containers to simplify deployment and scale-up operations, ensuring Loki can be easily set up and maintained across different environments. + +## 5. Multi-language Support +- **Language Expansion:** + - Support for additional languages beyond English, including Chinese, Arabic, etc, to cater to a global user base. + + +We are committed to these enhancements to make Loki not just more powerful, but also more adaptable to the needs of a global user base. Stay tuned as we roll out these exciting developments! diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..7f2fd71 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,9 @@ +# OpenFactVerification Documentation + +Welcome to the OpenFactVerification (Loki) documentation! This repository contains the codebase for the Loki project, which is a fact-checking pipeline that leverages state-of-the-art language models to verify the veracity of textual claims. The pipeline is designed to be modular, allowing users to easily customize the evidence retrieval, language model, and prompt used in the fact-checking process. + +## Table of Contents + +* To learn about how to use the Loki pipeline, please refer to the [User Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/user_guide.md). + +* To learn how to add a new language model support, new search engine support, or new prompt support, please refer to the [Development Guide](https://github.com/Libr-AI/OpenFactVerification/tree/main/docs/development_guide.md). diff --git a/docs/RELEASE_LOG.md b/docs/RELEASE_LOG.md new file mode 100644 index 0000000..4342092 --- /dev/null +++ b/docs/RELEASE_LOG.md @@ -0,0 +1,22 @@ +# Release Log + + +## v0.0.2 + +### New Features +1. **API Key Handling:** Transitioned from creating key files via copying to dynamically reading all API keys from a YAML file, streamlining configuration processes. +2. **Unified Configuration Dictionary:** Replaced platform-specific dictionaries with a unified dictionary that aligns with environmental variable naming conventions, enhancing consistency and maintainability. +3. **Model Switching:** Introduced a `--model` parameter that allows switching between different models, currently supporting OpenAI and Anthropic. +4. **Modular Architecture:** Restructured the codebase into one Base class file and individual class files for each model, enhancing modularity and clarity. +5. **Base Class Redefinition:** Redefined the Base class to abstract asynchronous operations and other functionalities. Users customizing models need only override three functions. +6. **Prompt Switching:** Added a `--prompt` parameter for switching between predefined prompts, initially supporting prompts for OpenAI and Anthropic. +7. **Prompt Definitions via YAML and JSON:** Enabled prompt definitions using YAML and JSON, allowing prompts to be automatically read from corresponding YAML or JSON files when the prompt parameter ends with `.yaml` or `.json`. +8. **Search Engine Switching:** Introduced a `--retriever` parameter to switch between different search engines, currently supporting Serper and Google. +9. **Webapp Frontend Optimization:** Optimized the web application frontend to prevent duplicate requests during processing, including disabling the submit button after a click and displaying a timer during processing. +10. **Client Switching:** introduce a `--client` parameter that allows switching between different client (chat API), currently support OpenAI compatible API (for local model and official model), and Anthropic chat API client. + + + +## v0.0.1 + +Initial release of Loki. diff --git a/docs/development_guide.md b/docs/development_guide.md new file mode 100644 index 0000000..5f5b3d6 --- /dev/null +++ b/docs/development_guide.md @@ -0,0 +1,45 @@ +# Loki Development Guide + +This documentation page provides a guide for developers to want to contribute to the Loki project, for versions v0.0.2 and later. + +## Loki Framework Introduction + +Loki leverage state-of-the-art language models to verify the veracity of textual claims. The pipeline is designed to be modular in `factcheck/core/`, which include the following components: + +- **Decomposer:** Breaks down extensive texts into digestible, independent claims, setting the stage for detailed analysis. +- **Checkworthy:** Assesses each claim's potential significance, filtering out vague or ambiguous statements to focus on those that truly matter. For example, vague claims like "MBZUAI has a vast campus" are considered unworthy because of the ambiguous nature of "vast." +- **Query Generator:** Transforms check-worthy claims into precise queries, ready to navigate the vast expanse of the internet in search of truth. +- **Evidence Retriever:** Ventures into the digital realm, retrieving relevant evidence that forms the foundation of informed verification. +- **ClaimVerify:** Examines the gathered evidence, determining the veracity of each claim to uphold the integrity of information. + +To support each component's functionality, Loki relies on the following utils: +- **Language Model:** Currently, 4 out of 5 components (including: Decomposer, Checkworthy, Query Generator, and ClaimVerify) use the language model (LLMs) to perform their tasks. The supported LLMs are defined in `factcheck/core/utils/llmclient/` and can be easily extended to support more LLMs. +- **Prompt:** The prompt is a crucial part of the LLMs, and is usually optimized for each LLM to achieve the best performance. The prompt is defined in `factcheck/core/utils/prompt/` and can be easily extended to support more prompts. + + +## New LLM Support + +A new LLM should be defined in `factcheck/core/utils/llmclient/` and should be a subclass of `BaseClient` from `factcheck/core/utils/llmclient/base.py`. The LLM should implement the `_call` method, which take a single string input and return a string output. + +> **_Note_:** +> To ensure the sanity of the pipeline, the output of the LLM should be a compiled-code-based string, which can be directly parsed by python `eval` method. Usually, the output should be a `list` or `dict` in the form of a string. + +We find that ChatGPT [json_mode](https://platform.openai.com/docs/guides/text-generation/json-mode) is a good choice for the LLM, as it can generate structured output. +To support a new LLM, you may need to implement a post-processing to convert the output of the LLM to a structured format. + +## New Search Engine (Retriever) Support + +Evidence retriever should be defined in `factcheck/core/Retriever/` and should be a subclass of `EvidenceRetriever` from `factcheck/core/Retriever/base.py`. The retriever should implement the `retrieve_evidence` method. + +## New Language Support + +To support a new language, you need to create a new file in `factcheck/utils/prompt/` with the name `