Skip to content

Latest commit

 

History

History
156 lines (101 loc) · 8.81 KB

File metadata and controls

156 lines (101 loc) · 8.81 KB

🔎 GPT Researcher

Official Website Discord Follow GitHub Repo stars Twitter Follow

GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks.

The agent can produce detailed, factual and unbiased research reports, with customization options for focusing on relevant resources, outlines, and lessons. Inspired by AutoGPT and the recent Plan-and-Solve paper, GPT Researcher addresses issues of speed and determinism, offering a more stable performance and increased speed through parallelized agent work, as opposed to synchronous operations.

Our mission is to empower individuals and organizations with accurate, unbiased, and factual information by leveraging the power of AI.

Why GPT Researcher?

  • To form objective conclusions for manual research tasks can take time, sometimes weeks to find the right resources and information.
  • Current LLMs are trained on past and outdated information, with heavy risks of hallucinations, making them almost irrelevant for research tasks.
  • Solutions that enable web search (such as ChatGPT + Web Plugin), only consider limited resources that in some cases result in superficial conclusions or biased answers.
  • Using only a selection of resources can create bias in determining the right conclusions for research questions or tasks.

Architecture

The main idea is to run "planner" and "execution" agents, whereas the planner generates questions to research, and the execution agents seek the most related information based on each generated research question. Finally, the planner filters and aggregates all related information and creates a research report. The agents leverage both gpt3.5-turbo-16k and gpt-4 to complete a research task.

More specifcally:

  • Generate a set of research questions that together form an objective opinion on any given task.
  • For each research question, trigger a crawler agent that scrapes online resources for information relevant to the given task.
  • For each scraped resources, summarize based on relevant information and keep track of its sources.
  • Finally, filter and aggregate all summarized sources and generate a final research report.

Demo

gpt-researcher-demo.mp4

Tutorials

Features

  • 📝 Generate research, outlines, resources and lessons reports
  • 🌐 Aggregates over 20 web sources per research to form objective and factual conclusions
  • 🖥️ Includes an easy-to-use web interface (HTML/CSS/JS)
  • 🔍 Scrapes web sources with javascript support
  • 📂 Keeps track and context of visited and used web sources
  • 📄 Export research reports to PDF and more...

Quickstart

Step 0 - Install Python 3.11 or later. See here for a step-by-step guide.


Step 1 - Download the project

$ git clone https://github.com/assafelovic/gpt-researcher.git
$ cd gpt-researcher

Step 2 - Install dependencies

$ pip install -r requirements.txt

Step 3 - Create .env file with your OpenAI Key or simply export it

$ export OPENAI_API_KEY={Your API Key here}
  • By default we use OpenAI, but you can use any other LLM model (including open sources) supported by Langchain Adapter, simply change the model name in config/config.py and provider in agent/llm_utils.py file. Follow this guide to learn how to integrate LLMs with Langchain.
  • We highly recommend using GPT models for optimal performance.

Step 4 - Run the agent with FastAPI

$ uvicorn main:app --reload

Step 5 - Go to http://localhost:8000 on any browser and enjoy researching!

Try it with Docker

Step 1 - Install Docker

Follow instructions at https://docs.docker.com/engine/install/

Step 2 - Create .env file with your OpenAI Key or simply export it

$ export OPENAI_API_KEY={Your API Key here}

Step 3 - Run the application

$ docker-compose up

Step 4 - Go to http://localhost:8000 on any browser and enjoy researching!

🚀 Contributing

We highly welcome contributions! Please check out contributing if you're interested.

Please check out our roadmap page and reach out to us via our Discord community if you're interested in joining our mission.

🔧 Troubleshooting

We're constantly working to provide a more stable version. In the meantime, see here for known issues:

model: gpt-4 does not exist This relates to not having permission to use gpt-4 yet. Based on OpenAI, it will be widely available for all by end of July.

cannot load library 'gobject-2.0-0'

The issue relates to the library WeasyPrint (which is used to generate PDFs from the research report). Please follow this guide to resolve it: https://doc.courtbouillon.org/weasyprint/stable/first_steps.html

Error processing the url

We're using Selenium for site scraping. Some sites fail to be scraped. In these cases, restart and try running again.

Chrome version issues

Many users have an issue with their chromedriver because the latest chrome browser version doesn't have a compatible chrome driver yet.

To downgrade your Chrome web browser using slimjet, follow these steps. First, visit the website and scroll down to find the list of available older Chrome versions. Choose the version you wish to install making sure it's compatible with your operating system. Once you've selected the desired version, click on the corresponding link to download the installer. Before proceeding with the installation, it's crucial to uninstall your current version of Chrome to avoid conflicts.

It's important to check if the version you downgrade to, has a chromedriver available in the official chrome driver website

If none of the above work, you can try out our hosted beta

🛡 Disclaimer

This project, GPT Researcher, is an experimental application and is provided "as-is" without any warranty, express or implied. We are sharing codes for academic purposes under the MIT license. Nothing herein is academic advice, and NOT a recommendation to use in academic or research papers.

Our view on unbiased research claims:

  1. The whole point of our scraping system is to reduce incorrect fact. How? The more sites we scrape the less chances of incorrect data. We are scraping 20 per research, the chances that they are all wrong is extremely low.
  2. We do not aim to eliminate biases; we aim to reduce it as much as possible. We are here as a community to figure out the most effective human/llm interactions.
  3. In research, people also tend towards biases as most have already opinions on the topics they research about. This tool scrapes many opinions and will evenly explain diverse views that a biased person would never have read.

Please note that the use of the GPT-4 language model can be expensive due to its token usage. By utilizing this project, you acknowledge that you are responsible for monitoring and managing your own token usage and the associated costs. It is highly recommended to check your OpenAI API usage regularly and set up any necessary limits or alerts to prevent unexpected charges.