Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added retriever for Tavily #1797

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RamXX
Copy link

@RamXX RamXX commented Nov 14, 2024

A DSPy retriever module that uses Tavily's Search API to perform web searches and return relevant content.

This retriever supports both basic and advanced search modes, general and news topics, and can include
both synthesized answers and raw content from web pages. Results are returned with source URLs for
reference.

Args:
    api_key (Optional[str]): Tavily API key. If not provided, will look for TAVILY_API_KEY in environment 
        variables. Defaults to None.
    k (int): Maximum number of results to return (including the answer if include_answer=True). 
        Defaults to 5.
    search_depth (Literal["basic", "advanced"]): The depth of search to perform. For concrete definitions of each
        please review the Tavily documentation. Defaults to "basic".
    topic (Literal["general", "news"]): Type of search to perform. "general" for regular web search, 
        "news" for news articles. Defaults to "general".
    days (int): For news searches, the maximum age of articles in days. Only used when topic="news". 
        Defaults to 3.
    include_answer (bool): Whether to include Tavily's synthesized answer as the first result. The 
        answer will include references to the source URLs. Defaults to True.
    include_raw_content (bool): Whether to return the full raw content of pages instead of snippets. 
        When True, raw_content replaces regular content where available. Defaults to False.
    include_images (bool): Whether to include image results. Note: Even if images are returned by Tavily, 
        they are not included in the DSPy retriever output. Use the Tavily API directly for image results. 
        Defaults to False.
    include_image_descriptions (bool): Whether to include descriptions for returned images. Only used 
        when include_images=True. See above note about images. Defaults to False.
    include_domains (Optional[List[str]]): List of domains to restrict the search to. Defaults to None.
    exclude_domains (Optional[List[str]]): List of domains to exclude from the search. Defaults to None.
    include_urls (bool): Whether to append source URLs to content and answers. Defaults to False.

Returns:
    dspy.Prediction: A list of dotdict objects, each containing a 'long_text' field with either:
        - A synthesized answer with source URLs (if include_answer=True and include_urls=True)
        - Content snippets with their source URLs (if include_urls=True)
        - Raw content with source URLs (if include_raw_content=True and include_urls=True)

Example:

import dspy
from dspy.retrieve import TavilyRM

# Initialize with default settings
retriever = TavilyRM(api_key="your-api-key")

# Or customize the behavior
retriever = TavilyRM(
    api_key="your-api-key",
    k=3,
    search_depth="advanced",
    topic="news",
    days=7,
    include_answer=True,
    include_domains=["example.com", "trusteddomain.com"]
)

# Use as the default retriever
dspy.settings.configure(rm=retriever)

# Or use directly
results = retriever("What are the latest developments in AI?")

Note:
The retriever requires a valid Tavily API key.
The API key can be obtained from https://tavily.com/.
For detailed API documentation, see:
https://docs.tavily.com/docs/python-sdk/tavily-search/api-reference

Author:
@RamXX (Ramiro Salas)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant