Added retriever for Tavily #1797

RamXX · 2024-11-14T04:20:25Z

A DSPy retriever module that uses Tavily's Search API to perform web searches and return relevant content.

This retriever supports both basic and advanced search modes, general and news topics, and can include
both synthesized answers and raw content from web pages. Results are returned with source URLs for
reference.

Args:
    api_key (Optional[str]): Tavily API key. If not provided, will look for TAVILY_API_KEY in environment 
        variables. Defaults to None.
    k (int): Maximum number of results to return (including the answer if include_answer=True). 
        Defaults to 5.
    search_depth (Literal["basic", "advanced"]): The depth of search to perform. For concrete definitions of each
        please review the Tavily documentation. Defaults to "basic".
    topic (Literal["general", "news"]): Type of search to perform. "general" for regular web search, 
        "news" for news articles. Defaults to "general".
    days (int): For news searches, the maximum age of articles in days. Only used when topic="news". 
        Defaults to 3.
    include_answer (bool): Whether to include Tavily's synthesized answer as the first result. The 
        answer will include references to the source URLs. Defaults to True.
    include_raw_content (bool): Whether to return the full raw content of pages instead of snippets. 
        When True, raw_content replaces regular content where available. Defaults to False.
    include_images (bool): Whether to include image results. Note: Even if images are returned by Tavily, 
        they are not included in the DSPy retriever output. Use the Tavily API directly for image results. 
        Defaults to False.
    include_image_descriptions (bool): Whether to include descriptions for returned images. Only used 
        when include_images=True. See above note about images. Defaults to False.
    include_domains (Optional[List[str]]): List of domains to restrict the search to. Defaults to None.
    exclude_domains (Optional[List[str]]): List of domains to exclude from the search. Defaults to None.
    include_urls (bool): Whether to append source URLs to content and answers. Defaults to False.

Returns:
    dspy.Prediction: A list of dotdict objects, each containing a 'long_text' field with either:
        - A synthesized answer with source URLs (if include_answer=True and include_urls=True)
        - Content snippets with their source URLs (if include_urls=True)
        - Raw content with source URLs (if include_raw_content=True and include_urls=True)

Example:

import dspy
from dspy.retrieve import TavilyRM

# Initialize with default settings
retriever = TavilyRM(api_key="your-api-key")

# Or customize the behavior
retriever = TavilyRM(
    api_key="your-api-key",
    k=3,
    search_depth="advanced",
    topic="news",
    days=7,
    include_answer=True,
    include_domains=["example.com", "trusteddomain.com"]
)

# Use as the default retriever
dspy.settings.configure(rm=retriever)

# Or use directly
results = retriever("What are the latest developments in AI?")

Note:
The retriever requires a valid Tavily API key.
The API key can be obtained from https://tavily.com/.
For detailed API documentation, see:
https://docs.tavily.com/docs/python-sdk/tavily-search/api-reference

Author:
@RamXX (Ramiro Salas)

Added retriever for Tavily

1a747bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added retriever for Tavily #1797

Added retriever for Tavily #1797

RamXX commented Nov 14, 2024

Added retriever for Tavily #1797

Are you sure you want to change the base?

Added retriever for Tavily #1797

Conversation

RamXX commented Nov 14, 2024