summarize-topic: Add a tool to summarize topic. #834

amanagr · 2024-10-24T05:25:13Z

For testing you need access token from https://huggingface.co/settings/tokens (or set the correct env variable with the access token if using a different model)

Then set os.environ["HUGGINGFACE_API_KEY"] = "YOUR_API_KEY" in tools/summarize-topic.

Just run tools/summarize-topic to generate same summary.

$ ./tools/summarize-topic --help
usage: summarize-topic [-h] [--url URL] [--model MODEL]

options:
  -h, --help     show this help message and exit
  --url URL      The URL to fetch content from
  --model MODEL  The model name to use for summarization

NOTE: only topic links are supported right now.

requirements.txt

tools/summarize-topic

timabbott · 2024-10-30T20:44:17Z

I did some quick reworking of the output and README, fixed an extra l in the directory names, and added a bit of error handling. Here's the diff:

diff --git a/zulip/integrations/litellm/README.md b/zulip/integrations/litellm/README.md
index 81fd645a..9d66fcb4 100644
--- a/zulip/integrations/litellm/README.md
+++ b/zulip/integrations/litellm/README.md
@@ -2,28 +2,31 @@
 
 Generate a short summary of the last 100 messages in the provided topic URL.
 
-
 ### API Keys
 
-For testing you need access token from https://huggingface.co/settings/tokens (or set the correct env variable with the access token if using a different model)
+For testing you need access token from
+https://huggingface.co/settings/tokens (or set the correct env
+variable with the access token if using a different model)
+
+In `~/.zuliprc` add a section named `litellm` and set the api key for
+the model you are trying to use.  For example:
 
-In `~/.zuliprc` add a section named `LITELLM_API_KEYS` and set the api key for the model you are trying to use.
-For example:

-[LITELLM_API_KEYS]
+[litellm]
HUGGINGFACE_API_KEY=YOUR_API_KEY


### Setup

```bash
-$ pip install -r zulip/integrations/litelllm/requirements.txt
+$ pip install -r zulip/integrations/litellm/requirements.txt

-Just run zulip/integrations/litelllm/summarize-topic to generate sample summary.
+Just run zulip/integrations/litellm/summarize-topic to generate
+sample summary.

-$ zulip/integrations/litelllm/summarize-topic --help
+$ zulip/integrations/litellm/summarize-topic --help
usage: summarize-topic [-h] [--url URL] [--model MODEL]

options:
diff --git a/zulip/integrations/litellm/summarize-topic b/zulip/integrations/litellm/summarize-topic
index 5d536cbd..901017b0 100755
--- a/zulip/integrations/litellm/summarize-topic
+++ b/zulip/integrations/litellm/summarize-topic
@@ -10,27 +10,6 @@ from litellm import completion  # type: ignore[import-not-found]

import zulip

-config_file = zulip.get_default_config_filename()
-if not config_file:
-    print("Could not find the Zulip configuration file. Please read the provided README.")
-    sys.exit()
-
-client = zulip.Client(config_file=config_file)
-
-config = ConfigParser()
-# Make config parser case sensitive otherwise API keys will be lowercased
-# which is not supported by litellm.
-# https://docs.python.org/3/library/configparser.html#configparser.ConfigParser.optionxform
-config.optionxform = str  # type: ignore[assignment, method-assign]
-
-with open(config_file) as f:
-    config.read_file(f, config_file)
-
-# Set all the keys in `LITELLM_API_KEYS` as environment variables.
-for key in config["LITELLM_API_KEYS"]:
-    print("Setting key:", key)
-    os.environ[key] = config["LITELLM_API_KEYS"][key]
-
if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
@@ -45,8 +24,48 @@ if __name__ == "__main__":
        help="The model name to use for summarization",
        default="huggingface/meta-llama/Meta-Llama-3-8B-Instruct",
    )
+    parser.add_argument(
+        "--max-tokens",
+        type=int,
+        help="The maximum tokens permitted in the response",
+        default=100,
+    )
+    parser.add_argument(
+        "--max-messages",
+        type=int,
+        help="The maximum number of messages fetched from the server",
+        default=100,
+    )
+    parser.add_argument(
+        "--verbose",
+        type=bool,
+        help="Print verbose debugging output",
+        default=False,
+    )
    args = parser.parse_args()

+    config_file = zulip.get_default_config_filename()
+    if not config_file:
+        print("Could not find the Zulip configuration file. Please read the provided README.")
+        sys.exit()
+
+    client = zulip.Client(config_file=config_file)
+
+    config = ConfigParser()
+    # Make config parser case sensitive otherwise API keys will be lowercased
+    # which is not supported by litellm.
+    # https://docs.python.org/3/library/configparser.html#configparser.ConfigParser.optionxform
+    config.optionxform = str  # type: ignore[assignment, method-assign]
+
+    with open(config_file) as f:
+        config.read_file(f, config_file)
+
+    # Set all the keys in `litellm` as environment variables.
+    for key in config["litellm"]:
+        if args.verbose:
+            print("Setting key:", key)
+        os.environ[key] = config["litellm"][key]
+
    url = args.url
    model = args.model

@@ -64,33 +83,48 @@ if __name__ == "__main__":

    request = {
        "anchor": "newest",
-        "num_before": 100,
+        "num_before": args.max_messages,
        "num_after": 0,
        "narrow": narrow,
+        # Fetch raw Markdown, not HTML
        "apply_markdown": False,
    }
    result = client.get_messages(request)
+    if result["result"] == "error":
+        print("Failed fetching message history", result)
+        sys.exit(1)
    messages = result["messages"]

+    if len(messages) == 0:
+        print("No messages in conversation to summarize")
+        sys.exit(0)
+
    formatted_messages = [
        {"content": f"{message['sender_full_name']}: {message['content']}", "role": "user"}
        for message in messages
    ]

    # Provide a instruction if using an `Instruct` model.
-    # There is a 100 token output limit by hugging face.
    if "Instruct" in model:
        formatted_messages.append(
-            {"content": "Summarize the above content within 90 words.", "role": "user"}
+            {
+                "content": """
+Summarize the above content within 90 words.
+""",
+                "role": "user",
+            }
        )

    # Send formatted messages to the LLM model for summarization
    response = completion(
+        max_tokens=args.max_tokens,
        model=model,
        messages=formatted_messages,
    )

-    print("Server response:\n", response)
-    print("\n\nGenerated summary for URL:", url)
-    print("Summary:")
+    print("Summarized conversation URL:", url)
+    print(
+        f"Used {response['usage']['total_tokens']} tokens to summarize {len(formatted_messages)} Zulip messages."
+    )
+    print()
    print(response["choices"][0]["message"]["content"])

zulipbot added the size: L label Oct 24, 2024

amanagr mentioned this pull request Oct 24, 2024

summarize-topic: Add a tool to summarize topic. zulip/zulip#31897

Closed

amanagr force-pushed the llm_summaries branch 3 times, most recently from 0166f56 to 9eef6a4 Compare October 24, 2024 06:56

timabbott reviewed Oct 24, 2024

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

timabbott reviewed Oct 24, 2024

View reviewed changes

tools/summarize-topic Outdated Show resolved Hide resolved

timabbott reviewed Oct 24, 2024

View reviewed changes

tools/summarize-topic Outdated Show resolved Hide resolved

amanagr force-pushed the llm_summaries branch from 9eef6a4 to 051690f Compare October 25, 2024 10:14

zulipbot added size: XL and removed size: L labels Oct 25, 2024

amanagr force-pushed the llm_summaries branch from 051690f to f0f8c59 Compare October 25, 2024 10:33

litellm: Add a tool to summarize a topic.

c04a172

timabbott force-pushed the llm_summaries branch from f0f8c59 to c04a172 Compare October 30, 2024 20:44

timabbott merged commit c04a172 into zulip:main Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

summarize-topic: Add a tool to summarize topic. #834

summarize-topic: Add a tool to summarize topic. #834

amanagr commented Oct 24, 2024 •

edited

Loading

timabbott commented Oct 30, 2024

summarize-topic: Add a tool to summarize topic. #834

summarize-topic: Add a tool to summarize topic. #834

Conversation

amanagr commented Oct 24, 2024 • edited Loading

timabbott commented Oct 30, 2024

amanagr commented Oct 24, 2024 •

edited

Loading