Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new project to alert on AK session errors #155

Merged
merged 4 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ demonstrate basic system features, integration APIs, and best practices.
| [JIRA Assignee From Google Calendar Workflow](./jira_google_calendar/assignee_from_schedule/) | Set Assignee in Jira ticket to the person currently on-call | jira, calendar |
| [Create calendar due date event for Jira ticket](./jira_google_calendar/deadline_to_event/) | When a new Jira issue is created, the workflow automatically generates a Google Calendar event with a deadline | calendar, jira |
| [Alert on missing Jira events](./ops/alert_on_missing_events/) | Send Slack alerts when AutoKitteh doesn't receive certain Jira events in time | Jira, Slack |
| [Alert on session errors](./ops/alert_on_session_errors/) | Send Slack alerts when AutoKitteh sessions end due to errors | Slack |
| [Pull Request Review Reminder (Purrr)](./purrr/) | Streamline code reviews and cut down turnaround time to merge pull requests | GitHub, Google Sheets, Slack |
| [Quickstart](./quickstart/) | Sample for quickstart | http |
| [Monitor PR until completion in Slack](./reviewkitteh/) | Create a Slack channel for each PR, update team leads until completion | slack, github, sheets |
Expand Down
55 changes: 55 additions & 0 deletions ops/alert_on_session_errors/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: Alert on session errors
description: Send Slack alerts when AutoKitteh sessions end due to errors
integrations: ["Slack"]
categories: ["Ops"]
---

# Alert on Session Errors

Send Slack alerts when AutoKitteh sessions end due to errors.

This is a detection tool for incidents due to unexpected exceptions
in workflows that are usually stable and dependable. It can also be
used as a development and debugging tool.

It gets triggered by the AutoKitteh scheduler every minute, on the minute,
to look for sessions that ended with an error status in the previous minute.

## Configuration and Deployment

### Cloud Usage

1. Generate a personal API auth token in the web UI:

- Click your user icon in the bottom-left corner of the page
- Click the "Client Setup" menu option to go to that page
- Click the "Generate Token" button, and copy the generated
[JWT](https://jwt.io/)

2. Import/upload the project
3. Initialize your connections
4. Set/modify these project variables:

- `AUTOKITTEH_API_BASE_URL` (default = `https://api.autokitteh.cloud`,
use `http://localhost:9980` for self-hosted servers)
- `AUTOKITTEH_UI_BASE_URL` (default = `https://app.autokitteh.cloud`,
use `http://localhost:9982` for self-hosted servers)
- `AUTOKITTEH_AUTH_TOKEN`: the API auth token generated in step 1 above
- `SLACK_CHANNEL`: send alert messages to this Slack channel name/ID
(default = `autokitteh-alerts`)

5. Deploy the project

### Self-Hosted Usage

Generate a personal API auth token, by running this CLI command:

```shell
ak auth create-token
```

Follow [these detailed instructions](https://docs.autokitteh.com/get_started/deployment)
to deploy the project on a self-hosted server.

Also follow the instructions in the [Cloud Usage](#cloud-usage) section above.
26 changes: 26 additions & 0 deletions ops/alert_on_session_errors/autokitteh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# This YAML file is a declarative manifest that describes the setup of an
# AutoKitteh project that sends Slack alerts when sessions end due to errors.

version: v1

project:
name: alert_on_session_errors

vars:
- name: AUTOKITTEH_API_BASE_URL
value: "https://api.autokitteh.cloud"
- name: AUTOKITTEH_UI_BASE_URL
value: "https://app.autokitteh.cloud"
- name: AUTOKITTEH_AUTH_TOKEN
value: ""
- name: SLACK_CHANNEL
value: autokitteh-alerts

connections:
- name: slack_conn
integration: slack

triggers:
- name: monitor_schedule
schedule: "@every 1m"
call: program.py:on_monitor_schedule
64 changes: 64 additions & 0 deletions ops/alert_on_session_errors/program.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
"""Send Slack alerts when AutoKitteh sessions end due to errors.

See the configuration and deployment instructions in the README.md file.
"""

from datetime import datetime, timedelta, UTC
import json
import os
from urllib.parse import urljoin

from autokitteh.slack import slack_client
import requests
from requests import exceptions


API_BASE_URL = os.getenv("AUTOKITTEH_API_BASE_URL", "")
UI_BASE_URL = os.getenv("AUTOKITTEH_UI_BASE_URL", "")
JWT = os.getenv("AUTOKITTEH_AUTH_TOKEN", "")
SLACK_CHANNEL = os.getenv("SLACK_CHANNEL", "")

slack = slack_client("slack_conn")


def on_monitor_schedule(event):
"""Triggered at the beginning of every minute, so it covers the previous one."""
end_time = datetime.now(UTC).replace(second=0, microsecond=0)
start_time = end_time - timedelta(minutes=1)

count = 0
for session in reversed(_list_sessions_with_errors()):
session_updated = datetime.fromisoformat(session["updatedAt"])
if start_time <= session_updated < end_time:
count += 1
_log_error(session)

print(f"Found {count} sessions with new errors")


def _list_sessions_with_errors():
url = urljoin(API_BASE_URL, "autokitteh.sessions.v1.SessionsService/List")
headers = {"Content-Type": "application/json"}
if JWT: # Servers in dev mode don't require auth.
headers["Authorization"] = "Bearer " + JWT

resp = requests.post(url, headers=headers, json={"stateType": 3}, timeout=10)
print(f"API call's Round Trip Time: {resp.elapsed}")
resp.raise_for_status()

try:
return resp.json().get("sessions", [])
except exceptions.JSONDecodeError:
print(f"Response headers: {resp.headers}")
print(f"Response text: {resp.text}")
raise


def _log_error(session):
data = json.dumps(session, indent=True)
print(data)

pid, did = session["projectId"], session["deploymentId"]
path = f"/projects/{pid}/deployments/{did}/sessions/{session['sessionId']}"
msg = f"Error in AutoKitteh session: {urljoin(UI_BASE_URL, path)}\n```{data}```"
slack.chat_postMessage(channel=SLACK_CHANNEL, text=msg)
Loading