Operations 14 min read

Design and Implementation of an Alert Scheduling System (GoAlert) and Notification Center

This article explains why alerts and on‑call scheduling are needed, outlines the core principles of an alert scheduling system, describes the architecture evolution from PagerDuty to GoAlert and Notice‑Center, and details the implementation, code snippets, and future outlook for a comprehensive operations monitoring solution.

Liulishuo Tech Team
Liulishuo Tech Team
Liulishuo Tech Team
Design and Implementation of an Alert Scheduling System (GoAlert) and Notification Center

1. Why Alerts

According to the Google SRE book, a monitoring system should have three output types, one of which is alerts.

Urgent alerts require immediate action to resolve an existing issue or prevent an upcoming one.

2. Why Scheduling

Alerts may be related to an application or its runtime environment; they should be sent to the responsible app owners or the ops team respectively.

Sending alerts to many people simultaneously is inefficient; a scheduling mechanism ensures a single person receives alerts for a given period, reducing operational and communication costs.

3. Core of Alert Scheduling System

Accuracy: alerts go to people who can resolve them.

Reachability: ensure recipients receive the alert.

Aggregation: combine related alerts into one.

Timely response: guarantee alerts are handled promptly.

4. Architecture Evolution

We replaced PagerDuty with GoAlert (scheduling) + Notice‑Center (notification).

GoAlert aligns with the internal personnel system; login via QR code reduces overhead.

Notice‑Center supports multiple notification channels, meeting business needs.

Mercury alert conversion service connects GoAlert with various cloud monitoring sources.

Scheduling policies are generated per business.

5. GoAlert Scheduling System

5.1 Purpose

Determine who should receive alerts for a given app or team during a time window.

5.2 Advantages

Never‑miss alerts: GoAlert escalates through layers if the current on‑call person is unavailable.

Simplifies on‑call management via escalation policies.

Customizable integrations with existing monitoring and telemetry systems.

Mobile‑friendly Web UI for acknowledging and closing alerts.

5.3 Alert Flow

1. Prometheus triggers alerts to Alertmanager.

2. Alertmanager groups, silences, and routes alerts to GoAlert via integration keys based on app/team labels.

3. GoAlert selects the first‑layer on‑call person using the escalation policy and forwards details to Notice‑Center.

4. Notice‑Center chooses notification channels, looks up contact info, renders the message, and sends it to the recipient.

5.4 Login Method

GoAlert supports OIDC login; Dex proxies internal LLS OAuth to OIDC for seamless QR‑code login.

5.5 GoAlert Resource Definitions

rotation: shift table (daily/weekly rotation).

schedule: precise schedule binding multiple rotations or users.

escalation policy: defines which schedule or person receives each escalation layer.

service: binds escalation policy and integration keys.

integration keys: keys for Alertmanager, Mercury, etc., to send alerts.

5.6 Custom Development

Added a snooze table to silence alerts for a configurable period after acknowledgment, preventing missed notifications while allowing automatic re‑activation.

func (db *DB) _createOrUpdate(ctx context.Context, tx *sql.Tx, sz *AlertSnooze) (*AlertSnooze, error) { ... }

6. Notice‑Center Notification System

6.1 Purpose

Deliver alerts to users via specified channels.

6.2 Main Functions

Three micro‑services: information query, scheduling, and notification.

Information query service provides contact info from corporate chat systems.

Scheduling service determines channel, renders templates, merges duplicate messages, and forwards to notification service.

Notification service sends rendered messages through email, phone, or corporate chat APIs.

6.3 Notification Flow

1. Scheduling service receives an alert, parses channel and template.

2. Queries contact info.

3. Sends channel, template, and contact info to notification service.

4. Notification service dispatches the message via the chosen channel.

func (d *Dispatch) Send(ctx context.Context, request *service_v1.DispatcherSendRequest) (*service_v1.DispatcherSendResponse, error) { ... }

7. Mercury Service (DevOps)

7.1 Purpose

Provide alert conversion and automation to reduce operational cost.

7.2 Main Features

Expose webhook for third‑party alerts, convert to GoAlert format.

Automatically generate on‑call policies in GoAlert from internal catalog.

Auto‑generate Alertmanager config when creating services and integration keys.

Auto‑create contact methods for new users.

Auto‑follow on‑call services for users.

7.3 Default Escalation Templates

Level 1: app owner + members weekly rotation (15 min escalation).

Level 2: all app owners (15 min escalation).

Level 3: group/team members weekly rotation (repeated three times).

7.4 Implementation

Mercury interfaces and GoAlert resource generation code snippets.

var appInfo catalog.AppInfo ...

Alert conversion logic.

if alarmDetail.NewStateValue == "ALARM" { alarm.EventType = config.GoalertTriggerStatus } else if alarmDetail.NewStateValue == "OK" { alarm.EventType = config.GoalertCloseStatus } else { return fmt.Errorf("unknown alert status %s", alarmDetail.NewStateValue) }

8. Summary

8.1 How the Existing System Meets Core Alert Requirements

Aggregation via Alertmanager grouping and inhibition.

Accuracy by creating a service per app with custom escalation policies.

Reachability through multi‑channel notification center with retry mechanisms.

Response ensured by automatic escalation to the next layer.

8.2 Outlook

Future plans to integrate the notification center with various infra‑team messages (CI/CD, tracing, etc.) turning it into a unified message hub.

monitoringoperationsalertingnotification systemgoalerton-call scheduling
Liulishuo Tech Team
Written by

Liulishuo Tech Team

Help everyone become a global citizen!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.