Enterprise only. Custom alerting requires the Editor role on your project’s Grafana workspace, which is available to enterprise customers. If you would like access, contact support@goldsky.com or reach out to your account manager.
Overview
Every enterprise project gets a dedicated Grafana workspace with the full set of Turbo pipeline metrics (Kafka lag, block lag, checkpoint duration, sink flush latency, and more). With the Editor role enabled, you can create your own alert rules on any of these metrics and route notifications to Slack, email, or any other Grafana-supported contact point. This page walks through the end-to-end setup.Prerequisites
Before you start, make sure:- Editor access is enabled for your account. Contact support@goldsky.com to confirm this is set up before you start.
- You have a notification destination ready. For Slack, create an Incoming Webhook in the target workspace and copy the webhook URL. For email, gather the recipient addresses you want to notify.
Open the Grafana workspace
- Sign in to the Goldsky dashboard.
- Navigate to any Turbo pipeline.
- Open the Metrics tab.
- Click Advanced metrics in the top-right corner.
goldsky-prometheus datasource.
Create a contact point
Contact points tell Grafana where to send notifications when an alert fires. Create them once and reuse them across many alert rules.Slack
- In the Grafana sidebar, go to Alerting → Contact points.
- Click + Add contact point.
- Fill in the form:
- Name: a descriptive label such as
slack-pipeline-alerts. - Integration: select Slack.
- Webhook URL: paste the Slack Incoming Webhook URL from your Slack workspace.
- Name: a descriptive label such as
- Click Test to send a test message. Confirm it arrives in the expected Slack channel.
- Click Save contact point.
- Go to Alerting → Contact points → + Add contact point.
- Fill in the form:
- Name: e.g.
email-oncall. - Integration: select Email.
- Addresses: enter one or more recipients separated by semicolons, e.g.
ops@example.com;oncall@example.com.
- Name: e.g.
- Click Test, then Save contact point.
Other supported integrations
The Goldsky workspace runs open-source Grafana, so you can pick any of the built-in Grafana Alerting contact point integrations when creating a contact point:- Alertmanager
- AWS SNS
- Cisco Webex Teams
- DingDing
- Discord
- Google Chat
- Kafka REST Proxy
- LINE
- Microsoft Teams
- MQTT
- Opsgenie
- PagerDuty
- Pushover
- Sensu Go
- Slack
- Telegram
- Threema Gateway
- VictorOps
- Webhook
- WeCom
Create an alert rule
Alert rules define the condition that should trigger a notification.- Go to Alerting → Alert rules.
- Click + New alert rule.
- Under Define query and alert condition:
- Datasource: select
goldsky-prometheus. - Query: write a PromQL expression for the metric you want to alert on. See common alert queries below for starting points.
- Datasource: select
- Under Set alert evaluation behavior:
- Evaluate every:
1mis a reasonable default. - For: how long the condition must continuously hold before the alert fires.
5mis a good starting point to avoid flapping on transient spikes.
- Evaluate every:
- Under Configure labels and notifications:
- Select the contact point you created in the previous step.
- Add labels such as
severity=warningorseverity=criticalto help with routing and filtering later.
- Give the rule a descriptive name and click Save rule and exit.
Normal, Pending, or Firing — on the Alert rules page.
Common alert queries
A quick reference of the five alerts we recommend setting up. Suggested conditions are starting points — tune thresholds for your workload.| Alert on | Suggested condition |
|---|---|
| Pipeline falling behind (block lag) | above 60 for 10m |
| Kafka consumer lag growing | above your baseline for 15m |
| Checkpoint failures | above 0 for 5m |
| Sink flush latency spike (P95, ms) | above 2000 for 10m |
| Pipeline not producing output | below 1 for 10m |
Starter PromQL queries
Expand an alert to see the full PromQL expression. Paste it into the Query field when you create the alert rule.Block lag
Block lag
60–120 seconds is a common choice for real-time pipelines.Kafka consumer lag
Kafka consumer lag
Checkpoint failures
Checkpoint failures
Sink flush latency P95 (ms)
Sink flush latency P95 (ms)
Pipeline not producing output
Pipeline not producing output
Each pipeline is identified by its
service_instance_id label. Use Grafana’s Explore view with the goldsky-prometheus datasource to browse all available metrics and labels for your project. Scope any query to a single pipeline with {service_instance_id=~".*my-pipeline"}.Recommended starter alerts
Three alerts cover most production incidents. Create these first:- Checkpoint failures — critical; catches lost state on every pipeline.
- Block lag — catches pipelines falling behind the chain tip.
- Sink flush latency — catches database slowdowns before they cascade into lag.
Troubleshooting
I can't see an 'Alerting' section in the Grafana sidebar
I can't see an 'Alerting' section in the Grafana sidebar
Test notification works, but alerts never fire
Test notification works, but alerts never fire
Check that your alert rule’s query returns data in Grafana’s Explore view — if the query returns no series, the rule will stay in
Normal forever. Also confirm the threshold direction (above vs below) matches what the metric actually does when the condition you care about occurs.Alerts fire too often on transient spikes
Alerts fire too often on transient spikes
Increase the For duration so the condition must hold longer before firing.
5m or 10m is usually enough to filter out brief spikes while still catching real incidents.