Setting up custom Grafana alerts for Turbo pipelines

Enterprise only. Custom alerting requires the Editor role on your project’s Grafana workspace, which is available to enterprise customers. If you would like access, contact support@goldsky.com or reach out to your account manager.

Overview

Every enterprise project gets a dedicated Grafana workspace with the full set of Turbo pipeline metrics (Kafka lag, block lag, checkpoint duration, sink flush latency, and more). With the Editor role enabled, you can create your own alert rules on any of these metrics and route notifications to Slack, email, or any other Grafana-supported contact point. This page walks through the end-to-end setup.

Prerequisites

Before you start, make sure:

Editor access is enabled for your account. Contact support@goldsky.com to confirm this is set up before you start.
You have a notification destination ready. For Slack, create an Incoming Webhook in the target workspace and copy the webhook URL. For email, gather the recipient addresses you want to notify.

Open the Grafana workspace

Sign in to the Goldsky dashboard.
Navigate to any Turbo pipeline.
Open the Metrics tab.
Click Advanced metrics in the top-right corner.

This opens your project’s Grafana workspace in a new tab, pre-authenticated and scoped to your project. All of your pipelines’ metrics are available through the goldsky-prometheus datasource.

Create a contact point

Contact points tell Grafana where to send notifications when an alert fires. Create them once and reuse them across many alert rules.

Slack

In the Grafana sidebar, go to Alerting → Contact points.
Click + Add contact point.
Fill in the form:
- Name: a descriptive label such as slack-pipeline-alerts.
- Integration: select Slack.
- Webhook URL: paste the Slack Incoming Webhook URL from your Slack workspace.
Click Test to send a test message. Confirm it arrives in the expected Slack channel.
Click Save contact point.

Email

Go to Alerting → Contact points → + Add contact point.
Fill in the form:
- Name: e.g. email-oncall.
- Integration: select Email.
- Addresses: enter one or more recipients separated by semicolons, e.g. ops@example.com;oncall@example.com.
Click Test, then Save contact point.

Other supported integrations

The Goldsky workspace runs open-source Grafana, so you can pick any of the built-in Grafana Alerting contact point integrations when creating a contact point:

Alertmanager
AWS SNS
Cisco Webex Teams
DingDing
Discord
Email
Google Chat

Kafka REST Proxy
LINE
Microsoft Teams
MQTT
Opsgenie
PagerDuty
Pushover

Sensu Go
Slack
Telegram
Threema Gateway
VictorOps
Webhook
WeCom

See the Grafana contact points reference for the full list of fields each integration requires.

You can create multiple contact points and pick different ones per alert rule — for example, a low-severity Slack channel for warnings and a PagerDuty or email group for critical alerts.

Create an alert rule

Alert rules define the condition that should trigger a notification.

Go to Alerting → Alert rules.
Click + New alert rule.
Under Define query and alert condition:
- Datasource: select goldsky-prometheus.
- Query: write a PromQL expression for the metric you want to alert on. See common alert queries below for starting points.
Under Set alert evaluation behavior:
- Evaluate every: 1m is a reasonable default.
- For: how long the condition must continuously hold before the alert fires. 5m is a good starting point to avoid flapping on transient spikes.
Under Configure labels and notifications:
- Select the contact point you created in the previous step.
- Add labels such as severity=warning or severity=critical to help with routing and filtering later.
Give the rule a descriptive name and click Save rule and exit.

The rule will begin evaluating immediately. You can see its current state — Normal, Pending, or Firing — on the Alert rules page.

Common alert queries

A quick reference of the five alerts we recommend setting up. Suggested conditions are starting points — tune thresholds for your workload.

Alert on	Suggested condition
Pipeline falling behind (block lag)	above `60` for `10m`
Kafka consumer lag growing	above your baseline for `15m`
Checkpoint failures	above `0` for `5m`
Sink flush latency spike (P95, ms)	above `2000` for `10m`
Pipeline not producing output	below `1` for `10m`

Starter PromQL queries

Expand an alert to see the full PromQL expression. Paste it into the Query field when you create the alert rule.

Block lag

max by (service_instance_id) (streamling_block_lag_max_seconds)

Alert when end-to-end block lag exceeds a business-acceptable threshold. 60–120 seconds is a common choice for real-time pipelines.

Kafka consumer lag

max by (service_instance_id) (streamling_kafka_consumer_messages_lag)

Threshold depends on steady-state volume. Establish a baseline for each pipeline first, then alert on multiples of it.

Checkpoint failures

sum by (service_instance_id) (increase(streamling_checkpoint_epochs_failed_total[10m]))

Any non-zero value is a critical signal — the pipeline isn’t durably saving its position.

Sink flush latency P95 (ms)

histogram_quantile(0.95, sum by (service_instance_id, id, le) (rate(streamling_checkpoint_sink_flush_milliseconds_bucket[5m])))

Useful for catching database slowdowns before they cause pipeline lag. Threshold in milliseconds. Grouping by id (the sink’s reference name) gives you a separate series per sink so you can see which one is slow. Drop id from the grouping if you prefer a single alert per pipeline.

Pipeline not producing output

sum by (service_instance_id) (rate(streamling_output_rows_total{topology_node_type="sink"}[5m]))

Fires when a pipeline that should be emitting data goes silent.

Each pipeline is identified by its service_instance_id label, which has the form {project_id}-{pipeline_name}. Use Grafana’s Explore view with the goldsky-prometheus datasource to browse all available metrics and labels for your project. Scope any query to a single pipeline with {service_instance_id=~".*-my-pipeline"}.

Recommended starter alerts

Three alerts cover most production incidents. Create these first:

Checkpoint failures — critical; catches lost state on every pipeline.
Block lag — catches pipelines falling behind the chain tip.
Sink flush latency — catches database slowdowns before they cascade into lag.

For context on what each metric means, see the health dashboard guide.

Troubleshooting

I can't see an 'Alerting' section in the Grafana sidebar

Alerting requires the Editor role. Contact support@goldsky.com to request Editor access for your project, and reload the Grafana tab once confirmed.

Test notification works, but alerts never fire

Check that your alert rule’s query returns data in Grafana’s Explore view — if the query returns no series, the rule will stay in Normal forever. Also confirm the threshold direction (above vs below) matches what the metric actually does when the condition you care about occurs.

Alerts fire too often on transient spikes

Increase the For duration so the condition must hold longer before firing. 5m or 10m is usually enough to filter out brief spikes while still catching real incidents.

​Overview

​Prerequisites

​Open the Grafana workspace

​Create a contact point

​Slack

​Email

​Other supported integrations

​Create an alert rule

​Common alert queries

​Starter PromQL queries

​Recommended starter alerts

​Troubleshooting

Overview

Prerequisites

Open the Grafana workspace

Create a contact point

Slack

Email

Other supported integrations

Create an alert rule

Common alert queries

Starter PromQL queries

Recommended starter alerts

Troubleshooting