Skip to main content
Enterprise only. Custom alerting requires the Editor role on your project’s Grafana workspace, which is available to enterprise customers. If you would like access, contact support@goldsky.com or reach out to your account manager.

Overview

Every enterprise project gets a dedicated Grafana workspace with the full set of Turbo pipeline metrics (Kafka lag, block lag, checkpoint duration, sink flush latency, and more). With the Editor role enabled, you can create your own alert rules on any of these metrics and route notifications to Slack, email, or any other Grafana-supported contact point. This page walks through the end-to-end setup.

Prerequisites

Before you start, make sure:
  1. Editor access is enabled for your account. Contact support@goldsky.com to confirm this is set up before you start.
  2. You have a notification destination ready. For Slack, create an Incoming Webhook in the target workspace and copy the webhook URL. For email, gather the recipient addresses you want to notify.

Open the Grafana workspace

  1. Sign in to the Goldsky dashboard.
  2. Navigate to any Turbo pipeline.
  3. Open the Metrics tab.
  4. Click Advanced metrics in the top-right corner.
This opens your project’s Grafana workspace in a new tab, pre-authenticated and scoped to your project. All of your pipelines’ metrics are available through the goldsky-prometheus datasource.

Create a contact point

Contact points tell Grafana where to send notifications when an alert fires. Create them once and reuse them across many alert rules.

Slack

  1. In the Grafana sidebar, go to AlertingContact points.
  2. Click + Add contact point.
  3. Fill in the form:
    • Name: a descriptive label such as slack-pipeline-alerts.
    • Integration: select Slack.
    • Webhook URL: paste the Slack Incoming Webhook URL from your Slack workspace.
  4. Click Test to send a test message. Confirm it arrives in the expected Slack channel.
  5. Click Save contact point.

Email

  1. Go to AlertingContact points+ Add contact point.
  2. Fill in the form:
    • Name: e.g. email-oncall.
    • Integration: select Email.
    • Addresses: enter one or more recipients separated by semicolons, e.g. ops@example.com;oncall@example.com.
  3. Click Test, then Save contact point.

Other supported integrations

The Goldsky workspace runs open-source Grafana, so you can pick any of the built-in Grafana Alerting contact point integrations when creating a contact point:
  • Alertmanager
  • AWS SNS
  • Cisco Webex Teams
  • DingDing
  • Discord
  • Email
  • Google Chat
  • Kafka REST Proxy
  • LINE
  • Microsoft Teams
  • MQTT
  • Opsgenie
  • PagerDuty
  • Pushover
  • Sensu Go
  • Slack
  • Telegram
  • Threema Gateway
  • VictorOps
  • Webhook
  • WeCom
See the Grafana contact points reference for the full list of fields each integration requires.
You can create multiple contact points and pick different ones per alert rule — for example, a low-severity Slack channel for warnings and a PagerDuty or email group for critical alerts.

Create an alert rule

Alert rules define the condition that should trigger a notification.
  1. Go to AlertingAlert rules.
  2. Click + New alert rule.
  3. Under Define query and alert condition:
    • Datasource: select goldsky-prometheus.
    • Query: write a PromQL expression for the metric you want to alert on. See common alert queries below for starting points.
  4. Under Set alert evaluation behavior:
    • Evaluate every: 1m is a reasonable default.
    • For: how long the condition must continuously hold before the alert fires. 5m is a good starting point to avoid flapping on transient spikes.
  5. Under Configure labels and notifications:
    • Select the contact point you created in the previous step.
    • Add labels such as severity=warning or severity=critical to help with routing and filtering later.
  6. Give the rule a descriptive name and click Save rule and exit.
The rule will begin evaluating immediately. You can see its current state — Normal, Pending, or Firing — on the Alert rules page.

Common alert queries

A quick reference of the five alerts we recommend setting up. Suggested conditions are starting points — tune thresholds for your workload.
Alert onSuggested condition
Pipeline falling behind (block lag)above 60 for 10m
Kafka consumer lag growingabove your baseline for 15m
Checkpoint failuresabove 0 for 5m
Sink flush latency spike (P95, ms)above 2000 for 10m
Pipeline not producing outputbelow 1 for 10m

Starter PromQL queries

Expand an alert to see the full PromQL expression. Paste it into the Query field when you create the alert rule.
max by (service_instance_id) (streamling_block_lag_max_seconds)
Alert when end-to-end block lag exceeds a business-acceptable threshold. 60120 seconds is a common choice for real-time pipelines.
max by (service_instance_id) (streamling_kafka_consumer_messages_lag)
Threshold depends on steady-state volume. Establish a baseline for each pipeline first, then alert on multiples of it.
sum by (service_instance_id) (increase(streamling_checkpoint_epochs_failed_total[10m]))
Any non-zero value is a critical signal — the pipeline isn’t durably saving its position.
histogram_quantile(0.95, sum by (service_instance_id, le) (rate(streamling_checkpoint_sink_flush_milliseconds_bucket[5m])))
Useful for catching database slowdowns before they cause pipeline lag. Threshold in milliseconds.
sum by (service_instance_id) (rate(streamling_output_rows_total{topology_node_type="sink"}[5m]))
Fires when a pipeline that should be emitting data goes silent.
Each pipeline is identified by its service_instance_id label. Use Grafana’s Explore view with the goldsky-prometheus datasource to browse all available metrics and labels for your project. Scope any query to a single pipeline with {service_instance_id=~".*my-pipeline"}.
Three alerts cover most production incidents. Create these first:
  • Checkpoint failures — critical; catches lost state on every pipeline.
  • Block lag — catches pipelines falling behind the chain tip.
  • Sink flush latency — catches database slowdowns before they cascade into lag.
For context on what each metric means, see the health dashboard guide.

Troubleshooting

Alerting requires the Editor role. Contact support@goldsky.com to request Editor access for your project, and reload the Grafana tab once confirmed.
Check that your alert rule’s query returns data in Grafana’s Explore view — if the query returns no series, the rule will stay in Normal forever. Also confirm the threshold direction (above vs below) matches what the metric actually does when the condition you care about occurs.
Increase the For duration so the condition must hold longer before firing. 5m or 10m is usually enough to filter out brief spikes while still catching real incidents.