Skip to main content

Overview

The Turbo health dashboard is a Grafana-based view that aggregates the most important signals across all of your Turbo pipelines in a single place. Use it to quickly spot pipelines that are falling behind, catch checkpoint failures, and isolate whether latency is coming from the source, the pipeline, or a specific sink. Open it from the Metrics tab of any Turbo pipeline in the Goldsky dashboard by clicking Advanced metrics in the top-right corner.
Use the Pipeline filter at the top of the dashboard to narrow the view to a single pipeline, and the Rate Window selector to change the interval used for throughput and rate calculations.

Where to look first

When something looks off, work through these checks in order — each one rules out a class of problem before you dig deeper.
1

Is Checkpoint Failures non-zero?

Open pipeline logs immediately. A failed checkpoint means the pipeline is not durably saving its position. See Summary.
2

Is Block Lag growing?

Check Sink Flush Latency next — growing block lag is usually a downstream sink causing backpressure, not a slow source. See Block lag.
3

Are sinks fast but Checkpoint Duration high?

Tune batch settings. Raise batch_size or lower batch interval. See Checkpoint duration P95.
4

Is one sink's flush latency high while others are fine?

That sink is the bottleneck. See Sink flush latency P95.

Summary

The Summary row at the top gives you a health check across every active pipeline in the project. Turbo health dashboard — summary and per-pipeline status
  • Active Pipelines — how many pipelines are currently running.
  • Avg Kafka Lag — average Kafka consumer lag across all pipelines. Kafka lag is a good proxy for sink lag — if your sinks (e.g. Postgres, ClickHouse) start falling behind, Kafka lag will grow.
  • Total Throughput — combined rate of inserts and updates being sent to sinks, in operations per second.
  • Checkpoint Failures — count of pipelines that failed to save checkpoint state in the window. Should almost always be No data or zero.
Any non-zero Checkpoint Failures value needs high-attention investigation. A failed checkpoint means the pipeline is not durably saving its position — on restart it may re-process data or fall further behind. Check pipeline logs and, if the failures persist, contact support.

Pipeline status

Below the summary, Pipeline Status breaks the same signals down per pipeline — Kafka-based EVM pipelines on the left, Solana pipelines on the right. Each row shows current lag, source output rate, sink output rate, and checkpoint failure status.
High Kafka lag on its own does not mean a pipeline is falling behind — a pipeline that is intentionally processing a lot of historical data will show high lag while catching up. Watch for spikes or steadily increasing lag on a pipeline that was previously steady. That usually means the pipeline needs tuning: larger batch sizes, longer batch intervals, or increased sink parallelism.

Block lag

Block lag tells you how recent the most recent block the pipeline has processed is, relative to the current chain tip. Turbo health dashboard — block lag over time and per-pipeline gauges
  • Block Lag (Max) by Pipeline plots block lag over the selected time window. Values are reported in seconds because block times are measured in seconds.
  • Block Lag Gauge by Pipeline shows the current block lag for each pipeline as an at-a-glance gauge.
Block lag is an end-to-end metric. If a downstream sink is slow, the pipeline will deliberately slow down how fast it pulls from its source dataset — this is backpressure, and it exists to prevent the pipeline from running out of memory while a sink catches up. In other words, a growing block lag is often a symptom of a slow sink, not a slow source.
Block lag is measured using the reported block time for each chain. Some chains only propagate block headers to indexers a few seconds after the block is produced, so a steady baseline of a few seconds of lag on those chains is normal.

Performance

The performance panels help you isolate where latency is coming from when you need to tune a pipeline. Turbo health dashboard — checkpoint duration and sink flush latency

Checkpoint duration P95

Checkpoint duration is a signal for how long data stays inside the pipeline before it is confirmed as delivered. A checkpoint is only confirmed (flushed) when every record in the batch has been fully sent to every sink. Long checkpoint durations usually mean one of two things:
  • Your batch flush interval is high, so records are collected for longer before being sent.
  • One or more sinks are slow, so batches take longer to drain.
If you see consistently high checkpoint durations on a pipeline where low latency matters, reduce the batch interval or investigate the sinks using the Sink Flush Latency panel below.

Sink flush latency P95

Unlike checkpoint duration, sink flush latency is a per-sink metric, not end-to-end. It measures how long a specific sink takes to accept a batch when the pipeline flushes to it. This is the panel to look at when you need to isolate a specific slow sink. If checkpoint duration is high but only one sink in the pipeline shows elevated flush latency, that sink is the bottleneck — not the pipeline engine. Typical causes:
  • Database is undersized for the write volume.
  • Missing indexes on the target table causing slow upserts.
  • Network latency between the sink and the pipeline.
  • Sink-side back-pressure (e.g. Kafka broker slow to ack).
Pair Checkpoint Duration and Sink Flush Latency when tuning. High checkpoint duration plus one sink with high flush latency → fix that sink. High checkpoint duration and all sinks fast → increase batch size or decrease batch interval.

Kafka consumer lag

The Kafka Consumer Lag row shows the same lag as the Summary row, but broken out per pipeline over time. Reach for it when you need to answer “when did this pipeline’s lag start climbing, and did any others climb with it?” — the per-pipeline time series makes it easy to correlate a lag spike with a deploy, a chain reorg, or a downstream slowdown.

Throughput

The Throughput row shows stacked input (source) and output (sink) record rates per pipeline. Use it to see the total write volume for the project and identify which pipelines dominate it — useful for capacity planning and spotting runaway load.

Checkpoints

The Checkpoints row shows checkpoint success rate and failure rate per pipeline over time. The failure-rate panel is the time-series companion to the Summary’s single Checkpoint Failures number — pair it with an alert on checkpoint failures so you catch them in real time instead of on a dashboard check.

Solana source

The Solana Source row is specific to Solana pipelines and exposes source-internal signals that don’t apply to EVM chains:
  • Solana Blocks/sec — how fast the pipeline is pulling Solana blocks from its source.
  • Solana Buffer Size — internal buffer depth. Rising buffer length means the downstream pipeline can’t consume as fast as the source is producing.
  • Solana Fetch Duration P95 — how long individual source fetches are taking.
Together, these tell you whether a slow Solana pipeline is bottlenecked at the source fetch itself or downstream of it.

Next steps

  • Set up custom alerts on these metrics to get notified in Slack or email before lag becomes a problem.
  • Pipe the same metrics into your own observability stack with the Prometheus integration.
  • Drill into a specific pipeline with Live Inspect to see the actual records flowing through.
  • Tune pipeline throughput via resource_size, batch settings, and sink parallelism — see the pipeline configuration reference.