How to throttle streaming data throughput in Turbo pipelines

Overview

The throttle transform caps the throughput of a stream by buffering records into batches and emitting each batch on a fixed minimum interval. Use it to:

Stay under rate limits of downstream sinks or external APIs
Smooth out bursty sources into a steady, predictable rate
Test sink behavior at a controlled records-per-second rate
Reduce pressure on small resource sizes during development

Throttle does not modify the data: every input record passes through unchanged. It only controls when records are emitted.

Configuration

transforms:
  my_throttle:
    type: throttle
    from: <source-or-transform>
    max_batch_size: 100
    min_batch_interval: 10s

Parameters

type

string

required

Must be throttle

from

string

required

The source or transform to read data from

max_batch_size

integer

required

Maximum number of records to emit per batch. The throttle accumulates up to this many records before flushing.

min_batch_interval

duration

required

Minimum time to wait between batches (e.g., 10s, 500ms, 1m). The next batch will not be emitted until this interval has elapsed since the previous batch.

How throttling works

Records are buffered as they arrive from the upstream source or transform. A batch is flushed downstream when both conditions are met:

max_batch_size records have accumulated, and
min_batch_interval has elapsed since the last batch was emitted

The effective maximum throughput is approximately:

max_batch_size / min_batch_interval = records per second

Examples:

max_batch_size: 100, min_batch_interval: 10s → ~10 rps
max_batch_size: 1000, min_batch_interval: 5s → ~200 rps

“Records per second” here means the average throughput the downstream system needs to handle, not literal requests or messages per second. The throttle emits one batch per interval: sinks consume that batch in whatever way is natural for them. For example, an S3 sink writes one file per interval at the configured batch size.

Throttle limits the maximum rate, not the minimum. If the upstream is slow, batches will be smaller and arrive less frequently.

Example

Throttle a high-volume ERC-20 transfer stream down to ~10 rps before sending it to a sink:

name: throttle_example
resource_size: s
use_dedicated_ip: false
job: false

sources:
  erc20s:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest

transforms:
  throttled_erc20s:
    type: throttle
    from: erc20s
    max_batch_size: 100 # ~10 rps with a 10s interval
    min_batch_interval: 10s

sinks:
  sink_1:
    type: blackhole
    from: throttled_erc20s

When to use throttle

Rate-limited sinks: Stay under per-second write quotas on downstream APIs or databases.
External handler protection: Pace records into an HTTP handler so the receiving service is not overwhelmed.
Cost control during development: Slow down processing while iterating on a pipeline against a live source.
Testing: Reproduce sink behavior under a known, fixed input rate.

Best Practices

Place throttle close to the bottleneck

Throttle the stream just before the rate-limited sink or handler so upstream transforms still process at full speed.

Tune batch size to your sink

Larger max_batch_size reduces per-batch overhead but increases latency per record. Pick a size that matches your sink’s preferred batch size.

Remove throttle in production where possible

Throttle caps throughput by design. Once rate-limit concerns are addressed, remove the transform to let the pipeline run at full speed.

Documentation Index

​Overview

​Configuration

​Parameters

​How throttling works

​Example

​When to use throttle

​Best Practices

Overview

Configuration

Parameters

How throttling works

Example

When to use throttle

Best Practices