Skip to main content

Overview

Sinks are the final destination for data in your Turbo pipelines. They write processed data to external systems like databases, data warehouses, or HTTP endpoints.

Available Sinks

Common Parameters

All sinks share these common parameters:
type
string
required
The sink type (postgres, clickhouse, webhook, etc.)
from
string
required
The transform or source to read data from
secret_name
string
Name of the secret containing connection credentials (required for database sinks)

Multiple Sinks

You can write the same data to multiple destinations:
transforms:
  processed_data:
    type: sql
    primary_key: id
    sql: SELECT * FROM source

sinks:
  # Write to PostgreSQL
  postgres_archive:
    type: postgres
    from: processed_data
    schema: public
    table: archive
    secret_name: MY_POSTGRES

  # Send to webhook
  webhook_notification:
    type: webhook
    from: processed_data
    url: https://api.example.com/notify

  # Publish to Kafka
  kafka_downstream:
    type: kafka
    from: processed_data
    topic: processed.events
Each sink operates independently - failures in one don’t affect others.

Sink Behavior

Checkpointing

All sinks participate in Turbo’s checkpointing system:
  • Data is buffered until a checkpoint completes
  • Only acknowledged data is committed to the sink
  • Ensures exactly-once delivery semantics

Backpressure

Sinks apply backpressure to the pipeline:
  • If a sink can’t keep up, the entire pipeline slows down
  • Prevents data loss and memory overflow
  • Monitor sink performance to identify bottlenecks

Error Handling

Sink errors are handled with retries:
  • Transient errors (network issues) are retried with exponential backoff
  • Permanent errors (invalid data, schema mismatches) fail the pipeline
  • Check logs for detailed error messages

Best Practices

  • PostgreSQL: Transactional data, updates, relational queries
  • ClickHouse: High-volume analytics, aggregations, time-series
  • Webhook: Real-time notifications, integrations with external systems
  • Kafka: Downstream processing, event sourcing, decoupling systems
  • S2: Decoupled processing, large number of readers, serverless architectures
For upsert behavior in databases, choose a stable primary key:
sinks:
  postgres_sink:
    primary_key: id  # Use a unique, stable identifier
Use logs and metrics to track:
  • Write throughput
  • Error rates
  • Latency
goldsky turbo logs my-pipeline
Always use secrets for database credentials:
# Create secret
goldsky secret create MY_DB_SECRET

# Reference in pipeline
sinks:
  my_sink:
    secret_name: MY_DB_SECRET