Overview
Sinks are the final destination for data in your Turbo pipelines. They write processed data to external systems like databases, data warehouses, or HTTP endpoints.Available Sinks
PostgreSQL
Write to PostgreSQL databases
ClickHouse
Write to ClickHouse for analytics
Webhook
Send data to HTTP endpoints
Kafka
Publish to Kafka topics
S3
Write to S3-compatible storage
Output to stdout (debugging)
Blackhole
Discard data (testing)
Common Parameters
All sinks share these common parameters:The sink type (
postgres, clickhouse, webhook, etc.)The transform or source to read data from
Name of the secret containing connection credentials (required for database sinks)
PostgreSQL Sink
Write data to PostgreSQL databases with automatic table creation and upsert support.Configuration
Parameters
PostgreSQL schema name (e.g.,
public, analytics)Table name to write to. Will be created automatically if it doesn’t exist.
Optional. Column to use for upserts. If specified, existing rows will be updated instead of inserted.
Secret Format
The secret should contain a PostgreSQL connection string:Features
- Auto Table Creation: Tables are created automatically based on your data schema
- Upsert Support: Use
primary_keyto update existing rows - Type Handling: Automatic type conversion from Arrow to PostgreSQL types
- Large Numbers: U256/I256 types are stored as
NUMERIC(78,0)
Example
ClickHouse Sink
Write data to ClickHouse for high-performance analytical queries.Configuration
Parameters
ClickHouse table name
Primary key column for the table
Secret Format
The secret should contain ClickHouse connection details:Example: Solana Blocks to ClickHouse
Webhook Sink
Send data to HTTP endpoints as JSON payloads.Configuration
Parameters
The HTTP endpoint URL to send data to
If
true, sends each row individually. If false, sends batches of rows.Request Format
Data is sent as JSON: Single row (one_row_per_request: true):
one_row_per_request: false):
Example: Send High-Value Transfers to API
The webhook sink includes retry logic with exponential backoff for transient failures. Configure timeouts on your endpoint accordingly.
Kafka Sink
Publish processed data back to Kafka topics with Avro serialization.Configuration
Parameters
Kafka topic name to publish to
Number of partitions for the topic (created if doesn’t exist)
Serialization format. Currently only
avro is supported.Features
- Auto Schema Registration: Schemas are automatically registered with Schema Registry
- Avro Encoding: Efficient binary serialization
- Operation Headers:
_gs_opcolumn is included as a message header (dbz.op)
Example
S3 Sink
Write data to S3-compatible object storage services.Configuration
Parameters
S3-compatible endpoint URL (e.g.,
https://s3.amazonaws.com, https://t3.storage.dev)Access key ID for authentication
Secret access key for authentication
AWS region or
auto for S3-compatible servicesTarget bucket name
Optional path prefix for objects within the bucket
Print Sink
Output data to stdout in JSON format. Useful for debugging and development.Configuration
Output Format
Each row is printed as JSON with a row kind prefix:Example
Blackhole Sink
Discard all data without performing any operation. Useful for testing pipeline performance without I/O overhead.Configuration
Use Cases
- Performance Testing: Measure transform performance without sink overhead
- Pipeline Validation: Test that data flows correctly without writing anywhere
- Development: Temporarily disable output while working on transforms
Example
Multiple Sinks
You can write the same data to multiple destinations:Sink Behavior
Checkpointing
All sinks participate in Turbo’s checkpointing system:- Data is buffered until a checkpoint completes
- Only acknowledged data is committed to the sink
- Ensures exactly-once delivery semantics
Backpressure
Sinks apply backpressure to the pipeline:- If a sink can’t keep up, the entire pipeline slows down
- Prevents data loss and memory overflow
- Monitor sink performance to identify bottlenecks
Error Handling
Sink errors are handled with retries:- Transient errors (network issues) are retried with exponential backoff
- Permanent errors (invalid data, schema mismatches) fail the pipeline
- Check logs for detailed error messages
Best Practices
Choose the right sink for your use case
Choose the right sink for your use case
- PostgreSQL: Transactional data, updates, relational queries
- ClickHouse: High-volume analytics, aggregations, time-series
- Webhook: Real-time notifications, integrations with external systems
- Kafka: Downstream processing, event sourcing, decoupling systems
Use appropriate primary keys
Use appropriate primary keys
For upsert behavior in databases, choose a stable primary key:
Monitor sink performance
Monitor sink performance
Use logs and metrics to track:
- Write throughput
- Error rates
- Latency
Secure credentials
Secure credentials
Always use secrets for database credentials: