Documentation Index
Fetch the complete documentation index at: https://docs.goldsky.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Publish processed data to Kafka topics with Avro or JSON serialization. Connection and Schema Registry credentials are supplied through a Goldskysecret, not the pipeline YAML.
Configuration
Parameters
Must be
kafkaThe transform or source to read data from
Kafka topic name to publish to
Serialization format. Supported values:
avro or json.Name of the Goldsky secret (type
kafka) containing the broker address, SASL credentials, and Schema Registry URL. Required when publishing to any broker that is not pre-configured for your deployment.Number of partitions to use if Goldsky creates the topic. Ignored for topics that already exist (the broker’s existing partition count is used).
Column (or comma-separated list of columns) used as the Kafka message key. Values are joined with
: to form the key (e.g. enriched_transaction_v2:0x6a7b...789d:1). When set and the topic is created by Goldsky, the topic is created with cleanup.policy=compact. When omitted, the sink falls back to the upstream transform or source’s primary key if one is defined.Number of parallel Kafka producers for increased throughput. Messages are routed to producers by hashing the message key, so per-key ordering is preserved across producers.
Maximum number of messages to batch before sending (maps to librdkafka’s
batch.num.messages).Maximum time to wait before flushing a batch (maps to librdkafka’s
linger.ms). Accepts humantime durations such as 100ms, 1s, or 500ms.Maximum size in bytes for a single Kafka protocol request (maps to librdkafka’s
message.max.bytes). Default is 10 MiB.Secret structure
Create a Kafka secret with the Goldsky CLI:securityProtocol:PLAINTEXT,SASL_PLAINTEXT, orSASL_SSL.saslMechanism:PLAIN,SCRAM-SHA-256, orSCRAM-SHA-512(only required for SASL protocols).schemaRegistryUrl: required whendata_format: avro, optional fordata_format: json.
Features
- Multiple Formats: Choose between Avro (binary) or JSON serialization
- Auto Schema Registration: Schemas are automatically registered with Schema Registry (Avro only)
- Avro Encoding: Efficient binary serialization with schema evolution support
- JSON Encoding: Human-readable format without Schema Registry dependency
- Operation Headers: The
_gs_opvalue for each row is published as the Kafka headerdbz.op(cfor insert,ufor update,dfor delete) - Topic Auto-Creation: If the topic does not exist, the sink creates it with
retention.ms=-1, the broker’s default replication factor, andcleanup.policy=compactwhenprimary_keyis set
Delivery and ordering
- Producer acks: Producers are configured with
acks=alland a 10 minutemessage.timeout.ms. - Delivery semantics: At-least-once. Idempotent produce is not enabled, so retries after a timeout can produce duplicates.
- Backpressure: When librdkafka’s internal queue fills, the sink polls and retries the send rather than dropping messages.
- Ordering: Per-key ordering is preserved; global ordering across keys is not guaranteed. With
parallelism > 1, all rows sharing a key are routed to the same producer (and therefore the same partition).
Partitioning
The sink sets the Kafka message key to the joined primary-key values (seeprimary_key above). Partition assignment is performed by librdkafka’s default partitioner.
Implications of changing partition count:
- Records with the same key always go to the same partition at a given point in time, ensuring ordering per key.
- Increasing partitions causes key redistribution — existing keys may map to different partitions after the change.
- Global ordering is not guaranteed; only per-key ordering is maintained.
Examples
Avro format
JSON format (no Schema Registry required)
schemaRegistryUrl.
High-throughput configuration
Performance tuning
If your Kafka sink is not keeping up with pipeline throughput, tuningparallelism is the most effective first step. Each unit of parallelism runs a separate Kafka producer, allowing concurrent writes to the target cluster.
When to tune
Check the Kafka flush sink p95 metric in your pipeline’s advanced metrics. If flush latency is consistently high, your sink is likely a bottleneck and increasing parallelism can help.Choosing a parallelism value
Setparallelism to a value that evenly divides the number of Kafka partitions in the target topic. This ensures producers are balanced across partitions and avoids uneven load distribution.
For example, with a 12-partition topic, good parallelism values are 2, 3, 4, or 6. Avoid values like 5 or 7 that don’t divide evenly into 12.
Tuning batching
After adjusting parallelism, you can further optimize throughput by tuning batch settings:batch_size: Increase if you have high message volume and want fewer, larger writes. Larger batches improve throughput but add a small amount of latency.batch_flush_interval: Decrease for lower latency at the cost of smaller batches. Increase if you want to accumulate more messages before flushing. Accepts humantime durations like50msor1s.message_max_bytes: Increase if you see errors related to message size limits, especially with large payloads.