Overview
Publish processed data back to Kafka topics with Avro or JSON serialization.Configuration
Parameters
Must be
kafkaThe transform or source to read data from
Kafka topic name to publish to
Number of partitions for the topic (created if doesn’t exist)
Serialization format. Supported values:
avro or json.URL of the Schema Registry. Required when using
avro format, optional for json format.Number of parallel Kafka producers for increased throughput. Messages are routed to producers based on key hash to maintain per-key ordering.
Number of messages to batch before sending (maps to Kafka’s
batch.num.messages).Maximum time in milliseconds to wait before flushing a batch (maps to Kafka’s
linger.ms).Maximum size in bytes for a Kafka request (maps to Kafka’s
message.max.bytes).Features
- Multiple Formats: Choose between Avro (binary) or JSON serialization
- Auto Schema Registration: Schemas are automatically registered with Schema Registry (Avro only)
- Avro Encoding: Efficient binary serialization with schema evolution support
- JSON Encoding: Human-readable format without Schema Registry dependency
- Operation Headers:
_gs_opcolumn is included as a message header (dbz.op)
Partitioning
Goldsky uses Kafka’s default partitioning strategy based on message key hashes. The message key is constructed from the primary key column(s) of your data. Key behavior:- Key format: Primary key values joined with
_(e.g.,enriched_transaction_v2_0x6a7b...789d_1) - Partitioner: Kafka’s DefaultPartitioner (murmur2 hash)
- Partition assignment:
murmur2(keyBytes) % numPartitions
- Records with the same key always go to the same partition, ensuring ordering per key
- Increasing partitions will cause key redistribution — existing keys may map to different partitions
- Global ordering is not guaranteed; only per-key ordering is maintained
Examples
Avro format (with Schema Registry)
JSON format (no Schema Registry required)
High-throughput configuration
Performance tuning
If your Kafka sink is not keeping up with pipeline throughput, tuningparallelism is the most effective first step. Each unit of parallelism runs a separate Kafka producer, allowing concurrent writes to the target cluster.
When to tune
Check the Kafka flush sink p95 metric in your pipeline’s advanced metrics. If flush latency is consistently high, your sink is likely a bottleneck and increasing parallelism can help.Choosing a parallelism value
Setparallelism to a value that evenly divides the number of Kafka partitions in the target topic. This ensures producers are balanced across partitions and avoids uneven load distribution.
For example, with a 12-partition topic, good parallelism values are 2, 3, 4, or 6. Avoid values like 5 or 7 that don’t divide evenly into 12.
Tuning batching
After adjusting parallelism, you can further optimize throughput by tuning batch settings:batch_size: Increase if you have high message volume and want fewer, larger writes. Larger batches improve throughput but add a small amount of latency.batch_flush_interval: Decrease for lower latency at the cost of smaller batches. Increase if you want to accumulate more messages before flushing.message_max_bytes: Increase if you see errors related to message size limits, especially with large payloads.