Goldsky Documentation

Overview

Publish processed data back to Kafka topics with Avro or JSON serialization.

Configuration

sinks:
  my_kafka_sink:
    type: kafka
    from: my_transform
    topic: my_output_topic
    topic_partitions: 10
    data_format: avro

Parameters

type

string

required

Must be kafka

from

string

required

The transform or source to read data from

topic

string

required

Kafka topic name to publish to

topic_partitions

number

Number of partitions for the topic (created if doesn’t exist)

data_format

string

default:"avro"

Serialization format. Supported values: avro or json.

schema_registry_url

string

URL of the Schema Registry. Required when using avro format, optional for json format.

parallelism

number

default:"1"

Number of parallel Kafka producers for increased throughput. Messages are routed to producers based on key hash to maintain per-key ordering.

batch_size

number

default:"10000"

Number of messages to batch before sending (maps to Kafka’s batch.num.messages).

batch_flush_interval

number

default:"100"

Maximum time in milliseconds to wait before flushing a batch (maps to Kafka’s linger.ms).

message_max_bytes

number

default:"1000000"

Maximum size in bytes for a Kafka request (maps to Kafka’s message.max.bytes).

Features

Multiple Formats: Choose between Avro (binary) or JSON serialization
Auto Schema Registration: Schemas are automatically registered with Schema Registry (Avro only)
Avro Encoding: Efficient binary serialization with schema evolution support
JSON Encoding: Human-readable format without Schema Registry dependency
Operation Headers: _gs_op column is included as a message header (dbz.op)

Partitioning

Goldsky uses Kafka’s default partitioning strategy based on message key hashes. The message key is constructed from the primary key column(s) of your data. Key behavior:

Key format: Primary key values joined with _ (e.g., enriched_transaction_v2_0x6a7b...789d_1)
Partitioner: Kafka’s DefaultPartitioner (murmur2 hash)
Partition assignment: murmur2(keyBytes) % numPartitions

Implications for increasing partitions:

Records with the same key always go to the same partition, ensuring ordering per key
Increasing partitions will cause key redistribution — existing keys may map to different partitions
Global ordering is not guaranteed; only per-key ordering is maintained

Examples

Avro format (with Schema Registry)

sinks:
  kafka_output:
    type: kafka
    from: enriched_events
    topic: processed.events
    topic_partitions: 10
    data_format: avro
    schema_registry_url: http://schema-registry:8081

JSON format (no Schema Registry required)

sinks:
  kafka_output:
    type: kafka
    from: enriched_events
    topic: processed.events
    topic_partitions: 10
    data_format: json

High-throughput configuration

sinks:
  kafka_output:
    type: kafka
    from: high_volume_events
    topic: events.stream
    topic_partitions: 16
    data_format: avro
    schema_registry_url: http://schema-registry:8081
    parallelism: 4
    batch_size: 5000
    batch_flush_interval: 50
    message_max_bytes: 2000000

Turbo

Reference

Kafka

Overview

Configuration

Parameters

Features

Partitioning

Examples

Avro format (with Schema Registry)

JSON format (no Schema Registry required)

High-throughput configuration

Turbo

Reference

​Overview

​Configuration

​Parameters

​Features

​Partitioning

​Examples

​Avro format (with Schema Registry)

​JSON format (no Schema Registry required)

​High-throughput configuration

Overview

Configuration

Parameters

Features

Partitioning

Examples

Avro format (with Schema Registry)

JSON format (no Schema Registry required)

High-throughput configuration