Kafka is a distributed streaming platform that is used to build real-time data pipelines and streaming applications. It is designed to be fast, scalable, and durable.
You can use Kafka to deeply integrate into your existing data ecosystem. Goldsky supplies a message format that allows you to handle blockchain forks and reorganizations with your downstream data pipelines.
Kafka has a rich ecosystem of SDKs and connectors you can make use of to do advanced data processing.
The Kafka integration is less end to end - while Goldsky will handle topic partitioning balancing and other details, using Kafka is a bit more involved compared to getting data directly mirrored into a database.
Full configuration details for Kafka sink is available in the reference page.
Configuration options
Parallelism
The parallelism option controls the number of parallel Kafka producers used to write data. Increasing parallelism can significantly improve throughput for high-volume pipelines.
sinks:
kafka_sink:
type: kafka
from: my_source
topic: my_topic
secret_name: KAFKA_SECRET
parallelism: 4
Key routing with parallelism: When using multiple producers, messages are routed to specific producers based on the message key hash. This ensures that all messages with the same key are handled by the same producer, maintaining per-key ordering guarantees.
Batching
Configure batching behavior to optimize throughput and latency:
batch_size: Number of messages to batch before sending (maps to Kafka’s batch.num.messages). Default: 10000
batch_flush_interval: Maximum time in milliseconds to wait before flushing a batch (maps to Kafka’s linger.ms). Default: 100
sinks:
kafka_sink:
type: kafka
from: my_source
topic: my_topic
secret_name: KAFKA_SECRET
batch_size: 5000
batch_flush_interval: 50
Message size
message_max_bytes: Maximum size in bytes for a Kafka request (maps to Kafka’s message.max.bytes). Default: 1000000 (1MB)
sinks:
kafka_sink:
type: kafka
from: my_source
topic: my_topic
secret_name: KAFKA_SECRET
message_max_bytes: 2000000
Secrets
goldsky secret create --name A_KAFKA_SECRET --value '{
"type": "kafka",
"bootstrapServers": "Type.String()",
"securityProtocol": "Type.Enum(SecurityProtocol)", // PLAINTEXT | SASL_PLAINTEXT | SASL_SSL
"saslMechanism": "Type.Optional(Type.Enum(SaslMechanism))", // PLAIN | SCRAM-SHA-256 | SCRAM-SHA-512
"saslJaasUsername": "Type.Optional(Type.String())",
"saslJaasPassword": "Type.Optional(Type.String())",
"schemaRegistryUrl": "Type.Optional(Type.String())",
"schemaRegistryUsername": "Type.Optional(Type.String())",
"schemaRegistryPassword": "Type.Optional(Type.String())"
}'
Partitioning
Goldsky uses Kafka’s default partitioning strategy based on message key hashes. The message key is constructed from the primary key column(s) of your data.
Key behavior:
- Key format: Primary key values joined with
_ (e.g., enriched_transaction_v2_0x6a7b...789d_1)
- Partitioner: Kafka’s DefaultPartitioner (murmur2 hash)
- Partition assignment:
murmur2(keyBytes) % numPartitions
Implications for increasing partitions:
- Records with the same key always go to the same partition, ensuring ordering per key
- Increasing partitions will cause key redistribution — existing keys may map to different partitions
- Global ordering is not guaranteed; only per-key ordering is maintained