Skip to main content
Kafka is a distributed streaming platform that is used to build real-time data pipelines and streaming applications. It is designed to be fast, scalable, and durable. You can use Kafka to deeply integrate into your existing data ecosystem. Goldsky supplies a message format that allows you to handle blockchain forks and reorganizations with your downstream data pipelines. Kafka has a rich ecosystem of SDKs and connectors you can make use of to do advanced data processing.
The Kafka integration is less end to end - while Goldsky will handle topic partitioning balancing and other details, using Kafka is a bit more involved compared to getting data directly mirrored into a database.
Full configuration details for Kafka sink is available in the reference page.

Configuration options

Parallelism

The parallelism option controls the number of parallel Kafka producers used to write data. Increasing parallelism can significantly improve throughput for high-volume pipelines.
sinks:
  kafka_sink:
    type: kafka
    from: my_source
    topic: my_topic
    secret_name: KAFKA_SECRET
    parallelism: 4
Key routing with parallelism: When using multiple producers, messages are routed to specific producers based on the message key hash. This ensures that all messages with the same key are handled by the same producer, maintaining per-key ordering guarantees.

Batching

Configure batching behavior to optimize throughput and latency:
  • batch_size: Number of messages to batch before sending (maps to Kafka’s batch.num.messages). Default: 10000
  • batch_flush_interval: Maximum time in milliseconds to wait before flushing a batch (maps to Kafka’s linger.ms). Default: 100
sinks:
  kafka_sink:
    type: kafka
    from: my_source
    topic: my_topic
    secret_name: KAFKA_SECRET
    batch_size: 5000
    batch_flush_interval: 50

Message size

  • message_max_bytes: Maximum size in bytes for a Kafka request (maps to Kafka’s message.max.bytes). Default: 1000000 (1MB)
sinks:
  kafka_sink:
    type: kafka
    from: my_source
    topic: my_topic
    secret_name: KAFKA_SECRET
    message_max_bytes: 2000000

Secrets


goldsky secret create --name A_KAFKA_SECRET --value '{
  "type": "kafka",
  "bootstrapServers": "Type.String()",
  "securityProtocol": "Type.Enum(SecurityProtocol)", // PLAINTEXT | SASL_PLAINTEXT | SASL_SSL
  "saslMechanism": "Type.Optional(Type.Enum(SaslMechanism))", // PLAIN | SCRAM-SHA-256 | SCRAM-SHA-512
  "saslJaasUsername": "Type.Optional(Type.String())",
  "saslJaasPassword": "Type.Optional(Type.String())",
  "schemaRegistryUrl": "Type.Optional(Type.String())",
  "schemaRegistryUsername": "Type.Optional(Type.String())",
  "schemaRegistryPassword": "Type.Optional(Type.String())"
}'

Partitioning

Goldsky uses Kafka’s default partitioning strategy based on message key hashes. The message key is constructed from the primary key column(s) of your data. Key behavior:
  • Key format: Primary key values joined with _ (e.g., enriched_transaction_v2_0x6a7b...789d_1)
  • Partitioner: Kafka’s DefaultPartitioner (murmur2 hash)
  • Partition assignment: murmur2(keyBytes) % numPartitions
Implications for increasing partitions:
  • Records with the same key always go to the same partition, ensuring ordering per key
  • Increasing partitions will cause key redistribution — existing keys may map to different partitions
  • Global ordering is not guaranteed; only per-key ordering is maintained