Overview

Goldsky is a high-performance data indexing provider for Stellar that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases via Mirror (real-time data replication pipelines).

Scope

Goldsky has partnered with Stellar to make our product available to the ecosystem and provide dedicated support for Stellar data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
Mirror
EnablementYes
Benefit10% discount on Pipeline workers and events written
AvailabilityAll developers
Where perks are available to projects specified by the chain ecosystem, please reach out to their developer relations team for an access code and link to private signup form. Where perks are openly available to all developers, please contact sales@goldsky.com to apply the applicable partnership perks to your project.

Getting started

To use Goldsky, you’ll need to create an account, install the CLI, and log in.

Mirror

Mirror pipelines allow users to replicate data into their own infrastructure (any of the supported sinks) in real time. The supported data sources are the following direct indexing datasets: checkpoints, packages, transactions, epochs and events. These datasets have been created from ledger sequence 57837496 onwards. For a complete overview of how to deploy Mirror pipelines, including a video walkthrough, check the Create a Pipeline. Below, we will first look into the data model for Stellar primitives and then see some example implementation pipelines that you can use as a reference.

Working with the Stellar V2 Datasets

After multiple iterations and feedback from the Stellar ecosystem, we’ve released V2 of our Stellar datasets. Unlike v1, which had a latency of 2-5 minutes, this new version is real time (<5 seconds). It also has a different data model designed to offer greater flexibility and convenience to better support a wide range of use cases. To understand these datasets, it’s important to first revisit the Ledger primitive—one of the core building blocks of the Stellar blockchain and the highest level of abstraction in our datasets.
Need a refresher on Stellar data structures? Check out Stellar’s official documentation.
As an example, let’s look at Ledger 57255191. A ledger contains multiple transactions, and each transaction contains one or more operations. We’ve modeled these inner structures accordingly in the V2 datasets. At the core is the ledgers_v2 dataset, which acts as the canonical dataset from which the other datasets derive. You can:
  1. Stream complete ledger data in a fully normalized way using the ledgers_v2 dataset, or
  2. Work directly with specialized datasets that target specific structures:
    • transactions_v2
    • events_v2
    • balances_v2
In essence, ledgers_v2 functions as a “mega-schema,” allowing you to query all ledger data holistically or customize exactly which subsets of data you want.
You can view the excerpt of an example record of the ledgers_v2 dataset here. Let’s now look at some example pipelines using these datasets:

1. Deploying Stellar Transactions Using Goldsky Flow

As explained in Create a Pipeline, Goldsky Flow is the visual editor from which we can build and deploy Mirror pipelines. In this video we create a pipeline using Flow to stream into the transactions_v2 dataset into a ClickHouse instance.

2. Deploying Stellar Transactions Using the Goldsky CLI

An alternative way of deploying pipelines is using a pipeline definition file along with the command goldsky pipeline apply The example below demonstrates how you can stream the same inner transactions dataset directly (transactions_v2) exposed in previous example but instead using the master ledgers_v2 dataset and ‘explode’ its data with the CROSS JOIN UNNEST SQL command:
name: stellar-ledger-transactions
resource_size: s
apiVersion: 3
sources:
  ledgers:
    type: dataset
    dataset_name: stellar.ledgers_v2
    version: 1.0.0
    start_at: earliest # -> starting at ledger sequence 57837496.
transforms:
  sql_1:
    type: sql
    sql: |-
      SELECT
              ledgers.sequence AS ledger_sequence,
              transaction.transaction_hash as transaction_hash,
              transaction.account as account,
              transaction.transaction_result_code as transaction_result_code
            FROM
              ledgers
            CROSS JOIN UNNEST(transactions) AS transaction
    primary_key: transaction_hash
sinks:
  sink_1:
    type: clickhouse
    secret_name: <YOUR_SECRET_NAME>
    from: sql_1
    table: stellar_events
You can deploy this pipeline by configuring your sink and running goldsky pipeline apply stellar-ledger-transactions.yaml --status ACTIVE

3. Deploying Stellar Balances Using the Goldsky CLI

Although balances are also directly exposed in the dataset balances_v2, here’s the actual way these are computed directly from the master ledgers_v2 dataset in case you’d like to create your own modification:
name: stellar-balances
version: 1
resource_size: s
apiVersion: 3
sources:
  ledgers:
    type: dataset
    dataset_name: stellar.ledgers_v2
    version: 1.0.0
    start_at: earliest # -> starting at ledger sequence 57837496.
transforms:
  # Ledger entry changes for payments:
  #   * Change types "update" or "created" (for post-change)
  #   * Entry types "account" (XLM) or "trustline" (token)
  payment_changes:
    type: sql
    primary_key: change_id
    sql: |
      SELECT
        ledgers.sequence AS ledger_sequence,
        change.id AS change_id,
        change.ledger_entry_type,
        change.entry_data
      FROM
        ledgers
      -- explode nested arrays
      CROSS JOIN UNNEST(transactions) AS transaction
      CROSS JOIN UNNEST(transaction.operations) AS operation
      CROSS JOIN UNNEST(operation.ledger_entry_changes) AS change
      WHERE
        -- 1) Only “payment” operations:
        operation.type = 'payment'
        -- 2) Only post‐change records (new balance):
        AND change.change_type IN ('updated', 'created')
        -- 3) Only XLM (“account”) or token (“trustline”) ledger entries:
        AND change.ledger_entry_type IN ('account', 'trustline')
      ;

  # Normalized balances. Take each change row (from payment_changes) and extract:
  #   * account_id
  #   * asset_code  (XLM if ledger_entry_type='account', else trustline code)
  #   * asset_issuer (NULL if XLM, else trustline issuer)
  #   * balance_stroops (new balance, in stroops)
  normalized_balances:
    type: sql
    primary_key: change_id
    sql: |
      SELECT
        pc.ledger_sequence,
        pc.change_id,
        pc.ledger_entry_type,

        -- Extract account_id:
        CASE
          WHEN pc.ledger_entry_type = 'account'
            THEN JSON_VALUE(pc.entry_data, '$.accountID')
          ELSE JSON_VALUE(pc.entry_data, '$.trustline.account_id')
        END AS account_id,

        -- Determine asset_code:
        CASE
          WHEN pc.ledger_entry_type = 'account'
            THEN 'XLM'
          ELSE COALESCE(
            JSON_VALUE(pc.entry_data, '$.trustline.asset.credit_alphanum4.asset_code'),
            JSON_VALUE(pc.entry_data, '$.trustline.asset.credit_alphanum12.asset_code')
          )
        END AS asset_code,

        -- Determine asset_issuer:
        CASE
          WHEN pc.ledger_entry_type = 'account'
            THEN CAST(NULL AS STRING)
          ELSE COALESCE(
            JSON_VALUE(pc.entry_data, '$.trustline.asset.credit_alphanum4.issuer'),
            JSON_VALUE(pc.entry_data, '$.trustline.asset.credit_alphanum12.issuer')
          )
        END AS asset_issuer,

        -- New balance in stroops:
        CAST(
          CASE
            WHEN pc.ledger_entry_type = 'account'
              THEN JSON_VALUE(pc.entry_data, '$.balance')
            ELSE JSON_VALUE(pc.entry_data, '$.trustline.balance')
          END AS BIGINT
        ) AS balance_stroops

      FROM payment_changes AS pc
      ;
  # Build the final balances table with a new primary key
  balances_final:
    type: sql
    primary_key: balance_key
    sql: |
      SELECT
        nb.ledger_sequence,
        nb.account_id,
        nb.asset_code,
        nb.asset_issuer,
        nb.balance_stroops,

        -- Build composite upsert key. If asset_issuer is NULL, COALESCE→''.
        CONCAT(
          COALESCE(nb.account_id, ''),
          '-',
          COALESCE(nb.asset_code, ''),
          '-',
          COALESCE(nb.asset_issuer, '')
        ) AS balance_key

      FROM normalized_balances AS nb
      WHERE
        -- Ensure account_id is present:
        nb.account_id IS NOT NULL
        -- Ensure asset_code is present:
        AND nb.asset_code IS NOT NULL
      ;

sinks:
  sink_1:
    type: clickhouse
    table: stellar_balances
    secret_name: <YOUR_SECRET_NAME>
    from: balances_final

use_dedicated_ip: false
You can deploy this pipeline by configuring your sink and running goldsky pipeline apply stellar_ledger_balances.yaml --status ACTIVE

Getting support

Can't find what you're looking for? Reach out to us at support@goldsky.com for help.