How to build real-time Solana data pipelines for tokens, transactions, rewards, and instructions

Overview

Goldsky provides curated Solana datasets with full historical data, making it easy to build pipelines for blocks, transactions, instructions, and token activity. All datasets are pre-processed and optimized for common use cases.

Solana configuration differs from EVM chains:

Solana uses start_block (slot number) instead of start_at. Omit start_block to start from the latest slot.
Use end_block (not SQL WHERE clauses) to bound a Solana job mode pipeline.
in_order mode is available for Solana sources (not available for EVM)
Batch settings are not available for Solana sources

Quick Start

Get started with Solana data - choose the dataset that fits your use case: For token transfers:

sources:
  solana_token_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_block: 312000000

For transaction + instruction analysis:

sources:
  tx_with_instructions:
    type: dataset
    dataset_name: solana.transactions_with_instructions
    version: 1.0.0
    start_block: 312000000

Available datasets:

solana.blocks - Block data with leader info
solana.transactions - Transaction data with accounts and balances
solana.transactions_with_instructions - Transactions with nested instruction arrays
solana.instructions - Individual instructions (one row per instruction)
solana.token_transfers - SPL token transfers
solana.native_balances - SOL balance changes
solana.token_balances - SPL token balance changes
solana.rewards - Records of rewards distributed to validators for securing and validating the Solana network.

Dataset schemas

The tables below list the columns produced by each Solana dataset. List<Struct> columns are nested arrays; see Working with Nested Instructions for query patterns.

`solana.blocks` (v1.0.0)

Block-level summary, one row per block. SQL transform over the raw source. Primary key: slot.

Column	Type	Notes
`hash`	Utf8	Blockhash. PK is `slot`
`height`	Int64	`blockHeight`
`parent_slot`	Int64	`parentSlot`
`previous_block_hash`	Utf8	`previousBlockhash`
`skipped`	Boolean	Always `false`
`slot`	UInt64	Slot number. PK
`timestamp`	Timestamp	`to_timestamp(blockTime)`
`transaction_count`	Int64	Number of transactions in the block
`leader`	Utf8	Derived via `_gs_solana_get_leader_info`
`leader_reward`	Int64	Leader reward lamports

`solana.transactions` (v1.0.0)

One row per transaction. SQL transform over the raw source. Primary key: id (sl_transaction_{block_hash}_{index}).

Column	Type	Notes
`id`	Utf8	`sl_transaction_{block_hash}_{index}`. PK
`index`	Int64	Position within the block
`block_slot`	UInt64
`block_hash`	Utf8
`block_timestamp`	Timestamp
`accounts`	List<Struct>	Resolved account keys (`_gs_solana_get_accounts`)
`balance_changes`	List<Struct>	Native balance deltas (`_gs_solana_get_balance_changes`)
`pre_token_balances`	List<Struct>	From tx meta
`post_token_balances`	List<Struct>	From tx meta
`recent_block_hash`	Utf8
`signature`	Utf8	First signature
`err`	Utf8/Struct	Error object, null on success
`status`	Int64	`1` success / `0` failed
`compute_units_consumed`	UInt64
`fee`	UInt64	Lamports
`log_messages`	List<Utf8>

`solana.instructions` (v1.0.0)

One row per (flattened) instruction. solana_transform (transform: instructions). Primary key: id.

Column	Type	Notes
`id`	Utf8	PK
`index`	Int64	Instruction index within the tx
`parent_index`	Int64	Parent instruction index; null if top-level
`signature`	Utf8
`block_hash`	Utf8
`block_slot`	UInt64
`program_id`	Utf8	Invoked program
`data`	Utf8	Base-58 instruction data
`accounts`	List<Utf8>	Account keys referenced
`block_timestamp`	Int64	Epoch seconds
`err`	Utf8	null on success
`status`	UInt8	`1` / `0`
`tx_fee`	UInt64	Transaction fee (lamports)
`tx_index`	UInt64	Transaction index within the block

`solana.token_transfers` (v1.0.0)

SPL token transfers. Plugin + SQL (transform: token_transfers then SQL reshape). Primary key: id.

Column	Type	Notes
`id`	Utf8	PK
`signature`	Utf8
`block_slot`	UInt64
`block_hash`	Utf8
`block_timestamp`	Int64	Epoch seconds
`from_token_account`	Utf8
`to_token_account`	Utf8
`from_owner`	Utf8
`to_owner`	Utf8
`token_mint_address`	Utf8
`amount`	UInt64	Raw (unscaled) amount
`transfer_type`	Utf8	e.g. transfer / mint / burn
`err`	Utf8
`status`	UInt8	`1` / `0`
`authority`	Utf8
`mint_authority`	Utf8
`index`	Int64
`parent_index`	Int64
`tx_fee`	UInt64
`decimals`	UInt8	Mint decimals

`solana.token_transfers` (v1.1.0)

Same as v1.0.0 plus two fee columns (for transfers carrying a token fee):

Column	Type	Notes
…		all v1.0.0 columns
`fee_decimals`	UInt8	Fee token decimals
`fee_amount`	UInt64	Raw fee amount

`solana.native_balances` (v1.0.0)

Native SOL balance changes, one row per account balance delta. Derived from solana.transactions via SQL. Primary key: id ({block_hash}_NATIVE_{tx_index}_{index}).

Column	Type	Notes
`id`	Utf8	PK
`block_slot`	UInt64
`block_hash`	Utf8
`block_timestamp`	Timestamp
`tx_index`	Int64
`signature`	Utf8
`account`	Utf8
`amount_before`	Float64	Lamports `/ 10^9` (SOL)
`amount_after`	Float64	Lamports `/ 10^9` (SOL)

`solana.token_balances` (v1.0.0)

SPL token balances per (owner, mint) per transaction. Derived from solana.transactions via SQL (pre/post union). Primary key: id ({block_hash}_{tx_index}_{index}).

Column	Type	Notes
`id`	Utf8	PK
`block_slot`	UInt64
`block_hash`	Utf8
`block_timestamp`	Timestamp
`tx_index`	Int64
`signature`	Utf8
`account`	Utf8	Token account pubkey
`decimals`	UInt8	Mint decimals
`mint`	Utf8
`owner`	Utf8
`amount_before`	Float64	`uiAmount` (human-scaled), `0` if new
`amount_after`	Float64	`uiAmount` (human-scaled), `0` if closed

`solana.rewards` (v1.0.0)

Staking/validator rewards, one row per reward entry. SQL transform over the raw source. Primary key: id (sl_reward_{block_hash}_{index}).

Column	Type	Notes
`id`	Utf8	`sl_reward_{block_hash}_{index}`. PK
`block_slot`	UInt64
`block_hash`	Utf8
`block_timestamp`	Int64	`blockTime` (epoch seconds)
`pubkey`	Utf8	Reward recipient
`lamports`	Int64	Reward amount (lamports)
`reward_type`	Utf8	`rewardType`
`commission`	UInt8	Validator commission; nullable
`post_balance`	UInt64	`postBalance` (lamports)

`solana.transactions_with_instructions` (v1.0.0)

One row per transaction with its grouped instructions and token balances inline — the base for dex_swaps. solana_transform (transform: transactions_with_instructions). Primary key: id.

Column	Type	Notes
`id`	Utf8	PK
`index`	Int64
`signature`	Utf8
`block_hash`	Utf8
`block_slot`	UInt64
`block_timestamp`	Int64	Epoch seconds
`fee`	UInt64
`status`	UInt8	`1` / `0`
`err`	Utf8
`accounts`	List<Utf8>
`log_messages`	List<Utf8>
`balance_changes`	List<Struct>	`{account, before, after}`
`pre_token_balances`	List<Struct>	`{accountIndex, mint, owner, programId, uiTokenAmount}`
`post_token_balances`	List<Struct>	same struct as above
`compute_units_consumed`	UInt64
`instructions`	List<Struct>	Grouped (nested) instructions

Starting position

Solana sources use start_block to specify a starting slot number. This differs from EVM chains which use start_at: latest or start_at: earliest.

sources:
  # Start from a specific slot
  solana_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_block: 312000000

  # Start from the latest slot (omit start_block)
  solana_live:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    # No start_block = start from latest

To start from the latest slot on Solana, simply omit the start_block parameter. This is different from EVM chains where you would use start_at: latest.

Multiple block ranges

Solana sources accept an optional block_ranges field so a single pipeline can process several disjoint slot windows. This is useful for backfilling specific historical ranges without replaying everything in between, or for splitting a large backfill into several sharded pipelines. Syntax: block_ranges takes a JSON-encoded string (not a YAML list) of [start, end] pairs. Both bounds are inclusive.

sources:
  targeted_backfill:
    type: dataset
    dataset_name: solana.transactions
    version: 1.0.0
    block_ranges: "[[300000000, 300000099], [300100000, 300100099]]"

Rules:

Ranges must be non-empty, non-overlapping, and strictly increasing. The engine panics at startup if these invariants are violated.
A range’s start must be at or after the network’s earliest available block.
block_ranges takes precedence over start_block / end_block. If you set both, the legacy fields are ignored and a warning is logged.
After a checkpoint restore, the engine skips ahead to the next slot that falls inside any remaining range.

Use with job: true to run a bounded backfill that terminates cleanly once every range is processed — the pipeline exits after the epoch covering the final range’s end slot finalizes:

name: solana-range-backfill
job: true
resource_size: m

sources:
  disjoint_ranges:
    type: dataset
    dataset_name: solana.transactions
    version: 1.0.0
    block_ranges: "[[300000000, 300000099], [300100000, 300100099]]"

sinks:
  output:
    type: postgres
    from: disjoint_ranges
    schema: public
    table: solana_backfill
    secret_name: MY_POSTGRES
    primary_key: id

Changing block_ranges on a running pipeline triggers a rewind, just like changing start_block or end_block. See Source checkpoints and rewinds.

Guide: Track Specific SPL Tokens

Monitor transfers for specific tokens like USDC:

name: solana-usdc-tracker
resource_size: s

sources:
  token_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_block: 312000000

transforms:
  # Filter to USDC transfers only
  usdc_transfers:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        token_mint_address,
        from_token_account,
        to_token_account,
        amount,
        block_slot,
        block_timestamp,
        signature as transaction_signature,
        _gs_op
      FROM token_transfers
      WHERE token_mint_address = 'EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v'

sinks:
  postgres_usdc:
    type: postgres
    from: usdc_transfers
    schema: public
    table: solana_usdc_transfers
    secret_name: MY_POSTGRES
    primary_key: id

Guide: Track Large SOL Balance Changes

Monitor accounts with significant SOL movement:

name: sol-whale-tracker
resource_size: s

sources:
  native_balances:
    type: dataset
    dataset_name: solana.native_balances
    version: 1.0.0
    start_block: 312000000

transforms:
  # Find large balance changes (>100 SOL)
  large_changes:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        account,
        amount_before,
        amount_after,
        amount_after - amount_before as net_change,
        block_slot,
        _gs_op
      FROM native_balances
      WHERE ABS(amount_after - amount_before) > 100

sinks:
  postgres_whales:
    type: postgres
    from: large_changes
    schema: public
    table: sol_whale_activity
    secret_name: MY_POSTGRES
    primary_key: id

Guide: Decode Program Instructions

Decode Solana program instructions using IDL (Interface Definition Language):

name: raydium-swap-tracker
resource_size: m

sources:
  instructions:
    type: dataset
    dataset_name: solana.instructions
    version: 1.0.0
    start_block: 312000000

transforms:
  # Decode Raydium instructions using IDL
  decoded_instructions:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        program_id,
        _gs_decode_instruction_data(
          _gs_fetch_abi('https://gist.githubusercontent.com/jeffling/a5fbae53f47570c0e66980f9229fc83d/raw/02f3bd30b742fb1b1af0fbb40897aeeb77c7b941/raydium-swap-idl.json', 'raw'),
          data
        ) as decoded,
        accounts,
        block_slot,
        block_timestamp,
        signature
      FROM instructions
      WHERE program_id = 'CPMMoo8L3F4NbTegBCKVNunggL7H1ZpdTHKxQB5qKP1C'

  # Extract decoded fields from the name and result JSON
  parsed_swaps:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        decoded.name as instruction_name,
        decoded.value as instruction_value,
        accounts[5] as token_in_account,
        accounts[6] as token_out_account
      FROM decoded_instructions
      where decoded.name = 'swap_base_input'
      or decoded.name = 'swap_base_output'

sinks:
  postgres_swaps:
    type: postgres
    from: parsed_swaps
    schema: public
    table: raydium_swaps
    secret_name: MY_POSTGRES
    primary_key: id

Decoding functions:

_gs_decode_instruction_data(idl, data) - Decode instruction data using an IDL
_gs_decode_log_message(idl, log_messages) - Decode program log messages using an IDL
_gs_fetch_abi(url, 'raw') - Fetch IDL from a URL

The decoded result includes the instruction/event name and parameters.

Guide: Decode Program Log Messages

Decode Solana program log messages to extract structured event data. This is useful for tracking program events like swaps, liquidations, or other on-chain actions that emit logs.

name: drift-decoded-logs
resource_size: m

sources:
  transactions_with_instructions:
    type: dataset
    dataset_name: solana.transactions_with_instructions
    version: 1.0.0
    start_block: 380000000

transforms:
  # Decode Drift protocol log messages
  decoded_logs:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        block_slot,
        block_timestamp,
        signature,
        fee,
        log_messages,
        _gs_decode_log_message(
          _gs_fetch_abi('https://raw.githubusercontent.com/drift-labs/protocol-v2/master/sdk/src/idl/drift.json', 'raw'),
          log_messages
        ) as log_messages_decoded,
        _gs_op
      FROM transactions_with_instructions
      WHERE
        -- Filter to transactions with Drift program instructions
        array_length(
          array_filter(instructions, 'program_id', 'dRiftyHA39MWEi3m9aunc5MzRF1JYuBsbn6VPcn33UH')
        ) > 0
        AND array_length(log_messages) > 0
        AND status = 1

sinks:
  postgres_logs:
    type: postgres
    from: decoded_logs
    schema: public
    table: drift_decoded_logs
    secret_name: MY_POSTGRES
    primary_key: id

Log message decoding works best with Anchor-based programs that emit structured events. The IDL must match the program version to decode correctly.

Guide: Analyze Transaction Success Rates

Track transaction patterns and success rates using SQL:

name: transaction-analytics
resource_size: m

sources:
  transactions:
    type: dataset
    dataset_name: solana.transactions
    version: 1.0.0
    start_block: 312000000

transforms:
  # Categorize transactions by fee and success
  tx_analysis:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        signature,
        block_slot,
        block_timestamp,
        status = 1 as is_successful,
        fee,
        compute_units_consumed,
        CASE
          WHEN fee < 5000 THEN 'low'
          WHEN fee < 50000 THEN 'medium'
          ELSE 'high'
        END as fee_category,
        CAST(compute_units_consumed AS DOUBLE) / CAST(fee AS DOUBLE) as compute_efficiency,
        _gs_op
      FROM transactions

sinks:
  postgres_analytics:
    type: postgres
    from: tx_analysis
    schema: public
    table: solana_tx_analytics
    secret_name: MY_POSTGRES
    primary_key: id

Guide: Track Specific Programs

Monitor all instructions for a specific program using SQL:

name: jupiter-tracker
resource_size: m

sources:
  instructions:
    type: dataset
    dataset_name: solana.instructions
    version: 1.0.0
    start_block: 312000000

transforms:
  # Filter to Jupiter program instructions
  jupiter_instructions:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        program_id,
        data,
        accounts,
        block_slot,
        block_timestamp,
        signature,
        _gs_op
      FROM instructions
      WHERE program_id = 'JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4'

sinks:
  postgres_jupiter:
    type: postgres
    from: jupiter_instructions
    schema: public
    table: jupiter_instructions
    secret_name: MY_POSTGRES
    primary_key: id

Guide: Multi-Account Monitoring with Dynamic Tables

Track transfers involving specific accounts:

name: account-tracker
resource_size: m

sources:
  token_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_block: 312000000

transforms:
  # Dynamic table for tracked accounts
  tracked_accounts:
    type: dynamic_table
    backend_type: Postgres
    backend_entity_name: tracked_accounts
    secret_name: MY_POSTGRES

  # Filter to tracked accounts
  monitored_transfers:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        token_mint_address,
        from_token_account,
        to_token_account,
        amount,
        decimals,
        CASE
          WHEN dynamic_table_check('tracked_accounts', from_token_account) THEN 'outgoing'
          WHEN dynamic_table_check('tracked_accounts', to_token_account) THEN 'incoming'
        END as direction,
        block_slot,
        block_timestamp,
        _gs_op
      FROM token_transfers
      WHERE dynamic_table_check('tracked_accounts', from_token_account)
         OR dynamic_table_check('tracked_accounts', to_token_account)

sinks:
  postgres_monitored:
    type: postgres
    from: monitored_transfers
    schema: public
    table: monitored_transfers
    secret_name: MY_POSTGRES
    primary_key: id

Add accounts to track:

-- Add Solana account to monitor
INSERT INTO turbo.tracked_accounts VALUES
  ('7xKXtg2CW87d97TXJSDpbD5jBkheTqA83TZRuJosgAsU');

Guide: Working with Transactions and Instructions Together

The transactions_with_instructions dataset provides a transaction-centric view with all instructions nested in an array. This is ideal when you need both transaction-level data and instruction details without joining separate datasets.

When to Use transactions_with_instructions

Use transactions_with_instructions

Analyzing multi-instruction transactions - Counting instructions per transaction - Transaction-level aggregations with instruction filtering - Examining instruction sequences within transactions

Use separate datasets

Simple instruction filtering by program - Individual instruction analysis
Better SQL performance for instruction-only queries - Joining instructions with other data

Schema Overview

Each row represents one transaction with nested arrays: Transaction fields: id, index, block_slot, block_hash, block_timestamp, accounts, balance_changes, pre_token_balances, post_token_balances, recent_block_hash, signature, err, status, compute_units_consumed, fee, log_messages Nested instruction array: instructions - Array of instruction structs, each containing:

id, index, parent_index (null for top-level, set for inner instructions), signature, block_slot, block_timestamp, block_hash, tx_fee, tx_index
program_id, data, accounts
status, err

You can see a sample of the transactions_with_instructions dataset here

Example: Analyze Multi-Step Swap Transactions

Find transactions with multiple Jupiter swap instructions:

name: jupiter-multi-hop-swaps
resource_size: m

sources:
  tx_with_instructions:
    type: dataset
    dataset_name: solana.transactions_with_instructions
    version: 1.0.0
    start_block: 312000000

transforms:
  # Filter to Jupiter transactions and count instructions
  jupiter_swaps:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        signature,
        block_slot,
        block_timestamp,
        fee,
        status,
        compute_units_consumed,
        -- Count Jupiter instructions in this transaction
        array_length(
          array_filter(instructions, 'program_id', 'JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4')
        ) as jupiter_instruction_count,
        -- Get all instructions for analysis
        instructions,
        _gs_op
      FROM tx_with_instructions
      WHERE array_length(
        array_filter(instructions, 'program_id', 'JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4')
      ) > 0

  # Filter to multi-hop swaps (more than 1 Jupiter instruction)
  multi_hop_swaps:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        signature,
        block_timestamp,
        jupiter_instruction_count,
        fee,
        compute_units_consumed,
        CAST(compute_units_consumed AS DOUBLE) / CAST(fee AS DOUBLE) as compute_efficiency,
        _gs_op
      FROM jupiter_swaps
      WHERE jupiter_instruction_count > 1

sinks:
  postgres_swaps:
    type: postgres
    from: multi_hop_swaps
    schema: public
    table: jupiter_multi_hop_swaps
    secret_name: MY_POSTGRES
    primary_key: signature

Example: Examine Inner Instructions

Analyze transactions with inner instructions (Cross-Program Invocations):

name: inner-instruction-analyzer
resource_size: m

sources:
  tx_with_instructions:
    type: dataset
    dataset_name: solana.transactions_with_instructions
    version: 1.0.0
    start_block: 312000000

transforms:
  # Identify transactions with inner instructions
  tx_with_cpi:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        signature,
        block_timestamp,
        -- Total instruction count
        array_length(instructions) as total_instructions,
        -- Count top-level instructions (parent_index is null)
        array_length(
          array_filter(instructions, 'parent_index', null)
        ) as top_level_count,
        -- Calculate inner instruction count
        array_length(instructions) -
        array_length(
          array_filter(instructions, 'parent_index', null)
        ) as inner_instruction_count,
        fee,
        status,
        _gs_op
      FROM tx_with_instructions
      WHERE array_length(instructions) >
            array_length(
              array_filter(instructions, 'parent_index', null)
            )

sinks:
  postgres_cpi:
    type: postgres
    from: tx_with_cpi
    schema: public
    table: transactions_with_cpi
    secret_name: MY_POSTGRES
    primary_key: id

Example: Transaction Success Analysis by Program

Analyze transaction success rates grouped by the programs involved:

name: program-success-analysis
resource_size: m

sources:
  tx_with_instructions:
    type: dataset
    dataset_name: solana.transactions_with_instructions
    version: 1.0.0
    start_block: 312000000

transforms:
  # Extract program involvement
  program_transactions:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        signature,
        block_timestamp,
        status = 1 as is_successful,
        -- Get first instruction's program (typically the main program)
        instructions[1]['program_id'] as primary_program,
        -- Count unique programs in transaction
        array_length(instructions) as instruction_count,
        fee,
        compute_units_consumed,
        _gs_op
      FROM tx_with_instructions
      WHERE array_length(instructions) > 0

sinks:
  postgres_analysis:
    type: postgres
    from: program_transactions
    schema: public
    table: program_transaction_analysis
    secret_name: MY_POSTGRES
    primary_key: id

Working with Nested Instructions

To access individual instructions from the instructions array: Array indexing (1-based):

instructions[1]           -- First instruction
instructions[1].program_id -- First instruction's program
instructions[1].data      -- First instruction's data

Filtering arrays:

-- Get only Jupiter instructions
array_filter(instructions, 'program_id', 'JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4')

-- Get only top-level instructions
array_filter(instructions, 'parent_index', null)

-- Get first token program instruction
array_filter_first(instructions, 'program_id', 'TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA')

Counting:

-- Total instructions in transaction
array_length(instructions)

-- Instructions for specific program
array_length(array_filter(instructions, 'program_id', 'JUP6...'))

Use array_filter() and array_filter_first() SQL functions to work efficiently with the nested instruction arrays. See the SQL Functions Reference for more details.

Guide: Recreating Solana’s Transaction Counter

Stream every Solana transaction in real-time to power a live counter, similar to the Total transactions to date counter on Solana’s homepage. This example includes any and all transactions, including failed ones.

name: solana-transaction-counter
resource_size: m

sources:
  transactions:
    type: dataset
    dataset_name: solana.transactions
    version: 1.0.0

transforms:
  # Filter transactions to reduce the payload
  minimal_txns:
    type: sql
    primary_key: signature
    sql: |
      SELECT
        signature,
        block_slot,
        block_timestamp
      FROM transactions

sinks:
  postgres_transactions: 
    type: postgres
    from: minimal_txns
    schema: public
    table: solana_transactions_counter
    secret_name: MY_POSTGRES
    primary_key: signature

Source checkpoints and rewinds

Turbo pipelines use checkpoints to track processing progress. Understanding how checkpoints work helps you avoid unintended rewinds when updating pipeline configurations.

How checkpoints work

Checkpoints use a hash of your source configuration (start_block, end_block, and block_ranges) to identify the user’s intent:

Same hash: The pipeline resumes from the existing checkpoint
Different hash: The pipeline treats this as a “source change” and rewinds to the new start_block

Preventing unintended rewinds

A common issue occurs when re-applying a pipeline with a different start_block than the currently running configuration. For example:

# Original configuration (pipeline already running at slot 315000000)
sources:
  token_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_block: 312000000  # Original start block

If you re-apply with an earlier start_block, the pipeline will rewind:

# Re-applied with earlier start block - triggers rewind!
sources:
  token_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_block: 310000000  # Earlier block triggers source change

Changing start_block, end_block, or block_ranges triggers a deliberate rewind. The pipeline will restart from the new start_block, reprocessing all data.

Best practices

Fetch the current configuration before re-applying

Always check the current pipeline definition before making changes:Using the CLI:

goldsky turbo get <pipeline-name> --output yaml

Using the UI: Navigate to your pipeline in the dashboard to view the current configuration.

Keep start_block consistent

When updating other pipeline settings (transforms, sinks, resource size), keep the start_block the same as the running configuration to preserve your checkpoint.

If you intentionally want to reprocess data from a specific block, changing the start_block is the correct approach. The rewind behavior is by design to give you control over reprocessing.

Filtering by Account or Program

When you only need data for specific accounts or programs, add a filter to your source configuration. This enables fast scan mode, which skips irrelevant slots during backfills by querying an index of which accounts and programs are active in each slot. This can dramatically speed up historical data processing. We index all accounts and programs referenced in Solana transactions, instructions, and cross-program invocations (CPIs). Program addresses should be specified in the program_ids field, while account addresses, including wallet public keys, Program Derived Addresses (PDAs), and token mint addresses, should be specified in the account_ids field.

sources:
  token_balances:
    type: dataset
    dataset_name: solana.token_balances
    version: 1.0.0
    start_block: 312000000
    filter:
      account_ids: "EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v"

Filter syntax

The filter parameter is a YAML mapping with two optional fields:

Field	Description
`account_ids`	Comma-separated list of account addresses to filter by
`program_ids`	Comma-separated list of program addresses to filter by

You can specify one or both fields. When both are provided, slots containing activity from either the specified accounts or the specified programs are processed (the conditions are OR’d together).

# Filter by accounts only
filter:
  account_ids: "addr1,addr2,addr3"

# Filter by program only
filter:
  program_ids: "progAddr"

# Filter by both accounts and programs
filter:
  account_ids: "addr1,addr2"
  program_ids: "progAddr1,progAddr2"

The following addresses are not indexed and should be excluded from account_ids and program_ids when using fast scan:

Vote111111111111111111111111111111111111111
11111111111111111111111111111111
TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA
TokenzQdBNbLqP5VEhdkAS6EPFLC1PHnBqCXEpPxuEb
ATokenGPvbdGVxr1b2hvZbsiqW5xWH25efTNsLJA8knL
ComputeBudget111111111111111111111111111111
SysvarC1ock11111111111111111111111111111111
AddressLookupTab1e1111111111111111111111111
MemoSq4gqABAXKb96qnH8TysNcWxMyWCqXgDLGmfcHr
Memo1UhkJRfHyvLMcVucJwxXeuD728EqVDDwQDxFMNo
BPFLoaderUpgradeab1e11111111111111111111111
Stake11111111111111111111111111111111111111

Example: Backfill token balances for specific accounts

name: account-balance-backfill
resource_size: m

sources:
  token_balances:
    type: dataset
    dataset_name: solana.token_balances
    version: 1.0.0
    start_block: 312000000
    filter:
      account_ids: "5iEFdmkmdEbzfNmBdRioRUQJiPVNwsJaX4Nmh4HBZH9d,px1rbjiEWwwcq1epXSsTMJERyQy7h4vg4VopqFz2HwH"

transforms: {}

sinks:
  postgres_balances:
    type: postgres
    from: token_balances
    schema: public
    table: tracked_token_balances
    secret_name: MY_POSTGRES
    primary_key: id

Example: Backfill transactions for a specific program

name: program-transaction-backfill
resource_size: m

sources:
  transactions:
    type: dataset
    dataset_name: solana.transactions
    version: 1.0.0
    start_block: 312000000
    filter:
      program_ids: "6EF8rrecthR5Dkzon8Nwu78hRvfCKubJ14M5uBEwF6P"

transforms: {}

sinks:
  postgres_txns:
    type: postgres
    from: transactions
    schema: public
    table: program_transactions
    secret_name: MY_POSTGRES
    primary_key: id

The filter parameter speeds up backfills only — when processing historical data from a start_block. During real-time processing (when the pipeline has caught up to the chain tip), all slots are processed regardless of the filter.

Performance Tips

Choose the right dataset

Use the most specific dataset:

Token transfers? Use solana.token_transfers
Balance changes? Use solana.native_balances or solana.token_balances
Transaction-level analysis with instruction details? Use solana.transactions_with_instructions
Individual instruction analysis? Use solana.instructions
Transaction metadata only? Use solana.transactions

Use source filters for backfills

If you only need data for specific accounts or programs, use the filter parameter on your source to skip irrelevant slots during backfills:

sources:
  instructions:
    type: dataset
    dataset_name: solana.instructions
    version: 1.0.0
    start_block: 312000000
    filter:
      program_ids: "JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4"

This is much faster than filtering in SQL transforms alone, as it skips entire slots that don’t contain relevant data.

Filter early in SQL transforms

Apply filters in SQL as early as possible:

transforms:
  # SQL filter first
  filtered:
    type: sql
    sql: SELECT * FROM instructions WHERE program_id = '...'

  # Then decode
  decoded:
    type: sql
    from: filtered
    sql: |
      SELECT
        *,
        _gs_decode_instruction_data(
          _gs_fetch_abi('https://api.example.com/idl.json', 'raw'),
          instruction_data
        ) as decoded
      FROM filtered

Use appropriate resource sizes

Solana has high throughput. Start with medium or large:

resource_size: m  # or l for high-volume

​Overview

​Quick Start

​Dataset schemas

​solana.blocks (v1.0.0)

​solana.transactions (v1.0.0)

​solana.instructions (v1.0.0)

​solana.token_transfers (v1.0.0)

​solana.token_transfers (v1.1.0)

​solana.native_balances (v1.0.0)

​solana.token_balances (v1.0.0)

​solana.rewards (v1.0.0)

​solana.transactions_with_instructions (v1.0.0)

​Starting position

​Multiple block ranges

​Guide: Track Specific SPL Tokens

​Guide: Track Large SOL Balance Changes

​Guide: Decode Program Instructions

​Guide: Decode Program Log Messages

​Guide: Analyze Transaction Success Rates

​Guide: Track Specific Programs

​Guide: Multi-Account Monitoring with Dynamic Tables

​Guide: Working with Transactions and Instructions Together

​When to Use transactions_with_instructions

Use transactions_with_instructions

Use separate datasets

​Schema Overview

​Example: Analyze Multi-Step Swap Transactions

​Example: Examine Inner Instructions

​Example: Transaction Success Analysis by Program

​Working with Nested Instructions

​Guide: Recreating Solana’s Transaction Counter

​Source checkpoints and rewinds

​How checkpoints work

​Preventing unintended rewinds

​Best practices

​Filtering by Account or Program

​Filter syntax

​Example: Backfill token balances for specific accounts

​Example: Backfill transactions for a specific program

​Performance Tips

Overview

Quick Start

Dataset schemas

`solana.blocks` (v1.0.0)

`solana.transactions` (v1.0.0)

`solana.instructions` (v1.0.0)

`solana.token_transfers` (v1.0.0)

`solana.token_transfers` (v1.1.0)

`solana.native_balances` (v1.0.0)

`solana.token_balances` (v1.0.0)

`solana.rewards` (v1.0.0)

`solana.transactions_with_instructions` (v1.0.0)

Starting position

Multiple block ranges

Guide: Track Specific SPL Tokens

Guide: Track Large SOL Balance Changes

Guide: Decode Program Instructions

Guide: Decode Program Log Messages

Guide: Analyze Transaction Success Rates

Guide: Track Specific Programs

Guide: Multi-Account Monitoring with Dynamic Tables

Guide: Working with Transactions and Instructions Together

When to Use transactions_with_instructions

Schema Overview

Example: Analyze Multi-Step Swap Transactions

Example: Examine Inner Instructions

Example: Transaction Success Analysis by Program

Working with Nested Instructions

Guide: Recreating Solana’s Transaction Counter

Source checkpoints and rewinds

How checkpoints work

Preventing unintended rewinds

Best practices

Filtering by Account or Program

Filter syntax

Example: Backfill token balances for specific accounts

Example: Backfill transactions for a specific program

Performance Tips