Goldsky provides curated Solana datasets with full historical data, making it easy to build pipelines for blocks, transactions, instructions, and token activity. All datasets are pre-processed and optimized for common use cases.
Turbo only: Solana datasets are exclusively available on Turbo. They are not supported in Mirror v1 pipelines.
Solana configuration differs from EVM chains:
Solana uses start_block (slot number) instead of start_at. Omit start_block to start from the latest slot.
Use end_block (not SQL WHERE clauses) to bound a Solana job mode pipeline.
in_order mode is available for Solana sources (not available for EVM)
Batch settings are not available for Solana sources
Solana sources use start_block to specify a starting slot number. This differs from EVM chains which use start_at: latest or start_at: earliest.
sources: # Start from a specific slot solana_transfers: type: dataset dataset_name: solana.token_transfers version: 1.0.0 start_block: 312000000 # Start from the latest slot (omit start_block) solana_live: type: dataset dataset_name: solana.token_transfers version: 1.0.0 # No start_block = start from latest
To start from the latest slot on Solana, simply omit the start_block parameter. This is different from EVM chains where you would use start_at: latest.
Solana sources accept an optional block_ranges field so a single pipeline can process several disjoint slot windows. This is useful for backfilling specific historical ranges without replaying everything in between, or for splitting a large backfill into several sharded pipelines.Syntax:block_ranges takes a JSON-encoded string (not a YAML list) of [start, end] pairs. Both bounds are inclusive.
Ranges must be non-empty, non-overlapping, and strictly increasing. The engine panics at startup if these invariants are violated.
A range’s start must be at or after the network’s earliest available block.
block_ranges takes precedence over start_block / end_block. If you set both, the legacy fields are ignored and a warning is logged.
After a checkpoint restore, the engine skips ahead to the next slot that falls inside any remaining range.
Use with job: true to run a bounded backfill that terminates cleanly once every range is processed — the pipeline exits after the epoch covering the final range’s end slot finalizes:
Decode Solana program instructions using IDL (Interface Definition Language):
name: raydium-swap-trackerresource_size: msources: instructions: type: dataset dataset_name: solana.instructions version: 1.0.0 start_block: 312000000transforms: # Decode Raydium instructions using IDL decoded_instructions: type: sql primary_key: id sql: | SELECT id, program_id, _gs_decode_instruction_data( _gs_fetch_abi('https://gist.githubusercontent.com/jeffling/a5fbae53f47570c0e66980f9229fc83d/raw/02f3bd30b742fb1b1af0fbb40897aeeb77c7b941/raydium-swap-idl.json', 'raw'), data ) as decoded, accounts, block_slot, block_timestamp, signature FROM instructions WHERE program_id = 'CPMMoo8L3F4NbTegBCKVNunggL7H1ZpdTHKxQB5qKP1C' # Extract decoded fields from the name and result JSON parsed_swaps: type: sql primary_key: id sql: | SELECT id, decoded.name as instruction_name, decoded.value as instruction_value, accounts[5] as token_in_account, accounts[6] as token_out_account FROM decoded_instructions where decoded.name = 'swap_base_input' or decoded.name = 'swap_base_output'sinks: postgres_swaps: type: postgres from: parsed_swaps schema: public table: raydium_swaps secret_name: MY_POSTGRES primary_key: id
Decoding functions:
_gs_decode_instruction_data(idl, data) - Decode instruction data using an IDL
_gs_decode_log_message(idl, log_messages) - Decode program log messages using an IDL
_gs_fetch_abi(url, 'raw') - Fetch IDL from a URL
The decoded result includes the instruction/event name and parameters.
Decode Solana program log messages to extract structured event data. This is useful for tracking program events like swaps, liquidations, or other on-chain actions that emit logs.
name: drift-decoded-logsresource_size: msources: transactions_with_instructions: type: dataset dataset_name: solana.transactions_with_instructions version: 1.0.0 start_block: 380000000transforms: # Decode Drift protocol log messages decoded_logs: type: sql primary_key: id sql: | SELECT id, block_slot, block_timestamp, signature, fee, log_messages, _gs_decode_log_message( _gs_fetch_abi('https://raw.githubusercontent.com/drift-labs/protocol-v2/master/sdk/src/idl/drift.json', 'raw'), log_messages ) as log_messages_decoded, _gs_op FROM transactions_with_instructions WHERE -- Filter to transactions with Drift program instructions array_length( array_filter(instructions, 'program_id', 'dRiftyHA39MWEi3m9aunc5MzRF1JYuBsbn6VPcn33UH') ) > 0 AND array_length(log_messages) > 0 AND status = 1sinks: postgres_logs: type: postgres from: decoded_logs schema: public table: drift_decoded_logs secret_name: MY_POSTGRES primary_key: id
Log message decoding works best with Anchor-based programs that emit structured events. The IDL must match the program version to decode correctly.
Track transaction patterns and success rates using SQL:
name: transaction-analyticsresource_size: msources: transactions: type: dataset dataset_name: solana.transactions version: 1.0.0 start_block: 312000000transforms: # Categorize transactions by fee and success tx_analysis: type: sql primary_key: id sql: | SELECT id, signature, block_slot, block_timestamp, status = 1 as is_successful, fee, compute_units_consumed, CASE WHEN fee < 5000 THEN 'low' WHEN fee < 50000 THEN 'medium' ELSE 'high' END as fee_category, CAST(compute_units_consumed AS DOUBLE) / CAST(fee AS DOUBLE) as compute_efficiency, _gs_op FROM transactionssinks: postgres_analytics: type: postgres from: tx_analysis schema: public table: solana_tx_analytics secret_name: MY_POSTGRES primary_key: id
Guide: Working with Transactions and Instructions Together
The transactions_with_instructions dataset provides a transaction-centric view with all instructions nested in an array. This is ideal when you need both transaction-level data and instruction details without joining separate datasets.
Analyze transaction success rates grouped by the programs involved:
name: program-success-analysisresource_size: msources: tx_with_instructions: type: dataset dataset_name: solana.transactions_with_instructions version: 1.0.0 start_block: 312000000transforms: # Extract program involvement program_transactions: type: sql primary_key: id sql: | SELECT id, signature, block_timestamp, status = 1 as is_successful, -- Get first instruction's program (typically the main program) instructions[1]['program_id'] as primary_program, -- Count unique programs in transaction array_length(instructions) as instruction_count, fee, compute_units_consumed, _gs_op FROM tx_with_instructions WHERE array_length(instructions) > 0sinks: postgres_analysis: type: postgres from: program_transactions schema: public table: program_transaction_analysis secret_name: MY_POSTGRES primary_key: id
To access individual instructions from the instructions array:Array indexing (1-based):
instructions[1] -- First instructioninstructions[1].program_id -- First instruction's programinstructions[1].data -- First instruction's data
Filtering arrays:
-- Get only Jupiter instructionsarray_filter(instructions, 'program_id', 'JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4')-- Get only top-level instructionsarray_filter(instructions, 'parent_index', null)-- Get first token program instructionarray_filter_first(instructions, 'program_id', 'TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA')
Counting:
-- Total instructions in transactionarray_length(instructions)-- Instructions for specific programarray_length(array_filter(instructions, 'program_id', 'JUP6...'))
Use array_filter() and array_filter_first() SQL functions to work
efficiently with the nested instruction arrays. See the SQL Functions
Reference
for more details.
Stream every Solana transaction in real-time to power a live counter, similar to the Total transactions to date counter on Solana’s homepage. This example includes any and all transactions, including failed ones.
Turbo pipelines use checkpoints to track processing progress. Understanding how checkpoints work helps you avoid unintended rewinds when updating pipeline configurations.
Changing start_block, end_block, or block_ranges triggers a deliberate rewind. The pipeline will restart from the new start_block, reprocessing all data.
Fetch the current configuration before re-applying
Always check the current pipeline definition before making changes:Using the CLI:
goldsky turbo get <pipeline-name> --output yaml
Using the UI:
Navigate to your pipeline in the dashboard to view the current configuration.
2
Keep start_block consistent
When updating other pipeline settings (transforms, sinks, resource size), keep the start_block the same as the running configuration to preserve your checkpoint.
If you intentionally want to reprocess data from a specific block, changing the start_block is the correct approach. The rewind behavior is by design to give you control over reprocessing.
When you only need data for specific accounts or programs, add a filter to your source configuration. This enables fast scan mode, which skips irrelevant slots during backfills by querying an index of which accounts and programs are active in each slot. This can dramatically speed up historical data processing.We index all accounts and programs referenced in Solana transactions, instructions, and cross-program invocations (CPIs). Program addresses should be specified in the program_ids field, while account addresses, including wallet public keys, Program Derived Addresses (PDAs), and token mint addresses, should be specified in the account_ids field.
The filter parameter is a YAML mapping with two optional fields:
Field
Description
account_ids
Comma-separated list of account addresses to filter by
program_ids
Comma-separated list of program addresses to filter by
You can specify one or both fields. When both are provided, slots containing activity from either the specified accounts or the specified programs are processed (the conditions are OR’d together).
# Filter by accounts onlyfilter: account_ids: "addr1,addr2,addr3"# Filter by program onlyfilter: program_ids: "progAddr"# Filter by both accounts and programsfilter: account_ids: "addr1,addr2" program_ids: "progAddr1,progAddr2"
The following addresses are not indexed and should be excluded from account_ids and program_ids when using fast scan:
The filter parameter speeds up backfills only — when processing historical data from a start_block. During real-time processing (when the pipeline has caught up to the chain tip), all slots are processed regardless of the filter.