Goldsky provides curated Solana datasets with full historical data, making it easy to build pipelines for blocks, transactions, instructions, and token activity. All datasets are pre-processed and optimized for common use cases.
Solana configuration differs from EVM chains:
Solana uses start_block (slot number) instead of start_at. Omit start_block to start from the latest slot.
in_order mode is available for Solana sources (not available for EVM)
Batch settings are not available for Solana sources
Solana sources use start_block to specify a starting slot number. This differs from EVM chains which use start_at: latest or start_at: earliest.
Copy
Ask AI
sources: # Start from a specific slot solana_transfers: type: dataset dataset_name: solana.token_transfers version: 1.0.0 start_block: 312000000 # Start from the latest slot (omit start_block) solana_live: type: dataset dataset_name: solana.token_transfers version: 1.0.0 # No start_block = start from latest
To start from the latest slot on Solana, simply omit the start_block parameter. This is different from EVM chains where you would use start_at: latest.
Decode Solana program instructions using IDL (Interface Definition Language):
Copy
Ask AI
name: raydium-swap-trackerresource_size: msources: instructions: type: dataset dataset_name: solana.instructions version: 1.0.0 start_block: 312000000transforms: # Decode Raydium instructions using IDL decoded_instructions: type: sql primary_key: id sql: | SELECT id, program_id, _gs_decode_instruction_data( _gs_fetch_abi('https://gist.githubusercontent.com/jeffling/a5fbae53f47570c0e66980f9229fc83d/raw/02f3bd30b742fb1b1af0fbb40897aeeb77c7b941/raydium-swap-idl.json', 'raw'), data ) as decoded, accounts, block_slot, block_timestamp, signature FROM instructions WHERE program_id = 'CPMMoo8L3F4NbTegBCKVNunggL7H1ZpdTHKxQB5qKP1C' # Extract decoded fields from the name and result JSON parsed_swaps: type: sql primary_key: id sql: | SELECT id, decoded.name as instruction_name, decoded.value as instruction_value, accounts[5] as token_in_account, accounts[6] as token_out_account FROM decoded_instructions where decoded.name = 'swap_base_input' or decoded.name = 'swap_base_output'sinks: postgres_swaps: type: postgres from: parsed_swaps schema: public table: raydium_swaps secret_name: MY_POSTGRES primary_key: id
Decoding functions:
_gs_decode_instruction_data(idl, data) - Decode instruction data using an IDL
_gs_decode_log_message(idl, log_messages) - Decode program log messages using an IDL
_gs_fetch_abi(url, 'raw') - Fetch IDL from a URL
The decoded result includes the instruction/event name and parameters.
Decode Solana program log messages to extract structured event data. This is useful for tracking program events like swaps, liquidations, or other on-chain actions that emit logs.
Copy
Ask AI
name: drift-decoded-logsresource_size: msources: transactions_with_instructions: type: dataset dataset_name: solana.transactions_with_instructions version: 1.0.0 start_block: 380000000transforms: # Decode Drift protocol log messages decoded_logs: type: sql primary_key: id sql: | SELECT id, block_slot, block_timestamp, signature, fee, log_messages, _gs_decode_log_message( _gs_fetch_abi('https://raw.githubusercontent.com/drift-labs/protocol-v2/master/sdk/src/idl/drift.json', 'raw'), log_messages ) as log_messages_decoded, _gs_op FROM transactions_with_instructions WHERE -- Filter to transactions with Drift program instructions array_length( array_filter(instructions, 'program_id', 'dRiftyHA39MWEi3m9aunc5MzRF1JYuBsbn6VPcn33UH') ) > 0 AND array_length(log_messages) > 0 AND status = 1sinks: postgres_logs: type: postgres from: decoded_logs schema: public table: drift_decoded_logs secret_name: MY_POSTGRES primary_key: id
Log message decoding works best with Anchor-based programs that emit structured events. The IDL must match the program version to decode correctly.
Track transaction patterns and success rates using SQL:
Copy
Ask AI
name: transaction-analyticsresource_size: msources: transactions: type: dataset dataset_name: solana.transactions version: 1.0.0 start_block: 312000000transforms: # Categorize transactions by fee and success tx_analysis: type: sql primary_key: id sql: | SELECT id, signature, block_slot, block_timestamp, status = 1 as is_successful, fee, compute_units_consumed, CASE WHEN fee < 5000 THEN 'low' WHEN fee < 50000 THEN 'medium' ELSE 'high' END as fee_category, CAST(compute_units_consumed AS DOUBLE) / CAST(fee AS DOUBLE) as compute_efficiency, _gs_op FROM transactionssinks: postgres_analytics: type: postgres from: tx_analysis schema: public table: solana_tx_analytics secret_name: MY_POSTGRES primary_key: id
Guide: Working with Transactions and Instructions Together
The transactions_with_instructions dataset provides a transaction-centric view with all instructions nested in an array. This is ideal when you need both transaction-level data and instruction details without joining separate datasets.
Analyze transaction success rates grouped by the programs involved:
Copy
Ask AI
name: program-success-analysisresource_size: msources: tx_with_instructions: type: dataset dataset_name: solana.transactions_with_instructions version: 1.0.0 start_block: 312000000transforms: # Extract program involvement program_transactions: type: sql primary_key: id sql: | SELECT id, signature, block_timestamp, status = '1' as is_successful, -- Get first instruction's program (typically the main program) instructions[1]['program_id'] as primary_program, -- Count unique programs in transaction array_length(instructions) as instruction_count, fee, compute_units_consumed, _gs_op FROM tx_with_instructions WHERE array_length(instructions) > 0sinks: postgres_analysis: type: postgres from: program_transactions schema: public table: program_transaction_analysis secret_name: MY_POSTGRES primary_key: id
To access individual instructions from the instructions array:Array indexing (1-based):
Copy
Ask AI
instructions[1] -- First instructioninstructions[1].program_id -- First instruction's programinstructions[1].data -- First instruction's data
Filtering arrays:
Copy
Ask AI
-- Get only Jupiter instructionsarray_filter(instructions, 'program_id', 'JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4')-- Get only top-level instructionsarray_filter(instructions, 'parent_index', null)-- Get first token program instructionarray_filter_first(instructions, 'program_id', 'TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA')
Counting:
Copy
Ask AI
-- Total instructions in transactionarray_length(instructions)-- Instructions for specific programarray_length(array_filter(instructions, 'program_id', 'JUP6...'))
Use array_filter() and array_filter_first() SQL functions to work
efficiently with the nested instruction arrays. See the SQL Functions
Reference
for more details.
Stream every Solana transaction in real-time to power a live counter, similar to the Total transactions to date counter on Solana’s homepage. This example includes any and all transactions, including failed ones.
Turbo pipelines use checkpoints to track processing progress. Understanding how checkpoints work helps you avoid unintended rewinds when updating pipeline configurations.
Changing start_block, end_block, or block_range triggers a deliberate rewind. The pipeline will restart from the new start_block, reprocessing all data.
Fetch the current configuration before re-applying
Always check the current pipeline definition before making changes:Using the CLI:
Copy
Ask AI
goldsky turbo get <pipeline-name> --output yaml
Using the UI:
Navigate to your pipeline in the dashboard to view the current configuration.
2
Keep start_block consistent
When updating other pipeline settings (transforms, sinks, resource size), keep the start_block the same as the running configuration to preserve your checkpoint.
If you intentionally want to reprocess data from a specific block, changing the start_block is the correct approach. The rewind behavior is by design to give you control over reprocessing.
When you only need data for specific accounts or programs, add a filter to your source configuration. This enables fast scan mode, which skips irrelevant slots during backfills by querying an index of which accounts and programs are active in each slot. This can dramatically speed up historical data processing.
The filter parameter is a YAML mapping with two optional fields:
Field
Description
account_ids
Comma-separated list of account addresses to filter by
program_ids
Comma-separated list of program addresses to filter by
You can specify one or both fields. When both are provided, only slots containing activity from the specified accounts and programs are processed.
Copy
Ask AI
# Filter by accounts onlyfilter: account_ids: "addr1,addr2,addr3"# Filter by program onlyfilter: program_ids: "progAddr"# Filter by both accounts and programsfilter: account_ids: "addr1,addr2" program_ids: "progAddr1,progAddr2"
The filter parameter speeds up backfills only — when processing historical data from a start_block. During real-time processing (when the pipeline has caught up to the chain tip), all slots are processed regardless of the filter.