Sources are the entry points for data in your Turbo pipelines. They define where your pipeline reads blockchain data from and how it should be consumed - whether that’s Ethereum transfers, Solana transactions, or any other on-chain activity. Turbo pipelines use dataset sources to give you clean, versioned access to Goldsky’s curated blockchain datasets with consistent schemas across chains. In contrast to Mirror, subgraphs as a source are not currently supported in Turbo.Documentation Index
Fetch the complete documentation index at: https://docs.goldsky.com/llms.txt
Use this file to discover all available pages before exploring further.
EVM Sources
All EVM networks that are supported on Mirror are also supported in Turbo, with the same schemas.
Solana Sources
Turbo-exclusive Solana source with data from genesis (Mirror version only has data from Summer 2024).
Stellar Sources
Stellar data is also available on Mirror, but the “wide-row” format is best managed via Turbo.
NEAR Sources
NEAR receipts, transactions, and execution outcomes for indexing.
Bitcoin Sources
Bitcoin blocks and transactions data available for indexing.
Basic Configuration
Starting-point semantics by chain family:
- EVM, NEAR, Bitcoin: use
start_at: latestorstart_at: earliest. - Stellar: use
start_atwithlatest,earliest, or a specific ledger sequence (e.g.start_at: 60000000). See the Stellar Sources page forend_at,fetch_batch_size, andfetch_parallelism. - Solana: use
start_blockwith a specific slot number. Omitstart_blockto start from the latest slot. See the Solana Sources page forend_block,block_ranges,in_order, andbatch_sizeoptions.
Quick Examples
Source Naming
The reference name you give to a source is how you’ll refer to it in transforms and sinks:- Use descriptive, lowercase names with underscores or hyphens
- Avoid special characters except
_and- - Examples:
ethereum_blocks,filtered-transfers,enriched_data
Multiple Sources
You can define multiple sources in a single pipeline:Each source runs independently and can be processed at different rates. Use
SQL transforms to combine data from multiple sources.
Common Patterns
Single Chain Processing
Single Chain Processing
Process data from one blockchain:
Multi-Chain Processing
Multi-Chain Processing
Combine data from multiple chains:
Historical Data Processing
Historical Data Processing
Process all historical data from the beginning:
Cross-Chain Analytics
Cross-Chain Analytics
Combine EVM and Solana data:
Best Practices
Start with latest for new pipelines
Use
start_at: latest for new pipelines to avoid processing large amounts of historical data initially:Common Questions
When should I use 'start_at: earliest' vs 'latest'?
When should I use 'start_at: earliest' vs 'latest'?
Use
latest when you only need new data going forward - this is recommended for most use cases and avoids processing historical data. Use earliest for backfills or when you need complete historical data from genesis.Performance note: Starting from earliest on Ethereum mainnet means processing millions of historical blocks, which can take hours or days depending on your pipeline complexity. For testing, use latest or a recent block number.Can I process multiple chains in one pipeline?
Can I process multiple chains in one pipeline?
Yes! Each source runs independently and can be processed at different rates.
You can define multiple sources and use SQL transforms to combine them. See
the Multi-Chain Processing example in the Common Patterns section above.
What's the difference between start_at and start_block?
What's the difference between start_at and start_block?
These parameters control where your pipeline starts processing data, but they work differently for each chain family:EVM, NEAR, Bitcoin use
start_at:start_at: latest- Start from the current block (default for new pipelines)start_at: earliest- Start from genesis (full historical backfill)
start_at with an extra option:start_at: latestorstart_at: earlieststart_at: <ledger_sequence>- Start from a specific ledger (e.g.60000000)
start_block:start_block: <slot_number>- Start from a specific slot- Omit
start_blockentirely to start from the latest slot (equivalent tostart_at: lateston other chains)
How do I process only a specific block range?
How do I process only a specific block range?
For one-time historical processing of a specific range, use Job Mode. On EVM, set
start_at: earliest and put both the fast-scan filter and an upper block_number bound inside the filter: expression — end_block is not supported on EVM dataset sources and is silently ignored. On Solana, use start_block with end_block, or block_ranges for multiple ranges. On Stellar, pair start_at with end_at.What dataset version should I use?
What dataset version should I use?
Always use the latest stable version for best performance and newest features.
Check available versions with
goldsky dataset list, or browse them in the Datasource explorer.