Skip to main content
Sources are the entry points for data in your Turbo pipelines. They define where your pipeline reads blockchain data from and how it should be consumed - whether that’s Ethereum transfers, Solana transactions, or any other on-chain activity. Turbo pipelines use dataset sources to give you clean, versioned access to Goldsky’s curated blockchain datasets with consistent schemas across chains. In contrast to Mirror, subgraphs as a source are not currently supported in Turbo.

EVM Sources

All EVM networks that are supported on Mirror are also supported in Turbo, with the same schemas.

Solana Sources

Turbo-exclusive Solana source with data from genesis (Mirror version only has data from Summer 2024).

Stellar Sources

Stellar data is also available on Mirror, but the “wide-row” format is best managed via Turbo.

NEAR Sources

NEAR receipts, transactions, and execution outcomes for indexing.

Bitcoin Sources

Bitcoin blocks and transactions data available for indexing.

Basic Configuration

sources:
  my_source:
    type: dataset
    dataset_name: <chain>.<dataset_type>
    version: <version>
    start_at: latest | earliest  # EVM, NEAR, Bitcoin, Stellar
    # Stellar also accepts a ledger sequence number (e.g. start_at: 60000000)
    # OR, for Solana:
    start_block: <slot_number>   # omit to start at latest
Starting-point semantics by chain family:
  • EVM, NEAR, Bitcoin: use start_at: latest or start_at: earliest.
  • Stellar: use start_at with latest, earliest, or a specific ledger sequence (e.g. start_at: 60000000). See the Stellar Sources page for end_at, fetch_batch_size, and fetch_parallelism.
  • Solana: use start_block with a specific slot number. Omit start_block to start from the latest slot. See the Solana Sources page for end_block, block_ranges, in_order, and batch_size options.

Quick Examples

sources:
  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest

Source Naming

The reference name you give to a source is how you’ll refer to it in transforms and sinks:
sources:
  my_custom_name: # This is the reference name
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0

transforms:
  filtered_data:
    type: sql
    sql: SELECT * FROM my_custom_name # Use the reference name here
Naming Guidelines:
  • Use descriptive, lowercase names with underscores or hyphens
  • Avoid special characters except _ and -
  • Examples: ethereum_blocks, filtered-transfers, enriched_data

Multiple Sources

You can define multiple sources in a single pipeline:
sources:
  ethereum_blocks:
    type: dataset
    dataset_name: ethereum.blocks
    version: 1.0.0
    start_at: latest

  polygon_blocks:
    type: dataset
    dataset_name: matic.blocks
    version: 1.0.0
    start_at: latest

transforms:
  combined_blocks:
    type: sql
    sql: |
      SELECT *, 'ethereum' as chain FROM ethereum_blocks
      UNION ALL
      SELECT *, 'polygon' as chain FROM polygon_blocks
Each source runs independently and can be processed at different rates. Use SQL transforms to combine data from multiple sources.

Common Patterns

Process data from one blockchain:
sources:
  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest
Combine data from multiple chains:
sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest

  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest

transforms:
  all_transfers:
    type: sql
    sql: |
      SELECT *, 'ethereum' as chain FROM eth_transfers
      UNION ALL
      SELECT *, 'polygon' as chain FROM polygon_transfers
Process all historical data from the beginning:
sources:
  ethereum_blocks:
    type: dataset
    dataset_name: ethereum.blocks
    version: 1.0.0
    start_at: earliest # Process from genesis
Combine EVM and Solana data:
sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest

  solana_token_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_block: 312000000

# Process each chain's data separately, then combine in your application

Best Practices

1

Start with latest for new pipelines

Use start_at: latest for new pipelines to avoid processing large amounts of historical data initially:
start_at: latest  # Only process new data
2

Choose descriptive source names

Name sources clearly to indicate what they contain:
sources:
  polygon_usdc_transfers: # Clear and descriptive
    type: dataset
    # ...
3

Check dataset versions

Use the latest stable version of datasets for best performance and features:
version: 1.2.0  # Use latest stable version

Common Questions

Use latest when you only need new data going forward - this is recommended for most use cases and avoids processing historical data. Use earliest for backfills or when you need complete historical data from genesis.Performance note: Starting from earliest on Ethereum mainnet means processing millions of historical blocks, which can take hours or days depending on your pipeline complexity. For testing, use latest or a recent block number.
Yes! Each source runs independently and can be processed at different rates. You can define multiple sources and use SQL transforms to combine them. See the Multi-Chain Processing example in the Common Patterns section above.
These parameters control where your pipeline starts processing data, but they work differently for each chain family:EVM, NEAR, Bitcoin use start_at:
  • start_at: latest - Start from the current block (default for new pipelines)
  • start_at: earliest - Start from genesis (full historical backfill)
Stellar uses start_at with an extra option:
  • start_at: latest or start_at: earliest
  • start_at: <ledger_sequence> - Start from a specific ledger (e.g. 60000000)
Solana uses start_block:
  • start_block: <slot_number> - Start from a specific slot
  • Omit start_block entirely to start from the latest slot (equivalent to start_at: latest on other chains)
For one-time historical processing of a specific range, use Job Mode. On EVM, set start_at: earliest and put both the fast-scan filter and an upper block_number bound inside the filter: expression — end_block is not supported on EVM dataset sources and is silently ignored. On Solana, use start_block with end_block, or block_ranges for multiple ranges. On Stellar, pair start_at with end_at.
Always use the latest stable version for best performance and newest features. Check available versions with goldsky dataset list, or browse them in the Datasource explorer.