Skip to main content
Sources are the entry points for data in your Turbo pipelines. They define where your pipeline reads blockchain data from and how it should be consumed - whether that’s Ethereum transfers, Solana transactions, or any other on-chain activity. Turbo pipelines use dataset sources to give you clean, versioned access to Goldsky’s curated blockchain datasets with consistent schemas across chains. In contrast to Mirror, subgraphs as a source are not currently supported in Turbo.

EVM Sources

All EVM networks that are supported on Mirror are also supported in Turbo, with the same schemas.

Solana Sources

Turbo-exclusive Solana source with data from genesis (Mirror version only has data from Summer 2024).

Stellar Sources

Stellar data is also available on Mirror, but the “wide-row” format is best managed via Turbo.

Basic Configuration

sources:
  my_source:
    type: dataset
    dataset_name: <chain>.<dataset_type>
    version: <version>
    start_at: latest | earliest # EVM chains
    # OR
    start_block: <slot_number> # Solana, omit to start at latest

Quick Examples

sources:
  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest

Source Naming

The reference name you give to a source is how you’ll refer to it in transforms and sinks:
sources:
  my_custom_name: # This is the reference name
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0

transforms:
  filtered_data:
    type: sql
    sql: SELECT * FROM my_custom_name # Use the reference name here
Naming Guidelines:
  • Use descriptive, lowercase names with underscores or hyphens
  • Avoid special characters except _ and -
  • Examples: ethereum_blocks, filtered-transfers, enriched_data

Multiple Sources

You can define multiple sources in a single pipeline:
sources:
  ethereum_blocks:
    type: dataset
    dataset_name: ethereum.blocks
    version: 1.0.0
    start_at: latest

  polygon_blocks:
    type: dataset
    dataset_name: matic.blocks
    version: 1.0.0
    start_at: latest

transforms:
  combined_blocks:
    type: sql
    sql: |
      SELECT *, 'ethereum' as chain FROM ethereum_blocks
      UNION ALL
      SELECT *, 'polygon' as chain FROM polygon_blocks
Each source runs independently and can be processed at different rates. Use SQL transforms to combine data from multiple sources.

Common Patterns

Process data from one blockchain:
sources:
  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest
Combine data from multiple chains:
sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest

  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest

transforms:
  all_transfers:
    type: sql
    sql: |
      SELECT *, 'ethereum' as chain FROM eth_transfers
      UNION ALL
      SELECT *, 'polygon' as chain FROM polygon_transfers
Process all historical data from the beginning: yaml sources: ethereum_blocks: type: dataset dataset_name: ethereum.blocks version: 1.0.0 start_at: earliest # Process from genesis
Combine EVM and Solana data:
sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest

  solana_token_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_block: 312000000

# Process each chain's data separately, then combine in your application

Best Practices

1

Start with latest for new pipelines

Use start_at: latest for new pipelines to avoid processing large amounts of historical data initially:
start_at: latest  # Only process new data
2

Choose descriptive source names

Name sources clearly to indicate what they contain: yaml sources: polygon_usdc_transfers: # Clear and descriptive type: dataset # ...
3

Check dataset versions

Use the latest stable version of datasets for best performance and features:
version: 1.2.0  # Use latest stable version

Common Questions

Use latest when you only need new data going forward - this is recommended for most use cases and avoids processing historical data. Use earliest for backfills or when you need complete historical data from genesis.Performance note: Starting from earliest on Ethereum mainnet means processing millions of historical blocks, which can take hours or days depending on your pipeline complexity. For testing, use latest or a recent block number.
Yes! Each source runs independently and can be processed at different rates. You can define multiple sources and use SQL transforms to combine them. See the Multi-Chain Processing example in the Common Patterns section above.
start_at is used for EVM chains and accepts latest or earlieststart_block is used for Solana and requires a specific slot number. This difference reflects how each blockchain handles block numbering. Omit start_block if you want to start from the latest block on Solana.
For one-time historical processing of a specific range, use Job Mode with a start and end block. For continuous processing starting from a specific point, use start_at: earliest on EVM or specify a start_block on Solana.
Always use the latest stable version for best performance and newest features. Check available versions with goldsky datasets list.