Turbo

Sources are the entry points for data in your Turbo pipelines. They define where your pipeline reads blockchain data from and how it should be consumed - whether that’s Ethereum transfers, Solana transactions, or any other on-chain activity. Turbo pipelines use dataset sources to give you clean, versioned access to Goldsky’s curated blockchain datasets with consistent schemas across chains. In contrast to Mirror, subgraphs as a source are not currently supported in Turbo.

EVM Sources

All EVM networks that are supported on Mirror are also supported in Turbo, with the same schemas.

Solana Sources

Turbo-exclusive Solana source with data from genesis (Mirror version only has data from Summer 2024).

Stellar Sources

Stellar data is also available on Mirror, but the “wide-row” format is best managed via Turbo.

Bitcoin Sources

Bitcoin blocks and transactions data available for indexing.

Basic Configuration

sources:
  my_source:
    type: dataset
    dataset_name: <chain>.<dataset_type>
    version: <version>
    start_at: latest | earliest # EVM chains only
    # OR
    start_block: <slot_number> # Solana only, omit to start at latest

EVM vs Solana configuration differences:

EVM chains use start_at with values latest or earliest
Solana uses start_block with a specific slot number. Omit start_block to start from the latest slot
in_order mode is only available for Solana sources
Batch settings are not available for Solana sources

Quick Examples

sources:
  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest

Source Naming

The reference name you give to a source is how you’ll refer to it in transforms and sinks:

sources:
  my_custom_name: # This is the reference name
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0

transforms:
  filtered_data:
    type: sql
    sql: SELECT * FROM my_custom_name # Use the reference name here

Naming Guidelines:

Use descriptive, lowercase names with underscores or hyphens
Avoid special characters except _ and -
Examples: ethereum_blocks, filtered-transfers, enriched_data

Multiple Sources

You can define multiple sources in a single pipeline:

sources:
  ethereum_blocks:
    type: dataset
    dataset_name: ethereum.blocks
    version: 1.0.0
    start_at: latest

  polygon_blocks:
    type: dataset
    dataset_name: matic.blocks
    version: 1.0.0
    start_at: latest

transforms:
  combined_blocks:
    type: sql
    sql: |
      SELECT *, 'ethereum' as chain FROM ethereum_blocks
      UNION ALL
      SELECT *, 'polygon' as chain FROM polygon_blocks

Each source runs independently and can be processed at different rates. Use SQL transforms to combine data from multiple sources.

Common Patterns

Single Chain Processing

Process data from one blockchain:

sources:
  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest

Multi-Chain Processing

Combine data from multiple chains:

sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest

  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest

transforms:
  all_transfers:
    type: sql
    sql: |
      SELECT *, 'ethereum' as chain FROM eth_transfers
      UNION ALL
      SELECT *, 'polygon' as chain FROM polygon_transfers

Historical Data Processing

Process all historical data from the beginning:

yaml sources:         ethereum_blocks: type: dataset dataset_name: ethereum.blocks version: 1.0.0         start_at: earliest # Process from genesis

Cross-Chain Analytics

Combine EVM and Solana data:

sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest

  solana_token_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_block: 312000000

# Process each chain's data separately, then combine in your application

Best Practices

Start with latest for new pipelines

Use start_at: latest for new pipelines to avoid processing large amounts of historical data initially:

start_at: latest  # Only process new data

Choose descriptive source names

Name sources clearly to indicate what they contain: yaml sources: polygon_usdc_transfers: # Clear and descriptive type: dataset # ...

Check dataset versions

Use the latest stable version of datasets for best performance and features:

version: 1.2.0  # Use latest stable version

Common Questions

When should I use 'start_at: earliest' vs 'latest'?

Use latest when you only need new data going forward - this is recommended for most use cases and avoids processing historical data. Use earliest for backfills or when you need complete historical data from genesis.Performance note: Starting from earliest on Ethereum mainnet means processing millions of historical blocks, which can take hours or days depending on your pipeline complexity. For testing, use latest or a recent block number.

Can I process multiple chains in one pipeline?

Yes! Each source runs independently and can be processed at different rates. You can define multiple sources and use SQL transforms to combine them. See the Multi-Chain Processing example in the Common Patterns section above.

What's the difference between start_at and start_block?

These parameters control where your pipeline starts processing data, but they work differently for each chain type:EVM chains use start_at:

start_at: latest - Start from the current block (default for new pipelines)
start_at: earliest - Start from genesis (full historical backfill)

Solana uses start_block:

start_block: <slot_number> - Start from a specific slot
Omit start_block entirely to start from the latest slot (equivalent to EVM’s start_at: latest)

Note: Solana does not support start_at: latest syntax. To start from the latest slot on Solana, simply omit the start_block parameter.

How do I process only a specific block range?

For one-time historical processing of a specific range, use Job Mode with a start and end block. For continuous processing starting from a specific point, use start_at: earliest on EVM or specify a start_block on Solana.

What dataset version should I use?

Always use the latest stable version for best performance and newest features. Check available versions with goldsky datasets list.

Reference

EVM Sources

Solana Sources

Stellar Sources

Bitcoin Sources

​Basic Configuration

​Quick Examples

​Source Naming

​Multiple Sources

​Common Patterns

​Best Practices

​Common Questions

Basic Configuration

Quick Examples

Source Naming

Multiple Sources

Common Patterns

Best Practices

Common Questions