Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.goldsky.com/llms.txt

Use this file to discover all available pages before exploring further.

Sources are the entry points for data in your Turbo pipelines. They define where your pipeline reads blockchain data from and how it should be consumed - whether that’s Ethereum transfers, Solana transactions, or any other on-chain activity. Turbo pipelines use dataset sources to give you clean, versioned access to Goldsky’s curated blockchain datasets with consistent schemas across chains. In contrast to Mirror, subgraphs as a source are not currently supported in Turbo.

EVM Sources

All EVM networks that are supported on Mirror are also supported in Turbo, with the same schemas.

Solana Sources

Turbo-exclusive Solana source with data from genesis (Mirror version only has data from Summer 2024).

Stellar Sources

Stellar data is also available on Mirror, but the “wide-row” format is best managed via Turbo.

NEAR Sources

NEAR receipts, transactions, and execution outcomes for indexing.

Bitcoin Sources

Bitcoin blocks and transactions data available for indexing.

Basic Configuration

sources:
  my_source:
    type: dataset
    dataset_name: <chain>.<dataset_type>
    version: <version>
    start_at: latest | earliest  # EVM, NEAR, Bitcoin, Stellar
    # Stellar also accepts a ledger sequence number (e.g. start_at: 60000000)
    # OR, for Solana:
    start_block: <slot_number>   # omit to start at latest
Starting-point semantics by chain family:
  • EVM, NEAR, Bitcoin: use start_at: latest or start_at: earliest.
  • Stellar: use start_at with latest, earliest, or a specific ledger sequence (e.g. start_at: 60000000). See the Stellar Sources page for end_at, fetch_batch_size, and fetch_parallelism.
  • Solana: use start_block with a specific slot number. Omit start_block to start from the latest slot. See the Solana Sources page for end_block, block_ranges, in_order, and batch_size options.

Quick Examples

sources:
  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest

Source Naming

The reference name you give to a source is how you’ll refer to it in transforms and sinks:
sources:
  my_custom_name: # This is the reference name
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0

transforms:
  filtered_data:
    type: sql
    sql: SELECT * FROM my_custom_name # Use the reference name here
Naming Guidelines:
  • Use descriptive, lowercase names with underscores or hyphens
  • Avoid special characters except _ and -
  • Examples: ethereum_blocks, filtered-transfers, enriched_data

Multiple Sources

You can define multiple sources in a single pipeline:
sources:
  ethereum_blocks:
    type: dataset
    dataset_name: ethereum.blocks
    version: 1.0.0
    start_at: latest

  polygon_blocks:
    type: dataset
    dataset_name: matic.blocks
    version: 1.0.0
    start_at: latest

transforms:
  combined_blocks:
    type: sql
    sql: |
      SELECT *, 'ethereum' as chain FROM ethereum_blocks
      UNION ALL
      SELECT *, 'polygon' as chain FROM polygon_blocks
Each source runs independently and can be processed at different rates. Use SQL transforms to combine data from multiple sources.

Common Patterns

Process data from one blockchain:
sources:
  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest
Combine data from multiple chains:
sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest

  polygon_transfers:
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0
    start_at: latest

transforms:
  all_transfers:
    type: sql
    sql: |
      SELECT *, 'ethereum' as chain FROM eth_transfers
      UNION ALL
      SELECT *, 'polygon' as chain FROM polygon_transfers
Process all historical data from the beginning:
sources:
  ethereum_blocks:
    type: dataset
    dataset_name: ethereum.blocks
    version: 1.0.0
    start_at: earliest # Process from genesis
Combine EVM and Solana data:
sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest

  solana_token_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_block: 312000000

# Process each chain's data separately, then combine in your application

Best Practices

1

Start with latest for new pipelines

Use start_at: latest for new pipelines to avoid processing large amounts of historical data initially:
start_at: latest  # Only process new data
2

Choose descriptive source names

Name sources clearly to indicate what they contain:
sources:
  polygon_usdc_transfers: # Clear and descriptive
    type: dataset
    # ...
3

Check dataset versions

Use the latest stable version of datasets for best performance and features:
version: 1.2.0  # Use latest stable version

Common Questions

Use latest when you only need new data going forward - this is recommended for most use cases and avoids processing historical data. Use earliest for backfills or when you need complete historical data from genesis.Performance note: Starting from earliest on Ethereum mainnet means processing millions of historical blocks, which can take hours or days depending on your pipeline complexity. For testing, use latest or a recent block number.
Yes! Each source runs independently and can be processed at different rates. You can define multiple sources and use SQL transforms to combine them. See the Multi-Chain Processing example in the Common Patterns section above.
These parameters control where your pipeline starts processing data, but they work differently for each chain family:EVM, NEAR, Bitcoin use start_at:
  • start_at: latest - Start from the current block (default for new pipelines)
  • start_at: earliest - Start from genesis (full historical backfill)
Stellar uses start_at with an extra option:
  • start_at: latest or start_at: earliest
  • start_at: <ledger_sequence> - Start from a specific ledger (e.g. 60000000)
Solana uses start_block:
  • start_block: <slot_number> - Start from a specific slot
  • Omit start_block entirely to start from the latest slot (equivalent to start_at: latest on other chains)
For one-time historical processing of a specific range, use Job Mode. On EVM, set start_at: earliest and put both the fast-scan filter and an upper block_number bound inside the filter: expression — end_block is not supported on EVM dataset sources and is silently ignored. On Solana, use start_block with end_block, or block_ranges for multiple ranges. On Stellar, pair start_at with end_at.
Always use the latest stable version for best performance and newest features. Check available versions with goldsky dataset list, or browse them in the Datasource explorer.