> ## Documentation Index
> Fetch the complete documentation index at: https://docs.goldsky.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Turbo - Supported sources

> Ingest data from EVM chains, Solana, and more

Sources are the entry points for data in your Turbo pipelines. They define where your pipeline reads blockchain data from and how it should be consumed - whether that's Ethereum transfers, Solana transactions, or any other on-chain activity. Turbo pipelines use **dataset sources** to give you clean, versioned access to Goldsky's curated blockchain datasets with consistent schemas across chains. ***In contrast to Mirror, subgraphs as a source are not currently supported in Turbo***.

<Card title="EVM Sources" icon="ethereum" href="/turbo-pipelines/sources/evm">
  All EVM networks that are supported on Mirror are also supported in Turbo, with the same schemas.
</Card>

<Card title="Solana Sources" icon="circle-dollar-sign" href="/turbo-pipelines/sources/solana">
  Turbo-exclusive Solana source with data from genesis (Mirror version only has data from Summer 2024).
</Card>

<Card title="Stellar Sources" icon="sun" href="/turbo-pipelines/sources/stellar">
  Stellar data is also available on Mirror, but the "wide-row" format is best managed via Turbo.
</Card>

<Card title="NEAR Sources" icon="circle-nodes" href="/turbo-pipelines/sources/near">
  NEAR receipts, transactions, and execution outcomes for indexing.
</Card>

<Card title="Bitcoin Sources" icon="bitcoin" href="/turbo-pipelines/sources/bitcoin">
  Bitcoin blocks and transactions data available for indexing.
</Card>

### Basic Configuration

```yaml theme={null}
sources:
  my_source:
    type: dataset
    dataset_name: <chain>.<dataset_type>
    version: <version>
    start_at: latest | earliest  # EVM, NEAR, Bitcoin, Stellar
    # Stellar also accepts a ledger sequence number (e.g. start_at: 60000000)
    # OR, for Solana:
    start_block: <slot_number>   # omit to start at latest
```

<Note>
  **Starting-point semantics by chain family:**

  * **EVM, NEAR, Bitcoin**: use `start_at: latest` or `start_at: earliest`.
  * **Stellar**: use `start_at` with `latest`, `earliest`, or a specific ledger sequence (e.g. `start_at: 60000000`). See the [Stellar Sources](/turbo-pipelines/sources/stellar) page for `end_at`, `fetch_batch_size`, and `fetch_parallelism`.
  * **Solana**: use `start_block` with a specific slot number. Omit `start_block` to start from the latest slot. See the [Solana Sources](/turbo-pipelines/sources/solana) page for `end_block`, `block_ranges`, `in_order`, and `batch_size` options.
</Note>

### Quick Examples

<CodeGroup>
  ```yaml EVM (Polygon ERC-20) theme={null}
  sources:
    polygon_transfers:
      type: dataset
      dataset_name: matic.erc20_transfers
      version: 1.2.0
      start_at: latest
  ```

  ```yaml Solana (Blocks) theme={null}
  sources:
    solana_blocks:
      type: dataset
      dataset_name: solana.blocks
      version: 1.0.0
      start_block: 312229952
  ```
</CodeGroup>

## Source Naming

The reference name you give to a source is how you'll refer to it in transforms and sinks:

```yaml theme={null}
sources:
  my_custom_name: # This is the reference name
    type: dataset
    dataset_name: matic.erc20_transfers
    version: 1.2.0

transforms:
  filtered_data:
    type: sql
    sql: SELECT * FROM my_custom_name # Use the reference name here
```

**Naming Guidelines:**

* Use descriptive, lowercase names with underscores or hyphens
* Avoid special characters except `_` and `-`
* Examples: `ethereum_blocks`, `filtered-transfers`, `enriched_data`

## Multiple Sources

You can define multiple sources in a single pipeline:

```yaml theme={null}
sources:
  ethereum_blocks:
    type: dataset
    dataset_name: ethereum.blocks
    version: 1.0.0
    start_at: latest

  polygon_blocks:
    type: dataset
    dataset_name: matic.blocks
    version: 1.0.0
    start_at: latest

transforms:
  combined_blocks:
    type: sql
    sql: |
      SELECT *, 'ethereum' as chain FROM ethereum_blocks
      UNION ALL
      SELECT *, 'polygon' as chain FROM polygon_blocks
```

<Info>
  Each source runs independently and can be processed at different rates. Use
  SQL transforms to combine data from multiple sources.
</Info>

## Common Patterns

<AccordionGroup>
  <Accordion title="Single Chain Processing">
    Process data from one blockchain:

    ```yaml theme={null}
    sources:
      polygon_transfers:
        type: dataset
        dataset_name: matic.erc20_transfers
        version: 1.2.0
        start_at: latest
    ```
  </Accordion>

  <Accordion title="Multi-Chain Processing">
    Combine data from multiple chains:

    ```yaml theme={null}
    sources:
      eth_transfers:
        type: dataset
        dataset_name: ethereum.erc20_transfers
        version: 1.0.0
        start_at: latest

      polygon_transfers:
        type: dataset
        dataset_name: matic.erc20_transfers
        version: 1.2.0
        start_at: latest

    transforms:
      all_transfers:
        type: sql
        sql: |
          SELECT *, 'ethereum' as chain FROM eth_transfers
          UNION ALL
          SELECT *, 'polygon' as chain FROM polygon_transfers
    ```
  </Accordion>

  <Accordion title="Historical Data Processing">
    Process all historical data from the beginning:

    ```yaml theme={null}
    sources:
      ethereum_blocks:
        type: dataset
        dataset_name: ethereum.blocks
        version: 1.0.0
        start_at: earliest # Process from genesis
    ```
  </Accordion>

  <Accordion title="Cross-Chain Analytics">
    Combine EVM and Solana data:

    ```yaml theme={null}
    sources:
      eth_transfers:
        type: dataset
        dataset_name: ethereum.erc20_transfers
        version: 1.0.0
        start_at: latest

      solana_token_transfers:
        type: dataset
        dataset_name: solana.token_transfers
        version: 1.0.0
        start_block: 312000000

    # Process each chain's data separately, then combine in your application
    ```
  </Accordion>
</AccordionGroup>

## Best Practices

<Steps>
  <Step title="Start with latest for new pipelines">
    Use `start_at: latest` for new pipelines to avoid processing large amounts of historical data initially:

    ```yaml theme={null}
    start_at: latest  # Only process new data
    ```
  </Step>

  <Step title="Choose descriptive source names">
    Name sources clearly to indicate what they contain:

    ```yaml theme={null}
    sources:
      polygon_usdc_transfers: # Clear and descriptive
        type: dataset
        # ...
    ```
  </Step>

  <Step title="Check dataset versions">
    Use the latest stable version of datasets for best performance and features:

    ```yaml theme={null}
    version: 1.2.0  # Use latest stable version
    ```
  </Step>
</Steps>

## Common Questions

<AccordionGroup>
  <Accordion title="When should I use 'start_at: earliest' vs 'latest'?">
    Use `latest` when you only need new data going forward - this is recommended for most use cases and avoids processing historical data. Use `earliest` for backfills or when you need complete historical data from genesis.

    **Performance note:** Starting from `earliest` on Ethereum mainnet means processing millions of historical blocks, which can take hours or days depending on your pipeline complexity. For testing, use `latest` or a recent block number.
  </Accordion>

  <Accordion title="Can I process multiple chains in one pipeline?">
    Yes! Each source runs independently and can be processed at different rates.
    You can define multiple sources and use SQL transforms to combine them. See
    the Multi-Chain Processing example in the Common Patterns section above.
  </Accordion>

  <Accordion title="What's the difference between start_at and start_block?">
    These parameters control where your pipeline starts processing data, but they work differently for each chain family:

    **EVM, NEAR, Bitcoin** use `start_at`:

    * `start_at: latest` - Start from the current block (default for new pipelines)
    * `start_at: earliest` - Start from genesis (full historical backfill)

    **Stellar** uses `start_at` with an extra option:

    * `start_at: latest` or `start_at: earliest`
    * `start_at: <ledger_sequence>` - Start from a specific ledger (e.g. `60000000`)

    **Solana** uses `start_block`:

    * `start_block: <slot_number>` - Start from a specific slot
    * Omit `start_block` entirely to start from the latest slot (equivalent to `start_at: latest` on other chains)
  </Accordion>

  <Accordion title="How do I process only a specific block range?">
    For one-time historical processing of a specific range, use [Job Mode](/turbo-pipelines/job-mode). On EVM, set `start_at: earliest` and put both the [fast-scan](/turbo-pipelines/sources/evm#fast-scan) filter and an upper `block_number` bound inside the `filter:` expression — `end_block` is not supported on EVM dataset sources and is silently ignored. On Solana, use `start_block` with `end_block`, or `block_ranges` for multiple ranges. On Stellar, pair `start_at` with `end_at`.
  </Accordion>

  <Accordion title="What dataset version should I use?">
    Always use the latest stable version for best performance and newest features.
    Check available versions with `goldsky dataset list`, or browse them in the [Datasource explorer](https://app.goldsky.com/data-sources).
  </Accordion>
</AccordionGroup>
