This guide explains how to use Raw Logs Direct Indexing sources to create a Mirror pipeline that allows you to stream all native transactions for a chain into your database. In the example below, we will focus on ETH transfers on the Ethereum network but the same logic applies to any EVM-compatible chain which has this source available.

This guide is part of a series of tutorials on how you can export transfer data into your datawarehouse. Here we will be focusing on ERC-20 Transfers, visit the following guides for other types of Transfers:

What you’ll need

  1. A basic understanding of the Mirror product’s more basic ETL use case.
  2. A basic understanding of SQL, though we use the syntax and functionality of Flink v1.17.
  3. A destination sink to write your data to.

Preface

When it comes to identifying native transfers on a chain it’s important to highlight that there are two types of accounts that can interact with transactions:

  • Externally Owned Accounts (EOA): controlled by an actual user.
  • Contract Accounts: controlled by code.

Currently, transactions in a block can only be initiated by a EOAs (this is something that could change in the future with the introduction of Account Abstraction). For instance, take block 16240000; you will see all transactions initiated belong to EOAs.

A transaction initiated by an EOA can send value to another EOA as in this transaction. Alternatively, this EOA can call a smart contract’s method and optionally send value with it as in this transaction.

Smart contracts can then call other smart contracts. They can alternatively send value directly to another EOA. These internal transactions initiated by smart contracts can optionally send native value along so it is important to consider them. In most chain explorers you can identify these internal transactions and their corresponding value transfers accessing Advanced view mode.

All of these types of transactions (EOA initiated & internal transactions) are available in our Raw Logs dataset so we will use it as the source for our Mirror pipeline. You can see its data schema here.

Pipeline YAML

There is one transform in this configuration and we’ll explain how it works. If you copy and use this configuration file, make sure to update:

  1. Your secret_name (v2: secretName). If you already created a secret, you can find it via the CLI command goldsky secret list.
  2. The schema and table you want the data written to, by default it writes to public.eth_transfers.
name: native-transfers
apiVersion: 3
sources:
  my_ethereum_raw_traces:
    dataset_name: ethereum.raw_traces
    version: 1.1.0
    type: dataset
    start_at: earliest
transforms:
  ethereum_eth_transfers_transform:
    primary_key: id
    description: ETH Transfers transform
    sql: >
      SELECT
        id,
        block_number,
        block_hash,
        block_timestamp,
        transaction_hash,
        transaction_index,
        from_address,
        to_address,
        CASE
            WHEN trace_address <> '' THEN 'Internal TX'
            ELSE 'EOA TX'
        END AS tx_type,
        CASE
            WHEN block_number <= 16239999 THEN COALESCE(TRY_CAST(`value` AS DECIMAL(100)) / 1e9, 0)
            ELSE COALESCE(TRY_CAST(`value` AS DECIMAL(100)), 0)
        END AS `value`,
        call_type,
        trace_address,
        status
      FROM
        my_ethereum_raw_traces
      WHERE
        call_type <> 'delegatecall' and `value` > 0 and status = 1;
sinks:
  postgres_ethereum.eth_transfers:
    type: postgres
    table: eth_transfers
    schema: public
    secret_name: <YOUR_SECRET>
    description: "Postgres sink for ethereum.eth_transfers"
    from: ethereum_eth_transfers_transform

Native Transfers Transform

We’ll start at the top.

Traces context columns

SELECT
    id,
    block_number,
    block_hash,
    block_timestamp,
    transaction_hash,
    transaction_index,
    from_address,
    to_address,

These are optional columns from this dataset which we include to give us some context around the actual transfer.

Transaction Type

CASE
    WHEN trace_address <> '' THEN 'Internal TX'
    ELSE 'EOA TX'
END AS tx_type,

Here we look into trace_address column to identify whether this an initial EOA transaction or an internal one. This is also optional to include.

Token Value

CASE
    WHEN block_number <= 16239999 THEN COALESCE(TRY_CAST(`value` AS DECIMAL(100)) / 1e9, 0)
    ELSE COALESCE(TRY_CAST(`value` AS DECIMAL(100)), 0)
END AS `value`,

Due to the nature of the dataset, we need to make this conversion as values before 16239999 block were wrongly multiplied by 1e9

Some columns are surrounded by backticks, this is because they are reserved words in Flink SQL. Common columns that need backticks are: data, output, value, and a full list can be found here.

Filter

call_type,
trace_address,
status

We include these values in the SELECT statement as we will making use of them in the filter explained below:

WHERE
    call_type <> 'delegatecall' and `value` > 0 and status = 1;

Here we filter based on:

  • call_type <> 'delegatecall': delegatecall is a type of function call where the called contract’s code is executed with the state of the calling contract, including storage and balance. In some cases, it can mistakenly carry over the value transfer of the original calling contract which would compromise our data quality due to value transfer duplications. As a result, we can safely leave them out of our resulting dataset as delegatecalls can never send value with them.
  • value > 0: we want to make sure we track transactions with actual native value.
  • status = 1: the Raw Traces dataset can contain traces which got reverted. With this filter, we make sure to consider only successful transactions.

Deploying the pipeline

To deploy this pipeline and start sinking ERC-20 transfer data into your database simply execute:

goldsky pipeline create <pipeline_name> --definition-path <yaml_file>

Can't find what you're looking for? Reach out to us at support@goldsky.com for help.