This guide is part of a series of tutorials on how you can export transfer data of different ERC standards into your datawarehouse. Here we will be focusing on ERC-721 Transfers, visit the following two other guides for other types of Transfers:

In other guides we focus on the ELT use-case; here we will cover more advanced topics in transforming our decoded datasets and some of the functionality of the Flink SQL backend.

What you’ll need

  1. A basic understanding of the Mirror product’s more basic ETL use case.
  2. A basic understanding of SQL, though we use the syntax and functionality of Flink v1.17.
  3. A destination sink to write your data to.

Preface

In this example we’ll provide a YAML file that includes the transforms needed to sync token transfers to your database.

We need to differentate ERC-20 token transfers from ERC-721 (NFT) transfers since they have the same event signature in decoded data: Transfer(address,address,uint256)

In this example, we’ve included a number of columns that you may or may not need, the main columns needed for most purposes are: id, address (if you are syncing multiple contract addresses), sender, recipient, token_id, and value.

Pipeline YAML

There is one transform in this configuration and we’ll explain how it works. If you copy and use this configuration file, make sure to update:

  1. Your secretName. If you already created a secret, you can find it via the CLI command goldsky secret list.
  2. The schema and table you want the data written to, by default it writes to mirror.transfers.
  3. Optionally, the contract address or addresses you want to sync.
sources:
  - referenceName: ethereum.decoded_logs
    version: 1.0.0
    type: dataset
    startAt: earliest
    description: Decoded logs for events emitted from contracts. Contains the
      decoded event signature and event parameters, contract address, data,
      topics, and metadata for the block and transaction.
transforms:
  - type: sql
    referenceName: ethereum_20_transfers
    primaryKey: id
    description: ERC20 Transfers
    sql: >-
      SELECT
              lower(address) AS token_id,
              lower(event_params[1]) AS sender,
              lower(event_params[2]) AS recipient,
              lower(event_params[3]) AS `value`,
              raw_log.block_number       AS block_number,
              raw_log.block_hash         AS block_hash,
              raw_log.log_index          AS log_index,
              raw_log.transaction_hash   AS transaction_hash,
              raw_log.transaction_index  AS transaction_index,
              id
              FROM ethereum.decoded_logs WHERE raw_log.topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'
              AND SPLIT_INDEX(raw_log.topics, ',', 3) IS NULL
sinks:
  - type: postgres
    table: erc20_transfers
    schema: mirror
    secretName: <YOUR SECRET>
    description: Postgres sink for ERC20 transfers
    referenceName: ethereum_20_sink
    sourceStreamName: ethereum_20_transfers

ERC-20 Transfers

The main thing we need to do here is to make sure we’re pulling the sender and recipient, and value from the event_params array, and only getting ERC-20 transfers rather than other transfers that may share the same event signature. This isn’t usually a problem when filtering for a specific contract address, but can become one when looking at all contract addresses, or contracts that may implement multiple types of tokens.

If you would like to filter for specific contract addresses simply add a filter to the existing WHERE clause like: WHERE lower(address) IN ('0xBC4CA0EdA7647A8aB7C2061c2E118A18a936f13D', '0xdac17f958d2ee523a2206206994597c13d831ec7')

We’ll start at the top.

Token ID

SELECT lower(address) AS token_id,

We use the lower function here to lower-case the address to make using this data simpler downstream, we also rename the column to token_id to make it more explicit.

Sender

lower(event_params[1]) AS sender,

Here we continue to lower-case values for consistency. In this case we’re using the first element of the event_params array (using a 1-based index), and renaming it to sender. Each event parameter maps to an argument to the event_signature, for ERC-20, they are the sender and recipient respectively;

Recipient

lower(event_params[2]) AS recipient,

Like the previous column, we’re pulling the second element in the event_params array and renaming it to recipient.

Token Value

lower(event_params[3]) AS `value`

We’re pulling the third element in the event_params array and renaming it to value to represent the amount of the token_id sent in the transfer.

Some columns are surrounded by backticks, this is because they are reserved words in Flink SQL. Common columns that need backticks are: data, output, value, and a full list can be found here.

Block Metadata

raw_log.block_number       AS block_number,
raw_log.block_hash         AS block_hash,
raw_log.log_index          AS log_index,
raw_log.transaction_hash   AS transaction_hash,
raw_log.transaction_index  AS transaction_index,

These columns aren’t necessarily needed for this example, but they’re included so you’re aware of them. A complete list of block metadata columns is available here. They are often useful for looking up the transfers in another tool, such as a block explorer like Etherscan.

ID Primary Key

id

This is the Goldsky provided id, it is a string composed of the dataset name, block hash, and log index, which is unique per event, here’s an example: decoded_log_0x60eaf5a2ab37c73cf1f3bbd32fc17f2709953192b530d75aadc521111f476d6c_18.

You may can save some space when storing the ID by using md5(id) as id in your transform. One reason you may want to keep the existing id format is that it makes it easier to order events in the same block without also syncing block hash and log index.

Topic Filter and Length Check

AND raw_log.topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'
AND SPLIT_INDEX(raw_log.topics, ',', 3) IS NULL

As mentioned in the preface, ERC-20 transfers share the same event_signature as ERC-721 transfers. What differentiates ERC-20 transfers from ERC-721 transfers are the number of topics associated with the event. ERC-721 transfers have four topics, and ERC-20 transfers have three. If you want to get into the nitty gritty you may enjoy the Solidity developer documentation for events, but for now know that in Mirror, raw_log.topics is a comma separated string. Each value in the string is a hash. The first is the hash of the event_signature and event_params, in our case Transfer(address,address,uint256) for ERC-20, which is hashed to 0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef as seen in our WHERE clause.

We use LIKE to only consider the first signature, with a % at the end, which acts as a wildcard.

We could also use a filter such as event_signature = 'Transfer(address,address,uint256)', but we wanted to introduce the idea of topics as they can be a useful filter for some older contracts that may not completely follow the specification for ERC-20 contracts.

SPLIT_INDEX splits the first argument by the second argument, and then extracts the 0-indexed argument, in this case 3 which would be the fourth element. Here’s an example topic string to consider:

0x0d0707963952f2fba59dd06f2b425ace40b492fe,0x2d4e6c4023d5b21f32aab665a4439496ba73b0c4,34220000000

We check that the fourth element after splitting is NULL to make sure this is an ERC-20 transfer. An NFT transfer would have four elements when the topics are split, so SPLIT_INDEX would return NOT NULL.

Deploying the pipeline

To deploy this pipeline and start sinking ERC-20 transfer data into your database simply execute:

goldsky pipeline create <pipeline_name> --definition-path <yaml_file>

Can't find what you're looking for? Reach out to us at support@goldsky.com or send us a DM on Twitter at @goldskyio.