description: 'Postgres sink for: sui_transactions'
from: subset_transform
```
* Add your corresponding secret name and run `goldsky pipeline apply sui-transactions.yaml --status ACTIVE` to deploy the pipeline.
## Getting support
# Supported networks
Source: https://docs.goldsky.com/chains/supported-networks
## Subgraphs
Goldsky currently supports the following chains on Subgraphs.
## Mirror
Goldsky currently supports the following chains on Mirror.
# Indexing Swellchain with Goldsky
Source: https://docs.goldsky.com/chains/swellchain
## Overview
Goldsky is a high-performance data indexing provider for Swellchain that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with Swellchain to make our product available to the ecosystem and provide dedicated support for Swellchain data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ----------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 100% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | Select developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
Swellchain subgraphs can be deployed on Goldsky in 2 ways:
Both Swellchain mainnet and testnet are available at the chain slugs `swell` and `swell-testnet` respectively.
### Mirror
## Getting support
# Indexing TAC with Goldsky
Source: https://docs.goldsky.com/chains/tac
## Overview
Goldsky is a high-performance data indexing provider for TAC that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with TAC to make our product available to the ecosystem and provide dedicated support for TAC data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ----------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 100% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | Select developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
TAC subgraphs can be deployed on Goldsky in 2 ways:
TAC Turin Testnet is currently supported at the chain slug `tac-turin`.
### Mirror
## Getting support
# Indexing Taiko with Goldsky
Source: https://docs.goldsky.com/chains/taiko
## Overview
Goldsky is a high-performance data indexing provider for Taiko that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with Taiko to make our product available to the ecosystem and provide dedicated support for Taiko data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ----------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 100% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | Select developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
Taiko subgraphs can be deployed on Goldsky in 2 ways:
Both Taiko's latest testnet (Hekla) and mainnet are currently supported at the chain slug `taiko-hekla-testnet` and `taiko` respectively.
### Mirror
Support for Goldsky Mirror for Taiko is currently in progress. If you'd like to be notified when support is launched publicly, contact us at [sales@goldsky.com](mailto:sales@goldsky.com).
## Getting support
# Indexing Telos with Goldsky
Source: https://docs.goldsky.com/chains/telos
## Overview
Goldsky is a high-performance data indexing provider for Telos that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with Telos to make our product available to the ecosystem and provide dedicated support for Telos data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ---------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 10% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | All developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
Telos subgraphs can be deployed on Goldsky in 2 ways:
Both Telos mainnet and testnet are available at the chain slugs `telos` and `telos-testnet` respectively.
### Mirror
Support for Goldsky Mirror for Telos is currently in progress. If you'd like to be notified when support is launched publicly, contact us at [sales@goldsky.com](mailto:sales@goldsky.com).
## Getting support
# Indexing Treasure with Goldsky
Source: https://docs.goldsky.com/chains/treasure
## Overview
Goldsky is a high-performance data indexing provider for Treasure that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with Treasure to make our product available to the ecosystem and provide dedicated support for Treasure data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ---------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 10% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | Select developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
Treasure Subgraphs can be deployed on Goldsky in 2 ways:
Treasure Mainnet and Topaz Testnet are currently supported at the chain slugs `treasure` and `treasure-topaz` respectively.
### Mirror
## Getting support
# Indexing Unichain with Goldsky
Source: https://docs.goldsky.com/chains/unichain
## Overview
Goldsky is a high-performance data indexing provider for Unichain that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with Unichain to make our product available to the ecosystem and provide dedicated support for Unichain data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ---------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 10% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | Select developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
Unichain subgraphs can be deployed on Goldsky in 2 ways:
Unichain's Mainnet and Sepolia Testnet are currently supported at the chain slugs `unichain` and `unichain-sepolia` respectively.
### Mirror
## Getting support
# Indexing Viction with Goldsky
Source: https://docs.goldsky.com/chains/viction
## Overview
Goldsky is a high-performance data indexing provider for Viction that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with Viction to make our product available to the ecosystem and provide dedicated support for Viction data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ----------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 100% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | Select developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
Viction subgraphs can be deployed on Goldsky in 2 ways:
Viction Mainnet and Testnet are currently supported at the chain slugs `viction` and `viction-testnet` respectively.
### Mirror
## Getting support
# Indexing World Chain with Goldsky
Source: https://docs.goldsky.com/chains/worldchain
## Overview
Goldsky is a high-performance data indexing provider for World Chain that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with World Chain to make our product available to the ecosystem and provide dedicated support for World Chain data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ---------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 10% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | Select developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
World Chain subgraphs can be deployed on Goldsky in 2 ways:
Both World Chain Mainnet and Sepolia Testnet are currently supported at the chain slugs `worldchain` and `worldchain-sepolia` respectively.
### Mirror
## Getting support
# Indexing Xai with Goldsky
Source: https://docs.goldsky.com/chains/xai
## Overview
Goldsky is a high-performance data indexing provider for Xai that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with Xai to make our product available to the ecosystem and provide dedicated support for Xai data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ----------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 100% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | Select developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
Xai subgraphs can be deployed on Goldsky in 2 ways:
Both Xai mainnet and testnet are available at the chain slugs `xai` and `xai-testnet` respectively.
### Mirror
## Getting support
# Indexing ZERΟ΄ with Goldsky
Source: https://docs.goldsky.com/chains/zero
## Overview
Goldsky is a high-performance data indexing provider for ZERΟ΄ that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with ZERΟ΄ to make our product available to the ecosystem and provide dedicated support for ZERΟ΄ data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ----------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 100% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | Select developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
ZERΟ΄ subgraphs can be deployed on Goldsky in 2 ways:
ZERΟ΄ Mainnet and Testnet are currently supported at the chain slugs `zero` and `zero-sepolia` respectively.
### Mirror
## Getting support
# Indexing ZetaChain with Goldsky
Source: https://docs.goldsky.com/chains/zetachain
## Overview
Goldsky is a high-performance data indexing provider for ZetaChain that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with ZetaChain to make our product available to the ecosystem and provide dedicated support for ZetaChain data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ----------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 100% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | Select developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
ZetaChain subgraphs can be deployed on Goldsky in 2 ways:
Both ZetaChain mainnet and testnet are available at the chain slugs `zetachain-mainnet` and `zetachain-testnet` respectively.
### Mirror
Support for Goldsky Mirror for ZetaChain is currently in progress. If you'd like to be notified when support is launched publicly, contact us at [sales@goldsky.com](mailto:sales@goldsky.com).
## Getting support
# Indexing zkSync Era with Goldsky
Source: https://docs.goldsky.com/chains/zksync
## Overview
Goldsky is a high-performance data indexing provider for zkSync Era that makes it easy to extract, transform, and load on-chain data to power both application and analytics use cases. Goldsky offers two primary approaches to indexing and accessing blockchain data: [Subgraphs](/subgraphs) (high-performance subgraphs) and [Mirror](/mirror) (real-time data replication pipelines).
### Scope
Goldsky has partnered with zkSync Era to make our product available to the ecosystem and provide dedicated support for zkSync Era data indexing. The full scope of our partnership (which products are enabled, what partner-exclusive benefit is available, and who this benefit is available to) is outlined below.
| | Subgraphs | Mirror |
| ----------------------------- | ----------------------------------------------------- | --------------------------------------------------- |
| Enablement | Yes | Yes |
| Benefit | 100% discount on Subgraph workers and entities stored | 10% discount on Pipeline workers and events written |
| Availability | Select developers | All developers |
## Getting started
To use Goldsky, you'll need to create an account, install the CLI, and log in.
### Subgraphs
zkSync Era subgraphs can be deployed on Goldsky in 2 ways:
Both zkSync Era mainnet and testnet are available at the chain slugs `zksync-era` and `zksync-era-testnet` respectively.
### Mirror
Subgraphs indexed on zkSync Era by Goldsky can be "mirrored" into another database, as flat files, or as a Kafka topic. To learn more about mirroring subgraph data into your own infrastructure, visit the [dedicated page on subgraph-based Mirror pipelines](/mirror/sources/subgraphs).
## Getting support
# Indexing Zora with Goldsky
Source: https://docs.goldsky.com/chains/zora
Coming soon. If you're running into issues building on Zora, please contact [support@goldsky.com](mailto:support@goldsky.com) and we'd be happy to help
# Frequently asked questions
Source: https://docs.goldsky.com/faq
Collection of frequently (and not-so-frequently) asked questions.
## Subgraphs
Endpoints are by default publicly accessible but you can make your endpoints private so that it's only accessible by
authenticated users, see [private endpoints](./subgraphs/graphql-endpoints).
Regardless of the access type, endpoints are typically rate-limited preventing abuse, and are not publicly indexed
or searchable.
As a best practice, you may want to proxy your requests to prevent leaking your endpoint URL from your
front-end.
No! If Goldsky has already indexed that subgraph (unique subgraphs identified by their IPFS hash), it will sync instantly, though you will be provided your own endpoint with your own rate limits applied. Query away.
By default, the Scale plan is restricted to 50 requests every 10 seconds. However, our Enterprise plans scale horizontally and our highest-use endpoints are seamlessly handling thousands of requests a second at peak. If you need a higher rate limit than what you have enabled on your account, please contact us!
Not at the moment, though similar functionality for βlive queriesβ can be accomplished by polling our querying endpoints. We also do support webhooks, which can be similarly useful for certain push-based use cases.
Deployments with a lot of metadata can sometimes time out the IPFS server. You can try again (right away, and if that isn't working, a bit later) and eventually one attempt should work. This is a limitation of the IPFS server, but we're exploring options to workaround this. If you continue to face issues, contact our support team at [support@goldsky.com](mailto:support@goldsky.com) and we'll help manually port it over.
You may get `store error: column "x" specified more than once` when using Goldsky's [Instant Subgraphs functionality](/subgraphs/guides/create-a-no-code-subgraph). Multiple ABIs might be causing name conflicts due to conflicting fields or event names in the ABI. You can try splitting multiple ABIs into multiple subgraphs. There will be a mitigation for this in a future version. If you run into issues deploying or with the subgraph separately, contact our support team at [support@goldsky.com](mailto:support@goldsky.com).
## Mirror
Mirror pipelines write data from `us-west-2` on AWS from a dynamic range of IP addresses. If you need VPC peering / static IPs for your allow list, contact us at [support@goldsky.com](mailto:support@goldsky.com) to discuss your use case.
Yes! Add `--resource size ` to your `goldsky pipeline create ` command, and the resource size will be set prior to deployment of the pipeline, preventing the need for a pipeline update (which restarts the resource).
Yes - if the primary key is the same (which is the default), the pipeline will upsert and not rewrite data. If itβs already there (based on the primary key) it will skip and move to the next record until it identifies data that isnβt already in the destination sink. Itβs important to note that this only applies for databases that support upserts, such as Postgres, MySQL, and Elasticsearch. This does not work on S3, and duplicate data is written.
Your destination sink and indexes kept will vastly influence how much storage you need for your data. We are working on publishing a record count for raw data tables to serve as a starting point for approximation, but in the meantime feel free to contact support for a better estimate for your specific use case!
## Platform
API keys are only kept hashed (meaning after itβs displayed for the first time, you need to copy and save it locally in order to access it, we wonβt be able to restore it for you!). If your API key is lost, you can reset / generate a new one from the settings page in the web app.
Goldsky can support any EVM-compatible chain. If we don't support it in our shared indexing infrastructure, contact us to get set up with a dedicated indexer. Once set up, we can add new chains to your instance in a \~1 hour turnarount time or less.
Yes, every version of a subgraph incurs a separate worker fee and storage (in terms of entities) is also counted separately. Be sure to delete old versions of a subgraph you no longer need to query to minimize wasteful spend.
## Other
For help with anything else not answered on this documentation page, feel free
to try the doc-wide search with the top-bar, and if that doesn't help you find
what you're looking for, don't hesitate to contact our support team at
[support@goldsky.com](mailto:support@goldsky.com).
# Support
Source: https://docs.goldsky.com/getting-support
Our team is on standby to help you get the most out of our products.
## Starter + Scale
You can reach out to us any time with any questions, issues, concerns, or product ideas & feedback. Here are a few ways to do so:
* Tweet at us or send us a DM at [@goldskyio](https://x.com/goldskyio)
* Email us at [support@goldsky.com](mailto:support@goldsky.com)
For Starter plan users, we do not provide any response time estimates. For Scale plan users, we target a response time of 24-48 hours on a best-effort basis.
## Enterprise
If you an Enterprise user, you have additional options for getting help, including:
* Directly to your named Customer Success Manager via email
* Via your dedicated Slack support channel
* Via our Telegram support bot
Response times are defined on a company-by-company basis in your Support SLA.
# GitHub
Source: https://docs.goldsky.com/github-repo
# Introduction to Goldsky
Source: https://docs.goldsky.com/introduction
Goldsky is the go-to data indexer for web3 builders, offering high-performance subgraph hosting and realtime data replication pipelines.
```bash Install
curl https://goldsky.com | sh
```
```bash Migrate subgraph
goldsky subgraph deploy / --from-url
```
```bash Deploy pipeline
goldsky pipeline create
```
Goldsky offers two core self-serve products that can be used independently or in conjunction to power your data stack.
## Subgraphs
**Flexible indexing with typescript, with support for webhooks and more.**
Get started guide to index and query on-chain data
Dive into detailed documentation on Goldsky Subgraphs
## Mirror
**Get live blockchain data in your database or message queues with a single yaml config.**
Get started guide to stream data into your own infrastructure
Dive into detailed documentation on Goldsky Mirror
Mirror can use Goldsky-hosted subgraphs as a data source, allowing you to get your data into any of our sinks without any data lag.
\--
# About Mirror Pipelines
Source: https://docs.goldsky.com/mirror/about-pipeline
We recently released v3 of pipeline configurations which uses a more intuitive
and user friendly format to define and configure pipelines using a yaml file.
For backward compatibility purposes, we will still support the previous v2
format. This is why you will find references to each format in each yaml file
presented across the documentation. Feel free to use whichever is more
comfortable for you but we encourage you to start migrating to v3 format.
## Overview
A Mirror Pipeline defines flow of data from `sources -> transforms -> sinks`. It is configured in a `yaml` file which adheres to Goldsky's pipeline schema.
The core logic of the pipeline is defined in `sources`, `transforms` and `sinks` attributes.
* `sources` represent origin of the data into the pipeline.
* `transforms` represent data transformation/filter logic to be applied to either a source and/or transform in the pipeline.
* `sinks` represent destination for the source and/or transform data out of the pipeline.
Each `source` and `transform` has a unique name which is referenceable in other `transform` and/or `sink`, determining dataflow within the pipeline.
While the pipeline is configured in yaml, [goldsky pipeline CLI commands](/reference/cli#pipeline) are used to take actions on the pipeline such as: `start`, `stop`, `get`, `delete`, `monitor` etc.
Below is an example pipeline configuration which sources from `base.logs` Goldsky dataset, filters the data using `sql` and sinks to a `postgresql` table:
```yaml base-logs.yaml
apiVersion: 3
name: base-logs-pipeline
resource_size: s
sources:
base.logs:
dataset_name: base.logs
version: 1.0.0
type: dataset
description: Enriched logs for events emitted from contracts. Contains the
contract address, data, topics, decoded event and metadata for blocks and
transactions.
display_name: Logs
transforms:
filter_logs_by_block_number:
sql: SELECT * FROM base.logs WHERE block_number > 5000
primary_key: id
sinks:
postgres_base_logs:
type: postgres
table: base_logs
schema: public
secret_name: GOLDSKY_SECRET
description: "Postgres sink for: base.logs"
from: filter_logs_by_block_number
```
Keys in v3 format for sources, transforms and sinks are user provided values. In the above example, the source reference name `base.logs` matches the actual dataset name. This is the convention that you'll typically see across examples and autogenerated configurations.
However, you can use a custom name as the key.
```yaml base-logs.yaml
name: base-logs-pipeline
resource_size: s
apiVersion: 3
definition:
sources:
- referenceName: base.logs
type: dataset
version: 1.0.0
transforms: []
sinks:
- type: postgres
table: base_logs
schema: public
secretName: GOLDSKY_SECRET
description: 'Postgres sink for: base.logs'
sourceStreamName: base.logs
referenceName: postgres_base_logs
```
You can find the complete Pipeline configuration schema in the [reference](/reference/config-file/pipeline) page.
## Development workflow
Similar to the software development workflow of `edit -> compile -> run`, there's an implict iterative workflow of `configure -> apply -> monitor` for developing pipelines.
1. `configure`: Create/edit the configuration yaml file.
2. `apply`: Apply the configuration aka run the pipeline.
3. `monitor`: Monitor how the pipeline behaves. This will help create insights that'll generate ideas for the first step.
Eventually, you'll end up with a configuration that works for your use case.
Creating a Pipeline configuration from scratch is challenging. However, there are tools/guides/examples that make it easier to [get started](/mirror/create-a-pipeline).
## Understanding Pipeline Runtime Lifecycle
The `status` attribute represents the desired status of the pipeline and is provided by the user. Applicable values are:
* `ACTIVE` means the user wants to start the pipeline.
* `INACTIVE` means the user wants to stop the pipeline.
* `PAUSED` means the user wants to save-progress made by the pipeline so far and stop it.
A pipeline with status `ACTIVE` has a runtime status as well. Runtime represents the execution of the pipeline. Applicable runtime status values are:
* `STARTING` means the pipeline is being setup.
* `RUNNING` means the pipeline has been setup and is processing records.
* `FAILING` means the pipeline has encountered errors that prevents it from running successfully.
* `TERMINATED` means the pipeline has failed and the execution has been terminated.
There are several [goldsky pipeline CLI commands](/reference/config-file/pipeline#pipeline-runtime-commands) that help with pipeline execution.
For now, let's see how these states play out on successful and unsuccessful scenarios.
### Successful pipeline lifecycle
In this scenario the pipeline is succesfully setup and processing data without encountering any issues.
We consider the pipeline to be in a healthy state which translates into the following statuses:
* Desired `status` in the pipeline configuration is `ACTIVE`
* Runtime Status goes from `STARTING` to `RUNNING`
```mermaid
stateDiagram-v2
state ACTIVE {
[*] --> STARTING
STARTING --> RUNNING
}
```
Let's look at a simple example below where we configure a pipeline that consumes Logs from Base chain and streams them into a Postgres database:
```yaml base-logs.yaml
name: base-logs-pipeline
resource_size: s
apiVersion: 3
sources:
base.logs:
dataset_name: base.logs
version: 1.0.0
type: dataset
description: Enriched logs for events emitted from contracts. Contains the
contract address, data, topics, decoded event and metadata for blocks and
transactions.
display_name: Logs
transforms: {}
sinks:
postgres_base_logs:
type: postgres
table: base_logs
schema: public
secret_name: GOLDSKY_SECRET
description: "Postgres sink for: base.logs"
from: base.logs
```
```yaml base-logs.yaml
name: base-logs-pipeline
definition:
sources:
- referenceName: base.logs
type: dataset
version: 1.0.0
transforms: []
sinks:
- type: postgres
table: base_logs
schema: public
secretName: GOLDSKY_SECRET
description: 'Postgres sink for: base.logs'
sourceStreamName: base.logs
referenceName: postgres_base_logs
```
Let's attempt to run it using the command `goldsky pipeline apply base-logs.yaml --status ACTIVE` or `goldsky pipeline start base-logs.yaml`
```
β― goldsky pipeline apply base-logs.yaml --status ACTIVE
β
β Successfully validated config file
β
β Successfully applied config to pipeline: base-logs-pipeline
To monitor the status of your pipeline:
Using the CLI: `goldsky pipeline monitor base-logs-pipeline`
Using the dashboard: https://app.goldsky.com/dashboard/pipelines/stream/base-logs-pipeline/1
```
At this point we have set the desired status to `ACTIVE`. We can confirm this using `goldsky pipeline list`:
```
β― goldsky pipeline list
β Listing pipelines
ββββββββββββββββββββββββββββββββββββββββ
β Name β Version β Status β Resource β
β β β β Size β
ββββββββββββββββββββββββββββββββββββββββ
β base-logs-pipeline β 1 β ACTIVE β s β
ββββββββββββββββββββββββββββββββββββββββ
```
We can then check the runtime status of this pipeline using the `goldsky pipeline monitor base-logs-pipeline` command:
We can see how the pipeline starts in `STARTING` status and becomes `RUNNING` as it starts processing data successfully into our Postgres sink.
This pipeline will start processing the historical data of the source dataset, reach its edge and continue streaming data in real time until we either stop it or it encounters error(s) that interrupts it's execution.
### Unsuccessful pipeline lifecycle
Let's now consider the scenario where the pipeline encounters errors during its lifetime and ends up failing.
There can be multitude of reasons for a pipeline to encounter errors such as:
* secrets not being correctly configured
* sink availability issues
* policy rules on the sink preventing the pipeline from writing records
* resource size incompatiblity
* and many more
These failure scenarios prevents a pipeline from getting-into or staying-in a `RUNNING` runtime status.
```mermaid
---
title: Healthy pipeline becomes unhealthy
---
stateDiagram-v2
state status:ACTIVE {
[*] --> STARTING
STARTING --> RUNNING
RUNNING --> FAILING
FAILING --> TERMINATED
}
```
```mermaid
---
title: Pipeline cannot start
---
stateDiagram-v2
state status:ACTIVE {
[*] --> STARTING
STARTING --> FAILING
FAILING --> TERMINATED
}
```
A Pipeline can be in an `ACTIVE` desired status but a `TERMINATED` runtime status in scenarios that lead to terminal failure.
Let's see an example where we'll use the same configuration as above but set a `secret_name` that does not exist.
```yaml bad-base-logs.yaml
name: bad-base-logs-pipeline
resource_size: s
apiVersion: 3
sources:
base.logs:
dataset_name: base.logs
version: 1.0.0
type: dataset
description: Enriched logs for events emitted from contracts. Contains the
contract address, data, topics, decoded event and metadata for blocks and
transactions.
display_name: Logs
transforms: {}
sinks:
postgres_base_logs:
type: postgres
table: base_logs
schema: public
secret_name: YOUR_DATABASE_SECRET
description: "Postgres sink for: base.logs"
from: base.logs
```
```yaml bad-base-logs.yaml
name: bad-base-logs-pipeline
definition:
sources:
- referenceName: base.logs
type: dataset
version: 1.0.0
transforms: []
sinks:
- type: postgres
table: base_logs
schema: public
secretName: YOUR_DATABASE_SECRET
description: 'Postgres sink for: base.logs'
sourceStreamName: base.logs
referenceName: postgres_base_logs
```
Let's start it using the command `goldsky pipeline apply bad-base-logs.yaml`.
```
β― goldsky pipeline apply bad-base-logs.yaml
β
β Successfully validated config file
β
β Successfully applied config to pipeline: base-logs-pipeline
To monitor the status of your pipeline:
Using the CLI: `goldsky pipeline monitor bad-base-logs-pipeline`
Using the dashboard: https://app.goldsky.com/dashboard/pipelines/stream/bad-base-logs-pipeline/1
```
The pipeline configuration is valid, however, the pipeline runtime will encounter error since the secret that contains credentials to communicate with the sink does not exist.
Running `goldsky pipeline monitor bad-base-logs-pipeline` we see:
As expected, the pipeline has encountered a terminal error. Please note that the desired status is still `ACTIVE` even though the pipeline runtime status is `TERMINATED`
```
β― goldsky pipeline list
β Listing pipelines
βββββββββββββββββββββββββββββββββββββββββ
β Name β Version β Status β Resource β
β β β β Size β
βββββββββββββββββββββββββββββββββββββββββ
β bad-base-logs-pipeline β 1 β ACTIVE β s β
βββββββββββββββββββββββββββββββββββββββββ
```
## Runtime visibility
Pipeline runtime visibility is an important part of the pipeline development workflow. Mirror pipelines expose:
1. Runtime status and error messages
2. Logs emitted by the pipeline
3. Metrics on `Records received`, which counts all the records the pipeline has received from source(s) and, `Records written` which counts all records the pipeline has written to sink(s).
4. [Email notifications](/mirror/about-pipeline#email-notifications)
Runtime status, error messages and metrics can be seen via two methods:
1. Pipeline dashboard at `https://app.goldsky.com/dashboard/pipelines/stream//`
2. `goldsky pipeline monitor ` CLI command
Logs can only be seen in the pipeline dashboard.
Mirror attempts to surface appropriate and actionable error message and status for users, however, there is always room for imporovements. Please [reachout](/getting-support) if you think the experience can be improved.
### Email notifications
If a pipeline fails terminally the project members will get notified via an email.
You can configure this nofication in the [Notifications section](https://app.goldsky.com/dashboard/settings#notifications) of your project
## Error handling
There are two broad categories of errors.
**Pipeline configuration schema error**
This means the schema of the pipeline configuration is not valid. These errors are usually caught before pipeline execution. Some possible scenarios:
* a required attribute is missing
* transform SQL has syntax errors
* pipeline name is invalid
**Pipeline runtime error**
This means the pipeline encountered error during execution at runtime.
Some possible scenarios:
* credentails stored in the secret are incorrect or do not have needed access privilages
* sink availability issues
* poison-pill record that breaks the business logic in the transforms
* `resource_size` limitation
Transient errors are automatically retried as per retry-policy (for upto 6 hours) whearas non-transient ones immediately terminate the pipeline.
While many errors can be resolved by user intervention, there is a possibility of platform errors as well. Please [reachout to support](/getting-support) for investigation.
## Resource sizing
`resource_size` represents the compute (vCPUs and RAM) available to the pipeline. There are several options for pipeline sizes: `s, m, l, xl, xxl`. This attribute influences [pricing](/pricing/summary#mirror) as well.
Resource sizing depends on a few different factors such as:
* number of sources, transforms, sinks
* expected amount of data to be processed.
* transform sql involves joining multiple sources and/or transforms
Here's some general information that you can use as reference:
* A `small` resource size is usually enough in most use case: it can handle full backfill of small chain datasets and write to speeds of up to 300K records per second. For pipelines using
subgraphs as source it can reliably handle up to 8 subgraphs.
* Larger resource sizes are usually needed when backfilling large chains or when doing large JOINS (example: JOIN between accounts and transactions datasets in Solana)
* It's recommended to always follow a defensive approach: start small and scale up if needed.
## Snapshots
A Pipeline snapshot captures a point-in-time state of a `RUNNING` pipeline allowing users to resume from it in the future.
It can be useful in various scenarios:
* evolving your `RUNNING` pipeline (eg: adding a new source, sink) without losing progress made so far.
* recover from new bug introductions where the user fix the bug and resume from an earlier snapshot to reprocess data.
Please note that snapshot only contains info about the progress made in reading the source(s) and the sql transform's state. It isn't representative of the state of the source/sink. For eg: if all data in the sink database table is deleted, resuming the pipeline from a snapshot does not recover it.
Currently, a pipeline can only be resumed from the latest available snapshot. If you need to resume from older snapshots, please [reachout to support](/getting-support)
Snapshots are closely tied to pipeline runtime in that all [commands](/reference/config-file/pipeline#pipeline-runtime-commands) that changes pipeline runtime has options to trigger a new snapshot and/or resume from the latest one.
```mermaid
%%{init: { 'gitGraph': {'mainBranchName': 'myPipeline-v1'}, 'theme': 'default' , 'themeVariables': { 'git0': '#ffbf60' }}}%%
gitGraph
commit id: " " type: REVERSE tag:"start"
commit id: "snapshot1"
commit id: "snapshot2"
commit id: "snapshot3"
commit id: "snapshot4" tag:"stop" type: HIGHLIGHT
branch myPipeline-v2
commit id: "snapshot4 " type: REVERSE tag:"start"
```
### When are snapshots taken?
1. When updating a `RUNNING` pipeline, a snapshot is created before applying the update. This is to ensure that there's an up-to-date snapshot in case the update introduces issues.
2. When pausing a pipeline.
3. Automatically on regular intervals. For `RUNNING` pipelines in healthy state, automatic snapshots are taken every 4 hours to ensure minimal data loss in case of errors.
4. Users can request snapshot creation via the following CLI command:
* `goldsky pipeline snapshot create `
* `goldsky pipeline apply --from-snapshot new`
* `goldsky pipeline apply --save-progress true` (CLI version \< `11.0.0`)
5. Users can list all snapshots in a pipeline via the following CLI command:
* `goldsky pipeline snapshot list `
### How long does it take to create a snapshot
The amount of time it takes for a snapshot to be created depends largly on two factors. First, the amount of state accumulated during pipeline execution. Second, how fast records are being processed end-end in the pipeline.
In case of a long running snapshot that was triggered as part of an update to the pipeline, any future updates are blocked until snapshot is completed. Users do have an option to cancel the update request.
There is a scenario where the the pipeline was healthy at the time of starting the snapshot however, became unhealthy later preventing snapshot creation. Here, the pipeline will attempt to recover however, may need user intervention that involves restarting from last successful snapshot.
### Scenarios and Snapshot Behavior
Happy Scenario:
* Suppose a pipeline is at 50% progress, and an automatic snapshot is taken.
* The pipeline then progresses to 60% and is in a healthy state. If you pause the pipeline at this point, a new snapshot is taken.
* You can later start the pipeline from the 60% snapshot, ensuring continuity from the last known healthy state.
Bad Scenario:
* If the pipeline reaches 50%, and an automatic snapshot is taken.
* It then progresses to 60% but enters a bad state. Attempting to pause the pipeline in this state will fail.
* If you restart the pipeline, it will resume from the last successful snapshot at 50%, there was no snapshot created at 60%
# Getting started
Source: https://docs.goldsky.com/mirror/create-a-pipeline
Step by step instructions on how to create a Goldsky Mirror pipeline.
You have two options to create a Goldsky Mirror pipeline:
1. **[Goldsky Flow](/mirror/create-a-pipeline#goldsky-flow)**: With a guided web experience in the dashboard
2. **[CLI](/mirror/create-a-pipeline#creating-mirror-pipelines-with-the-cli)**: interactively or by providing a pipeline configuration
## Goldsky Flow
Flow allows you to deploy pipelines by simply dragging and dropping its component into a canvas. You can open up Flow by going to the [Pipelines page](https://app.goldsky.com/dashboard/pipelines) on the dashboard and clicking on the `New pipeline` button.
You'll be redirected to Goldsky Flow, which starts with an empty canvas representing the initial state.
The draggable components that will make up the pipeline are located on the left side menu.
Let's now look at how we can deploy a simple pipeline; in the following section we are going to see the steps needed to stream Ethereum raw logs into a ClickHouse database. Since the steps are the same for any pipeline, feel free to adapt the components to fit your specific use case.
1. **Select the Data Source**
Start by dragging and dropping a `Data Source` card onto the canvas. Once you do that, you'll to need select the chain you are interested in. We currently support [100+ chains](/chains/supported-networks). For this example we are going to choose `Ethereum`.
Next, we need to define the type of data source we want to use:
* Onchain datasets: these are [Direct Indexing datasets](/mirror/sources/direct-indexing) representing both raw data (e.g. Raw Blocks) as well as curated datasets (e.g. ERC-20 Transfers)
* [Subgraphs](/mirror/sources/subgraphs): this can be community subgraphs or existing subgraphs in your project for the choosen network
For this example, we are going to choose `Raw Logs`.
After selecting the data source, you have some optional configuration fields to use, in the case of `Onchain Datasets` you can configure:
* `Start indexing at`: here you can define whether you want to do a full backfill (`Beginning`) or read from edge (`Current`)
* `Filter by contract address`: optional contract address (in lowercase) to filter from
* `Filter by topics`: optional list of topics (in lowercase) to filter from, separated by commas.
* `View Schema`: view the data schema to get a better idea of the shape of the data as well as see some sample records.
2. **(Optional) Select a Transform**
Optionally select a Transform for your data by clicking on the `+` button at the top right edge of the `Data Source` card and you'll have the option to add a Transform or a Sink.
Tranforms are optional intermediate compute processors that allow you to modify the original data (you can find more information on the support Transform types [here](/mirror/transforms/transforms-overview)). For this example, we are going to create a simple SQL transform to select a subset of the available
data in the source. To do that, select `Custom SQL`.
Click on the `Query` field of the card to bring up the SQL editor.
In this inline editor you can define the logic of your transformation and run the
SQL code at the top right corner to experiment with the data and see the result of your queries. For this example we are adding `SELECT id, block_number, transaction_hash, data FROM source_1`
If you click on the `Run` button on the top right corner you'll see a preview of the final shape of the data. Once satisfied with the results in your Transforms, press `Save` to add it to the pipeline.
3. **Select the Sink**
The last pipeline component to define is the [Sink](/mirror/sinks/supported-sinks), this is, the destination of our data. Click on the `+` button at the top right edge of the `Transform Card` and select a Sink.
If you already have configured any sinks previously (for more information, see [Mirror Secrets](/mirror/manage-secrets)) you'll be able to
choose it from the list. Alternatively, you'll need to create a new sink by creating its corresponding secret. In our example, weβll use an existing sink to a ClickHouse database.
Once you select the sink, you'll have some configuration options available to define how the data will be written into your database as well as anoter `Preview Output` button to see the what the final shape of the data will be; this is a very convenient utility
in cases where you might have multiple sources and transforms in your pipeline and you want to iterate on its logic without having to redeploy the actual pipeline every time.
4. **Confirm and deploy**
Last but not least, we need to define a name for the pipeline. You can do that at the top center input of your screen. For this example, we are going to call it `ethereum-raw-logs`
Up to this point, your canvas should look similar to this:
Click on the `Deploy` button on the top right corner and specify the [resource size](/mirror/about-pipeline#resource-sizing); for this example you can choose the default `Small`.
You should now be redirected to the pipelines details page
Congratulations, you just deployed your first pipeline using Goldsky Flow! π₯³
Assuming the sink is properly configured you should start seeing data flowing into your database after a few seconds.
If you would like to update the components of your pipeline and deploy a newer version (more on this topic [here](/mirror/about-pipeline))
you can click on the `Update Pipeline` button on the top right corner of the page and it will take you back onto the Flow canvas so you can do any updates on it.
There's a couple of things about Flow worth highlighting:
* Pipelines are formally defined using [configuration files](/reference/config-file/pipeline) in YAML. Goldsky Flow abstract that complexity for us so that we can just
create the pipeline by dragging and dropping its components. You can at any time see the current configuration definition of the pipeline by switching the view to `YAML` on the top left corner. This is quite useful
in cases where you'd like to version control your pipeline logic and/or automate its deployment via CI/CD using the CLI (as explained in next section)
* Pipeline components are interconnected via reference names: in our example, the source has a default reference name of `source_1`; the transform (`sql_1`) reads from `source_1` in its SQL query; the sink (`sink_1`)
reads the result from the transform (see its `Input source` value) to finally emit the data into the destination. You can modify the reference names of every component of the pipeline on the canvas, just bear in mind
the connecting role these names play.
Read on the following sections if you would like to know how to deploy pipelines using the CLI.
## Goldsky CLI
There are two ways in which you can create pipelines with the CLI:
* Interactive
* Non-Interactive
### Guided CLI experience
This is a simple and guided way to create pipelines via the CLI.
Run `goldsky pipeline create ` in your terminal and follow the prompts.
In short, the CLI guides you through the following process:
1. Select one or more source(s)
2. Depending on the selected source(s), define transforms
3. Configure one or more sink(s)
### Custom Pipeline Configuration File
This is an advanced way to create a new pipeline. Instead of using the guided CLI experience (see above), you create the pipeline configuration on your own.
A pipeline configuration is a YAML structure with the following top-level properties:
```yaml
name:
apiVersion: 3
sources: {}
transforms: {}
sinks: {}
```
Both `sources` and `sinks` are required with a minimum of one entry each. `transforms` is optional and an empty object (`{}`) can be used if no transforms are needed.
Full configuration details for Pipelines is available in the [reference](/reference/config-file/pipeline) page.
As an example, see below a pipeline configuration which uses the Ethereum Decoded Logs dataset as source, uses a transform to select specific data fields and sinks that data into a Postgres database whose connection details are stored within the `A_POSTGRESQL_SECRET` secret:
```yaml pipeline.yaml
name: ethereum-decoded-logs
apiVersion: 3
sources:
ethereum_decoded_logs:
dataset_name: ethereum.decoded_logs
version: 1.0.0
type: dataset
start_at: latest
transforms:
select_relevant_fields:
sql: |
SELECT
id,
address,
event_signature,
event_params,
raw_log.block_number as block_number,
raw_log.block_hash as block_hash,
raw_log.transaction_hash as transaction_hash
FROM
ethereum_decoded_logs
primary_key: id
sinks:
postgres:
type: postgres
table: eth_logs
schema: goldsky
secret_name: A_POSTGRESQL_SECRET
from: select_relevant_fields
```
Note that to create a pipeline from configuration that sinks to your datastore, you need to have a [secret](/mirror/manage-secrets) already configured on your Goldsky project and reference it in the sink configuration.
Run `goldsky pipeline apply ` in your terminal to create a pipeline.
Once your pipeline is created, run `goldsky pipeline start ` to start your pipeline.
## Monitor a pipeline
When you create a new pipeline, the CLI automatically starts to monitor the status and outputs it in a table format.
If you want to monitor an existing pipeline at a later time, use the `goldsky pipeline monitor ` CLI command. It refreshes every ten seconds and gives you insights into how your pipeline performs.
Or you may monitor in the Pipeline Dashboard page at `https://app.goldsky.com/dashboard/pipelines/stream//` where you can see the pipeline's `status`, `logs`, `metrics`.
# CryptoHouse - Free Blockchain Analytics powered by ClickHouse and Goldsky
Source: https://docs.goldsky.com/mirror/cryptohouse
We have partnered with [Clickhouse](https://clickhouse.com/) to allow you to interact with some onchain datasets for free at [crypto.clickhouse.com](https://crypto.clickhouse.com/).
As of today, users of CryptoHouse can query Solana blocks, transactions, token\_transfers, block\_rewards, accounts, and tokens for free. Similar datasets are available for Ethereum and Base. We plan to expand the data available and expose more blockchains in the coming months!
Remember that you can see more information about each dataset in our [reference](/chains/supported-networks#mirror) and [schemas](/reference/schema/non-EVM-schemas#solana) pages.
If you want to learn more about this initiative, head to this [Clickhouse blog post](https://clickhouse.com/blog/announcing-cryptohouse-free-blockchain-analytics) for more information.
# Data Quality at Goldsky
Source: https://docs.goldsky.com/mirror/data-quality
# Mirror Indexing
Goldsky Mirror datasets are populated through various indexers, which write the data into a data stream.
The data stream is then accessible directly by users through Mirror pipelines. Internally, we copy the data to a data lake which is then used to power various features and also used for Data QA.
Data quality is managed during ingestion, and also through periodic checks.
Emitted data quality is managed through various database guarantees, depending on the destination of the data.
## Ingestion-level Consistency
### Chain Continuity
When first ingesting a block, we check for a continuous block hash chain. If the chain is not valid (ie the parent hash does not match the hash we have of the preceding block number), we issue deletes and updates into our dataset and walk backwards until we reach a consistent chain again.
All deletes and updates are propagated through to downstream sinks. This means if you have a Mirror pipeline writing chain data into a database, and that chain goes through a reorg or a rollback, **all the changes will automatically propagate to your database as well.**
### Write Guarantees
During ingestion, we ensure we have the full set of data for a block before emitting it into the various datasets. When emitting, we acquire full consistency acknowledgement from our various data sinks before marking the block as ingested.
### Schema Strictness
Our datasets follow strict typed schemas, causing writes that donβt fit into said schemas to fail completely.
## Dataset Validation Checks
In rare cases, RPC nodes can give us invalid data that may be missed during ingestion checks. For every dataset, we run checks on a daily basis and repair the data if any issues are seen.
These checks validate:
1. Missing blocks (EVM) - we record the minimum and maximum block numbers for each date, and look for gaps in the data
2. Missing transactions (EVM) - We count unique transaction hashes per block and compare it with the `transaction_count` for the block.
3. Missing logs (EVM) - We compare the maximum log index per block with the number of logs per block.
This framework will allow us to proactively address data quality issues in a structured and efficient manner. Much like unit tests in a software codebase, these checks will help prevent future regressions. Once a check is implemented for one chain, it can be seamlessly applied across others, ensuring consistency and scalability.
## Destination Level Consistency
To prevent missing when writing, mirror pipelines are built with an **at least once guarantee**.
### Snapshots
We do automatic fault tolerance every min with snapshot recovery every 4 hrs. When a pipeline is updated or is forced to terminate, a snapshot is persisted, and used for the next incarnation of the pipeline. This allows for continuity of the data being sent.
### Database Consistency
For every row of data the pipeline needs to send, we make sure we have full acknowledgement from the database before moving into the next set of data to be sent. If itβs not acknowledged, the snapshots will not show that set of data is sent, and if any restarts or errors happen with the pipeline, the snapshot will be pessimistic and risk resending data over missing data.
### Sink Downtime Handling
If a write into a sink (database or channel) errors for whatever reason, the pipeline will automatically restart just that batch for that sink. If it continues to error, the pipeline will restart the writers. Finally, if all fails for a prolonged period of time, the pipeline will fail, and when the user restarts it, it will resume from the last saved snapshot.
# Object Storage (S3/GCS/R2)
Source: https://docs.goldsky.com/mirror/extensions/channels/aws-s3
The sub-second realtime and reorg-aware advantages of mirror are greatly
diminished when using our S3 connector due to the constraints of file-based
storage. If possible, it's highly recommended to use one of the other channels
or sinks instead!
The files are created in [Parquet](https://parquet.apache.org/) format.
Files will be emitted on an interval, essentially mimicing a mini-batch system.
Data will also be append-only, so if there is a reorg, data with the same id will be emitted. It's up to the downstream consumers of this data to deduplicate the data.
Full configuration details for this sink is available in the [reference](/reference/config-file/pipeline#file) page.
## Secrets
Create an AWS S3 secret with the following CLI command:
```shell
goldsky secret create --name AN_AWS_S3_SECRET --value '{
"accessKeyId": "Type.String()",
"secretAccessKey": "Type.String()",
"region": "Type.String()",
"type": "s3"
}'
```
## Partitioning
This sink supports folder-based partitioning through the `partition_columns` option.
In this example, it will store files in a different file for each day, based on the `block_timestamp` of each Base transfer.
`s3://test-bucket/base/transfers/erc20/`
```yaml
name: example-partition
apiVersion: 3
sources:
base.transfers:
dataset_name: base.erc20_transfers
version: 1.2.0
type: dataset
start_at: latest
transforms:
transform_transactions:
type: sql
primary_key: id
sql: |-
select *, from_unixtime(block_timestamp/1000, 'yyyy-MM-dd') as dt
from base.transfers
sinks:
filesink_transform_transactions:
secret_name: S3_SECRET
path: s3://test-bucket/base/transfers/erc20/
type: file
format: parquet
partition_columns: dt
from: transform_transactions
```
# AWS SQS
Source: https://docs.goldsky.com/mirror/extensions/channels/aws-sqs
When you need to react to every new event coming from the blockchain or subgraph, SQS can be a simple and resilient way get started. SQS works with any mirror source, including subgraph updates and on-chain events.
Mirror Pipelines will send events to an SQS queue of your choosing. You can then use AWS's SDK to process events, or even create a lambda to do serverless processing of the events.
SQS is append-only, so any events will be sent with the metadata needed to handle mutations as needed.
Full configuration details for SQS sink is available in the [reference](/reference/config-file/pipeline#sqs) page.
## Secrets
Create an AWS SQS secret with the following CLI command:
```shell
goldsky secret create --name AN_AWS_SQS_SECRET --value '{
"accessKey": "Type.String()",
"secretAccessKey": "Type.String()",
"region": "Type.String()",
"type": "sqs"
}'
```
Secret requires `sqs:SendMessage` permission. Refer to [AWS SQS permissions documentation](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-api-permissions-reference.html) for more information.
# Kafka
Source: https://docs.goldsky.com/mirror/extensions/channels/kafka
[Kafka](https://kafka.apache.org/) is a distributed streaming platform that is used to build real-time data pipelines and streaming applications. It is designed to be fast, scalable, and durable.
You can use Kafka to deeply integrate into your existing data ecosystem. Goldsky supplies a message format that allows you to handle blockchain forks and reorganizations with your downstream data pipelines.
Kafka has a rich ecosystem of SDKs and connectors you can make use of to do advanced data processing.
**Less Magic Here**
The Kafka integration is less end to end - while Goldsky will handle a ton of the topic partitioning balancing and other details, using Kafka is a bit more involved compared to getting data directly mirrored into a database.
Full configuration details for Kafka sink is available in the [reference](/reference/config-file/pipeline#kafka) page.
## Secrets
```shell
goldsky secret create --name A_KAFKA_SECRET --value '{
"type": "kafka",
"bootstrapServers": "Type.String()",
"securityProtocol": "Type.Enum(SecurityProtocol)",
"saslMechanism": "Type.Optional(Type.Enum(SaslMechanism))",
"saslJaasUsername": "Type.Optional(Type.String())",
"saslJaasPassword": "Type.Optional(Type.String())",
"schemaRegistryUrl": "Type.Optional(Type.String())",
"schemaRegistryUsername": "Type.Optional(Type.String())",
"schemaRegistryPassword": "Type.Optional(Type.String())"
}'
```
# Overview
Source: https://docs.goldsky.com/mirror/extensions/channels/overview
Use channels to integrate Mirror into your existing data stack.
## What are channels?
Channels are a special type of [Sink](/mirror/sinks/supported-sinks) that represent intermediate storage layers designed to absorb the Goldsky
firehose. They aren't designed to be queryable on their own - instead, you
should plan to connect them to your existing data stack or sink that's not
currently supported by Goldsky.
AWS S3 offers unparalleled scalability and durability for storing vast
amounts of data.
AWS SQS provides reliable message queuing for decoupling distributed systems
with ease.
Kafka excels in handling high-throughput, real-time data streams with strong
fault tolerance.
## What should I use?
### For durable storage
Goldsky supports export of raw blockchain or custom data to [AWS S3](/mirror/extensions/channels/aws-s3) or GCS, either in Iceberg format or in plain Parquet format. GCS support is currently available upon request, please reach out to us at [support@goldsky.com](mailto:support@goldsky.com).
Keep in mind that the blockchain is eventually consistent, so reorgs in append-only mode is portrayed differently.
Once data is in S3, you can use a solution like AWS Athena to query, or merge with existing spark pipelines.
### For processing events as they come
If your backend needs to process and receive data as it appears on the blockchain, you can consider using our SQS or Kafka channel sinks. These sinks are append-only, so chain reorgs are not handled for you, but you do receive all the metadata required to handle reorgs yourself.
An example architecture would have:
1. An SQS queue
2. An AWS Lambda function processing said queue
Mirror will send each event as a message into the SQS queue, and the lambda function can process it however you need through making additional enrichments, calling discord/telegram bots, or inserting into another datastore that we donβt yet support.
### For more processing
In addition to AWS S3, we support direct emits to [Kafka](/mirror/extensions/channels/kafka). Kafka can store messages at scale, making it a great choice as an initial holding place for data before you do further processing.
Our team can work with many different strategies and can give guidance on how to integrate with our data format inside Kafka. Reach out to our support team at [support@goldsky.com](mailto:support@goldsky.com) if you'd like to learn more.
# Extensions
Source: https://docs.goldsky.com/mirror/extensions/overview
Goldsky Mirror's primary function is to simplify the ingestion of blockchain data into your data warehouse for querying and analytics, providing the data as-is. **Extensions** enhance Mirror's capabilities beyond this core use case, enabling new destinations for your data and the ability to transform it.
Types of Extensions:
* Channels: Intermediate storage layers that facilitate further integration into your data stack. They handle high-throughput data streams and enable flexible data processing.
* Transforms: Tools for filtering and aggregating data in-stream, allowing you to shape the data to meet your specific needs.
By leveraging Extensions, you can customize and extend your Mirror pipelines to better fit your unique data workflows and integration requirements.
For more details on Extensions, visit our [Channels documentation](/mirror/extensions/channels/overview) and [Transforms documentation](/mirror/transforms/transforms-overview).
# Decode contract events
Source: https://docs.goldsky.com/mirror/guides/decoding-contract-events
Sync contract events to a database with the contract ABI using Mirror.
This guide explains how to decode raw contract events on-the-fly using [Mirror Decoding Functions](/reference/mirror-functions/decoding-functions) within transforms in Mirror pipelines.
## What youβll need
1. A Goldky account and the CLI installed
1. A basic understanding of the [Mirror product](/mirror)
2. A destination sink to write your data to. In this example, we will use [PostgresSQL Sink](/mirror/sinks/postgres)
## Preface
To get decoded contract data on EVM chains in a Mirror pipeline, there are three options:
1. Decode data with a subgraph, then use a [subgraph entity source](/mirror/sources/subgraphs).
2. Use the `decoded_logs` and `decoded_traces` [direct indexing](/mirror/sources/direct-indexing) datasets. These are pre-decoded datasets, with coverage for common contracts, events, and functions.
3. Use the `raw_logs` dataset and decode inside a pipeline [transform](/reference/config-file/pipeline).
In this guide we are going to focus on the third method. We will use as example the [Friendtech contract](https://basescan.org/address/0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4) deployed in Base but the same logic applies to any other contract and chain for which there's an availabe Raw Log Direct Indexing dataset as per [this list](/mirror/sources/direct-indexing).
## Pipeline definition
In the `_gs_fetch_abi` function call below, we pull from a gist. You can also pull from basescan directly with an api key. \
\
`_gs_fetch_abi('', 'etherscan'), `
```yaml event-decoding-pipeline.yaml
name: decoding-contract-events
apiVersion: 3
sources:
my_base_raw_logs:
type: dataset
dataset_name: base.raw_logs
version: 1.0.0
transforms:
friendtech_decoded:
primary_key: id
# Fetch the ABI from a gist (raw)
sql: >
SELECT
`id`,
_gs_log_decode(
_gs_fetch_abi('https://gist.githubusercontent.com/jeffling/0320808b7f3cc0e8d9cc6c3b113e8156/raw/99bde70acecd4dc339b5a81aae39954973f5d178/gistfile1.txt', 'raw'),
`topics`,
`data`
) AS `decoded`,
block_number,
transaction_hash
FROM my_base_raw_logs
WHERE address='0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4'
friendtech_clean:
primary_key: id
# Clean up the previous transform, unnest the values from the `decoded` object.
sql: >
SELECT
`id`,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_signature`,
block_number,
transaction_hash
FROM friendtech_decoded
WHERE decoded IS NOT NULL
sinks:
friendtech_events:
secret_name: EXAMPLE_SECRET
type: postgres
from: friendtech_clean
schema: decoded_events
table: friendtech
```
```yaml
sources:
- type: dataset
referenceName: base.raw_logs
version: 1.0.0
transforms:
- referenceName: friendtech_decoded
type: sql
primaryKey: id
# Fetch the ABI from basescan, then use it to decode from the friendtech address.
sql: >
SELECT
`id`,
_gs_log_decode(
_gs_fetch_abi('https://api.basescan.org/api?module=contract&action=getabi&address=0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4', 'etherscan'),
`topics`,
`data`
) AS `decoded`,
block_number,
transaction_hash
FROM base.raw_logs
WHERE address='0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4'
- referenceName: friendtech_clean
primaryKey: id
type: sql
# Clean up the previous transform, unnest the values from the `decoded` object.
sql: >
SELECT
`id`,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_signature`,
block_number,
transaction_hash
FROM friendtech_decoded
WHERE decoded IS NOT NULL
sinks:
- referenceName: friendtech_events
secretName: EXAMPLE_SECRET
type: postgres
sourceStreamName: friendtech_clean
schema: decoded_events
table: friendtech
```
There are two important transforms in this pipeline definition which are responsible for decoding the contract; we'll explain how they work in detail. If you copy and use this configuration file, make sure to update:
1. Your `secret_name` (v2: `secretName`). If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `decoded_events.friendtech`.
### Decoding transforms
Let's start analyzing the first transform:
```sql Transform: friendtech_decoded
SELECT
`id`,
_gs_log_decode(
_gs_fetch_abi('https://api.basescan.org/api?module=contract&action=getabi&address=0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4', 'etherscan'),
`topics`,
`data`
) AS `decoded`,
block_number,
transaction_hash
FROM base.raw_logs
WHERE address='0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4'
```
Looking at the [Raw Logs schema](/reference/schema/EVM-schemas#raw-logs) we see there are standard log columns such as `id`, `block_number` and `transaction_hash`. Since its columns
`topics` and `data` are encoded we need to make use of the [\_gs\_log\_decode](/reference/mirror-functions/decoding-functions#gs-log-decode) to decode the data. This function takes the following parameters:
1. The contract ABI: rather than specifying ABI directly into the SQL query, which would made the code considerably less legible, we have decided to make use of the [\_gs\_fetch\_abi](/reference/mirror-functions/decoding-functions#gs_fetch_abi) function
to fetch the ABI from the BaseScan API but you could also fetch it from an external public repository like Github Gist if you preferred.
2. `topics`: as a second argument to the decode function we pass in the name of the column in our dataset that contains the topics as comma-separated string.
3. `data`: as a third argument to the decode function we pass in the name of the column in our dataset that contains the encoded data.
Some columns are surrounded by backticks, this is because they are reserved words in Flink SQL. Common columns that need backticks are: data, output, value, and a full list can be found [here](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/overview/#reserved-keywords).
We are storing the decoding result in a new column called `decoded` which is a [nested ROW](https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/types/#row) with the properties `event_param::TEXT[]` and `event_signature::TEXT`. We create a second transform that reads from the resulting dataset of this first SELECT query to access the decoded data:
```sql Transform: friendtech_clean
SELECT
`id`,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_signature`,
block_number,
transaction_hash
FROM friendtech_decoded
WHERE decoded IS NOT NULL
```
Notice how we add a filter for `decoded IS NOT NULL` as a safety measure to discard processing potential issues in the decoding phase.
## Deploying the pipeline
As a last step, to deploy this pipeline and start sinking decoded data into your database simply execute:
`goldsky pipeline apply `
## Conclusion
In this guide we have explored an example implementation of how we can use [Mirror Decoding Functions](/reference/mirror-functions/decoding-functions) to decode raw contract events and stream them into our PostgreSQL database.
This same methodology can be applied to any contract of interest for any chain with `raw_log` and `raw_traces` Direct Indexing datasets available ([see list](/mirror/sources/direct-indexing)).
Goldsky also provides alternative decoding methods:
* Decoded datasets: `decoded_logs` and `decoded_traces`
* [Subgraphs entity sources](/mirror/sources/subgraphs) to your pipelines.
Decoding contracts on the flight is a very powerful way of understanding onchain data and making it usable for your users.
# Decode traces
Source: https://docs.goldsky.com/mirror/guides/decoding-traces
Sync traces to a database with the contract ABI using Mirror.
This guide explains how to decode traces on-the-fly using [Mirror Decoding Functions](/reference/mirror-functions/decoding-functions) within transforms in Mirror pipelines.
## What youβll need
1. A Goldky account and the CLI installed
1. A basic understanding of the [Mirror product](/mirror)
2. A destination sink to write your data to. In this example, we will use [PostgresSQL Sink](/mirror/sinks/postgres)
## Preface
In this guide we are going to show how to decode traces of a contract with Goldsky Mirror. We will use as example the [Friendtech contract](https://basescan.org/address/0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4) deployed in Base but the same logic applies to any other contract and chain for which there's an availabe Raw Traces Direct Indexing dataset as per [this list](/mirror/sources/direct-indexing).
## Pipeline definition
```yaml traces-decoding-pipeline.yaml
name: decoding-traces
apiVersion: 3
sources:
my_base_raw_traces:
type: dataset
dataset_name: base.raw_traces
version: 1.0.0
transforms:
friendtech_decoded:
primary_key: id
# Fetch the ABI from basescan, then use it to decode from the friendtech address.
sql: >
SELECT
`id`,
_gs_tx_decode(
_gs_fetch_abi('https://gist.githubusercontent.com/jeffling/0320808b7f3cc0e8d9cc6c3b113e8156/raw/99bde70acecd4dc339b5a81aae39954973f5d178/gistfile1.txt', 'raw'),
`input`,
`output`
) AS `decoded`,
block_number,
transaction_hash
FROM my_base_raw_traces
WHERE address='0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4'
friendtech_clean:
primary_key: id
# Clean up the previous transform, unnest the values from the `decoded` object.
sql: >
SELECT
`id`,
decoded.`function` AS `function_name`,
decoded.decoded_inputs AS `decoded_inputs`,
decoded.decoded_outputs AS `decoded_outputs`,
block_number,
transaction_hash
FROM friendtech_decoded
WHERE decoded IS NOT NULL
sinks:
friendtech_logs:
secret_name: EXAMPLE_SECRET
type: postgres
from: friendtech_clean
schema: decoded_logs
table: friendtech
```
There are two important transforms in this pipeline definition which are responsible for decoding the contract; we'll explain how they work in detail. If you copy and use this configuration file, make sure to update:
1. Your `secret_name`. If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `decoded_logs.friendtech`.
### Decoding transforms
Let's start analyzing the first transform:
```sql Transform: friendtech_decoded
SELECT
`id`,
_gs_tx_decode(
_gs_fetch_abi('https://api.basescan.org/api?module=contract&action=getabi&address=0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4', 'etherscan'),
`input`,
`output`
) AS `decoded`,
block_number,
transaction_hash
FROM base.raw_traces
WHERE address='0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4'
```
Looking at the [Raw Traces schema](/reference/schema/EVM-schemas#raw-traces) we see there are standard traces columns such as `id`, `block_number` and `transaction_hash`. Since the columns
`input` and `output` are encoded we need to make use of the [\_gs\_tx\_decode](/reference/mirror-functions/decoding-functions#gs-tx-decode) to decode the data. This function takes the following parameters:
1. The contract ABI: rather than specifying ABI directly into the SQL query, which would made the code considerably less legible, we have decided to make use of the [\_gs\_fetch\_abi](/reference/mirror-functions/decoding-functions#gs_fetch_abi) function
to fetch the ABI from the BaseScan API but you could also fetch it from an external public repository like Github Gist if you preferred.
2. `input`: as a second argument which refer to the data sent along with the message call.
3. `output`: as a third argument which refer to the data returned by the message call.
We are storing the decoding result in a new column called `decoded` which is a [nested ROW](https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/types/#row) with the properties `event_param::TEXT[]` and `event_signature::TEXT`. We create a second transform that reads from the resulting dataset of this first SELECT query to access the decoded data:
```sql Transform: friendtech_clean
SELECT
`id`,
decoded.`function` AS `function_name`,
decoded.decoded_inputs AS `decoded_inputs`,
decoded.decoded_outputs AS `decoded_outputs`,
block_number,
transaction_hash
FROM friendtech_decoded
WHERE decoded IS NOT NULL
```
Notice how we add a filter for `decoded IS NOT NULL` as a safety measure to discard processing potential issues in the decoding phase.
## Deploying the pipeline
As a last step, to deploy this pipeline and start sinking decoded data into your database simply execute:
`goldsky pipeline apply `
## Conclusion
In this guide we have explored an example implementation of how we can use [Mirror Decoding Functions](/reference/mirror-functions/decoding-functions) to decode raw logs and stream them into our PostgreSQL database.
This same methodology can be applied to any contract of interest for any chain with `raw_log` and `raw_traces` Direct Indexing datasets available ([see list](/mirror/sources/direct-indexing)).
Goldsky also provides alternative decoding methods:
* Decoded datasets: `decoded_logs` and `decoded_traces`
* [Subgraphs entity sources](/mirror/sources/subgraphs) to your pipelines.
Decoding contracts on the flight is a very powerful way of understanding onchain data and making it usable for your users.
# Export contract events to Postgres
Source: https://docs.goldsky.com/mirror/guides/export-events-to-database
The `goldsky` CLI provides wizard to create pipelines. Based on the input you provide, the CLI generates a pipeline definition for you behind the scenes.
To create a new pipeline with the wizard, use the following command:
```shell
goldsky pipeline create
```
## What you'll need
1. An idea of the data you're interested in indexing (eg. contract address)
2. A destination sink to write your data to
## Walkthrough
In this example, we will create a pipeline that indexes Bored Ape Yacht Club contract events to a NeonDB (Postgres) database. This will include all transfers and other auxillary events associated to that address, with our ethereum decoded dataset as the source.
Initiate the wizard:
```shell
goldsky pipeline create bored-ape-transfers
```
1. **Select a Data Source**: Choose *Direct Indexing*.
2. **Choose Data Type**: Opt for *Ethereum - Decoded Logs*.
3. **Data Processing Time**: Pick *Process data from the time this pipeline is created*.
4. **Additional Sources**: Select *No* when asked to add more sources.
5. **Data Filtering**: Choose *Yes, filter the indexed on-chain data*.
6. **Contract Address**: Enter the [Bored Ape Yacht Club](https://boredapeyachtclub.com) contract address, `0xbc4ca0eda7647a8ab7c2061c2e118a18a936f13d`, when prompted.
7. **Transformation**: Choose *No* when asked to add another transform.
8. **Set Up Sink**: Choose *Postgres*. Remember to have a
Postgres [Neon DB](https://neon.tech)instance to connect to.
9. **Set Up Secret**: Connect your sink by following the prompts or selecting an existing one. This information is stored in your Goldsky account.
10. **Choose Schema**: Choose *Yes* to select default 'public' schema or choose your preferred alternative schema.
11. **Map Data to Sink Tables**: Select *Yes* when asked to automatically map data to sink tables. Choose *No* if you wish to customize the table name.
12. **Additional Sinks**: Select *No* when asked to add another sink.
Upon successful completion of these steps, an active pipeline is created. Data should start appearing in your database shortly. Monitor the table that is displayed. "RUNNING" status should appear after a minute or two. To monitor progress at any time, use:
```shell
goldsky pipeline monitor bored-ape-transfers
```
You can get the generated pipeline definition using:
```shell
goldsky pipeline get-definition bored-ape-transfers
```
For a full list of all available commands, use:
```shell
goldsky pipeline --help
```
# Merging cross chain subgraphs
Source: https://docs.goldsky.com/mirror/guides/merging-crosschain-subgraphs
This pipeline is named `poap-extended-1`. It pulls data from two `subgraph_entity` sources, does not perform any transformations, and stores the result into two separate PostgreSQL sinks.
```yaml cross-chain-pipeline.yaml
name: poap-extended-1
apiVersion: 3
sources:
hashflow_cross_chain.pool_created:
type: subgraph_entity
name: pool_created
subgraphs:
- name: polymarket
version: 1.0.0
- name: hashflow
version: 1.0.0
hashflow_cross_chain.update_router_permissions:
type: subgraph_entity
name: update_router_permissions
subgraphs:
- name: polymarket
version: 1.0.0
- name: hashflow
version: 1.0.0
transforms: {}
sinks:
pool_created_sink:
type: postgres
from: hashflow_cross_chain.pool_created
table: test_pool_created
schema: public
secret_name: API_POSTGRES_CREDENTIALS
update_router_permissions_sink:
type: postgres
from: hashflow_cross_chain.update_router_permissions
table: test_update_router_permissions
schema: public
secret_name: API_POSTGRES_CREDENTIALS
```
```yaml
sources:
- type: subgraphEntity
deployments:
- id: QmbsFSmqsWFFcbxnGedXifyeTbKBSypczRcwPrBxdQdyXE
- id: QmNSwC6QjZSFcSm2Tmoy6Van7g6zSEqD3yz4tDWRFdZiKh
- id: QmZUh5Rp3edMhYj3wCH58zSNvZvrPSQyeM6AN5HTmyw2Ch
referenceName: hashflow_cross_chain.pool_created
entity:
name: pool_created
- type: subgraphEntity
deployments:
- id: QmbsFSmqsWFFcbxnGedXifyeTbKBSypczRcwPrBxdQdyXE
- id: QmNSwC6QjZSFcSm2Tmoy6Van7g6zSEqD3yz4tDWRFdZiKh
- id: QmZUh5Rp3edMhYj3wCH58zSNvZvrPSQyeM6AN5HTmyw2Ch
referenceName: hashflow_cross_chain.update_router_permissions
entity:
name: update_router_permissions
transforms: []
sinks:
- type: postgres
sourceStreamName: hashflow_cross_chain.pool_created
table: test_pool_created
schema: public
secretName: API_POSTGRES_CREDENTIALS
referenceName: pool_created_sink
- type: postgres
sourceStreamName: hashflow_cross_chain.update_router_permissions
table: test_update_router_permissions
schema: public
secretName: API_POSTGRES_CREDENTIALS
referenceName: update_router_permissions_sink
```
You can run the above example by copying the file into a local yaml file and running the following Goldsky CLI command:
```bash
goldsky pipeline apply --status ACTIVE
```
# Operating pipelines
Source: https://docs.goldsky.com/mirror/guides/operating-pipelines
Guide to common pipeline operations
### Deploying a pipeline
There are two main ways by which you can deploy a pipeline: in the web app or by using the CLI.
If you prefer to deploy pipelines using a web interface instead check the [Pipeline Builder](/mirror/create-a-pipeline#creating-mirror-pipelines-with-the-pipeline-builder)
#### `apply` command + pipeline configuration
The [goldsky pipeline apply](/reference/cli#pipeline-apply) command expects a pipeline configuration file. For example:
```yaml base-logs.yaml
name: base-logs-pipeline
resource_size: s
apiVersion: 3
sources:
base.logs:
dataset_name: base.logs
version: 1.0.0
type: dataset
description: Enriched logs for events emitted from contracts. Contains the
contract address, data, topics, decoded event and metadata for blocks and
transactions.
display_name: Logs
transforms: {}
sinks:
postgres_base_logs:
type: postgres
table: base_logs
schema: public
secret_name: GOLDSKY_SECRET
description: "Postgres sink for: base.logs"
from: base.logs
```
```yaml base-logs.yaml
name: base-logs-pipeline
definition:
sources:
- referenceName: base.logs
type: dataset
version: 1.0.0
transforms: []
sinks:
- type: postgres
table: base_logs
schema: public
secretName: GOLDSKY_SECRET
description: 'Postgres sink for: base.logs'
sourceStreamName: base.logs
referenceName: postgres_base_logs
```
Please save the configuration in a file and run `goldsky pipeline apply --status ACTIVE` to deploy the pipeline.
### Pausing a pipeline
There are several ways by which you can pause a pipeline:
#### 1. `pause` command
`goldsky pipeline pause ` will attempt to take a snapshot before pausing the pipeline. The snapshot is successfully taken only if the
pipeline is in a healthy state. After snapshot completes, the pipeline desired status to `PAUSED` runtime status to `TERMINATED`.
Example:
```
> goldsky pipeline pause base-logs-pipeline
β Successfully paused pipeline: base-logs-pipeline
Pipeline paused and progress saved. You can restart it with "goldsky pipeline start base-logs-pipeline".
```
#### 2. `stop` command
You can stop a pipeline using the command `goldsky pipeline stop `. Unlike the `pause` command, stopping a pipeline doesn't try to take a snapshot. Mirror will directly set pipeline to `INACTIVE` desired status and `TERMINATED` runtime status.
Example:
```
> goldsky pipeline stop base-logs-pipeline
β
β Pipeline stopped. You can restart it with "goldsky pipeline start base-logs-pipeline".
```
#### 3. `apply` command + `INACTIVE` or `PAUSED` status
We can replicate the behaviour of the `pause` and `stop` commands using `pipeline apply` and setting the `--status` flag to `INACTIVE` or `PAUSED`.
Following up with our previous example, we could stop our deployed pipeline with `goldsky pipeline apply --status INACTIVE`
```
goldsky pipeline apply base-logs.yaml --status INACTIVE
β
β Successfully validated config file
β
β Successfully applied config to pipeline: base-logs-pipeline
```
### Restarting a pipeline
There are two ways to restart an already deployed pipeline:
#### 1. `restart` command
As in: `goldsky pipeline restart --from-snapshot last|none`
Example:
```
goldsky pipeline restart base-logs-pipeline --from-snapshot last
β
β Successfully restarted pipeline: base-logs-pipeline
Pipeline restarted. It's safe to exit now (press Ctrl-C). Or you can keep this terminal open to monitor the pipeline progress, it'll take a moment.
β Validating request
β Fetching pipeline
β Validating pipeline status
β Fetching runtime details
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Timestamp β Status β Total records received β Total records written β Errors β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 02:54:44 PM β STARTING β 0 β 0 β [] β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
This command will open up a monitor for your pipeline after deploying.
#### 2. `apply` command + `ACTIVE` status
Just as you can stop a pipeline changing its status to `INACTIVE` you can also restart it by setting it to `ACTIVE`
Following up with our previous example, we could restart our stopped pipeline with `goldsky pipeline apply base-logs-pipeline --status ACTIVE`
```
goldsky pipeline apply base-logs.yaml --status ACTIVE
β
β Successfully validated config file
β
β Successfully applied config to pipeline: base-logs-pipeline
To monitor the status of your pipeline:
Using the CLI: `goldsky pipeline monitor base-logs`
Using the dashboard: https://app.goldsky.com/dashboard/pipelines/stream/base-logs-pipeline/9
```
Unlike the `start` command, this method won't open up the monitor automatically.
### Applying updates to pipeline configuration
For example:
```yaml base-logs.yaml
name: base-logs-pipeline
description: a new description for my pipeline
resource_size: xxl
```
```
goldsky pipeline apply base-logs.yaml --from-snapshot last
β
β Successfully validated config file
β
β Successfully applied config to pipeline: base-logs-pipeline
```
```
goldsky pipeline apply base-logs.yaml --use-latest-snapshot
β
β Successfully validated config file
β
β Successfully applied config to pipeline: base-logs-pipeline
```
In this example we are changing the pipeline `description` and `resource_size` of the pipeline using its latest succesful snapshot available and informing Mirror
to not take a snapshot before applying the update. This is a common configuration to apply in a situation where you found issues with your pipeline and would like to restart from the last
healthy checkpoint.
For a more complete reference on the configuration attributes you can apply check [this reference](/reference/config-file/pipeline).
### Deleting a pipeline
Although pipelines with desired status `INACTIVE` don't consume any resources (and thus, do not imply a billing cost on your side) it's always nice to keep your project
clean and remove pipelines which you aren't going to use any longer.
You can delete pipelines with the command `goldsky pipeline delete`:
```
> goldsky pipeline delete base-logs-pipeline
β Deleted pipeline with name: base-logs-pipeline
```
### In-flight requests
Sometimes you might experience that you are not able to perform a specific action on your pipeline because an in-flight request is currently being processed.
What this means is that there was a previous operation performed in your pipeline which hasn't finished yet and needs to be either processed or discarded before you can apply
your specific operation. A common scenario for this is your pipeline is busy taking a snapshot.
Consider the following example where we recently paused a pipeline (thus triggering a snapshot) and we immediately try to delete it:
```
> goldsky pipeline delete base-logs-pipeline
β Cannot process request, found existing request in-flight.
* To monitor run 'goldsky pipeline monitor base-logs-pipeline --update-request'
* To cancel run 'goldsky pipeline cancel-update base-logs-pipeline'
```
Let's look at what process is still to be processed:
```
> goldsky pipeline monitor base-logs-pipeline --update-request
β
β Monitoring update progress
β
β You may cancel the update request by running goldsky pipeline cancel-update base-logs-pipeline
Snapshot creation in progress: β β β β β β β β β β β β β 33%
```
We can see that the snapshot is still taking place. Since we want to delete the pipeline we can go ahead and stop this snapshot creation:
```
> goldsky pipeline cancel-update base-logs-pipeline
β
β Successfully cancelled the in-flight update request for pipeline base-logs-pipeline
```
We can now succesfully remove the pipeline:
```
> goldsky pipeline delete base-log-pipeline
β Deleted pipeline with name: base-logs-pipeline
```
As you saw in this example, Mirror provides you with commands to see the current in-flight requests in your pipeline and decide whether you want to discard them or wait for them to be processed.
# Stream DEX trades
Source: https://docs.goldsky.com/mirror/guides/stream-DEX-trades
Stream Decentralized Exchange trade events to your database
Welcome to our guide on using Mirror to create pipelines for decoding and streaming data from decentralized exchanges (DEX) into your datawarehouse.
Mirror is a powerful tool designed to facilitate the seamless integration of blockchain data into different [sinks](/mirror/sinks/supported-sinks), enabling you to leverage the vast amounts of information generated by DEXs for analytics, reporting, and more.
## What you'll need
1. A Goldky account and the CLI installed
1. A basic understanding of the [Mirror product](/mirror)
2. A destination sink to write your data to. In this example, we will use [the PostgreSQL Sink](/mirror/sinks/postgres)
## Introduction
Most of the Decentralized Exchanges these days are based entirely on the Uniswap protocol or have strong similarities with it.
If you need a high level overview of how Uniswap works you can check out [this reference page](https://docs.uniswap.org/contracts/v2/concepts/protocol-overview/how-uniswap-works)
Having that in mind, we can narrow our focus on identifying events emitted by Uniswap contracts and use them to identify similar events emitted by all DEXes in the blockchain.
There are a number of different events we could track. In this guide we will track the `Swap` and `PoolCreated` events as they are arguably two of the most important events to track when wanting to make sense of trading activity in a DEX.
For this example implementation, we will choose the [Raw Log Direct Indexing](https://docs.goldsky.com/mirror/sources/direct-indexing) for the Base chain as the source of our pipeline but you could choose any other chain for which the Raw Log dataset if that is preferred.
Raw logs need to be decoded for us to be able to identify the events we want to track. For that purpose, we will use [Decoding Transform Functions](/reference/mirror-functions/decoding-functions) to dinamically fetch the ABIs of both [UniswapV3Factory](https://basescan.org/address/0x33128a8fc17869897dce68ed026d694621f6fdfd) and [UniswapV3Pool](https://basescan.org/address/0xcccc03b23cd798c06828c377466f267e59bb9739) contracts from the Basescan API since they contain
the actual definition for the PoolCreated and Swap events.
It's worth mentioning that Uniswap has different versions and it's possible that some event definitions might differ. In this example we'll focus on UniswapV3. Depending on the events you are interested in tracking you might want to refine this example accordingly but the principles explained will stay the same.
Let's now see all these concepts applied in an example pipeline definition:
## Pipeline Definition
```yaml base-dex-trades.yaml
name: base-dex-trades
apiVersion: 3
sources:
base_logs:
type: dataset
dataset_name: base.logs
version: 1.0.0
filter: topics like '0x783cca1c0412dd0d695e784568c96da2e9c22ff989357a2e8b1d9b2b4e6b7118%' OR topics like '0xc42079f94a6350d7e6235f29174924f928cc2ac818eb64fed8004e115fbcca67%'
transforms:
factory_decoded:
primary_key: id
# Fetch the ABI of UniswapV3Factory in Base and use it to decode PoolCreated events
sql: >
SELECT
`id`,
_gs_log_decode(
_gs_fetch_abi('https://gist.githubusercontent.com/JavierTrujilloG/7df78272e689bf102cbe97ae86607d94/raw/9733aaa132a2c3e82cccbe5b0681d3270d696c83/UniswapV3Factory-ABI.json', 'raw'),
`topics`,
`data`
) AS `decoded`,
block_number,
transaction_hash
FROM base_logs
pool_decoded:
primary_key: id
# Fetch the ABI of a UniswapV3Pool in Base and use it to decode Swap events
sql: >
SELECT
`id`,
_gs_log_decode(
_gs_fetch_abi('https://gist.githubusercontent.com/JavierTrujilloG/d3d2d80fbfd3415dd8e11aa498bd0909/raw/b8df8303e51ac7ad9ac921f25bfa84936bb4bc63/UniswapV3Pool-ABI.json', 'raw'),
`topics`,
`data`
) AS `decoded`,
block_number,
transaction_hash
FROM base_logs
factory_clean:
primary_key: id
# Clean up the previous transform, unnest the values from the `decoded` object to get PoolCreated event data
sql: >
SELECT
`id`,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_signature`,
block_number,
transaction_hash
FROM factory_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'PoolCreated'
pool_clean:
primary_key: id
# Clean up the previous transform, unnest the values from the `decoded` object to get Swap event data
sql: >
SELECT
`id`,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_signature`,
block_number,
transaction_hash
FROM pool_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'Swap'
sinks:
poolcreated_events_sink:
secret_name:
type: postgres
schema: decoded_events
table: poolcreated
from: factory_clean
swaps_event_sink:
secret_name:
type: postgres
schema: decoded_events
table: swaps
from: pool_clean
```
```yaml base-dex-trades.yaml
sources:
- type: dataset
referenceName: base.logs
version: 1.0.0
filter: topics like '0x783cca1c0412dd0d695e784568c96da2e9c22ff989357a2e8b1d9b2b4e6b7118%' OR topics like '0xc42079f94a6350d7e6235f29174924f928cc2ac818eb64fed8004e115fbcca67%'
transforms:
- referenceName: factory_decoded
type: sql
primaryKey: id
# Fetch the ABI of UniswapV3Factory in Base and use it to decode PoolCreated events
sql: >
SELECT
`id`,
_gs_log_decode(
_gs_fetch_abi('https://gist.githubusercontent.com/JavierTrujilloG/7df78272e689bf102cbe97ae86607d94/raw/9733aaa132a2c3e82cccbe5b0681d3270d696c83/UniswapV3Factory-ABI.json', 'raw'),
`topics`,
`data`
) AS `decoded`,
block_number,
transaction_hash
FROM base.logs
- referenceName: pool_decoded
type: sql
primaryKey: id
# Fetch the ABI of a UniswapV3Pool in Base and use it to decode Swap events
sql: >
SELECT
`id`,
_gs_log_decode(
_gs_fetch_abi('https://gist.githubusercontent.com/JavierTrujilloG/d3d2d80fbfd3415dd8e11aa498bd0909/raw/b8df8303e51ac7ad9ac921f25bfa84936bb4bc63/UniswapV3Pool-ABI.json', 'raw'),
`topics`,
`data`
) AS `decoded`,
block_number,
transaction_hash
FROM base.logs
- referenceName: factory_clean
primaryKey: id
type: sql
# Clean up the previous transform, unnest the values from the `decoded` object to get PoolCreated event data
sql: >
SELECT
`id`,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_signature`,
block_number,
transaction_hash
FROM factory_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'PoolCreated'
- referenceName: pool_clean
primaryKey: id
type: sql
# Clean up the previous transform, unnest the values from the `decoded` object to get Swap event data
sql: >
SELECT
`id`,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_signature`,
block_number,
transaction_hash
FROM pool_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'Swap'
sinks:
- referenceName: poolcreated_events_sink
secretName:
type: postgres
sourceStreamName: factory_clean
schema: decoded_events
table: poolcreated
- referenceName: swaps_event_sink
secretName:
type: postgres
sourceStreamName: pool_clean
schema: decoded_events
table: swaps
```
If you copy and use this configuration file, make sure to update:
1. Your `secretName`. If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `decoded_events` schema.
Let's deconstruct this pipeline starting at the top:
### Fast Scan Source
```sql source
sources:
base_logs:
type: dataset
dataset_name: base.logs
version: 1.0.0
filter: topics like '0x783cca1c0412dd0d695e784568c96da2e9c22ff989357a2e8b1d9b2b4e6b7118%' OR topics like '0xc42079f94a6350d7e6235f29174924f928cc2ac818eb64fed8004e115fbcca67%'
```
We start the pipeline defining a filter on the data source. This is a new feature called Quick Ingestion by which we are able to filter on the original dataset at a much faster rate than if we were doing normal filtering with transforms. Not all datasets currently have this feature but `base_logs` does and so we want to make use of it for this example dramatically.
We add as source filters the function signatures of the PoolCreated and Swap events:
* `PoolCreated (index_topic_1 address token0, index_topic_2 address token1, index_topic_3 uint24 fee, int24 tickSpacing, address pool)` maps to `0x783cca1c0412dd0d695e784568c96da2e9c22ff989357a2e8b1d9b2b4e6b7118`
* `Swap (index_topic_1 address sender, index_topic_2 address recipient, int256 amount0, int256 amount1, uint160 sqrtPriceX96, uint128 liquidity, int24 tick)` maps to `0xc42079f94a6350d7e6235f29174924f928cc2ac818eb64fed8004e115fbcca67`
It's worth highlighting that we add a `%` to the end of each filter as `topics` contains more than just the function signature.
If the dataset we are using has the option for Quick Ingestion this filter will be pre-applied and it will speed up ingestion dramatically. Because not all datasets have this option enabled yet we'll add some redundancy on the transform by adding extra filters on these events to make sure that we are targeting these two events regardless of this feature being available or not.
Next, there are 4 transforms in this pipeline definition which we'll explain how they work. We'll start from the top:
### Decoding Transforms
```sql Transform: factory_decoded
SELECT
`id`,
_gs_log_decode(
_gs_fetch_abi('https://gist.githubusercontent.com/JavierTrujilloG/7df78272e689bf102cbe97ae86607d94/raw/9733aaa132a2c3e82cccbe5b0681d3270d696c83/UniswapV3Factory-ABI.json', 'raw'),
`topics`,
`data`
) AS `decoded`,
block_number,
transaction_hash
FROM base.logs
```
```sql Transform: pool_decoded
SELECT
`id`,
_gs_log_decode(
_gs_fetch_abi('https://gist.githubusercontent.com/JavierTrujilloG/d3d2d80fbfd3415dd8e11aa498bd0909/raw/b8df8303e51ac7ad9ac921f25bfa84936bb4bc63/UniswapV3Pool-ABI.json', 'raw'),
`topics`,
`data`
) AS `decoded`,
block_number,
transaction_hash
FROM base.logs
```
The first two transforms fetch the ABI for UniswapV3Factory and a UniswapV3Pool to allows to decode Dex events and filter by `PoolCreated` and `Swap` events in the following transforms.
As explained in the [Decoding Contract Events guide](/mirror/guides/decoding-contract-events), we first make use of the `_gs_fetch_abi` function to get the ABIs from Basescan and pass it as first argument
to the function `_gs_log_decode` to decode its topics and data. We store the result in a `decoded` [ROW](https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/types/#row) which we unnest on the next transform.
### Event Filtering Transforms
```sql Transform: factory_clean
SELECT
`id`,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_signature`,
block_number,
transaction_hash
FROM factory_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'PoolCreated'
```
```sql Transform: pool_clean
SELECT
`id`,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_name`,
block_number,
transaction_hash
FROM pool_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'Swap'
```
In the next two transforms we take the result of the previous encoding for each contract and filter by the PoolCreated and Swap events:
* `id`: This is the Goldsky provided `id`, it is a string composed of the dataset name, block hash, and log index, which is unique per event, here's an example: `log_0x60eaf5a2ab37c73cf1f3bbd32fc17f2709953192b530d75aadc521111f476d6c_18`
* `decoded.event_params AS 'event_params'`: event\_params is an array containing the params associated to each event. For instance, in the case of Swap events, `event_params[1]` is the sender. You could use this for further analysis in downstream processing.
* `decoded.event_signature AS 'event_name'`: the decoder will output the event name as event\_signature, excluding its arguments.
* `WHERE decoded IS NOT NULL`: to leave out potential null results from the decoder
* `AND decoded.event_signature = 'PoolCreated'`: we use this value to filter only for 'PoolCreated' or 'Swap' events. As explained before, this filter will be redundant for datasets with Quick Ingestion enabled but we add it here in this example in case you would like to try out with a different dataset which doesn't have that option enabled.
If you would like to filter by other events like `Mint` you could easily add them to these queries; for example: `WHERE decoded.event_signature IN ('Swap', 'Mint')`
Both resulting datasets will be used as sources to two different tables at our sink: `decoded_events.poolcreated` & `decoded_events.swaps`
## Deploying the pipeline
Assuming we are using the same filename for the pipeline configuration as in this example we can deploy this pipeline with the [CLI pipeline create command](/reference/cli#pipeline-create):
`goldsky pipeline apply base-dex-trades.yaml --status ACTIVE`
Here's an example Swap record from our sink:
| id | event\_params | event\_name | block\_number | transaction\_hash |
| -------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | ------------- | ------------------------------------------------------------------ |
| log\_0x18db9278e431b3bb65c151857448227a649d9f8fe3fd0cdf2b9835eb8c71d8ae\_4 | 0x508fdf90951c1a31faa5dcd119f3b60e0e0e87fb,0x508fdf90951c1a31faa5dcd119f3b60e0e0e87fb,-256654458505550,500000000000000000,3523108129873998835611448265535,631691941157619701,75899 | Swap | 1472162 | 0xd8a1b2c1296479f31f048aaf753e16f3d7d908fd17e6697b8850fdf209f080f6 |
We can see that it corresponds to the Swap event of this transaction:
This concludes our successful deployment of a Mirror pipeline streaming DEX Trade events from Base chain into our database using inline decoders. Congrats\\! π
## Conclusion
In this guide, we've walked through the process of using Mirror to decode and stream DEX events, specifically focusing on Swap and PoolCreated events, into a PostgreSQL database.
Along the process we have seen an example implementation of how to do inline decoding using the ABI of Factory contracts with the [Decoding Transform Functions](/reference/mirror-functions/decoding-functions)
By understanding and leveraging these events, you can harness the power of real-time blockchain data to enhance your trading strategies, optimize liquidity management, and perform detailed market analysis.
Experience the transformative power of Mirror today and redefine your approach to blockchain data integration.
# Sync dataset to Postgres
Source: https://docs.goldsky.com/mirror/guides/sync-dataset-to-postgres
This pipeline is named `decoded-logs-pipeline`. It pulls data from a curated goldsky dataset, without performing any transformations, and stores the result into a PostgreSQL sink, in a table called `eth_logs` in the `goldsky` schema.
```yaml decoded-logs-pipeline.yaml
name: decoded-logs-pipeline
apiVersion: 3
sources:
my_ethereum_decoded_logs:
dataset_name: ethereum.decoded_logs
version: 1.0.0
type: dataset
start_at: latest
transforms:
logs:
sql: |
SELECT
id,
address,
event_signature,
event_params,
raw_log.block_number as block_number,
raw_log.block_hash as block_hash,
raw_log.transaction_hash as transaction_hash
FROM
my_ethereum_decoded_logs
primary_key: id
sinks:
logs_sink:
type: postgres
table: eth_logs
schema: goldsky
secret_name: API_POSTGRES_CREDENTIALS
from: logs
```
```yaml
sources:
- referenceName: ethereum.decoded_logs
version: 1.0.0
type: dataset
startAt: latest
transforms:
- sql: |
SELECT
id,
address,
event_signature,
event_params,
raw_log.block_number as block_number,
raw_log.block_hash as block_hash,
raw_log.transaction_hash as transaction_hash
FROM
ethereum.decoded_logs
referenceName: logs
type: sql
primaryKey: id
sinks:
- type: postgres
table: eth_logs
schema: goldsky
secretName: API_POSTGRES_CREDENTIALS
sourceStreamName: logs
referenceName: logs_sink
```
You can start above pipeline by running:
```bash
goldsky pipeline start pipeline.yaml
```
Or
```bash
goldsky pipeline apply pipeline.yaml --status ACTIVE
```
# Sync subgraph to postgres
Source: https://docs.goldsky.com/mirror/guides/sync-subgraph-to-postgres
This pipeline pulls data from a single `subgraph_entity` source, processes the data with a single SQL transformation, and stores the result into a PostgreSQL sink.
You will need to have the existing subgraph with the name/version combo of `polymarket/1.0.0` as a prerequisite to running this pipeline.
```yaml sync-subgraphs-postgres-pipeline.yaml
name: syncing-a-subgraph-into-postgres
apiVersion: 3
sources:
polygon.fixed_product_market_maker:
type: subgraph_entity
name: fixed_product_market_maker
subgraphs:
- name: polymarket
version: 1.0.0
transforms:
negative_fpmm_scaled_liquidity_parameter:
sql: SELECT id FROM polygon.fixed_product_market_maker WHERE scaled_liquidity_parameter < 0
primary_key: id
sinks:
postgres_polygon_sink:
type: postgres
from: negative_fpmm_scaled_liquidity_parameter
table: test_negative_fpmm_scaled_liquidity_parameter
schema: public
secret_name: API_POSTGRES_CREDENTIALS
```
```yaml
sources:
- type: subgraphEntity
deployments:
- id: QmVcgRByfiFSzZfi7RZ21gkJoGKG2jeRA1DrpvCQ6ficNb
entity:
name: fixed_product_market_maker
referenceName: polygon.fixed_product_market_makername
transforms:
- referenceName: negative_fpmm_scaled_liquidity_parameter
type: sql
sql: SELECT id FROM polygon.fixed_product_market_maker WHERE scaled_liquidity_parameter < 0
primaryKey: id
sinks:
- type: postgres
sourceStreamName: negative_fpmm_scaled_liquidity_parameter
table: test_negative_fpmm_scaled_liquidity_parameter
schema: public
secretName: API_POSTGRES_CREDENTIALS
referenceName: postgres_polygon_sink
```
You can start above pipeline by running:
```bash
goldsky pipeline start pipeline.yaml
```
Or
```bash
goldsky pipeline apply pipeline.yaml --status ACTIVE
```
# ERC-1155 Transfers
Source: https://docs.goldsky.com/mirror/guides/token-transfers/ERC-1155-transfers
Create a table containing ERC-1155 Transfers for several, or all token contracts.
ERC-1155 is a standard for EVM ecosystems that allows for the creation of both fungible and non-fungible assets within a single contract. The process of transferring ERC-1155 tokens into a database is fundamental, unlocking opportunities for data analysis, tracking, and the development of innovative solutions.
This guide is part of a series of tutorials on how you can stream transfer data into your datawarehouse using Mirror pipelines. Here we will be focusing on ERC-1155 Transfers, visit the following two other guides for other types of Transfers:
* [Native Transfers](/mirror/guides/token-transfers/Native-transfers)
* [ERC-20 Transfers](/mirror/guides/token-transfers/ERC-20-transfers)
* [ERC-721 Transfers](/mirror/guides/token-transfers/ERC-721-transfers)
## What you'll need
1. A Goldky account and the CLI installed
2. A basic understanding of the [Mirror product](/mirror)
3. A destination sink to write your data to. In this example, we will use [the PostgreSQL Sink](/mirror/sinks/postgres)
## Introduction
1. Use the readily available ERC-1155 dataset for the chain you are interested in: this is the easiest and quickest method to get you streaming token transfers into your sink of choice with minimum code.
2. Build the ERC-1155 Transfers pipeline from scratch using raw or decoded logs: this method takes more code and time to implement but it's a great way to learn about how you can use decoding functions in case you
want to build more customized pipelines.
Let's explore both method below with more detail:
## Using the ERC-1155 Transfers Source Dataset
Every EVM chain has its own ERC-1155 dataset available for you to use as source in your pipelines. You can check this by running the `goldsky dataset list` command and finding the EVM of your choice.
For this example, let's use `apex` chain and create a simple pipeline definition using its ERC-20 dataset that writes the data into a PostgreSQL instance:
```yaml apex-erc1155-transfers.yaml
name: apex-erc1155-pipeline
resource_size: s
apiVersion: 3
sources:
apex.erc1155_transfers:
dataset_name: apex.erc1155_transfers
version: 1.3.0
type: dataset
start_at: earliest
transforms: {}
sinks:
postgres_apex.erc1155_transfers_public_apex_erc1155_transfers:
type: postgres
table: apex_erc1155_transfers
schema: public
secret_name:
description: "Postgres sink for Dataset: apex.erc1155_transfers"
from: apex.erc1155_transfers
```
If you copy and use this configuration file, make sure to update:
1. Your `secretName`. If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `public.apex_erc1155_transfers`.
If you are use ClickHouse as a sink for this dataset you'll need to add the following schema\_override to avoid potential data precision errors for big numbers, see example:
```
postgres_apex.erc1155_transfers_public_apex_erc1155_transfers:
type: clickhouse
table: apex_erc1155_transfers
secret_name:
description: "ClickHouse sink for Dataset: apex.erc1155_transfers"
schema_override:
amount: UInt256
token_id: UInt256
from: apex.erc1155_transfers
```
You can start the pipeline by running:
```bash
goldsky pipeline start apex-erc1155-pipeline.yaml
```
Or
```bash
goldsky pipeline apply apex-erc1155-pipeline.yaml --status ACTIVE
```
That's it! You should soon start seeing ERC-20 token transfers in your database.
## Building ERC-1155 Transfers from scratch using logs
In the previous method we just explored, the ERC-20 datasets that we used as source to the pipeline encapsulates all the decoding logic that's explained in this section.
Read on if you are interested in learning how it's implemented in case you want to consider extending or modifying this logic yourself.
There are two ways that we can go about building these token transfers pipeline from scratch:
1. Use the `raw_logs` Direct Indexing dataset for that chain in combination with [Decoding Transform Functions](/reference/mirror-functions/decoding-functions) using the ABI of a specific ERC-1155 Contract.
2. Use the `decoded_logs` Direct Indexing dataset for that chain in which the decoding process has already been done by Goldsky. This is only available for certain chains as you can check in [this list](/mirror/sources/direct-indexing).
We'll primarily focus on the first decoding method using `raw_logs` and decoding functions as it's the default and most used way of decoding; we'll also present an example using `decoded_logs` and highlight the differences between the two.
### Building ERC-1155 Tranfers using Decoding Transform Functions
In this example, we will stream all the `Transfer` events of all the ERC-1155 tokens for the [Scroll chain](https://scroll.io/). To that end, we will dinamically fetch the ABI of the [Rubyscore\_Scroll](https://scrollscan.com/token/0xdc3d8318fbaec2de49281843f5bba22e78338146) token from the Scrollscan API (available [here](https://api.scrollscan.com/api?module=contract\&action=getabi\&address=0xdc3d8318fbaec2de49281843f5bba22e78338146))
and use it to identify all the same events for the tokens in the chain. We have decided to use the ABI of this token for this example but any other ERC-1155 compliant token would also work.
ERC-1155 combines the features of ERC-20 and ERC-721 contracts and adds a few features.
Each transfer has both a token ID and a value representing the quantity being transfered for funglible tokens, the number `1` for tokens intended to represent NFTs, but how these work depends on how the contract is implemented.
ERC-1155 also introduces new event signatures for transfers: `TransferSingle(address,address,address,uint256,uint256)` and `TransferBatch(address,address,address,uint256[],uint256[])` which lets the contract transfer multiple tokens at once to a single recipient.
This causes us some trouble since we want one row per transfer in our database, so we'll need some extra SQL logic in our pipeline to deal with this. To mitigate this complexity we have created two different transforms, each dealing with Single and Batch transfers separately.
We then aggregate both tables into a single view using a third transform.
Let's now see all these concepts applied in an example pipeline definition:
#### Pipeline Definition
```yaml scroll-erc1155-transfers.yaml
name: scroll-erc1155-transfers
apiVersion: 3
sources:
my_scroll_mainnet_raw_logs:
type: dataset
dataset_name: scroll_mainnet.raw_logs
version: 1.0.0
transforms:
scroll_decoded:
primary_key: id
# Fetch the ABI from scrollscan for Rubyscore_Scroll token
sql: >
SELECT
*,
_gs_log_decode(
_gs_fetch_abi('https://api.scrollscan.com/api?module=contract&action=getabi&address=0xdc3d8318fbaec2de49281843f5bba22e78338146', 'etherscan'),
`topics`,
`data`
) AS `decoded`
FROM my_scroll_mainnet_raw_logs
scroll_clean:
primary_key: id
# Clean up the previous transform, unnest the values from the `decoded` object
sql: >
SELECT
*,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_name`
FROM scroll_decoded
WHERE decoded IS NOT NULL
erc1155_transfer_single:
primary_key: id
sql: >
SELECT
id,
address AS contract_address,
lower(event_params[2]) AS sender,
lower(event_params[3]) AS recipient,
COALESCE(TRY_CAST(event_params[4] AS DECIMAL(78)), 0) AS token_id,
COALESCE(TRY_CAST(event_params[5] AS DECIMAL(78)), 0) AS amount,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
FROM scroll_clean WHERE topics LIKE '0xc3d58168c5ae7397731d063d5bbf3d657854427343f4c083240f7aacaa2d0f62%'
erc1155_transfer_batch:
primary_key: id
sql: >
WITH transfer_batch_logs AS (
SELECT
*,
_gs_split_string_by(
REPLACE(TRIM(LEADING '[' FROM TRIM(TRAILING ']' FROM event_params[4])), ',', ' ')
) AS token_ids,
_gs_split_string_by(
REPLACE(TRIM(LEADING '[' FROM TRIM(TRAILING ']' FROM event_params[5])), ',', ' ')
) AS amounts
FROM
scroll_clean
WHERE topics LIKE '0x4a39dc06d4c0dbc64b70af90fd698a233a518aa5d07e595d983b8c0526c8f7fb%'
)
SELECT
id || '_' || CAST(t.idx AS STRING) AS `id`,
address AS contract_address,
lower(event_params[2]) AS sender,
lower(event_params[3]) AS recipient,
COALESCE(TRY_CAST(token_ids[t.idx] AS DECIMAL(78)),0) AS token_id,
COALESCE(TRY_CAST(amounts[t.idx] AS DECIMAL(78)),0) AS amount,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
FROM transfer_batch_logs
CROSS JOIN UNNEST(
CAST(
_gs_generate_series(
CAST(1 AS BIGINT),
CAST(COALESCE(CARDINALITY(token_ids), 0) AS BIGINT)
) AS ARRAY
)
) AS t (idx)
scroll_1155_transfers:
primary_key: id
sql: >
SELECT * FROM erc1155_transfer_single
UNION ALL
SELECT * FROM erc1155_transfer_batch WHERE amount > 0
sinks:
scroll_1155_sink:
type: postgres
table: erc1155_transfers
schema: mirror
secret_name:
description: Postgres sink for ERC1155 transfers
from: scroll_1155_transfers
```
```yaml scroll-erc1155-transfers.yaml
sources:
- type: dataset
referenceName: scroll_mainnet.raw_logs
version: 1.0.0
transforms:
- referenceName: scroll_decoded
type: sql
primaryKey: id
# Fetch the ABI from scrollscan for Rubyscore_Scroll token
sql: >
SELECT
*,
_gs_log_decode(
_gs_fetch_abi('https://api.scrollscan.com/api?module=contract&action=getabi&address=0xdc3d8318fbaec2de49281843f5bba22e78338146', 'etherscan'),
`topics`,
`data`
) AS `decoded`
FROM scroll_mainnet.raw_logs
- referenceName: scroll_clean
primaryKey: id
type: sql
# Clean up the previous transform, unnest the values from the `decoded` object
sql: >
SELECT
*,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_name`
FROM scroll_decoded
WHERE decoded IS NOT NULL
- referenceName: erc1155_transfer_single
primaryKey: id
type: sql
sql: >
SELECT
id,
address AS contract_address,
lower(event_params[2]) AS sender,
lower(event_params[3]) AS recipient,
COALESCE(TRY_CAST(event_params[4] AS NUMERIC), -999) AS token_id,
COALESCE(TRY_CAST(event_params[5] AS NUMERIC), -999) AS amount,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
FROM scroll_clean WHERE topics LIKE '0xc3d58168c5ae7397731d063d5bbf3d657854427343f4c083240f7aacaa2d0f62%'
- referenceName: erc1155_transfer_batch
primaryKey: id
type: sql
sql: >
WITH transfer_batch_logs AS (
SELECT
*,
_gs_split_string_by(
REPLACE(TRIM(LEADING '[' FROM TRIM(TRAILING ']' FROM event_params[4])), ',', ' ')
) AS token_ids,
_gs_split_string_by(
REPLACE(TRIM(LEADING '[' FROM TRIM(TRAILING ']' FROM event_params[5])), ',', ' ')
) AS amounts
FROM
scroll_clean
WHERE topics LIKE '0x4a39dc06d4c0dbc64b70af90fd698a233a518aa5d07e595d983b8c0526c8f7fb%'
)
SELECT
id || '_' || CAST(t.idx AS STRING) AS `id`,
address AS contract_address,
lower(event_params[2]) AS sender,
lower(event_params[3]) AS recipient,
CAST(token_ids[t.idx] AS NUMERIC(78)) as token_id,
CAST(amounts[t.idx] AS NUMERIC(78)) as amount,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
FROM transfer_batch_logs
CROSS JOIN UNNEST(
CAST(
_gs_generate_series(
CAST(1 AS BIGINT),
CAST(COALESCE(CARDINALITY(token_ids), 0) AS BIGINT)
) AS ARRAY
)
) AS t (idx)
- type: sql
referenceName: scroll_1155_transfers
primaryKey: id
sql: >
SELECT * FROM erc1155_transfer_single
UNION ALL
SELECT * FROM erc1155_transfer_batch WHERE amount > 0
sinks:
- type: postgres
table: erc1155_transfers
schema: mirror
secretName:
description: Postgres sink for ERC1155 transfers
referenceName: scroll_1155_sink
sourceStreamName: scroll_1155_transfers
```
If you copy and use this configuration file, make sure to update:
1. Your `secretName`. If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `mirror.erc1155_transfers`.
There are 5 transforms in this pipeline definition which we'll explain how they work. We'll start from the top:
##### Decoding Transforms
```sql Transform: scroll_decoded
SELECT
*,
_gs_log_decode(
_gs_fetch_abi('https://api.scrollscan.com/api?module=contract&action=getabi&address=0xc7d86908ccf644db7c69437d5852cedbc1ad3f69', 'etherscan'),
`topics`,
`data`
) AS `decoded`
FROM scroll_mainnet.raw_logs
```
As explained in the [Decoding Contract Events guide](/mirror/guides/decoding-contract-events) we first make use of the `_gs_fetch_abi` function to get the ABI from Scrollscan and pass it as first argument
to the function `_gs_log_decode` to decode its topics and data. We store the result in a `decoded` [ROW](https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/types/#row) which we unnest on the next transform.
```sql Transform: scroll_clean
SELECT
*,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_name`
FROM scroll_decoded
WHERE decoded IS NOT NULL
```
In this second transform, we take the `event_params` and `event_signature` from the result of the decoding. We then filter the query on `decoded IS NOT NULL` to leave out potential null results from the decoder.
#### SingleTransfer Transform
```sql Transform: erc1155_transfer_single
SELECT
id,
address AS contract_address,
lower(event_params[2]) AS sender,
lower(event_params[3]) AS recipient,
COALESCE(TRY_CAST(event_params[4] AS DECIMAL(78)), 0) AS token_id,
COALESCE(TRY_CAST(event_params[5] AS DECIMAL(78)), 0) AS amount,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
FROM scroll_clean WHERE topics LIKE '0xc3d58168c5ae7397731d063d5bbf3d657854427343f4c083240f7aacaa2d0f62%'
```
In this transform we focus on SingleTransfer events.
Similar to the [ERC-721 example](/mirror/guides/token-transfers/ERC-721-transfers), we use `event_params` we pull out the sender, recipient and token ID, note the indexes we use are different since ERC-1155 tokens have a different `event_signature`.
```sql
COALESCE(TRY_CAST(event_params[4] AS DECIMAL(78)), 0) AS token_id,
COALESCE(TRY_CAST(event_params[5] AS DECIMAL(78)), 0) AS amount,
```
1. `event_params[4]` is the fourth element of the `event_params` array, and for ERC-1155 this is the token ID.
2. `TRY_CAST(event_params[4] AS NUMERIC)` is casting the string element `event_params[4]` to `NUMERIC` - token IDs can be as large as an unsigned 256 bit integer, so make sure your database can handle that, if not, you can cast it to a different data type that your sink can handle. We use `TRY_CAST` because it will prevent the pipeline from failing in case the cast fails returning a `NULL` value instead.
3. `COALESCE(TRY_CAST(event_params[4] AS DECIMAL(78)), 0)`: `COALESCE` can take an arbitrary number of arguments and returns the first non-NULL value. Since `TRY_CAST` can return a `NULL` we're returning `0` in case it does. This isn't strictly necessary but is useful to do in case you want to find offending values that were unable to be cast.
We repeat this process for `event_params[5]` which represents the amount of a token.
```sql
AND raw_log.topics LIKE '0xc3d58168c5ae7397731d063d5bbf3d657854427343f4c083240f7aacaa2d0f62%'
```
We filter for a specific topic to get ERC-1155 single transfers, the above topic is for the `event_signature` `TransferSingle(address,address,address,uint256,uint256)`. As with [ERC-721](/mirror/guides/token-transfers/ERC-721-transfers), we could use the event signature as a filter instead.
##### BatchTransfers Transform
Now, let's look at the BatchTransfer events:
```Transform: erc1155_transfer_single (subquery)
WITH transfer_batch_logs AS (
SELECT
*,
_gs_split_string_by(
REPLACE(TRIM(LEADING '[' FROM TRIM(TRAILING ']' FROM event_params[4])), ',', ' ')
) AS token_ids,
_gs_split_string_by(
REPLACE(TRIM(LEADING '[' FROM TRIM(TRAILING ']' FROM event_params[5])), ',', ' ')
) AS amounts
FROM
ethereum.decoded_logs
WHERE scroll_clean LIKE '0x4a39dc06d4c0dbc64b70af90fd698a233a518aa5d07e595d983b8c0526c8f7fb%'
)
```
The first thing we want to achieve is to decompose the string representation of tokens and their respective amounts into separate rows that we
can add as columns to each transaction. This will allow us to index on tokenId-amount pairs much more easily as a second step.
This is the trickiest part of the transformation and involves some functionality that is niche to both Goldsky and Flink v1.17.
Weβll start from the inside and work our way out again.
1. `TRIM(LEADING '[' FROM TRIM(TRAILING ']' FROM event_params[4]))`: Similar to the [ERC-721 example](/mirror/guides/token-transfers/ERC-721-transfers),
we use `event_params` to access the token\_id information. For ERC-1155, the string for batch transfers in element 4 looks like this when decoded: \[1 2 3 4 5 6]. We need to trim the leading and trailing \[ and ] characters before splitting it out into individual token IDs.
2. `_gs_split_string_by(...)`: This is a Goldsky UDF which splits strings by the space character only. If you need to split by another character, for now you can use `REGEXP_REPLACE(column, ',', ' ')` to replace commas with spaces.
3. `CROSS JOIN UNNEST ... AS token_ids (token_id)`: This works like `UNNEST` in most other SQL dialects, but is a special case in Flink. It may be confusing that we have two separate CROSS JOINs, but they don't work like CROSS JOIN in other SQL dialects, we'll get a single row with a `token_id` and `token_value` that map correctly to each other.
Lastly, we filter on topic:
```sql
raw_log.topics LIKE '0x4a39dc06d4c0dbc64b70af90fd698a233a518aa5d07e595d983b8c0526c8f7fb%'
```
This is the same as the other topic filters but it is using the topic hash of the batch transfer event signature.
Next, onto creating an index for each tokenId - amount pair:
```Transform: erc1155_transfer_single (time series)
FROM transfer_batch_logs
CROSS JOIN UNNEST(
CAST(
_gs_generate_series(
CAST(1 AS BIGINT),
CAST(COALESCE(CARDINALITY(token_ids), 0) AS BIGINT)
) AS ARRAY
)
) AS t (idx)
```
In this step we generate a series of indexes that we can use to access each individual tokenId - amount pair within a transfer.
We do this by definining a Goldsky UDF called `_gs_generate_series` which will generate an array of indexes for as many tokens there are in the batch.
We combine this indexes with our existing table and use to access each token - amount pair:
```
CAST(token_ids[t.idx] AS NUMERIC(78)) as token_id,
CAST(amounts[t.idx] AS NUMERIC(78)) as amount,
```
We also use this logic to generate the resulting ID Primary Key for batch transfers:
```sql
id || '_' || CAST(t.idx AS STRING) AS `id`
```
The `id` coming from the source represents an entire batch transfer event, which can contain multiple tokens, so we concatenate the token\_id to the `id` to make the unnested rows unique.
#### Combining Single and Batch Transfers
```Transform: scroll_1155_transfers
SELECT * FROM erc1155_transfer_single
UNION ALL
SELECT * FROM erc1155_transfer_batch
WHERE amount > 0
```
This final directive in the third transform creates a combined stream of all single transfers and batch transfers.
### Deploying the pipeline
Our last step is to deploy this pipeline and start sinking ERC-1155 transfer data into our database. Assuming we are using the same file name for the pipeline configuration as in this example,
we can use the [CLI pipeline create command](/reference/cli#pipeline-create) like this:
`goldsky pipeline create scroll-erc1155-transfers --definition-path scroll-erc1155-transfers.yaml`
After some time, you should see the pipeline start streaming Transfer data into your sink.
Remember that you can always speed up the streaming process by [updating](/reference/cli#pipeline-update) the resourceSize of the pipeline
Here's an example transfer record from our sink:
| id | contract\_address | sender | recipient | token\_id | event\_name | block\_number | block\_hash | log\_index | transaction\_hash | transaction\_index |
| -------------------------------------------------------------------------- | ------------------------------------------ | ------------------------------------------ | ------------------------------------------ | --------- | ----------- | ------------- | ------------------------------------------------------------------ | ---------- | ------------------------------------------------------------------ | ------------------ |
| log\_0x360fcd6ca8c684039c45642d748735645fac639099d8a89ec57ad2b274407c25\_7 | 0x7de37842bcf314c83afe83a8dab87f85ca3a2cee | 0x0000000000000000000000000000000000000000 | 0x16f6aff7a2d84b802b2ddf0f0aed49033b69f4f9 | 6 | 1 | 105651 | 0x360fcd6ca8c684039c45642d748735645fac639099d8a89ec57ad2b274407c25 | 7 | 0x5907ba72e32434938f45539b2792e4eacf0d141db7c4c101e207c1fb26c99274 | 5 |
We can find this [transaction in Scrollscan](https://scrollscan.com/tx/0x0a968c797271d18420261d22f1cef08b040c45b6dd219c9a53f76c1545e592ce). We see that it corresponds to a mint of 60 tokens:
This concludes our successful deployment of a Mirror pipeline streaming ERC-1155 Tokens from Scroll chain into our database using inline decoders. Congrats! π
### ERC-1155 Transfers using decoded datasets
As explained in the Introduction, Goldsky provides decoded datasets for Raw Logs and Raw Traces for a number of different chains. You can check [this list](/mirror/sources/direct-indexing) to see if the chain you are interested in has these decoded datasets.
In these cases, there is no need for us to run Decoding Transform Functions as the dataset itself will already contain the event signature and event params decoded.
Click on the button below to see an example pipeline definition for streaming ERC-1155 tokens on the Ethereum chain using the `decoded_logs` dataset.
```yaml ethereum-decoded-logs-erc1155-transfers.yaml
sources:
- referenceName: ethereum.decoded_logs
version: 1.0.0
type: dataset
startAt: earliest
description: Decoded logs for events emitted from contracts. Contains the
decoded event signature and event parameters, contract address, data,
topics, and metadata for the block and transaction.
transforms:
- type: sql
referenceName: erc1155_transfer_single
primaryKey: id
sql: >-
SELECT
address AS contract_address,
lower(event_params[2]) AS sender,
lower(event_params[3]) AS recipient,
COALESCE(TRY_CAST(event_params[4] AS DECIMAL(78)), 0) AS token_id,
COALESCE(TRY_CAST(event_params[5] AS DECIMAL(78)), 0) AS amount,
raw_log.block_number AS block_number,
raw_log.block_hash AS block_hash,
raw_log.log_index AS log_index,
raw_log.transaction_hash AS transaction_hash,
raw_log.transaction_index AS transaction_index,
id
FROM ethereum.decoded_logs WHERE raw_log.topics LIKE '0xc3d58168c5ae7397731d063d5bbf3d657854427343f4c083240f7aacaa2d0f62%'
AND address = '0xc36cf0cfcb5d905b8b513860db0cfe63f6cf9f5c'
- type: sql
referenceName: erc1155_transfer_batch
primaryKey: id
description: ERC1155 Transform
sql: >-
WITH transfer_batch_logs AS (
SELECT
*,
_gs_split_string_by(
REPLACE(TRIM(LEADING '[' FROM TRIM(TRAILING ']' FROM event_params[4])), ',', ' ')
) AS token_ids,
_gs_split_string_by(
REPLACE(TRIM(LEADING '[' FROM TRIM(TRAILING ']' FROM event_params[5])), ',', ' ')
) AS amounts
FROM
ethereum.decoded_logs
WHERE raw_log.topics LIKE '0x4a39dc06d4c0dbc64b70af90fd698a233a518aa5d07e595d983b8c0526c8f7fb%'
AND address = '0xc36cf0cfcb5d905b8b513860db0cfe63f6cf9f5c'
)
SELECT
address AS contract_address,
lower(event_params[2]) AS sender,
lower(event_params[3]) AS recipient,
CAST(token_ids[t.idx] AS NUMERIC(78)) as token_id,
CAST(amounts[t.idx] AS NUMERIC(78)) as amount,
raw_log.block_number AS block_number,
raw_log.block_hash AS block_hash,
raw_log.log_index AS log_index,
raw_log.transaction_hash AS transaction_hash,
raw_log.transaction_index AS transaction_index,
id || '_' || CAST(t.idx AS STRING) AS `id`
FROM transfer_batch_logs
CROSS JOIN UNNEST(
CAST(
_gs_generate_series(
CAST(1 AS BIGINT),
CAST(COALESCE(CARDINALITY(token_ids), 0) AS BIGINT)
) AS ARRAY
)
) AS t (idx)
- type: sql
referenceName: ethereum_1155_transfers
primaryKey: id
sql: SELECT * FROM erc1155_transfer_single UNION ALL SELECT * FROM
erc1155_transfer_batch WHERE amount > 0
sinks:
- type: postgres
table: erc1155_transfers
schema: mirror
secretName:
description: Postgres sink for 1155 transfers
referenceName: transfers
sourceStreamName: ethereum_1155_transfers
```
If you copy and use this configuration file, make sure to update:
1. Your `secretName`. If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `mirror.erc1155_transfers`.
You can appreciate that it's pretty similar to the inline decoding pipeline method but here we simply create a transform which does the filtering based on the `raw_log.topics` just as we did on the previous method.
Assuming we are using the same filename for the pipeline configuration as in this example we can deploy this pipeline with the [CLI pipeline create command](/reference/cli#pipeline-create):
`goldsky pipeline create ethereum-erc1155-transfers --definition-path ethereum-decoded-logs-erc1155-transfers.yaml`
## Conclusion
In this guide, we have learnt how Mirror simplifies streaming ERC-1155 Transfer events into your database.
We have first looked into the easy way of achieving this, simply by making use of the readily available ERC-1155 dataset of the EVM chaina and using its as the source to our pipeline.
We have deep dived into the standard decoding method using Decoding Transform Functions, implementing an example on Scroll chain.
We have also looked into an example implementation using the decoded\_logs dataset for Ethereum. Both are great decoding methods and depending on your use case and dataset availability you might prefer one over the other.
With Mirror, developers gain flexibility and efficiency in integrating blockchain data, opening up new possibilities for applications and insights. Experience the transformative power of Mirror today and redefine your approach to blockchain data integration.
# ERC-20 Transfers
Source: https://docs.goldsky.com/mirror/guides/token-transfers/ERC-20-transfers
Create a table containing ERC-20 Transfers for several or all token contracts
ERC-20 tokens provide a standardized format for fungible digital assets within EVM ecosystems. The process of transferring ERC-20 tokens into a database is fundamental, unlocking opportunities for data analysis, tracking, and the development of innovative solutions.
This guide is part of a series of tutorials on how you can stream transfer data into your datawarehouse using Mirror pipelines. Here we will be focusing on ERC-20 Transfers, visit the following guides for other types of Transfers:
* [Native Transfers](/mirror/guides/token-transfers/Native-transfers)
* [ERC-721 Transfers](/mirror/guides/token-transfers/ERC-721-transfers)
* [ERC-1155 Transfers](/mirror/guides/token-transfers/ERC-1155-transfers)
## What you'll need
1. A Goldky account and the CLI installed
2. A basic understanding of the [Mirror product](/mirror)
3. A destination sink to write your data to. In this example, we will use [the PostgreSQL Sink](/mirror/sinks/postgres)
## Introduction
In order to stream all the ERC-20 Transfers of a chain there are two potential methods available:
1. Use the readily available ERC-20 dataset for the chain you are interested in: this is the easiest and quickest method to get you streaming token transfers into your sink of choice with minimum code.
2. Build the ERC-20 Transfers pipeline from scratch using raw or decoded logs: this method takes more code and time to implement but it's a great way to learn about how you can use decoding functions in case you
want to build more customized pipelines.
Let's explore both method below with more detail:
## Using the ERC-20 Transfers Source Dataset
Every EVM chain has its own ERC-20 dataset available for you to use as source in your pipelines. You can check this by running the `goldsky dataset list` command and finding the EVM of your choice.
For this example, let's use `apex` chain and create a simple pipeline definition using its ERC-20 dataset that writes the data into a PostgreSQL instance:
```yaml apex-erc20-transfers.yaml
name: apex-erc20-pipeline
resource_size: s
apiVersion: 3
sources:
apex.erc20_transfers:
dataset_name: apex.erc20_transfers
version: 1.0.0
type: dataset
start_at: earliest
transforms: {}
sinks:
postgres_apex.erc20_transfers_public_apex_erc20_transfers:
type: postgres
table: apex_erc20_transfers
schema: public
secret_name:
description: 'Postgres sink for Dataset: apex.erc20_transfers'
from: apex.erc20_transfers
```
If you copy and use this configuration file, make sure to update:
1. Your `secretName`. If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `public.apex_erc20_transfers`.
You can start above pipeline by running:
```bash
goldsky pipeline start apex-erc20-pipeline.yaml
```
Or
```bash
goldsky pipeline apply apex-erc20-pipeline.yaml --status ACTIVE
```
That's it! You should soon start seeing ERC-20 token transfers in your database.
## Building ERC-20 Transfers from scratch using logs
In the previous method we just explored, the ERC-20 datasets that we used as source to the pipeline encapsulates all the decoding logic that's explained in this section.
Read on if you are interested in learning how it's implemented in case you want to consider extending or modifying this logic yourself.
There are two ways that we can go about building this token transfers pipeline from scratch:
1. Use the `raw_logs` Direct Indexing dataset for that chain in combination with [Decoding Transform Functions](/reference/mirror-functions/decoding-functions) using the ABI of a specific ERC-20 Contract.
2. Use the `decoded_logs` Direct Indexing dataset for that chain in which the decoding process has already been done by Goldsky. This is only available for certain chains as you can check in [this list](/mirror/sources/direct-indexing).
We'll primarily focus on the first decoding method using `raw_logs` and decoding functions as it's the default and most used way of decoding; we'll also present an example using `decoded_logs` and highlight the differences between the two.
### Building ERC-20 Tranfers using Decoding Transform Functions
In this example, we will stream all the `Transfer` events of all the ERC-20 Tokens for the [Scroll chain](https://scroll.io/). To that end, we will dinamically fetch the ABI of the USDT token from the Scrollscan API (available [here](https://api.scrollscan.com/api?module=contract\&action=getabi\&address=0xc7d86908ccf644db7c69437d5852cedbc1ad3f69))
and use it to identify all the same events for the tokens in the chain. We have decided to use the ABI of USDT token contract for this example but any other ERC-20 compliant token would also work.
We need to differentiate ERC-20 token transfers from ERC-721 (NFT) transfers since they have the same event signature in decoded data: `Transfer(address,address,uint256)`.
However, if we look closely at their event definitions we can appreciate that the number of topics differ:
* [ERC-20](https://ethereum.org/en/developers/docs/standards/tokens/erc-20/): `event Transfer(address indexed _from, address indexed _to, uint256 _value)`
* [ERC-721](https://ethereum.org/en/developers/docs/standards/tokens/erc-721/): `event Transfer(address indexed _from, address indexed _to, uint256 indexed _tokenId)`
ERC-20 Transfer events have three topics (one topic for event signature + 2 topics for the indexed params).
NFTs on the other hand have four topics as they have one more indexed param in the event signature.
We will use this as a filter in our pipeline transform to only transfer ERC-20 Transfer events.
Let's now see all these concepts applied in an example pipeline definition:
#### Pipeline Definition
```yaml scroll-erc20-transfers.yaml
name: scroll-erc20-transfers
apiVersion: 3
sources:
my_scroll_mainnet_raw_logs:
type: dataset
dataset_name: scroll_mainnet.raw_logs
version: 1.0.0
transforms:
scroll_decoded:
primary_key: id
# Fetch the ABI from scrollscan for USDT
sql: >
SELECT
*,
_gs_log_decode(
_gs_fetch_abi('https://api.scrollscan.com/api?module=contract&action=getabi&address=0xc7d86908ccf644db7c69437d5852cedbc1ad3f69', 'etherscan'),
`topics`,
`data`
) AS `decoded`
FROM my_scroll_mainnet_raw_logs
WHERE topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'
AND SPLIT_INDEX(topics, ',', 3) IS NULL
scroll_clean:
primary_key: id
# Clean up the previous transform, unnest the values from the `decoded` object
sql: >
SELECT
*,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_name`
FROM scroll_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'Transfer'
scroll_20_transfers:
primary_key: id
sql: >
SELECT
id,
address AS token_id,
lower(event_params[1]) AS sender,
lower(event_params[2]) AS recipient,
lower(event_params[3]) AS `value`,
event_name,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
FROM scroll_clean
sinks:
scroll_20_sink:
type: postgres
table: erc20_transfers
schema: mirror
secret_name:
description: Postgres sink for ERC20 transfers
from: scroll_20_transfers
```
```yaml scroll-erc20-transfers.yaml
sources:
- type: dataset
referenceName: scroll_mainnet.raw_logs
version: 1.0.0
transforms:
- referenceName: scroll_decoded
type: sql
primaryKey: id
# Fetch the ABI from scrollscan for USDT
sql: >
SELECT
*,
_gs_log_decode(
_gs_fetch_abi('https://api.scrollscan.com/api?module=contract&action=getabi&address=0xc7d86908ccf644db7c69437d5852cedbc1ad3f69', 'etherscan'),
`topics`,
`data`
) AS `decoded`
WHERE topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'
AND SPLIT_INDEX(topics, ',', 3) IS NULL
FROM scroll_mainnet.raw_logs
- referenceName: scroll_clean
primaryKey: id
type: sql
# Clean up the previous transform, unnest the values from the `decoded` object
sql: >
SELECT
*,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_name`
FROM scroll_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'Transfer'
- referenceName: scroll_20_transfers
primaryKey: id
type: sql
sql: >
SELECT
id,
address AS token_id,
lower(event_params[1]) AS sender,
lower(event_params[2]) AS recipient,
lower(event_params[3]) AS `value`,
event_name,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
FROM scroll_clean
sinks:
- type: postgres
table: erc20_transfers
schema: mirror
secretName:
description: Postgres sink for ERC20 transfers
referenceName: scroll_20_sink
sourceStreamName: scroll_20_transfers
```
If you copy and use this configuration file, make sure to update:
1. Your `secretName`. If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `mirror.erc20_transfers`.
There are three transforms in this pipeline definition which we'll explain how they work:
```sql Transform: scroll_decoded
SELECT
*,
_gs_log_decode(
_gs_fetch_abi('https://api.scrollscan.com/api?module=contract&action=getabi&address=0xc7d86908ccf644db7c69437d5852cedbc1ad3f69', 'etherscan'),
`topics`,
`data`
) AS `decoded`
FROM scroll_mainnet.raw_logs
WHERE topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'
AND SPLIT_INDEX(topics, ',', 3) IS NULL
```
As explained in the [Decoding Contract Events guide](/mirror/guides/decoding-contract-events) we first make use of the `_gs_fetch_abi` function to get the ABI from Scrollscan and pass it as first argument
to the function `_gs_log_decode` to decode its topics and data. We store the result in a `decoded` [ROW](https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/types/#row) which we unnest on the next transform.
We also limit the decoding to the relevent events using the topic filter and `SPLIT_INDEX` to only include ERC-20 transfers.
* `topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'`: `topics` is a comma separated string. Each value in the string is a hash. The first is the hash of the full event\_signature (including arguments), in our case `Transfer(address,address,uint256)` for ERC-20, which is hashed to `0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef`. We use `LIKE` to only consider the first signature, with a `%` at the end, which acts as a wildcard.
* `SPLIT_INDEX(topics, ',', 3) IS NULL`: as mentioned in the introduction, ERC-20 transfers share the same `event_signature` as ERC-721 transfers. The difference between them is the number of topics associated with the event. ERC-721 transfers have four topics, and ERC-20 transfers have three.
```sql Transform: scroll_clean
SELECT
*,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_name`
FROM scroll_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'Transfer'
```
In this second transform, we take the `event_params` and `event_signature` from the result of the decoding. We then filter the query on:
* `decoded IS NOT NULL`: to leave out potential null results from the decoder
* `decoded.event_signature = 'Transfer'`: the decoder will output the event name as event\_signature, excluding its arguments. We use it to filter only for Transfer events.
```sql Transform: scroll_20_transfers
SELECT
id,
address AS token_id,
lower(event_params[1]) AS sender,
lower(event_params[2]) AS recipient,
lower(event_params[3]) AS `value`,
event_name,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
FROM scroll_clean
```
In this last transform we are essentially selecting all the Transfer information we are interested in having in our database.
We've included a number of columns that you may or may not need, the main columns needed for most purposes are: `id`, `address` (if you are syncing multiple contract addresses), `sender`, `recipient`, `token_id`, and `value`.
* `id`: This is the Goldsky provided `id`, it is a string composed of the dataset name, block hash, and log index, which is unique per event, here's an example: `log_0x60eaf5a2ab37c73cf1f3bbd32fc17f2709953192b530d75aadc521111f476d6c_18`
* `address AS token_id`: We use the `lower` function here to lower-case the address to make using this data simpler downstream, we also rename the column to `token_id` to make it more explicit.
* `lower(event_params[1]) AS sender`: Here we continue to lower-case values for consistency. In this case we're using the first element of the `event_params` array (using a 1-based index), and renaming it to `sender`. Each event parameter maps to an argument to the `event_signature`.
* `lower(event_params[2]) AS recipient`: Like the previous column, we're pulling the second element in the `event_params` array and renaming it to `recipient`.
* `lower(event_params[3]) AS value`: We're pulling the third element in the `event_params` array and renaming it to `value` to represent the amount of the token\_id sent in the transfer.
Lastly, we are also adding more block metadata to the query to add context to each transaction:
```
event_name,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
```
It's worth mentioning that in this example we are interested in all the ERC-20 Transfer events but if you would like to filter for specific contract addresses you could simply add a `WHERE` filter to this query with address you are interested in, like: `WHERE address IN ('0xBC4CA0EdA7647A8aB7C2061c2E118A18a936f13D', '0xdac17f958d2ee523a2206206994597c13d831ec7')`
#### Deploying the pipeline
Our last step is to deploy this pipeline and start sinking ERC-20 transfer data into our database. Assuming we are using the same file name for the pipeline configuration as in this example,
we can use the [CLI pipeline create command](/reference/cli#pipeline-create) like this:
`goldsky pipeline create scroll-erc20-transfers --definition-path scroll-erc20-transfers.yaml`
After some time, you should see the pipeline start streaming Transfer data into your sink.
Remember that you can always speed up the streaming process by [updating](/reference/cli#pipeline-update) the resourceSize of the pipeline
Here's an example transfer record from our sink:
| id | token\_id | sender | recipient | value | event\_name | block\_number | block\_hash | log\_index | transaction\_hash | transaction\_index |
| -------------------------------------------------------------------------- | ------------------------------------------ | ------------------------------------------ | ------------------------------------------ | ----------------- | ----------- | ------------- | ------------------------------------------------------------------ | ---------- | ------------------------------------------------------------------ | ------------------ |
| log\_0x666622ad5c04eb5a335364d9268e24c64d67d005949570061d6c150271b0da12\_2 | 0x5300000000000000000000000000000000000004 | 0xefeb222f8046aaa032c56290416c3192111c0085 | 0x8c5c4595df2b398a16aa39105b07518466db1e5e | 22000000000000006 | Transfer | 5136 | 0x666622ad5c04eb5a335364d9268e24c64d67d005949570061d6c150271b0da12 | 2 | 0x63097d8bd16e34caacfa812d7b608c29eb9dd261f1b334aa4cfc31a2dab2f271 | 0 |
We can find this [transaction in Scrollscan](https://scrollscan.com/tx/0x63097d8bd16e34caacfa812d7b608c29eb9dd261f1b334aa4cfc31a2dab2f271). We see that it corresponds to the second internal transfers of Wrapped ETH (WETH):
This concludes our successful deployment of a Mirror pipeline streaming ERC-20 Tokens from Scroll chain into our database using inline decoders. Congrats! π
### ERC-20 Transfers using decoded datasets
As explained in the Introduction, Goldsky provides decoded datasets for Raw Logs and Raw Traces for a number of different chains. You can check [this list](/mirror/sources/direct-indexing) to see if the chain you are interested in has these decoded datasets.
In these cases, there is no need for us to run Decoding Transform Functions as the dataset itself will already contain the event signature and event params decoded.
Click on the button below to see an example pipeline definition for streaming ERC-20 tokens on the Ethereum chain using the `decoded_logs` dataset.
```yaml ethereum-decoded-logs-erc20-transfers.yaml
sources:
- referenceName: ethereum.decoded_logs
version: 1.0.0
type: dataset
startAt: earliest
description: Decoded logs for events emitted from contracts. Contains the
decoded event signature and event parameters, contract address, data,
topics, and metadata for the block and transaction.
transforms:
- type: sql
referenceName: ethereum_20_transfers
primaryKey: id
description: ERC20 Transfers
sql: >-
SELECT
address AS token_id,
lower(event_params[1]) AS sender,
lower(event_params[2]) AS recipient,
lower(event_params[3]) AS `value`,
raw_log.block_number AS block_number,
raw_log.block_hash AS block_hash,
raw_log.log_index AS log_index,
raw_log.transaction_hash AS transaction_hash,
raw_log.transaction_index AS transaction_index,
id
FROM ethereum.decoded_logs WHERE raw_log.topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'
AND SPLIT_INDEX(raw_log.topics, ',', 3) IS NULL
sinks:
- type: postgres
table: erc20_transfers
schema: mirror
secretName:
description: Postgres sink for ERC20 transfers
referenceName: ethereum_20_sink
sourceStreamName: ethereum_20_transfers
```
If you copy and use this configuration file, make sure to update:
1. Your `secretName`. If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `mirror.erc20_transfers`.
You can appreciate that it's pretty similar to the inline decoding pipeline method but here we simply create a transform which does the filtering based on the `raw_log.topics` just as we did on the previous method.
Assuming we are using the same filename for the pipeline configuration as in this example and that you have added your own [secret](/mirror/manage-secrets)
we can deploy this pipeline with the [CLI pipeline create command](/reference/cli#pipeline-create):
`goldsky pipeline create ethereum-erc20-transfers --definition-path ethereum-decoded-logs-erc20-transfers.yaml`
## Conclusion
In this guide, we have learnt how Mirror simplifies streaming ERC-20 Transfer events into your database.
We have first looked into the easy way of achieving this, simply by making use of the readily available ERC-20 dataset of the EVM chaina and using its as the source to our pipeline.
Next, we have deep dived into the standard decoding method using Decoding Transform Functions, implementing an example on Scroll chain.
We have also looked into an example implementation using the decoded\_logs dataset for Ethereum. Both are great decoding methods and depending on your use case and dataset availability you might prefer one over the other.
With Mirror, developers gain flexibility and efficiency in integrating blockchain data, opening up new possibilities for applications and insights. Experience the transformative power of Mirror today and redefine your approach to blockchain data integration.
# ERC-721 Transfers
Source: https://docs.goldsky.com/mirror/guides/token-transfers/ERC-721-transfers
Create a table containing ERC-721 Transfers for several or all token contracts
ERC-721 tokens, also known as NFTs, provide a standardized format for non-fungible digital assets within EVM ecosystems. The process of transferring ERC-721 tokens into a database is fundamental, unlocking opportunities for data analysis, tracking, and the development of innovative solutions.
This guide is part of a series of tutorials on how you can stream transfer data into your datawarehouse using Mirror pipelines. Here we will be focusing on ERC-721 Transfers, visit the following two other guides for other types of Transfers:
* [Native Transfers](/mirror/guides/token-transfers/Native-transfers)
* [ERC-20 Transfers](/mirror/guides/token-transfers/ERC-20-transfers)
* [ERC-1155 Transfers](/mirror/guides/token-transfers/ERC-1155-transfers)
## What you'll need
1. A Goldky account and the CLI installed
2. A basic understanding of the [Mirror product](/mirror)
3. A destination sink to write your data to. In this example, we will use [the PostgreSQL Sink](/mirror/sinks/postgres)
## Introduction
In order to stream all the ERC-721 Transfers of a chain there are two potential methods available:
1. Use the readily available ERC-20 dataset for the chain you are interested in: this is the easiest and quickest method to get you streaming token transfers into your sink of choice with minimum code.
2. Build the ERC-20 Transfers pipeline from scratch using raw or decoded logs: this method takes more code and time to implement but it's a great way to learn about how you can use decoding functions in case you
want to build more customized pipelines.
Let's explore both method below with more detail:
## Using the ERC-20 Transfers Source Dataset
Every EVM chain has its own ERC-20 dataset available for you to use as source in your pipelines. You can check this by running the `goldsky dataset list` command and finding the EVM of your choice.
For this example, let's use `apex` chain and create a simple pipeline definition using its ERC-20 dataset that writes the data into a PostgreSQL instance:
```yaml apex-erc721-tokens.yaml
name: apex-erc721-pipeline
resource_size: s
apiVersion: 3
sources:
apex.erc721_transfers:
dataset_name: apex.erc721_transfers
version: 1.0.0
type: dataset
start_at: earliest
transforms: {}
sinks:
postgres_apex.erc721_transfers_public_apex_erc721_transfers:
type: postgres
table: apex_erc721_transfers
schema: public
secret_name:
description: 'Postgres sink for Dataset: apex.erc721_transfers'
from: apex.erc721_transfers
```
If you copy and use this configuration file, make sure to update:
1. Your `secretName`. If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `public.apex_erc721_transfers`.
You can start the pipeline by running:
```bash
goldsky pipeline start apex-erc721-pipeline.yaml
```
Or
```bash
goldsky pipeline apply apex-erc721-pipeline.yaml --status ACTIVE
```
That's it! You should soon start seeing ERC-721 token transfers in your database.
## Building ERC-721 Transfers from scratch using logs
In the previous method we just explored, the ERC-721 datasets that we used as source to the pipeline encapsulates all the decoding logic that's explained in this section.
Read on if you are interested in learning how it's implemented in case you want to consider extending or modifying this logic yourself.
There are two ways that we can go about building this token transfers pipeline from scratch:
1. Use the `raw_logs` Direct Indexing dataset for that chain in combination with [Decoding Transform Functions](/reference/mirror-functions/decoding-functions) using the ABI of a specific ERC-721 Contract.
2. Use the `decoded_logs` Direct Indexing dataset for that chain in which the decoding process has already been done by Goldsky. This is only available for certain chains as you can check in [this list](/mirror/sources/direct-indexing).
We'll primarily focus on the first decoding method using `raw_logs` and decoding functions as it's the default and most used way of decoding; we'll also present an example using `decoded_logs` and highlight the differences between the two.
### Building ERC-721 Tranfers using Decoding Transform Functions
In this example, we will stream all the `Transfer` events of all the ERC-721 tokens for the [Scroll chain](https://scroll.io/). To that end, we will dinamically fetch the ABI of the [Cosmic Surprise](https://scrollscan.com/token/0xcf7f37b4916ac5c530c863f8c8bb26ec1e8d2ccb) token from the Scrollscan API (available [here](https://api.scrollscan.com/api?module=contract\&action=getabi\&address=0xcf7f37b4916ac5c530c863f8c8bb26ec1e8d2ccb))
and use it to identify all the same events for the tokens in the chain. We have decided to use the ABI of this NFT contract for this example but any other ERC-721 compliant token would also work.
We need to differentiate ERC-20 token transfers from ERC-721 (NFT) transfers since they have the same event signature in decoded data: `Transfer(address,address,uint256)`.
However, if we look closely at their event definitions we can appreciate that the number of topics differ:
* [ERC-20](https://ethereum.org/en/developers/docs/standards/tokens/erc-20/): `event Transfer(address indexed _from, address indexed _to, uint256 _value)`
* [ERC-721](https://ethereum.org/en/developers/docs/standards/tokens/erc-721/): `event Transfer(address indexed _from, address indexed _to, uint256 indexed _tokenId)`
ERC-20 Transfer events have three topics (one topic for event signature + 2 topics for the indexed params).
NFTs on the other hand have four topics as they have one more indexed param in the event signature.
We will use this as a filter in our pipeline transform to only index ERC-721 Transfer events.
Let's now see all these concepts applied in an example pipeline definition:
#### Pipeline Definition
```yaml scroll-erc721-transfers.yaml
name: scroll-erc721-transfers
apiVersion: 3
sources:
my_scroll_mainnet_raw_logs:
type: dataset
dataset_name: scroll_mainnet.raw_logs
version: 1.0.0
transforms:
scroll_decoded:
primary_key: id
# Fetch the ABI from scrollscan for Cosmic Surprise token
sql: >
SELECT
*,
_gs_log_decode(
_gs_fetch_abi('https://api.scrollscan.com/api?module=contract&action=getabi&address=0xcf7f37b4916ac5c530c863f8c8bb26ec1e8d2ccb', 'etherscan'),
`topics`,
`data`
) AS `decoded`
WHERE topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'
AND SPLIT_INDEX(topics, ',', 3) IS NOT NULL
FROM my_scroll_mainnet_raw_logs
scroll_clean:
primary_key: id
# Clean up the previous transform, unnest the values from the `decoded` object
sql: >
SELECT
*,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_name`
FROM scroll_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'Transfer'
scroll_721_transfers:
primary_key: id
sql: >
SELECT
id,
address AS contract_address,
lower(event_params[1]) AS sender,
lower(event_params[2]) AS recipient,
COALESCE(TRY_CAST(event_params[3] AS NUMERIC), -999) AS token_id,
event_name,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
FROM scroll_clean
sinks:
scroll_721_sink:
type: postgres
table: erc721_transfers
schema: mirror
secret_name:
description: Postgres sink for ERC721 transfers
from: scroll_721_transfers
```
```yaml scroll-erc721-transfers.yaml
sources:
- type: dataset
referenceName: scroll_mainnet.raw_logs
version: 1.0.0
transforms:
- referenceName: scroll_decoded
type: sql
primaryKey: id
# Fetch the ABI from scrollscan for Cosmic Surprise token
sql: >
SELECT
*,
_gs_log_decode(
_gs_fetch_abi('https://api.scrollscan.com/api?module=contract&action=getabi&address=0xcf7f37b4916ac5c530c863f8c8bb26ec1e8d2ccb', 'etherscan'),
`topics`,
`data`
) AS `decoded`
FROM scroll_mainnet.raw_logs
WHERE topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'
AND SPLIT_INDEX(topics, ',', 3) IS NOT NULL
- referenceName: scroll_clean
primaryKey: id
type: sql
# Clean up the previous transform, unnest the values from the `decoded` object
sql: >
SELECT
*,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_name`
FROM scroll_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'Transfer'
- referenceName: scroll_721_transfers
primaryKey: id
type: sql
sql: >
SELECT
id,
address AS contract_address,
lower(event_params[1]) AS sender,
lower(event_params[2]) AS recipient,
COALESCE(TRY_CAST(event_params[3] AS NUMERIC), -999) AS token_id,
event_name,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
FROM scroll_clean
sinks:
- type: postgres
table: erc721_transfers
schema: mirror
secretName:
description: Postgres sink for ERC721 transfers
referenceName: scroll_721_sink
sourceStreamName: scroll_721_transfers
```
If you copy and use this configuration file, make sure to update:
1. Your `secretName`. If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `mirror.erc721_transfers`.
There are 3 transforms in this pipeline definition which we'll explain how they work:
```sql Transform: scroll_decoded
SELECT
*,
_gs_log_decode(
_gs_fetch_abi('https://api.scrollscan.com/api?module=contract&action=getabi&address=0xc7d86908ccf644db7c69437d5852cedbc1ad3f69', 'etherscan'),
`topics`,
`data`
) AS `decoded`
FROM scroll_mainnet.raw_logs
WHERE topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'
AND SPLIT_INDEX(topics, ',', 3) IS NOT NULL
```
As explained in the [Decoding Contract Events guide](/mirror/guides/decoding-contract-events) we first make use of the `_gs_fetch_abi` function to get the ABI from Scrollscan and pass it as first argument
to the function `_gs_log_decode` to decode its topics and data. We store the result in a `decoded` [ROW](https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/types/#row) which we unnest on the next transform.
We include the topic and `SPLIT_INDEX` filters here to limit decoding only to the relevant events.
* `topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'`: `topics` is a comma separated string. Each value in the string is a hash. The first is the hash of the full event\_signature (including arguments), in our case `Transfer(address,address,uint256)` for ERC-721, which is hashed to `0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef`. We use `LIKE` to only consider the first signature, with a `%` at the end, which acts as a wildcard.
* `SPLIT_INDEX(topics, ',', 3) IS NOT NULL`: as mentioned in the introduction, ERC-20 transfers share the same `event_signature` as ERC-721 transfers. The difference between them is the number of topics associated with the event. ERC-721 transfers have four topics, and ERC-20 transfers have three.
```sql Transform: scroll_clean
SELECT
*,
decoded.event_params AS `event_params`,
decoded.event_signature AS `event_name`
FROM scroll_decoded
WHERE decoded IS NOT NULL
AND decoded.event_signature = 'Transfer'
```
In this second transform, we take the `event_params` and `event_signature` from the result of the decoding. We then filter the query on:
* `decoded IS NOT NULL`: to leave out potential null results from the decoder
* `decoded.event_signature = 'Transfer'`: the decoder will output the event name as event\_signature, excluding its arguments. We use it to filter only for Transfer events.
```sql Transform: scroll_721_transfers
SELECT
id,
address AS contract_address,
lower(event_params[1]) AS sender,
lower(event_params[2]) AS recipient,
COALESCE(TRY_CAST(event_params[3] AS NUMERIC), -999) AS token_id,
event_name,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
FROM scroll_clean
```
In this last transform we are essentially selecting all the Transfer information we are interested in having in our database.
We've included a number of columns that you may or may not need, the main columns needed for most purposes are: `id`, `contract_address` (if you are syncing multiple contract addresses), `sender`, `recipient` and `token_id`.
* `id`: This is the Goldsky provided `id`, it is a string composed of the dataset name, block hash, and log index, which is unique per event, here's an example: `log_0x60eaf5a2ab37c73cf1f3bbd32fc17f2709953192b530d75aadc521111f476d6c_18`
* `address AS contract_address`: We use the lower function here to lower-case the address to make using this data simpler downstream, we also rename the column to contract\_address to make it more explicit.
* `lower(event_params[1]) AS sender`: Here we continue to lower-case values for consistency. In this case we're using the first element of the `event_params` array (using a 1-based index), and renaming it to `sender`. Each event parameter maps to an argument to the `event_signature`.
* `lower(event_params[2]) AS recipient`: Like the previous column, we're pulling the second element in the `event_params` array and renaming it to `recipient`.
For the token\_id we introduce a few SQL functions `COALESCE(TRY_CAST(event_params[3] AS NUMERIC), -999) AS token_id`. Weβll start from the inside and work our way out.
1. `event_params[3]` is the third element of the `event_params` array, and for ERC-721 this is the token ID. Although not covered in this example, since ERC-20 shares the same signature, this element represents a token balance rather than token ID if you're decoding ERC-20 transfers.
2. `TRY_CAST(event_params[3] AS NUMERIC)` is casting the string element `event_params[3]` to `NUMERIC` - token IDs can be as large as an unsigned 256 bit integer, so make sure your database can handle that, if not, you can cast it to a different data type that your sink can handle. We use `TRY_CAST` because it will prevent the pipeline from failing in case the cast fails returning a `NULL` value instead.
3. `COALESCE(TRY_CAST(event_params[3] AS NUMERIC), -999)`: `COALESCE` can take an arbitrary number of arguments and returns the first non-NULL value. Since `TRY_CAST` can return a `NULL` we're returning `-999` in case it does. This isn't strictly necessary but is useful to do in case you want to find offending values that were unable to be cast.
Lastly, we are also adding more block metadata to the query to add context to each transaction:
```
event_name,
block_number,
block_hash,
log_index,
transaction_hash,
transaction_index
```
It's worth mentioning that in this example we are interested in all the ERC-721 Transfer events but if you would like to filter for specific contract addresses you could simply add a `WHERE` filter to this query with address you are interested in, like: `WHERE address IN ('0xBC4CA0EdA7647A8aB7C2061c2E118A18a936f13D', '0xdac17f958d2ee523a2206206994597c13d831ec7')`
#### Deploying the pipeline
Our last step is to deploy this pipeline and start sinking ERC-721 transfer data into our database. Assuming we are using the same file name for the pipeline configuration as in this example,
we can use the [CLI pipeline create command](/reference/cli#pipeline-create) like this:
`goldsky pipeline create scroll-erc721-transfers --definition-path scroll-erc721-transfers.yaml`
After some time, you should see the pipeline start streaming Transfer data into your sink.
Remember that you can always speed up the streaming process by [updating](/reference/cli#pipeline-update) the resourceSize of the pipeline
Here's an example transfer record from our sink:
| id | contract\_address | sender | recipient | token\_id | event\_name | block\_number | block\_hash | log\_index | transaction\_hash | transaction\_index |
| --------------------------------------------------------------------------- | ------------------------------------------ | ------------------------------------------ | ------------------------------------------ | --------- | ----------- | ------------- | ------------------------------------------------------------------ | ---------- | ------------------------------------------------------------------ | ------------------ |
| log\_0x5e3225c40254dd5b1b709152feafaa8437e505ae54c028b6d433362150f99986\_34 | 0x6e55472109e6abe4054a8e8b8d9edffcb31032c5 | 0xd2cda3fa01d34878bbe6496c7327b3781d4422bc | 0x6e55472109e6abe4054a8e8b8d9edffcb31032c5 | 38087399 | Transfer | 4057598 | 0x5e3225c40254dd5b1b709152feafaa8437e505ae54c028b6d433362150f99986 | 34 | 0xf06c42ffd407bb9abba8f00d4a42cb7f1acc1725c604b8895cdb5f785f827967 | 11 |
We can find this [transaction in Scrollscan](https://scrollscan.com/tx/0xf06c42ffd407bb9abba8f00d4a42cb7f1acc1725c604b8895cdb5f785f827967). We see that it corresponds to the transfer of MERK token:
This concludes our successful deployment of a Mirror pipeline streaming ERC-721 Tokens from Scroll chain into our database using inline decoders. Congrats! π
### ERC-721 Transfers using decoded datasets
As explained in the Introduction, Goldsky provides decoded datasets for Raw Logs and Raw Traces for a number of different chains. You can check [this list](/mirror/sources/direct-indexing) to see if the chain you are interested in has these decoded datasets.
In these cases, there is no need for us to run Decoding Transform Functions as the dataset itself will already contain the event signature and event params decoded.
Click on the button below to see an example pipeline definition for streaming ERC-1155 tokens on the Ethereum chain using the `decoded_logs` dataset.
```yaml ethereum-decoded-logs-erc721-transfers.yaml
sources:
- referenceName: ethereum.decoded_logs
version: 1.0.0
type: dataset
startAt: earliest
description: Decoded logs for events emitted from contracts. Contains the
decoded event signature and event parameters, contract address, data,
topics, and metadata for the block and transaction.
transforms:
- type: sql
referenceName: ethereum_721_transfers
primaryKey: id
description: ERC721 Transfers
sql: >-
SELECT
address AS contract_address,
lower(event_params[1]) AS sender,
lower(event_params[2]) AS recipient,
COALESCE(TRY_CAST(event_params[3] AS NUMERIC), -999) AS token_id,
raw_log.block_number AS block_number,
raw_log.block_hash AS block_hash,
raw_log.log_index AS log_index,
raw_log.transaction_hash AS transaction_hash,
raw_log.transaction_index AS transaction_index,
id
FROM ethereum.decoded_logs WHERE raw_log.topics LIKE '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef%'
AND SPLIT_INDEX(raw_log.topics, ',', 3) IS NOT NULL
sinks:
- type: postgres
table: ecr721_transfers
schema: mirror
secretName:
description: Postgres sink for 721 NFT transfers
referenceName: ethereum_721_sink
sourceStreamName: ethereum_721_transfers
```
If you copy and use this configuration file, make sure to update:
1. Your `secretName`. If you already [created a secret](/mirror/manage-secrets), you can find it via the [CLI command](/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `mirror.erc721_transfers`.
You can appreciate that it's pretty similar to the inline decoding pipeline method but here we simply create a transform which does the filtering based on the `raw_log.topics` just as we did on the previous method.
Assuming we are using the same filename for the pipeline configuration as in this example we can deploy this pipeline with the [CLI pipeline create command](/reference/cli#pipeline-create):
`goldsky pipeline create ethereum-erc721-transfers --definition-path ethereum-decoded-logs-erc721-transfers.yaml`
## Conclusion
In this guide, we have learnt how Mirror simplifies streaming NFT Transfer events into your database.
We have first looked into the easy way of achieving this, simply by making use of the readily available ERC-721 dataset of the EVM chaina and using its as the source to our pipeline.
We have deep dived into the standard decoding method using Decoding Transform Functions, implementing an example on Scroll chain.
We have also looked into an example implementation using the decoded\_logs dataset for Ethereum. Both are great decoding methods and depending on your use case and dataset availability you might prefer one over the other.
With Mirror, developers gain flexibility and efficiency in integrating blockchain data, opening up new possibilities for applications and insights. Experience the transformative power of Mirror today and redefine your approach to blockchain data integration.
# Native Transfers
Source: https://docs.goldsky.com/mirror/guides/token-transfers/Native-transfers
Create a table containing transfers for the native token of a chain
This guide explains how to use [Raw Logs Direct Indexing sources](/mirror/sources/direct-indexing) to create a Mirror pipeline that allows you to stream all native transactions for a chain into your database. In the example below, we will focus on ETH transfers on the Ethereum network but the same
logic applies to any EVM-compatible chain which has this source available.
This guide is part of a series of tutorials on how you can export transfer data into your datawarehouse. Here we will be focusing on ERC-20 Transfers, visit the following guides for other types of Transfers:
* [ERC-20 Transfers](/mirror/guides/token-transfers/ERC-20-transfers)
* [ERC-721 Transfers](/mirror/guides/token-transfers/ERC-721-transfers)
* [ERC-1155 Transfers](/mirror/guides/token-transfers/ERC-1155-transfers)
## What you'll need
1. A basic understanding of the Mirror product's more [basic ETL use case](/mirror/guides/export-events-to-database).
2. A basic understanding of SQL, though we use the syntax and functionality of [Flink v1.17](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/functions/systemfunctions/).
3. A destination sink to write your data to.
## Preface
When it comes to identifying native transfers on a chain it's important to highlight that there are [two types of accounts](https://ethereum.org/en/developers/docs/accounts/) that can interact with transactions:
* Externally Owned Accounts (EOA): controlled by an actual user.
* Contract Accounts: controlled by code.
Currently, transactions in a block can only be initiated by a EOAs (this is something that could change in the future with the introduction ofΒ [Account Abstraction](https://ethereum.org/en/roadmap/account-abstraction/)).
For instance, take [block 16240000](https://etherscan.io/block/16240000); you will see all transactions initiated belong to EOAs.
A transaction initiated by an EOA can send value to another EOA as in [this transaction](https://etherscan.io/tx/0x7498065db91e8543c6eafed286687fe8006b9ff90081153f769ad47ce115afc8).
Alternatively, this EOA can call a smart contract's method and optionally send value with it as in [this transaction](https://etherscan.io/tx/0x7856bfef7e5da7b22fbdc2fa923bf29d040b2d1b3dbdb3b834dffdc06f4f0a17/advanced).
Smart contracts can then call other smart contracts. They can alternatively send value directly to another EOA. These internal transactions initiated by smart contracts can optionally send native value along so it is important to consider them.
In most chain explorers you can identify these internal transactions and their corresponding value transfers [accessing Advanced view mode](https://etherscan.io/tx/0x7856bfef7e5da7b22fbdc2fa923bf29d040b2d1b3dbdb3b834dffdc06f4f0a17#internal).
All of these types of transactions (EOA initiated & internal transactions) are available in our Raw Logs dataset so we will use it as the source for our Mirror pipeline. You can see its data schema [here](/reference/schema/EVM-schemas#raw-traces).
## Pipeline YAML
There is one transform in this configuration and we'll explain how it works. If you copy and use this configuration file, make sure to update:
1. Your `secret_name` (v2: `secretName`). If you already [created a secret](https://docs.goldsky.com/mirror/manage-secrets), you can find it via the [CLI command](https://docs.goldsky.com/reference/cli#secret) `goldsky secret list`.
2. The schema and table you want the data written to, by default it writes to `public.eth_transfers`.
```yaml native-transfers.yaml
name: native-transfers
apiVersion: 3
sources:
my_ethereum_raw_traces:
dataset_name: ethereum.raw_traces
version: 1.1.0
type: dataset
start_at: earliest
transforms:
ethereum_eth_transfers_transform:
primary_key: id
description: ETH Transfers transform
sql: >
SELECT
id,
block_number,
block_hash,
block_timestamp,
transaction_hash,
transaction_index,
from_address,
to_address,
CASE
WHEN trace_address <> '' THEN 'Internal TX'
ELSE 'EOA TX'
END AS tx_type,
CASE
WHEN block_number <= 17999551 THEN COALESCE(TRY_CAST(`value` AS DECIMAL(100)) / 1e9, 0)
ELSE COALESCE(TRY_CAST(`value` AS DECIMAL(100)), 0)
END AS `value`,
call_type,
trace_address,
status
FROM
my_ethereum_raw_traces
WHERE
call_type <> 'delegatecall' and `value` > 0 and status = 1;
sinks:
postgres_ethereum.eth_transfers:
type: postgres
table: eth_transfers
schema: public
secret_name:
description: "Postgres sink for ethereum.eth_transfers"
from: ethereum_eth_transfers_transform
```
```yaml
sources:
- referenceName: ethereum.raw_traces
version: 1.1.0
type: dataset
startAt: earliest
description: Traces of all function calls made on the chain including metadata
for block, trace, transaction, and gas.
transforms:
- type: sql
referenceName: ethereum_eth_transfers_transform
primaryKey: id
description: ETH Transfers transform
sql: >
SELECT
id,
block_number,
block_hash,
block_timestamp,
transaction_hash,
transaction_index,
from_address,
to_address,
CASE
WHEN trace_address <> '' THEN 'Internal TX'
ELSE 'EOA TX'
END AS tx_type,
CASE
WHEN block_number <= 17999551 THEN COALESCE(TRY_CAST(`value` AS DECIMAL(100)) / 1e9, 0)
ELSE COALESCE(TRY_CAST(`value` AS DECIMAL(100)), 0)
END AS `value`,
call_type,
trace_address,
status
FROM
ethereum.raw_traces
WHERE
call_type <> 'delegatecall' and `value` > 0 and status = 1;
sinks:
- type: postgres
table: eth_transfers
schema: public
secretName:
description: "Postgres sink for ethereum.eth_transfers"
referenceName: postgres_ethereum.eth_transfers
sourceStreamName: ethereum_eth_transfers_transform
```
### Native Transfers Transform
We'll start at the top.
#### Traces context columns
```sql
SELECT
id,
block_number,
block_hash,
block_timestamp,
transaction_hash,
transaction_index,
from_address,
to_address,
```
These are optional columns from this dataset which we include to give us some context around the actual transfer.
#### Transaction Type
```sql
CASE
WHEN trace_address <> '' THEN 'Internal TX'
ELSE 'EOA TX'
END AS tx_type,
```
Here we look into `trace_address` column to identify whether this an initial EOA transaction or an internal one. This is also optional to include.
#### Token Value
```sql
CASE
WHEN block_number <= 17999551 THEN COALESCE(TRY_CAST(`value` AS DECIMAL(100)) / 1e9, 0)
ELSE COALESCE(TRY_CAST(`value` AS DECIMAL(100)), 0)
END AS `value`,
```
Due to the nature of the dataset, we need to make this conversion as values before 17999551 block were wrongly multiplied by 1e9, this is only the case for the Ethereum dataset. Other datasets do not need this adjustment.
Some columns are surrounded by backticks, this is because they are reserved words in Flink SQL. Common columns that need backticks are: data, output, value, and a full list can be found [here](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/overview/#reserved-keywords).
#### Filter
```sql
call_type,
trace_address,
status
```
We include these values in the SELECT statement as we will making use of them in the filter explained below:
```sql
WHERE
call_type <> 'delegatecall' and `value` > 0 and status = 1;
```
Here we filter based on:
* `call_type <> 'delegatecall'`: [delegatecall](https://www.educative.io/answers/what-is-delegatecall-in-ethereum) is a type of function call where the called contract's code is executed with the state of the calling contract, including storage and balance. In some cases, it can mistakenly carry over the value transfer of the original
calling contract which would compromise our data quality due to value transfer duplications. As a result, we can safely leave them out of our resulting dataset as delegatecalls can never send value with them.
* `value > 0`: we want to make sure we track transactions with actual native value.
* `status = 1`: the Raw Traces dataset can contain traces which got reverted. With this filter, we make sure to consider only successful transactions.
## Deploying the pipeline
To deploy this pipeline and start sinking ERC-20 transfer data into your database simply execute:
`goldsky pipeline create --definition-path `
# Introduction
Source: https://docs.goldsky.com/mirror/introduction
# Stream Onchain Data with Pipelines
Mirror streams **onchain data directly to your database**, with \<1s latency.
Using a database offers unlimited queries and the flexibility to easily combine onchain and offchain data together in one place.
You can [source](/mirror/sources/supported-sources) the data you want via
a subgraph or direct indexing, then use
[transforms](/mirror/transforms/transforms-overview) to further filter or map that
data.
Mirror can minimize your latency if you're [running an
app](/mirror/sinks/supported-sinks#for-apis-for-apps), or maximize your
efficiency if you're [calculating
analytics](/mirror/sinks/supported-sinks#for-analytics). You can even send
data to a [channel](/mirror/extensions/channels/overview) to level up your
data team.
Behind the scenes, Mirror automatically creates and runs data pipelines for you off a `.yaml` config file. Pipelines:
1. Are reorg-aware and update your datastores with the latest information
2. Fully manage backfills + edge streaming so you can focus on your product
3. Benefit from quality checks and automated fixes & improvements
4. Work with data across chains, harmonizing timestamps, etc. automatically
Set up your first database by [creating a pipeline](/mirror/create-a-pipeline) in 5 minutes.
# Database secrets
Source: https://docs.goldsky.com/mirror/manage-secrets
## Overview
In order for Goldsky to connect to your sink, you have to configure secrets. Secrets refer to your datastore credentials which are securely stored in your Goldsky account.
You can create and manage your secrets with the `goldsky secret` command. To see a list of available commands and how to use them, please refer to the output of `goldsky secret -h`.
For sink-specific secret information, please refer to the [individual sink pages](/mirror/sinks).
## Guided CLI experience
If you create a pipeline with `goldsky pipeline create `, there is no need to create a secret beforehand. The CLI will list existing secrets and offer you the option of creating a new secret as part of the pipeline creation flow.
# Example Pipelines Repo
Source: https://docs.goldsky.com/mirror/mirror-github
# Performance benchmark
Source: https://docs.goldsky.com/mirror/performance-benchmark
A test of Mirror's write speeds into a ClickHouse database.
## Overview
As a quick test of Mirror's write performance, we ran a pipeline of each size backfilling Ethereum blocks data (\~18.5M rows as of October 23, 2023) with an [unmodified schema](/reference/schema/EVM-schemas#raw-blocks).
This test was performed on a 720 GiB RAM, 180 vCPU [ClickHouse Cloud](https://clickhouse.com/) instance to ensure that non-pipeline factors (eg. disk IO, available memory, CPU cores) do not act as a bottleneck.
Each test was run on a completely clean instance (ie. no existing tables) with no other queries or commands running on the database.
## Results
| Pipeline size | Peak (rows/m) | Time to backfill |
| ------------- | ------------- | ---------------- |
| S | 350,000 | 53min |
| M | 950,000 | 20min |
| L | 2,100,000 | 9min |
| XL | 3,500,000 | \~5min |
| XXL | 6,300,000 | 3min |
# ClickHouse
Source: https://docs.goldsky.com/mirror/sinks/clickhouse
[ClickHouse](https://clickhouse.com/) is a highly performant and cost-effective OLAP database that can support real-time inserts. Mirror pipelines can write subgraph or blockchain data directly into ClickHouse with full data guarantees and reorganization handling.
Mirror can work with any ClickHouse setup, but we have several strong defaults. From our experimentation, the `ReplacingMergeTree` table engine with `append_only_mode` offers the best real-time data performance for large datasets.
[ReplacingMergeTree](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree) engine is used for all sink tables by default. If you don't want to use a ReplacingMergeTree, you can pre-create the table with any data engine you'd like. If you don't want to use a ReplacingMergeTree, you can disable `append_only_mode`.
Full configuration details for Clickhouse sink is available in the [reference](/reference/config-file/pipeline#clickhouse) page.
## Secrets
**Use HTTP**
Mirror writes to ClickHouse via the `http` interface (often port `8443`), rather than the `tcp` interface (often port `9000`).
```shell
goldsky secret create --name A_CLICKHOUSE_SECRET --value '{
"url": "clickhouse://blah.host.com:8443?ssl=true",
"type": "clickHouse",
"username": "default",
"password": "qwerty123",
"databaseName": "myDatabase"
}'
```
## Required permissions
The user will need the following permissions for the target database.
* CREATE DATABASE permissions for that database
* INSERT, SELECT, CREATE, DROP table permissions for tables within that database
```sql
CREATE USER 'username' IDENTIFIED WITH password 'user_password';
GRANT CREATE DATABASE ON goldsky.* TO 'username';
GRANT SELECT, INSERT, DROP, CREATE ON goldsky.* TO 'username';
```
It's highly recommended to assign a ROLE to the user as well, and restrict the amount of total memory and CPU the pipeline has access to. The pipeline will take what it needs to insert as fast as possible, and while that may be desired for a backfill, in a production scenario you may want to isolate those resources.
## Data consistency with ReplacingMergeTrees
With `ReplacingMergeTree` tables, we can write, overwrite, and flag rows with the same primary key for deletes without actually mutating. As a result, the actual raw data in the table may contain duplicates.
ClickHouse allows you to clean up duplicates and deletes from the table by running
```sql
OPTIMIZE FINAL;
```
which will merge rows with the same primary key into one. This may not be deterministic and fully clean all data up, so it's recommended to also add the `FINAL` keyword after the table name for queries.
```SQL
SELECT
FROM FINAL
```
This will run a clean-up process, though there may be performance considerations.
## Append-Only Mode
**Proceed with Caution**
Without `append_only_mode=true` (v2: `appendOnlyMode=true`), the pipeline may hit ClickHouse mutation flush limits. Write speed will also be slower due to mutations.
Append-only mode means the pipeline will only *write* and not *update* or *delete* tables. There will be no mutations, only inserts.
This drastically increases insert speed and reduces Flush exceptions (which happen when too many mutations are queued up).
It's highly recommended as it can help you operate a large dataset with many writes with a small ClickHouse instance.
When `append_only_mode` (v2: `appendOnlyMode`) is `true` (default and recommended for ReplacingMergeTrees), the sink behaves the following way:
* All updates and deletes are converted to inserts.
* `is_deleted` column is automatically added to a table. It contains `1` in case of deletes, `0` otherwise.
* If `versionColumnName` is specified, it's used as a [version number column](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree#ver) for deduplication. If it's not specified, `insert_time` column is automatically added to a table. It contains insertion time and is used for deduplication.
* Primary key is used in the `ORDER BY` clause.
This allows us to handle blockchain reorganizations natively while providing high insert speeds.
When `append_only_mode` (v2: `appendOnlyMode`) is `false`:
* All updates and deletes are propagated as is.
* No extra columns are added.
* Primary key is used in the `PRIMARY KEY` clause.
# Elasticsearch
Source: https://docs.goldsky.com/mirror/sinks/elasticsearch
Give your users blazing-fast auto-complete suggestions, full-text fuzzy searches, and scored recommendations based off of on-chain data.
[Elasticsearch](https://www.elastic.co/) is the leading search datastore, used for a wide variety of usecase for billions of datapoints a day, including search, roll-up aggregations, and ultra-fast lookups on text data.
Goldsky supports real-time insertion into Elasticsearch, with event data updating in Elasticsearch indexes as soon as it gets finalized on-chain.
See the [Elasticsearch docs](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/elasticsearch-intro.html) to see more of what it can do!
Full configuration details for Elasticsearch sink is available in the [reference](/reference/config-file/pipeline#elasticsearch) page.
Contact us at [sales@goldsky.com](mailto:sales@goldsky.com) to learn more about how we can power search for your on-chain data!
## Secrets
Create an Elasticsearch secret with the following CLI command:
```shell
goldsky secret create --name AN_ELASTICSEARCH_SECRET --value '{
"host": "Type.String()",
"username": "Type.String()",
"password": "Type.String()",
"type": "elasticsearch"
}'
```
# MySQL
Source: https://docs.goldsky.com/mirror/sinks/mysql
[MySQL](https://www.mysql.com/) is a powerful, open source object-relational database system used for OLTP workloads.
Mirror supports MySQL as a sink, allowing you to write data directly into MySQL. This provides a robust and flexible solution for both mid-sized analytical workloads and high performance REST and GraphQL APIs.
When you create a new pipeline, a table will be automatically created with columns from the source dataset. If a table is already created, the pipeline will write to it. As an example, you can set up partitions before you setup the pipeline, allowing you to scale MySQL even further.
MySQL also supports Timescale hypertables, if the hypertable is already setup. We have a separate Timescale sink in technical preview that will automatically setup hypertables for you - contact [support@goldsky.com](mailto:support@goldsky.com) for access.
Full configuration details for MySQL sink is available in the [reference](/reference/config-file/pipeline#mysql) page.
## Role Creation
Here is an example snippet to give the permissions needed for pipelines.
```sql
CREATE ROLE goldsky_writer WITH LOGIN PASSWORD 'supersecurepassword';
-- Allow the pipeline to create schemas.
-- This is needed even if the schemas already exist
GRANT CREATE ON DATABASE mysql TO goldsky_writer;
-- For existing schemas that you want the pipeline to write to:
GRANT USAGE, CREATE ON SCHEMA TO goldsky_writer;
```
## Secret Creation
Create a MySQL secret with the following CLI command:
```shell
goldsky secret create --name A_MYSQL_SECRET --value '{
"type": "jdbc",
"protocol": "mysql",
"host": "db.host.com",
"port": 5432,
"databaseName": "myDatabase",
"user": "myUser",
"password": "myPassword"
}'
```
## Examples
### Getting an edge-only stream of decoded logs
This definition gets real-time edge stream of decoded logs straight into a MySQL table named `eth_logs` in the `goldsky` schema, with the secret `A_MYSQL_SECRET` created above.
```yaml
name: ethereum-decoded-logs-to-mysql
apiVersion: 3
sources:
my_ethereum_decoded_logs:
dataset_name: ethereum.decoded_logs
version: 1.0.0
type: dataset
start_at: latest
transforms:
logs:
sql: |
SELECT
id,
address,
event_signature,
event_params,
raw_log.block_number as block_number,
raw_log.block_hash as block_hash,
raw_log.transaction_hash as transaction_hash
FROM
my_ethereum_decoded_logs
primary_key: id
sinks:
my_mysql_sink:
type: mysql
table: eth_logs
schema: goldsky
secret_name: A_MYSQL_SECRET
from: logs
```
```yaml
sources:
- name: ethereum.decoded_logs
version: 1.0.0
type: dataset
startAt: latest
transforms:
- sql: |
SELECT
id,
address,
event_signature,
event_params,
raw_log.block_number as block_number,
raw_log.block_hash as block_hash,
raw_log.transaction_hash as transaction_hash
FROM
ethereum.decoded_logs
name: logs
type: sql
primaryKey: id
sinks:
- type: mysql
table: eth_logs
schema: goldsky
secretName: A_MYSQL_SECRET
sourceStreamName: logs
```
## Tips for backfilling large datasets into MySQL
While MySQL offers fast access of data, writing large backfills into MySQL can sometimes be hard to scale.
Often, pipelines are bottlenecked against sinks.
Here are some things to try:
### Avoid indexes on tables until *after* the backfill
Indexes increase the amount of writes needed for each insert. When doing many writes, inserts can slow down the process significantly if we're hitting resources limitations.
### Bigger batch\_sizes for the inserts
The `sink_buffer_max_rows` setting controls how many rows are batched into a single insert statement. Depending on the size of the events, you can increase this to help with write performance. `1000` is a good number to start with. The pipeline will collect data until the batch is full, or until the `sink_buffer_interval` is met.
### Temporarily scale up the database
Take a look at your database stats like CPU and Memory to see where the bottlenecks are. Often, big writes aren't blocked on CPU or RAM, but rather on network or disk I/O.
For Google Cloud SQL, there are I/O burst limits that you can surpass by increasing the amount of CPU.
For AWS RDS instances (including Aurora), the network burst limits are documented for each instance. A rule of thumb is to look at the `EBS baseline I/O` performance as burst credits are easily used up in a backfill scenario.
### Aurora Tips
When using Aurora, for large datasets, make sure to use `Aurora I/O optimized`, which charges for more storage, but gives you immense savings on I/O credits. If you're streaming the entire chain into your database, or have a very active subgraph, these savings can be considerable, and the disk performance is significantly more stable and results in more stable CPU usage pattern.
# PostgreSQL
Source: https://docs.goldsky.com/mirror/sinks/postgres
[PostgreSQL](https://www.postgresql.org/) is a powerful, open source object-relational database system used for OLTP workloads.
Mirror supports PostgreSQL as a sink, allowing you to write data directly into PostgreSQL. This provides a robust and flexible solution for both mid-sized analytical workloads and high performance REST and GraphQL APIs.
When you create a new pipeline, a table will be automatically created with columns from the source dataset. If a table is already created, the pipeline will write to it. As an example, you can set up partitions before you setup the pipeline, allowing you to scale PostgreSQL even further.
The PostgreSQL also supports Timescale hypertables, if the hypertable is already setup. We have a separate Timescale sink in technical preview that will automatically setup hypertables for you - contact [support@goldsky.com](mailto:support@goldsky.com) for access.
Full configuration details for PostgreSQL sink is available in the [reference](/reference/config-file/pipeline#postgresql) page.
## Role Creation
Here is an example snippet to give the permissions needed for pipelines.
```sql
CREATE ROLE goldsky_writer WITH LOGIN PASSWORD 'supersecurepassword';
-- Allow the pipeline to create schemas.
-- This is needed even if the schemas already exist
GRANT CREATE ON DATABASE postgres TO goldsky_writer;
-- For existing schemas that you want the pipeline to write to:
GRANT USAGE, CREATE ON SCHEMA TO goldsky_writer;
```
## Secret Creation
Create a PostgreSQL secret with the following CLI command:
```shell
goldsky secret create --name A_POSTGRESQL_SECRET --value '{
"type": "jdbc",
"protocol": "postgresql",
"host": "db.host.com",
"port": 5432,
"databaseName": "myDatabase",
"user": "myUser",
"password": "myPassword"
}'
```
## Examples
### Getting an edge-only stream of decoded logs
This definition gets real-time edge stream of decoded logs straight into a postgres table named `eth_logs` in the `goldsky` schema, with the secret `A_POSTGRESQL_SECRET` created above.
```yaml
name: ethereum-decoded-logs-to-postgres
apiVersion: 3
sources:
my_ethereum_decoded_logs:
dataset_name: ethereum.decoded_logs
version: 1.0.0
type: dataset
start_at: latest
transforms:
logs:
sql: |
SELECT
id,
address,
event_signature,
event_params,
raw_log.block_number as block_number,
raw_log.block_hash as block_hash,
raw_log.transaction_hash as transaction_hash
FROM
my_ethereum_decoded_logs
primary_key: id
sinks:
my_postgres_sink:
type: postgres
table: eth_logs
schema: goldsky
secret_name: A_POSTGRESQL_SECRET
from: logs
```
```yaml
sources:
- name: ethereum.decoded_logs
version: 1.0.0
type: dataset
startAt: latest
transforms:
- sql: |
SELECT
id,
address,
event_signature,
event_params,
raw_log.block_number as block_number,
raw_log.block_hash as block_hash,
raw_log.transaction_hash as transaction_hash
FROM
ethereum.decoded_logs
name: logs
type: sql
primaryKey: id
sinks:
- type: postgres
table: eth_logs
schema: goldsky
secretName: A_POSTGRESQL_SECRET
sourceStreamName: logs
```
## Tips for backfilling large datasets into PostgreSQL
While PostgreSQL offers fast access of data, writing large backfills into PostgreSQL can sometimes be hard to scale.
Often, pipelines are bottlenecked against sinks.
Here are some things to try:
### Avoid indexes on tables until *after* the backfill
Indexes increase the amount of writes needed for each insert. When doing many writes, inserts can slow down the process significantly if we're hitting resources limitations.
### Bigger batch\_sizes for the inserts
The `sink_buffer_max_rows` setting controls how many rows are batched into a single insert statement. Depending on the size of the events, you can increase this to help with write performance. `1000` is a good number to start with. The pipeline will collect data until the batch is full, or until the `sink_buffer_interval` is met.
### Temporarily scale up the database
Take a look at your database stats like CPU and Memory to see where the bottlenecks are. Often, big writes aren't blocked on CPU or RAM, but rather on network or disk I/O.
For Google Cloud SQL, there are I/O burst limits that you can surpass by increasing the amount of CPU.
For AWS RDS instances (including Aurora), the network burst limits are documented for each instance. A rule of thumb is to look at the `EBS baseline I/O` performance as burst credits are easily used up in a backfill scenario.
# Provider Specific Notes
### AWS Aurora Postgres
When using Aurora, for large datasets, make sure to use `Aurora I/O optimized`, which charges for more storage, but gives you immense savings on I/O credits. If you're streaming the entire chain into your database, or have a very active subgraph, these savings can be considerable, and the disk performance is significantly more stable and results in more stable CPU usage pattern.
### Supabase
Supabase's direct connection URLs only support IPv6 connections and will not work with our default validation. There are two solutions
1. Use `Session Pooling`. In the connection screen, scroll down to see the connection string for the session pooler. This will be included in all Supabase plans and will work for most people. However, sessions will expire, and may lead to some warning logs in your pipeline logs. These will be dealt with gracefully and no action is needed. No data will be lost due to a session disconnection.

2. Alternatively, buy the IPv4 add-on, if session pooling doesn't fit your needs. It can lead to more persistent direct connections,
# Mirror - Supported sinks
Source: https://docs.goldsky.com/mirror/sinks/supported-sinks
Sinks define the destination of your data. We support two broad categories of sinks based on their functionality and applicability:
* Standard Sinks: These sinks are destinations readily available for querying and analysis, such as traditional databases.
* Channel Sinks: These sinks serve as intermediate storage layers, facilitating further integration into your data stack. Examples: Kafka, AWS S3, or AWS SQS.
## Standard Sinks
Standard Sinks are the default and most popular type of sinks for Mirror. They are optimized for immediate querying and analysis, providing a seamless experience for real-time data access and operations. These sinks are:
Postgres stands out with its advanced features, extensibility, and strong
ACID compliance.
Hosted Postgres managed by Goldsky via NeonDB. Store data securely, scale infinitely and export your data if you need it.
MySQL stands out with its advanced features, extensibility, and strong
ACID compliance.
ClickHouse delivers exceptional performance for OLAP queries with its
columnar storage format.
Elasticsearch is a powerful tool for real-time search and analytics on large
datasets.
Timescale offers powerful time-series data management and analytics with
PostgreSQL compatibility.
Goldsky Channels are storage layers designed to absorb the Goldsky firehose and let you stream data into alternative sinks. These channels are AWS S3, AWS SQS and Kafka.
A Webhook sink enables sending data to an external service via HTTP. This allows you to output pipeline results to your application server, to a third-party API, or a bot.
### What should I use?
#### For APIs for apps
For sub-second queries, typically you would choose a database that has row-based storage (i.e. it stores each row as it is instead of applying any sort of compression).
The drawbacks are that they take more space. This means large, non-indexed scans can take longer and storage costs can be higher.
1. [Postgres](/mirror/sinks/postgres) is the gold standard for application databases. It can scale almost infinitely with some management (You can use a Goldsky hosted version so you don't have to worrry about scaling), and can support very fast point-lookups with proper indexing.
If you require super fast lookups by `transaction_hash` or a specific column, Postgres is a very safe choice to start with. Itβs great as a backend for live data APIs.
However, it can be slow for analytics queries with a lot of aggregations. For that, you may want to look for an analytical database.
Great hosted solutions for Postgres include [NeonDB](https://neon.tech/), [AWS Aurora](https://aws.amazon.com/rds/aurora/), and [GCP CloudSQL](https://cloud.google.com/sql).
2. [Elasticsearch](/mirror/sinks/elasticsearch) is a no-sql database that allows for blazing fast lookups and searches. Elasticsearch is built around super-fast non-indexed scanning, meaning it can look at every single record to find the one you want. As a result, you can do queries like fuzzy matches and wildcard lookups with millisecond latency.
Common applications include search on multiple columns, βinstantβ auto-complete, and more.
#### For Analytics
1. [ClickHouse](/mirror/sinks/clickhouse) is a very efficient choice for storage. You can store the entire Ethereum blockchain and pay around \$50 in storage.
We recommend considering ClickHouse as an alternative to Snowflake or BigQuery - it supports many of the same use cases, and has additional features such as materialized views. Weβve seen our customers save tens of thousands of dollars using Goldsky and ClickHouse as a solution.
The pricing for managing ClickHouse is based on storage cost, then compute cost. The compute cost is constant and isnβt based on the amount of data scanned, so you can run concurrent queries without increasing cost.
## Channel Sinks
Channel Sinks act as an extension of the default sinks, providing intermediate storage for more complex data integration scenarios. They are designed to handle high-throughput data streams and enable further processing within your data stack. Examples include:
* AWS S3: A scalable object storage service.
* AWS SQS: A fully managed message queue for microservices, distributed systems, and serverless applications.
* Kafka: A distributed event streaming platform.
For more information on Channel Sinks and how to integrate them, visit our [Channels documentation](/mirror/extensions/channels/overview).
# Timescale
Source: https://docs.goldsky.com/mirror/sinks/timescale
**Closed Beta**
This feature is in closed beta and only available for our enterprise customers.
Please contact us at [support@goldsky.com](mailto:support@goldsky.com) to request access to this feature.
We partner with [Timescale](https://www.timescale.com) to provide teams with real-time data access on on-chain data, using a database powerful enough for time series analytical queries and fast enough for transactional workloads like APIs.
Timescale support is in the form of hypertables - any dataset that has a `timestamp`-like field can be used to create a Timescale hypertable.
You can also use the traditional JDBC/postgres sink with Timecale - you would just need to create the hypertable yourself.
You use TimescaleDB for anything you would use PostgreSQL for, including directly serving APIs and other simple indexed table look-ups. With Timescale Hypertables, you can also make complex database queries like time-windowed aggregations, continuous group-bys, and more.
Learn more about Timescale here: [https://docs.timescale.com/api/latest/](https://docs.timescale.com/api/latest/)
# Webhook
Source: https://docs.goldsky.com/mirror/sinks/webhook
A Webhook sink allows you to send data to an external service via HTTP. This provides considerable flexibility for forwarding pipeline results to your application server, a third-party API, or a bot.
Webhook sinks ensure at least once delivery and manage back-pressure, meaning data delivery adapts based on the responsiveness of your endpoints. The pipeline sends a POST request with a JSON payload to a specified URL, and the receiver only needs to return a 200 status code to confirm successful delivery.
Here is a snippet of YAML that specifies a Webhook sink:
## Pipeline configuration
```yaml
sinks:
my_webhook_sink:
type: webhook
# The webhook url
url: Type.String()
# The object key coming from either a source or transform.
# Example: ethereum.raw_blocks.
from: Type.String()
# The name of a goldsky httpauth secret you created which contains a header that can be used for authentication. More on how to create these in the section below.
secret_name: Type.Optional(Type.String())
# Optional metadata that you want to send on every request.
headers:
SOME-HEADER-KEY: Type.Optional(Type.String())
# Whether to send only one row per http request (better for compatibility with third-party integrations - e.g bots) or to mini-batch it (better for throughput).
one_row_per_request: Type.Optional(Type.Boolean())
```
```yaml
sinks:
myWebhookSink:
type: webhook
# The webhook url
url: Type.String()
# The object key coming from either a source or transform.
# Example: ethereum.raw_blocks.
from: Type.String()
# The name of a goldsky httpauth secret you created which contains a header that can be used for authentication. More on how to create these in the section below.
secretName: Type.Optional(Type.String())
# Optional metadata that you want to send on every request.
headers:
SOME-HEADER-KEY: Type.Optional(Type.String())
# Whether to send only one row per http request (better for compatibility with third-party integrations - e.g bots) or to mini-batch it (better for throughput).
oneRowPerRequest: Type.Optional(Type.Boolean())
```
## Secret creation
Create a httpauth secret with the following CLI command:
```shell
goldsky secret create
```
Select `httpauth` as the secret type and then follow the prompts to finish creating your httpauth secret.
## Example Webhook sink configuration
```yaml
sinks:
my_webhook_sink:
type: webhook
url: https://my-webhook-service.com/webhook-1
from: ethereum.raw_blocks
secret_name: ETH_BLOCKS_SECRET
```
# Direct indexing
Source: https://docs.goldsky.com/mirror/sources/direct-indexing
With mirror pipelines, you can access to indexed on-chain data. Define them as a source and pipe them into any sink we support.
## Use-cases
* Mirror specific logs and traces from a set of contracts into a postgres database to build an API for your protocol
* ETL data into a data warehouse to run analytics
* Push the full blockchain into Kafka or S3 to build a datalake for ML
## Supported Chains
## Schema
The schema for each of these datasets can be found [here](/reference/schema/EVM-schemas).
## Backfill vs Fast Scan
Goldsky allows you either backfill the entire datasets or alternatively pre-filter the data based on specific attributes.
This allows for an optimal cost and time efficient streaming experience based on your specific use case.
For more information on how to enable each streaming mode in your pipelines visit our [reference documentation](/reference/config-file/pipeline#backfill-vs-fast-scan).
# NFT datasets
Source: https://docs.goldsky.com/mirror/sources/nft-data
**Technical Preview**
This dataset is in technical preview - it's being used in production by customers already but we are onboarding new users slowly as we scale up our infrastructure.
[Email us](mailto:sales@goldsky.com) if you'd like to join our technical preview program.
Our NFT Metadata dataset includes:
* Image metadata
* Royalty metadata
* Floor price and previous sales
* Transfers
* Off-chain and on-chain Bids/Sales
* Rarity
As a mirror source, you'd be able to sink it into any of our supported sinks, execute transformations and aggregations, as well as join it with other datasets.
This dataset is in technical preview - it's being used in production by customers already but we are onboarding new users slowly as we scale up our infrastructure.
# Subgraphs
Source: https://docs.goldsky.com/mirror/sources/subgraphs
You can use subgraphs as a pipeline source, allowing you to combined the flexibility of subgraph indexing with the expressiveness of the database of your choice.
This enables a lot of powerful use-cases:
* Reuse all your existing subgraph entities.
* Increase querying speeds drastically compared to graphql-engines.
* Flexible aggregations that weren't possible with just GraphQL.
* Analytics on protocols through Clickhouse, and more.
* Plug into BI tools, train AI, and export data for your users
Full configuration details for Subgraph Entity source is available in the [reference](/reference/config-file/pipeline#subgraph-entity) page.
## Automatic Deduplication
Subgraphs natively support time travel queries. This means every historical version of every entity is stored. To do this, each row has an `id`, `vid`, and `block_range`.
When you update an entity in a subgraph mapping handler, a new row in the database is created with the same ID, but new VID and block\_range, and the old row's `block_range` is updated to have an end.
By default, pipelines **deduplicate** on `id`, to show only the latest row per `id`. In other words, historical entity state is not kept in the sink database. This saves a lot of database space and makes for easier querying, as additional deduplication logic is not needed for simple queries. In a postgres database for example, the pipeline will update existing rows with the values from the newest block.
This deduplication happens through setting the primary key in the data going through the pipeline. By default, the primary key is `id`.
If historical data is desired, you can set the primary key to `vid` through a transform.
```yaml
name: qidao-optimism-subgraph-to-postgrse
apiVersion: 3
sources:
subgraph_account:
type: subgraph_entity
name: account
subgraphs:
- name: qidao-optimism
version: 1.1.0
transforms:
historical_accounts:
sql: >-
select * from subgraph_account
primary_key: vid
sinks:
postgres_account:
type: postgres
table: historical_accounts
schema: goldsky
secret_name: A_POSTGRESQL_SECRET
from: historical_accounts
```
```yaml
sources:
- type: subgraphEntity
# The deployment IDs you gathered above. If you put multiple,
# they must have the same schema
deployments:
- id: QmPuXT3poo1T4rS6agZfT51ZZkiN3zQr6n5F2o1v9dRnnr
# A name, referred to later in the `sourceStreamName` of a transformation or sink
referenceName: account
entity:
# The name of the entities
name: account
transforms:
- referenceName: historical_accounts
type: sql
# The `account` referenced here is the referenceName set in the source
sql: >-
select * from account
primaryKey: vid
sinks:
- type: postgres
table: historical_accounts
schema: goldsky
secretName: A_POSTGRESQL_SECRET
# the `historical_accounts` is the referenceKey of the transformation made above
sourceStreamName: historical_accounts
```
In this case, all historical versions of the entity will be retained in the pipeline sink. If there was no table, tables will be automatically created as well.
## Using the wizard
### Subgraphs from your project
Use any of your own subgraphs as a pipeline source. Use `goldsky pipeline create ` and select `Project Subgraph`, and push subgraph data into any of our supported sinks.
### Community subgraphs
When you create a new pipeline with `goldsky pipeline create `, select **Community Subgraphs** as the source type. This will display a list of available subgraphs to choose from. Select the one you are interested in and follow the prompts to complete the pipeline creation.
This will get load the subgraph into your project and create a pipeline with that subgraph as the source.
# Mirror - Supported sources
Source: https://docs.goldsky.com/mirror/sources/supported-sources
Mirror data from community subgraphs or from your own custom subgraphs into any sink.
Mirror entire blockchains into your database for analysis, or filter/transform them to what you need.
Curated, NFT-specific datasets for token metadata, sales/listings/transfers activity, and more.
# Static IPs
Source: https://docs.goldsky.com/mirror/static-ips
## Overview
Goldsky can connect to your sinks using static IPs. This can be helpful if you want to further restrict access to your sinks and ensure that only Goldsky owned services can access them.
This feature is currently in a closed beta and only available for our enterprise customers. Please contact us at [support@goldsky.com](mailto:support@goldsky.com) (or through Slack/Telegram) to request access to this feature.
### Usage
1. Reach out to us to have this feature enabled for your account.
2. Whitelist the following IPs: `100.21.15.214`, `44.229.26.196`, `44.230.239.184`, `52.38.124.121`
3. Make sure you're using the latest version of the CLI: `curl | sh`
4. Deploy your pipelines with the flag for enabling static IPs by setting the `dedicated_egress_ip: true|false` in the YAML config of your pipeline
Example:
```
name: private-ip-pipeline
apiVersion: 3
resource_size: s
dedicated_egress_ip: true
sources:
reward:
type: subgraph_entity
subgraphs:
- name: rewards-subgraph
version: 1.0.1
name: reward_payout
transforms: {}
sinks:
rewards_payout_sink:
type: postgres
table: reward_payout
schema: rewards
secret_name: DB_WITH_IP_WHITELISTED
from: reward
```
# External Handler Transforms
Source: https://docs.goldsky.com/mirror/transforms/external-handlers
Transforming data with an external http service.
With external handler transforms, you can send data from your Mirror pipeline to an external service via HTTP and return the processed results back into the pipeline. This opens up a world of possibilities by allowing you to bring your own custom logic, programming languages, and external services into the transformation process.
[In this repo](https://github.com/goldsky-io/documentation-examples/tree/main/mirror-pipelines/goldsky-enriched-erc20-pipeline) you can see an example implementation of enriching ERC-20 Transfer Events with an HTTP service.
**Key Features of External Handler Transforms:**
* Send data to external services via HTTP.
* Supports a wide variety of programming languages and external libraries.
* Handle complex processing outside the pipeline and return results in real time.
* Guaranteed at least once delivery and back-pressure control to ensure data integrity.
### How External Handlers work
1. The pipeline sends a POST request to the external handler with a mini-batch of JSON rows.
2. The external handler processes the data and returns the transformed rows in the same format and order as received.
### Example workflow
1. The pipeline sends data to an external service (e.g. a custom API).
2. The service processes the data and returns the results to the pipeline.
3. The pipeline continues processing the enriched data downstream.
### Example HTTP Request
```json
POST /external-handler
[
{"id": 1, "value": "abc"},
{"id": 2, "value": "def"}
]
```
### Example HTTP Response
```json
[
{"id": 1, "transformed_value": "xyz"},
{"id": 2, "transformed_value": "uvw"}
]
```
### YAML config with an external transform
```YAML
transforms:
my_external_handler_transform:
type: handler # the transform type. [required]
primary_key: hash # [required]
url: http://example-url/example-transform-route # url that your external handler is bound to. [required]
headers: # [optional]
Some-Header: some_value # use http headers to pass any tokens your server requires for authentication or any metadata that you think is useful.
from: ethereum.raw_blocks # the input for the handler. Data sent to your handler will have the same schema as this source/transform. [required]
# A schema override signals to the pipeline that the handler will respond with a schema that differs from the upstream source/transform (in this case ethereum.raw_blocks).
# No override means that the handler will do some processing, but that its output will maintain the upstream schema.
# The return type of the handler is equal to the upstream schema after the override is applied. Make sure that your handler returns a response with rows that follow this schema.
schema_override: # [optional]
new_column_name: datatype # if you want to add a new column, do so by including its name and datatype.
existing_column_name: new_datatype # if you want to change the type of an existing column (e.g. cast an int to string), do so by including its name and the new datatype
other_existing_column_name: null # if you want to drop an existing column, do so by including its name and setting its datatype to null
```
### Schema override datatypes
When overriding the schema of the data returned by the handler itβs important to get the datatypes for each column right. The schema\_override property is a map of column names to Flink SQL datatypes.
| Data Type | Notes |
| :------------- | :---------------------------------- |
| STRING | |
| BOOLEAN | |
| BYTE | |
| DECIMAL | Supports fixed precision and scale. |
| SMALLINT | |
| INTEGER | |
| BIGINT | |
| FLOAT | |
| DOUBLE | |
| TIME | Supports only a precision of 0. |
| TIMESTAMP | |
| TIMESTAMP\_LTZ | |
| ARRAY | |
| ROW | |
### Key considerations
* **Schema Changes:** If the external handlerβs output schema changes, you will need to redeploy the pipeline with the relevant schema\_override.
* **Failure Handling:** In case of failures, the pipeline retries requests indefinitely with exponential backoff.
* **Networking & Performance:** For optimal performance, deploy your handler in a region close to where the pipelines are deployed (we use aws `us-west-2`). Aim to keep p95 latency under 100 milliseconds for best results.
* **Connection & Response times**: The maximum allowed response time is 5 minutes and the maximum allowed time to establish a connection is 1 minute.
### In-order mode for external handlers
In-Order mode allows for subgraph-style processing inside mirror. Records are emitted to the handler in the order that they appear on-chain.
**How to get started**
1. Make sure that the sources that you want to use currently support [Fast Scan](/mirror/sources/direct-indexing). If they donβt, submit a request to support.
2. In your pipeline definition specify the `filter` and `in_order` attributes for your source.
3. Declare a transform of type handler or a sink of type webhook.
Simple transforms (e.g filtering) in between the source and the handler/webhook are allowed, but other complex transforms (e.g. aggregations, joins) can cause loss of ordering.
**Example YAML config, with in-order mode**
```YAML
name: in-order-pipeline
sources:
ethereum.raw_transactions:
dataset_name: ethereum.raw_transactions
version: 1.1.0
type: dataset
filter: block_number > 21875698 # [required]
in_order: true # [required] enables in-order mode on the given source and its downstream transforms and sinks.
sinks:
my_in_order_sink:
type: webhook
url: https://my-handler.com/process-in-order
headers:
WEBHOOK-SECRET: secret_two
secret_name: HTTPAUTH_SECRET_TWO
from: another_transform
my_sink:
type: webhook
url: https://python-handler.fly.dev/echo
from: ethereum.raw_transactions
```
**Example in-order webhook sink**
```javascript
const express = require('express');
const { Pool } = require('pg');
const app = express();
app.use(express.json());
// Database connection settings
const pool = new Pool({
user: 'your_user',
host: 'localhost',
database: 'your_database',
password: 'your_password',
port: 5432,
});
async function isDuplicate(client, key) {
const res = await client.query("SELECT 1 FROM processed_messages WHERE key = $1", [key]);
return res.rowCount > 0;
}
app.post('/webhook', async (req, res) => {
const client = await pool.connect();
try {
await client.query('BEGIN');
const payload = req.body;
const metadata = payload.metadata || {};
const data = payload.data || {};
const op = metadata.op;
const key = metadata.key;
if (!key || !op || !data) {
await client.query('ROLLBACK');
return res.status(400).json({ error: "Invalid payload" });
}
if (await isDuplicate(client, key)) {
await client.query('ROLLBACK');
return res.status(200).json({ message: "Duplicate request processed without write side effects" });
}
if (op === "INSERT") {
const fields = Object.keys(data);
const values = Object.values(data);
const placeholders = fields.map((_, i) => `$${i + 1}`).join(', ');
const query = `INSERT INTO my_table (${fields.join(', ')}) VALUES (${placeholders})`;
await client.query(query, values);
} else if (op === "DELETE") {
const conditions = Object.keys(data).map((key, i) => `${key} = $${i + 1}`).join(' AND ');
const values = Object.values(data);
const query = `DELETE FROM my_table WHERE ${conditions}`;
await client.query(query, values);
} else {
await client.query('ROLLBACK');
return res.status(400).json({ error: "Invalid operation" });
}
await client.query("INSERT INTO processed_messages (key) VALUES ($1)", [key]);
await client.query('COMMIT');
return res.status(200).json({ message: "Success" });
} catch (e) {
await client.query('ROLLBACK');
return res.status(500).json({ error: e.message });
} finally {
client.release();
}
});
app.listen(5000, () => {
console.log('Server running on port 5000');
});
```
**In-order mode tips**
* To observe records in order, either have a single instance of your handler responding to requests OR introduce some coordination mechanism to make sure that only one replica of the service can answer at a time.
* When deploying your service, avoid having old and new instances running at the same time. Instead, discard the current instance and incur a little downtime to preserve ordering.
* When receiving messages that have already been processed in the handler (pre-existing idempotency key or previous index (e.g already seen block number)) **don't** introduce any side effects on your side, but **do** respond to the message as usual (i.e., processed messages for handlers, success code for webhook sink) so that the pipeline knows to keep going.
### Useful tips
Schema Changes: A change in the output schema of the external handler requires redeployment with schema\_override.
* **Failure Handling:** The pipeline retries indefinitely with exponential backoff.
* **Networking:** Deploy the handler close to where the pipeline runs for better performance.
* **Latency:** Keep handler response times under 100ms to ensure smooth operation.
# SQL Transforms
Source: https://docs.goldsky.com/mirror/transforms/sql-transforms
Transforming blockchain data with Streaming SQL
## SQL Transforms
SQL transforms allow you to write SQL queries to modify and shape data from multiple sources within the pipeline. This is ideal for operations that need to be performed within the data pipeline itself, such as filtering, aggregating, or joining datasets.
Depending on how you choose to [source](/mirror/sources/supported-sources) your data, you might find that you run into 1 of 2 challenges:
1. **You only care about a few contracts**
Rather than fill up your database with a ton of extra data, you'd rather ***filter*** down your data to a smaller set.
2. **The data is still a bit raw**
Maybe you'd rather track gwei rounded to the nearest whole number instead of wei. You're looking to ***map*** data to a different format so you don't have to run this calculation over and over again.
### The SQL Solution
You can use SQL-based transforms to solve both of these challenges that normally would have you writing your own indexer or data pipeline. Instead, Goldsky can automatically run these for you using just 3 pieces of info:
* `name`: **A shortname for this transform**
You can refer to this from sinks via `from` or treat it as a table in SQL from other transforms.
* `sql`: **The actual SQL**
To filter your data, use a `WHERE` clause, e.g. `WHERE liquidity > 1000`.
To map your data, use an `AS` clause combined with `SELECT`, e.g. `SELECT wei / 1000000000 AS gwei`.
* `primary_key`: **A unique ID**
This should be unique, but you can also use this to intentionally de-duplicate data - the latest row with the same ID will replace all the others.
Combine them together into your [config](/reference/config-file/pipeline):
```yaml
transforms:
negative_fpmm_scaled_liquidity_parameter:
sql: SELECT id FROM polymarket.fixed_product_market_maker WHERE scaled_liquidity_parameter < 0
primary_key: id
```
```yaml
transforms:
- referenceName: negative_fpmm_scaled_liquidity_parameter
type: sql
sql: SELECT id FROM polygon.fixed_product_market_maker WHERE scaled_liquidity_parameter < 0
primaryKey: id
```
That's it. You can now filter and map data to exactly what you need.
# Overview
Source: https://docs.goldsky.com/mirror/transforms/transforms-overview
Learn about Mirror's powerful transformation capabilities.
While the simple pipelines let you get real-time data from one of our data sets into your own destination, most teams also do enrichment and filtering using transforms.
With transforms, you can decode check external API is call contracts storage and more. You can even call your own APIs in order to tie the pipeline into your existing system seamlessly.
## [SQL Transforms](/mirror/transforms/sql-transforms)
SQL transforms allow you to write SQL queries to modify and shape data from multiple sources within the pipeline. This is ideal for operations that need to be performed within the data pipeline itself, such as filtering, aggregating, or joining datasets.
Depending on how you choose to [source](/mirror/sources/supported-sources) your data, you might find that you run into 1 of 2 challenges:
1. **You only care about a few contracts**
Rather than fill up your database with a ton of extra data, you'd rather ***filter*** down your data to a smaller set.
2. **The data is still a bit raw**
Maybe you'd rather track gwei rounded to the nearest whole number instead of wei. You're looking to ***map*** data to a different format so you don't have to run this calculation over and over again.
## [External Handler Transforms](/mirror/transforms/external-handlers)
With external handler transforms, you can send data from your Mirror pipeline to an external service via HTTP and return the processed results back into the pipeline. This opens up a world of possibilities by allowing you to bring your own custom logic, programming languages, and external services into the transformation process.
Key Features of External Handler Transforms:
* Send data to external services via HTTP.
* Supports a wide variety of programming languages and external libraries.
* Handle complex processing outside the pipeline and return results in real time.
* Guaranteed at least once delivery and back-pressure control to ensure data integrity.
### How External Handlers work
1. The pipeline sends a POST request to the external handler with a mini-batch of JSON rows.
2. The external handler processes the data and returns the transformed rows in the same format and order as received
# Pricing
Source: https://docs.goldsky.com/pricing/summary
Understand how metered billing works on Goldsky
Our prices are quoted on a monthly basis for simpler presentation, but metered and billed on an hourly basis. This has a few key implications.
1. To account for the varying number of days in each month of the year, we conservatively estimate that each month has 730 hours. This means that estimated monthly pricing shown is higher than what you would typically pay for the specified usage in most months.
2. All estimations of the number of subgraphs or Mirror pipelines assume "always-on" capacity. In practice, you can run double the number of subgraph workers or pipeline workers for half the time and pay the same price. This similarly holds for the "entities stored" metric in subgraphs.
## Subgraphs
We track usage based on two metrics: (1) the number of active subgraphs, and (2) the amount of data stored across all subgraphs in your project.
### Metering
#### Active Subgraphs
The number of active subgraph workers, tracked hourly. If you pause or delete a subgraph, it is no longer billed.
Examples:
1. If you have 10 active subgraphs, you use **10** *subgraph worker hours* per hour. At 730 hours per month, you incur **7,300** *subgraph worker hours*.
2. If you begin a period with 10 active subgraphs and delete all of them halfway through the period, you are billed the equivalent of 5 subgraphs for that period.
#### Subgraph Entities Stored
The number of entities stored across all subgraphs in your project, tracked hourly. If you delete a subgraph, stored entities are no longer tracked. All entities in a project count toward the project's usage on a cumulative basis.
Examples:
1. If you have 3 active subgraphs that cumulatively store 30,000 entities, you use **30,000** *subgraph entity storage hours* per hour.
At 730 hours per month, you incur `30,000 * 730 = 21,900,000` *subgraph entity storage hours* in that month.
2. If you begin a period with 3 active subgraphs, each with 10,000 entities, and you delete 2 of them after 10 days, you use **30,000** *entity subgraph hours* for the first 10 days, then **10,000** *entity subgraph hours* thereafter.
### Starter Plan
#### Active Subgraphs
Up to 3 active subgraphs per month.
#### Subgraph Storage
Up to 100,000 entities stored per month.
You incur usage for each hour that each subgraph in your project is deployed and active. If you have 2 subgraphs deployed and active for 2 hours each, you will accumulate 4 hours of usage.
When you exceed Starter Plan usage limits, subgraph indexing will be paused, but subgraphs will remain queryable.
### Scale Plan
| Active Subgraphs (subgraph worker-hours) | |
| ------------------------------------------ | ---------------------------------------------------------------- |
| First 2,250 worker-hours | Free (i.e., 3 always-on subgraphs) |
| Above 2,250 worker-hours | $0.05/hour (i.e., ~$ 36.50/month/additional subgraph) |
| Subgraph Storage (subgraph entity storage-hours) | |
| -------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
| First 100k entities stored (i.e., up to 75,000,000 storage-hours) | Free |
| Up to 10M entities stored (i.e., up to 7.5B storage-hours) | \~$4.00/month per 100k entities stored (i.e., $ 0.0053 per 100k entities stored/hour) |
| Above 10M entities stored (i.e., >7.5B storage-hours) | \~$1.05/month per 100k entities stored (i.e., $ 0.0014 per 100k entities stored/hour) |
## Mirror
### Metering
#### Active Pipeline Workers
The number of active workers, billed hourly. Pipeline resources can have multiple parallel workers, and each worker incurs usage separately.
| Resource Size | Workers |
| --------------- | ------- |
| small (default) | 1 |
| medium | 4 |
| large | 10 |
| x-large | 20 |
| xx-large | 40 |
If you have one small pipeline and one large pipeline each deployed for 2 hours, you will accumulate `1*2*1 + 1*2*10 = 2 + 20 = 22` hours of usage.
Note: Pipelines that use a single subgraph as a source, and webhooks or GraphQL APIs as sink(s), are not metered as pipelines. However, you still accumulate hourly subgraph usage.
Examples:
1. If you have **1** small pipeline, you use **1** *pipeline worker-hour* every hour. At 730 hours in the average month, you would incur **730** *pipeline worker-hours* for that month.
2. If you start with **10** small pipelines in a billing period and delete all of them halfway through the billing period, you are charged the equivalent of 5 pipeline workers for the full billing period.
3. If you have **2** large pipelines, you will be using **20** *pipeline worker-hours* every hour, equating to **14,600** *pipeline worker-hours* if you run them the entire month.
#### Pipeline Event Writes
The number of records written by pipelines in your project. For example, for a PostgreSQL sink, every row created, updated, or deleted, counts as a βwriteβ. For a Kafka sink, every message counts as write.
Examples:
1. If you have a pipeline that writes **20,000** records per day for 10 days, and then **20** records per day for 10 days, you will be using **200,200** pipeline event writes.
2. If you have two pipelines that each write 1 million events in one month, then you are not charged for the first one million events, but you are charged \$1 for the next one million, as per the Starter Plan limits below.
### Starter Plan
#### Active Pipeline Workers
Each billing cycle, you can run 1 small pipeline free of charge (\~730 pipeline worker-hours).
#### Pipeline Event Writes
You can write up to 1 million events to a sink, cumulatively across all pipelines, per billing cycle.
When you exceed Starter Plan limits, pipelines will be paused, but pipelines will remain queryable.
### Scale Plan
You will incur usage for each hour that each pipeline in your project is deployed and active.
Note: The pipeline `resource size` maps to the underlying VM size and acts as a multiplier on hourly usage.
| Active Pipelines (pipeline worker-hours) | |
| ------------------------------------------ | ------------------------------------------------- |
| First 750 worker-hours | Free (i.e., 1 always-on pipeline worker/month) |
| 751+ worker-hours | $0.10 (i.e., $ 73.00/month per worker) |
| Pipeline Throughput (pipeline events written) | |
| ----------------------------------------------- | ------------------------- |
| First 1M events written | Free |
| Up to 100M events written | \$1.00 per 100,000 events |
| Above 100M events written | \$0.10 per 100,000 events |
## Hosted databases (Beta)
The first hosted database we are shipping is Postgres. **During the Beta period, Scale plan customers can use hosted Postgres free of charge.** After that, we will turn on meterd billing. You can find the pricing details below.\
\
We track usage based on two metrics: (1) the amount of total storage used, and (2) the amount of compute time (memory and CPU) used across all databases. Note that during the Beta periods, we will not be charging for hosted databases.
### Metering
#### Storage - Total storage used
We bill for database storage based on the **average amount of storage** you use during your monthly billing cycle. This means youβre only charged for the actual amount of storage youβve usedβ**averaged across the entire month**.
Example
1. Letβs say you start your subscription and **add 3.3 GB of new data every day** for 30 days across all your databases. Hereβs how your usage looks:
* Day 1: 3.3 GB
* Day 2: 6.6 GB
* Day 3: 9.9 GB
* ...
* Day 30: 99 GB
Over the 30-day billing period, your total daily storage usage adds up to **1,534.5 GB-days**.
To find the monthly average, we divide by 30:
> **1,534.5 GB-days Γ· 30 days = 51.15 GB**
So at the end of your billing cycle, you'd be charged based on **51.15 GB of average storage**.
#### What is a GB-day?
A **GB-day** (gigabyte-day) is a simple way to represent **how much storage you used, and for how long**.
Think of it like this:
* If you use **1 GB** of storage for **1 day**, thatβs **1 GB-day**.
* If you use **3.3 GB** for **1 day**, thatβs **3.3 GB-days**.
* If you use **3.3 GB** for **2 days**, thatβs **6.6 GB-days**.
***
#### Utilization - Amount of compute time
The total amount of active CPU hours used by all databases \* number of vCPUs used. This is tracked hourly. If you delete or pause a pipeline that uses a hosted Postgres database, the database will transition to idle mode and **you wonβt incur utilization charges** during that time.
Note that if you query the database from an external source, like a DB visualization tool, you will be charged for utilization since the database is actively being queried.
#### There are two options
* Auto-scaling VCUS (recommended): You'll be able to set a min and max number of VCUs. You'll be charged a variable hourly rate depending on how much time is spent in each VCU range.
* Fixed VCUs: You'll be charged a fixed price per compute hour at the VCU multiple you set.
**WE WILL BE LAUNCHING WITH AUTOSCALING FOR THE BETA PERIOD.**
Example
1. If you have 2 active databases using a fixed 2Β VCUs running for 12hΒ a day. You'll be billed as follows: 2Β VCUsΒ ΓΒ 12Β hΒ /Β dayΒ ΓΒ 30Β days = **720Β computeβhours.**
### Starter Plan
Hosted databases are only available on Scale plans. You'll need to sign up for a Scale plan in order to use them.
### Scale Plan
| **Dimension** | **Unit** | **Price per unit** |
| :------------ | :----------------------------------- | :------------------------------------------------------------------- |
| Storage | 1 GB | **1GB free**, \$1.50 per GB, per month. Measured hourly in GB-hours. |
| Utilization | 1 Compute Hour | \$0.16 per vCPU, per month |
| Databases | No charge per additional DBs set up. | Free |
# Role-based access control
Source: https://docs.goldsky.com/rbac
Use RBAC to determine who can do what on your Goldsky project
## Overview
Goldsky supports Role Based Access Control (RBAC) to help you restrict what actions can be taken by different members of the team.
We support 4 different roles: `Owner`, `Admin`, `Editor` and `Viewer`. The permissions are listed below:
* `Owner`
* Can do everything an `Admin` can do
* Can add other `Owner`s to the project
* Can remove other `Owner`s from the project
* Can update the role of teammates to `Owner`
* Can change the subscription and billing information of the project
* `Admin`
* Can do everything an `Editor` can do
* Can invite non-`Owner` teammates to a project
* Can remove non-`Owner` teammates from a project
* Can update the role of non-`Owner` teammates on a project
* `Editor`
* Can do everything a `Viewer` can do
* Can create, update and delete API keys
* Can create, update and delete subgraphs
* Can create, update and delete pipelines
* Can create, update and delete secrets
* Can create, update and delete webhooks
* Can edit the name of a project
* `Viewer`
* Can view subgraphs
* Can view pipelines
* Can view secrets
* Can view webhooks
* Can view metrics
* Can view teammates
* Can leave a project
* Can create new projects
## Using the Webapp
### Adding a teammate to your project
When adding a teammate you will be prompted to select the desired role for the new teammate(s). The default selected role is `Viewer`
### Changing the role of teammates
You must be an `Admin` to change the role of your teammate(s).
To manage the RBAC settings for the team members of a given project, select the project and navigate to the [Settings](https://app.goldsky.com/dashboard/settings#team) menu.
Click on the overflow menu and click on `Update Role`
## Using the Command Line
### Adding a teammate to your project
Use the `--role` flag of `goldsky project users invite` to select which role the invited users will have. The default role is `Viewer`.
```
goldsky project users invite --emails "" "" (passing as many emails as you want) --role
```
### Changing the role of teammates
Use the `--role` flag of `goldsky project users update` to change the role a user defined by the `--email`
```
goldsky project users update --email "" --role
```
# CLI Reference
Source: https://docs.goldsky.com/reference/cli
Goldsky's command line interface reference
{/*}
This file is generated. Do not modify.
To update the file:
1. Navigate to the goldsky-io/goldsky monorepo
2. cd packages/cli && pnpm docs:reference:generate
3. Use the cli-reference.md content
{*/}
```
goldsky args
```
How to use:
```
goldsky args
Commands:
goldsky Get started with Goldsky [default]
goldsky login Log in to Goldsky to enable authenticated CLI commands
goldsky logout Log out of Goldsky on this computer
goldsky subgraph Commands related to subgraphs
goldsky project Commands related to project management
goldsky pipeline Commands related to Goldsky pipelines
goldsky dataset Commands related to Goldsky datasets
goldsky indexed Analyze blockchain data with indexed.xyz
goldsky secret Commands related to secret management
goldsky telemetry Commands related to CLI telemetry
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-v, --version Show version number [boolean]
-h, --help Show help [boolean]
```
## login
```
goldsky login
```
How to use:
```
goldsky login
Log in to Goldsky to enable authenticated CLI commands
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
## logout
```
goldsky logout
```
How to use:
```
goldsky logout
Log out of Goldsky on this computer
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
## subgraph
```
goldsky subgraph
```
How to use:
```
goldsky subgraph
Commands related to subgraphs
Commands:
goldsky subgraph deploy Deploy a subgraph to Goldsky
goldsky subgraph list [nameAndVersion] View deployed subgraphs and tags
goldsky subgraph delete Delete a subgraph from Goldsky
goldsky subgraph tag Commands related to tags
goldsky subgraph webhook Commands related to webhooks
goldsky subgraph log Tail a subgraph's logs
goldsky subgraph pause Pause a subgraph
goldsky subgraph start Start a subgraph
goldsky subgraph update Update a subgraph
goldsky subgraph init [nameAndVersion] Initialize a new subgraph project with basic scaffolding
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### subgraph deploy
```
goldsky subgraph deploy
```
How to use:
```
goldsky subgraph deploy
Deploy a subgraph to Goldsky
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--path Path to subgraph [string]
--description Description/notes for the subgraph [string]
--from-ipfs-hash IPFS hash of a publicly deployed subgraph [string]
--ipfs-gateway IPFS gateway to use if downloading the subgraph from IPFS [string] [default: "https://ipfs.network.thegraph.com"]
--from-abi Generate a subgraph from an ABI [string]
--from-url GraphQL endpoint for a publicly deployed subgraph [string]
--remove-graft Remove grafts from the subgraph prior to deployment [boolean] [default: false]
--start-block Change start block of your subgraph prior to deployment. If used in conjunction with --graft-from, this will be the graft block as well. [number]
--graft-from Graft from the latest block of an existing subgraph in the format / [string]
--enable-call-handlers Generate a subgraph from an ABI with call handlers enabled. Only meaningful when used with --from-abi [boolean] [default: false]
--tag Tag the subgraph after deployment, comma separated for multiple tags [string]
-h, --help Show help [boolean]
```
### subgraph list
```
goldsky subgraph list [nameAndVersion]
```
How to use:
```
goldsky subgraph list [nameAndVersion]
View deployed subgraphs and tags
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--filter Limit results to just tags or deployments [choices: "tags", "deployments"]
--summary Summarize subgraphs & versions without all their details [boolean] [default: false]
-h, --help Show help [boolean]
```
### subgraph delete
```
goldsky subgraph delete
```
How to use:
```
goldsky subgraph delete
Delete a subgraph from Goldsky
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-f, --force Force the deletion without prompting for confirmation [boolean] [default: false]
-h, --help Show help [boolean]
```
### subgraph tag
```
goldsky subgraph tag
```
How to use:
```
goldsky subgraph tag
Commands related to tags
Commands:
goldsky subgraph tag create Create a new tag
goldsky subgraph tag delete Delete a tag
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
#### subgraph tag create
```
goldsky subgraph tag create
```
How to use:
```
goldsky subgraph tag create
Create a new tag
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-t, --tag The name of the tag [string] [required]
-h, --help Show help [boolean]
```
#### subgraph tag delete
```
goldsky subgraph tag delete
```
How to use:
```
goldsky subgraph tag delete
Delete a tag
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-t, --tag The name of the tag to delete [string] [required]
-f, --force Force the deletion without prompting for confirmation [boolean] [default: false]
-h, --help Show help [boolean]
```
### subgraph webhook
```
goldsky subgraph webhook
```
How to use:
```
goldsky subgraph webhook
Commands related to webhooks
Commands:
goldsky subgraph webhook create Create a webhook
goldsky subgraph webhook delete [webhook-name] Delete a webhook
goldsky subgraph webhook list List webhooks
goldsky subgraph webhook list-entities List possible webhook entities for a subgraph
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
#### subgraph webhook create
```
goldsky subgraph webhook create
```
How to use:
```
goldsky subgraph webhook create
Create a webhook
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--name Name of the webhook, must be unique [string] [required]
--url URL to send events to [string] [required]
--entity Subgraph entity to send events for [string] [required]
--num-retries Number of times to retry sending an event [number] [default: 10]
--retry-interval Number of seconds to wait between retries [number] [default: 60]
--retry-timeout Number of seconds to wait for a response before retrying [number] [default: 30]
--secret The secret you will receive with each webhook request Goldsky sends [string]
-h, --help Show help [boolean]
```
#### subgraph webhook delete
```
goldsky subgraph webhook delete [webhook-name]
```
How to use:
```
goldsky subgraph webhook delete [webhook-name]
Delete a webhook
Positionals:
webhook-name Name of the webhook to delete [string]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--name Name of the webhook to delete [deprecated: Please use the positional argument instead.] [string]
-f, --force Force the deletion without prompting for confirmation [boolean] [default: false]
-h, --help Show help [boolean]
```
#### subgraph webhook list
```
goldsky subgraph webhook list
```
How to use:
```
goldsky subgraph webhook list
List webhooks
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
#### subgraph webhook list-entities
```
goldsky subgraph webhook list-entities
```
How to use:
```
goldsky subgraph webhook list-entities
List possible webhook entities for a subgraph
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### subgraph log
```
goldsky subgraph log
```
How to use:
```
goldsky subgraph log
Tail a subgraph's logs
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--since Return logs newer than a relative duration like now, 5s, 2m, or 3h [default: "1m"]
--format The format used to output logs, use text or json for easier parsed output, use pretty for more readable console output [choices: "pretty", "json", "text"] [default: "text"]
--filter The minimum log level to output [choices: "error", "warn", "info", "debug"] [default: "info"]
--levels The explicit comma separated log levels to include (error, warn, info, debug)
--interval The time in seconds to wait between checking for new logs [number] [default: 5]
-h, --help Show help [boolean]
```
### subgraph pause
```
goldsky subgraph pause
```
How to use:
```
goldsky subgraph pause
Pause a subgraph
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### subgraph start
```
goldsky subgraph start
```
How to use:
```
goldsky subgraph start
Start a subgraph
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### subgraph update
```
goldsky subgraph update
```
How to use:
```
goldsky subgraph update
Update a subgraph
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--public-endpoint Toggle public endpoint for the subgraph [string] [choices: "enabled", "disabled"]
--private-endpoint Toggle private endpoint for the subgraph [string] [choices: "enabled", "disabled"]
--description Description/notes for the subgraph [string]
-h, --help Show help [boolean]
```
### subgraph init
```
goldsky subgraph init [nameAndVersion]
```
How to use:
```
goldsky subgraph init [nameAndVersion]
Initialize a new subgraph project with basic scaffolding
Positionals:
nameAndVersion Name and version of the subgraph, e.g. 'my-subgraph/1.0.0' [string]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--target-path Target path to write subgraph files to [string]
--force Overwrite existing files at the target path [boolean] [default: false]
--from-config Path to instant subgraph JSON configuration file [string]
--abi ABI source(s) for contract(s) [string]
--contract Contract address(es) to watch for events [string]
--contract-events Event names to index for the contract(s) [string]
--contract-calls Call names to index for the contract(s) [string]
--network Network(s) to use for contract(s) reference our docs for supported subgraph networks: https://docs.goldsky.com/chains/supported-networks [string]
--contract-name Name of the contract(s) [string]
--start-block Block to start at for a contract on a specific network [string]
--description Subgraph description [string]
--call-handlers Enable call handlers for the subgraph [boolean]
--build Build the subgraph after writing files [boolean]
--deploy Deploy the subgraph after build [boolean]
-h, --help Show help [boolean]
```
## project
```
goldsky project
```
How to use:
```
goldsky project
Commands related to project management
Commands:
goldsky project users Commands related to the users of a project
goldsky project leave Leave a project
goldsky project list List all of the projects you belong to
goldsky project update Update a project
goldsky project create Create a project
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### project users
```
goldsky project users
```
How to use:
```
goldsky project users
Commands related to the users of a project
Commands:
goldsky project users list List all users for this project
goldsky project users invite Invite a user to your project
goldsky project users remove Remove a user from your project
goldsky project users update Update a user's project permissions
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
#### project users list
```
goldsky project users list
```
How to use:
```
goldsky project users list
List all users for this project
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
#### project users invite
```
goldsky project users invite
```
How to use:
```
goldsky project users invite
Invite a user to your project
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--emails emails of users to invite [array] [required]
--role desired role of invited user(s) [string] [required] [choices: "Owner", "Admin", "Editor", "Viewer"] [default: "Viewer"]
-h, --help Show help [boolean]
```
#### project users remove
```
goldsky project users remove
```
How to use:
```
goldsky project users remove
Remove a user from your project
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--email email of user to remove [string] [required]
-h, --help Show help [boolean]
```
#### project users update
```
goldsky project users update
```
How to use:
```
goldsky project users update
Update a user's project permissions
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--email email of user to remove [string] [required]
--role role of user to update [string] [required] [choices: "Owner", "Admin", "Editor", "Viewer"]
-h, --help Show help [boolean]
```
### project leave
```
goldsky project leave
```
How to use:
```
goldsky project leave
Leave a project
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--projectId the ID of the project you want to leave [string] [required]
-h, --help Show help [boolean]
```
### project list
```
goldsky project list
```
How to use:
```
goldsky project list
List all of the projects you belong to
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### project update
```
goldsky project update
```
How to use:
```
goldsky project update
Update a project
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--name the new name of the project [string] [required]
-h, --help Show help [boolean]
```
### project create
```
goldsky project create
```
How to use:
```
goldsky project create
Create a project
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--name the name of the new project [string] [required]
-h, --help Show help [boolean]
```
## pipeline
```
goldsky pipeline
```
How to use:
```
goldsky pipeline
Commands related to Goldsky pipelines
Commands:
goldsky pipeline get Get a pipeline
goldsky pipeline export [name] Export pipeline configurations
goldsky pipeline apply Apply the provided pipeline yaml config. This command creates the pipeline if it doesn't exist or updates the existing pipeline. This command is idempotent.
goldsky pipeline get-definition [deprecated] Get a shareable pipeline definition. Use "pipeline get --definition" instead.
goldsky pipeline create Create a pipeline
goldsky pipeline update [deprecated] Update a pipeline. Use "pipeline apply" instead.
goldsky pipeline delete Delete a pipeline
goldsky pipeline list List all pipelines
goldsky pipeline monitor Monitor a pipeline runtime
goldsky pipeline pause Pause a pipeline
goldsky pipeline start Start a pipeline
goldsky pipeline stop Stop a pipeline
goldsky pipeline info Display pipeline information
goldsky pipeline resize Resize a pipeline
goldsky pipeline validate [config-path] Validate a pipeline definition or config.
goldsky pipeline cancel-update Cancel in-flight update request
goldsky pipeline restart Restart a pipeline. Useful in scenarios where pipeline needs to be restarted without any configuration changes.
goldsky pipeline snapshots Commands related to snapshots
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### pipeline get
```
goldsky pipeline get
```
How to use:
```
goldsky pipeline get
Get a pipeline
Positionals:
nameOrConfigPath pipeline name or config file path [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--outputFormat, --output format of the output. Either json or table. Defaults to json. [deprecated] [string] [choices: "json", "table", "yaml"] [default: "yaml"]
--definition print the pipeline's definition only (sources, transforms, sinks) [boolean]
-v, --version pipeline version. Returns latest version of the pipeline if not set. [string]
-h, --help Show help [boolean]
```
### pipeline export
```
goldsky pipeline export [name]
```
How to use:
```
goldsky pipeline export [name]
Export pipeline configurations
Positionals:
name pipeline name [string]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--all Export pipeline configurations for all available pipelines [boolean]
-h, --help Show help [boolean]
```
### pipeline apply
```
goldsky pipeline apply
```
How to use:
```
goldsky pipeline apply
Apply the provided pipeline yaml config. This command creates the pipeline if it doesn't exist or updates the existing pipeline. This command is idempotent.
Positionals:
config-path path to the yaml pipeline config file. [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--from-snapshot Snapshot that will be used to start the pipeline. Applicable values are: 'last', 'new', 'none' or a snapshot-id. 'last' uses latest available snapshot. 'new' creates a new snapshot to use. 'none': does not use any snapshot aka starts from scratch. Including the option without any argument will start an interactive mode to select from a list of available snapshots. Defaults to 'new' [string]
--save-progress Attempt a snapshot of the pipeline before applying the update. Only applies if the pipeline already has status: ACTIVE and is running without issues. Defaults to saving progress unless pipeline is being updated to status=INACTIVE. [deprecated: Use '--from-snapshot'] [boolean]
--skip-transform-validation skips the validation of the transforms when updating the pipeline. Defaults to false [boolean]
--skip-validation skips the validation of the transforms when updating the pipeline. Defaults to false [deprecated] [boolean]
--use-latest-snapshot attempts to use the latest available snapshot. [deprecated: Use '--from-snapshot'] [boolean]
--status Status of the pipeline [string] [choices: "ACTIVE", "INACTIVE", "PAUSED"]
-h, --help Show help [boolean]
```
### pipeline get-definition
```
goldsky pipeline get-definition
```
How to use:
```
goldsky pipeline get-definition
[deprecated] Get a shareable pipeline definition. Use "pipeline get --definition" instead.
Positionals:
name pipeline name [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--outputFormat, --output format of the output. Either json or yaml. Defaults to yaml. [deprecated] [string] [choices: "json", "yaml"] [default: "yaml"]
-h, --help Show help [boolean]
```
### pipeline create
```
goldsky pipeline create
```
How to use:
```
goldsky pipeline create
Create a pipeline
Positionals:
name name of the new pipeline [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--output, --outputFormat format of the output. Either json or table. Defaults to table. [string] [choices: "json", "table", "yaml"] [default: "yaml"]
--resource-size, --resourceSize runtime resource size for when the pipeline runs [deprecated: Use 'pipeline resize'] [string] [required] [choices: "s", "m", "l", "xl", "xxl", "mem.l", "mem.xl", "mem.xxl"] [default: "s"]
--skip-transform-validation skips the validation of the transforms when creating the pipeline. [boolean]
--description the description of the new pipeline [deprecated: Use 'pipeline apply'] [string]
--definition definition of the pipeline that includes sources, transforms, sinks. Provided as json eg: `{sources: [], transforms: [], sinks:[]}` [deprecated: Use 'pipeline apply'] [string]
--definition-path path to a json/yaml file with the definition of the pipeline that includes sources, transforms, sinks. [deprecated: Use 'pipeline apply'] [string]
--status the desired status of the pipeline [deprecated: Use 'pipeline start/stop/pause'] [string] [choices: "ACTIVE", "INACTIVE"] [default: "ACTIVE"]
--use-dedicated-ip Whether the pipeline should use dedicated egress IPs [boolean] [required] [default: false]
-h, --help Show help [boolean]
```
### pipeline update
```
goldsky pipeline update
```
How to use:
```
goldsky pipeline update
[deprecated] Update a pipeline. Use "pipeline apply" instead.
Positionals:
name name of the pipeline to update. [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--outputFormat, --output format of the output. Either json or table. Defaults to json. [deprecated] [string] [required] [choices: "json", "table", "yaml"] [default: "yaml"]
--resource-size, --resourceSize runtime resource size for when the pipeline runs [string] [choices: "s", "m", "l", "xl", "xxl", "mem.l", "mem.xl", "mem.xxl"]
--status status of the pipeline [string] [choices: "ACTIVE", "INACTIVE", "PAUSED"]
--save-progress takes a snapshot of the pipeline before applying the update. Only applies if the pipeline already has status: ACTIVE. Defaults to saving progress unless pipeline is being updated to status=INACTIVE. [boolean]
--skip-transform-validation skips the validation of the transforms when updating the pipeline. [boolean]
--use-latest-snapshot attempts to use the latest available snapshot. [boolean]
--definition definition of the pipeline that includes sources, transforms, sinks. Provided as json eg: `{sources: [], transforms: [], sinks:[]}` [string]
--definition-path path to a json/yaml file with the definition of the pipeline that includes sources, transforms, sinks. [string]
--description description of the pipeline` [string]
-h, --help Show help [boolean]
```
### pipeline delete
```
goldsky pipeline delete
```
How to use:
```
goldsky pipeline delete
Delete a pipeline
Positionals:
nameOrConfigPath pipeline name or config file path [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-f, --force Force the deletion without prompting for confirmation [boolean] [default: false]
-h, --help Show help [boolean]
```
### pipeline list
```
goldsky pipeline list
```
How to use:
```
goldsky pipeline list
List all pipelines
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--output, --outputFormat format of the output. Either json or table. Defaults to json. [string] [choices: "json", "table", "yaml"] [default: "table"]
--outputVerbosity Either summary or all. Defaults to summary. [string] [choices: "summary", "usablewithapplycmd", "all"] [default: "summary"]
--include-runtime-details includes runtime details for each pipeline like runtime status and errors. Defaults to false. [boolean] [default: false]
-h, --help Show help [boolean]
```
### pipeline monitor
```
goldsky pipeline monitor
```
How to use:
```
goldsky pipeline monitor
Monitor a pipeline runtime
Positionals:
nameOrConfigPath pipeline name or config file path [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--update-request monitor update request [boolean]
--max-refreshes, --maxRefreshes max. number of data refreshes. [number]
-v, --version pipeline version, uses latest version if not set. [string]
-h, --help Show help [boolean]
```
### pipeline pause
```
goldsky pipeline pause
```
How to use:
```
goldsky pipeline pause
Pause a pipeline
Positionals:
nameOrConfigPath pipeline name or config file path [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### pipeline start
```
goldsky pipeline start
```
How to use:
```
goldsky pipeline start
Start a pipeline
Positionals:
nameOrConfigPath pipeline name or config path [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--use-latest-snapshot attempts to use the latest available snapshot. [deprecated: Use '--from-snapshot'] [boolean]
--from-snapshot Snapshot that will be used to start the pipeline. Applicable values are: 'last', 'new', 'none' or a snapshot-id. 'last' uses latest available snapshot. 'new' creates a new snapshot to use. 'none': does not use any snapshot aka starts from scratch. Including the option without any argument will start an interactive mode to select from a list of available snapshots. Defaults to 'new' [string]
-h, --help Show help [boolean]
```
### pipeline stop
```
goldsky pipeline stop
```
How to use:
```
goldsky pipeline stop
Stop a pipeline
Positionals:
nameOrConfigPath pipeline name or config file path [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### pipeline info
```
goldsky pipeline info
```
How to use:
```
goldsky pipeline info
Display pipeline information
Positionals:
nameOrConfigPath pipeline name or config file path [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-v, --version pipeline version. Returns latest version of the pipeline if not set. [string]
-h, --help Show help [boolean]
```
### pipeline resize
```
goldsky pipeline resize
```
How to use:
```
goldsky pipeline resize
Resize a pipeline
Positionals:
nameOrConfigPath pipeline name or config file path [string] [required]
resource-size, resourceSize runtime resource size [string] [choices: "s", "m", "l", "xl", "xxl", "mem.l", "mem.xl", "mem.xxl"] [default: "s"]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### pipeline validate
```
goldsky pipeline validate [config-path]
```
How to use:
```
goldsky pipeline validate [config-path]
Validate a pipeline definition or config.
Positionals:
config-path path to the yaml pipeline config file. [string]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--definition definition of the pipeline that includes sources, transforms, sinks. Provided as json eg: `{sources: [], transforms: [], sinks:[]}` [deprecated: use config-path positional instead.] [string]
--definition-path path to a json/yaml file with the definition of the pipeline that includes sources, transforms, sinks. [deprecated: use config-path positional instead.] [string]
-h, --help Show help [boolean]
```
### pipeline cancel-update
```
goldsky pipeline cancel-update
```
How to use:
```
goldsky pipeline cancel-update
Cancel in-flight update request
Positionals:
nameOrConfigPath pipeline name or config file path [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### pipeline restart
```
goldsky pipeline restart
```
How to use:
```
goldsky pipeline restart
Restart a pipeline. Useful in scenarios where pipeline needs to be restarted without any configuration changes.
Positionals:
nameOrConfigPath pipeline name or config path [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--from-snapshot Snapshot that will be used to start the pipeline. Applicable values are: 'last', 'new', 'none' or a snapshot-id. 'last' uses latest available snapshot. 'new' creates a new snapshot to use. 'none': does not use any snapshot aka starts from scratch. Including the option without any argument will start an interactive mode to select from a list of available snapshots. Defaults to 'new' [string] [required]
--disable-monitoring Disables monitoring after the command is run. Defaults to false. [boolean] [default: false]
-h, --help Show help [boolean]
```
### pipeline snapshots
```
goldsky pipeline snapshots
```
How to use:
```
goldsky pipeline snapshots
Commands related to snapshots
Commands:
goldsky pipeline snapshots list List snapshots in a pipeline
goldsky pipeline snapshots create Attempts to take a snapshot of the pipeline
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
#### pipeline snapshots list
```
goldsky pipeline snapshots list
```
How to use:
```
goldsky pipeline snapshots list
List snapshots in a pipeline
Positionals:
nameOrConfigPath pipeline name or config file path [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-v, --version pipeline version. Returns snapshots across all versions if not set. [string]
-h, --help Show help [boolean]
```
#### pipeline snapshots create
```
goldsky pipeline snapshots create
```
How to use:
```
goldsky pipeline snapshots create
Attempts to take a snapshot of the pipeline
Positionals:
nameOrConfigPath pipeline name or config file path [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
## dataset
```
goldsky dataset
```
How to use:
```
goldsky dataset
Commands related to Goldsky datasets
Commands:
goldsky dataset get Get a dataset
goldsky dataset list List datasets
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### dataset get
```
goldsky dataset get
```
How to use:
```
goldsky dataset get
Get a dataset
Positionals:
name dataset name [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--outputFormat the output format. Either json or yaml. Defaults to yaml [string]
-v, --version dataset version [string]
-h, --help Show help [boolean]
```
### dataset list
```
goldsky dataset list
```
How to use:
```
goldsky dataset list
List datasets
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
## indexed
```
goldsky indexed
```
How to use:
```
goldsky indexed
Analyze blockchain data with indexed.xyz
Commands:
goldsky indexed sync Commands related to syncing indexed.xyz real-time raw & decoded crypto datasets
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### indexed sync
```
goldsky indexed sync
```
How to use:
```
goldsky indexed sync
Commands related to syncing indexed.xyz real-time raw & decoded crypto datasets
Commands:
goldsky indexed sync decoded-logs Sync decoded logs for a smart contract from a network to this computer
goldsky indexed sync raw-blocks Sync all blocks from a network
goldsky indexed sync raw-logs Sync all logs from a network
goldsky indexed sync raw-transactions Sync all transactions from a network
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
#### indexed sync decoded-logs
```
goldsky indexed sync decoded-logs
```
How to use:
```
goldsky indexed sync decoded-logs
Sync decoded logs for a smart contract from a network to this computer
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--contract-address The contract address you are interested in [string] [default: ""]
--network The network of indexed.xyz data to synchronize [string] [default: "ethereum"]
-h, --help Show help [boolean]
```
#### indexed sync raw-blocks
```
goldsky indexed sync raw-blocks
```
How to use:
```
goldsky indexed sync raw-blocks
Sync all blocks from a network
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--network The network of indexed.xyz data to synchronize [string] [default: "ethereum"]
-h, --help Show help [boolean]
```
#### indexed sync raw-logs
```
goldsky indexed sync raw-logs
```
How to use:
```
goldsky indexed sync raw-logs
Sync all logs from a network
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--contract-address The contract address you are interested in [string] [default: ""]
--network The network of indexed.xyz data to synchronize [string] [default: "ethereum"]
-h, --help Show help [boolean]
```
#### indexed sync raw-transactions
```
goldsky indexed sync raw-transactions
```
How to use:
```
goldsky indexed sync raw-transactions
Sync all transactions from a network
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--network The network of indexed.xyz data to synchronize [string] [default: "ethereum"]
-h, --help Show help [boolean]
```
## secret
```
goldsky secret
```
How to use:
```
goldsky secret
Commands related to secret management
Commands:
goldsky secret create create a secret
goldsky secret list list all secrets
goldsky secret reveal reveal a secret
goldsky secret update update a secret
goldsky secret delete delete a secret
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### secret create
```
goldsky secret create
```
How to use:
```
goldsky secret create
create a secret
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--name the name of the new secret [string]
--value the value of the new secret in json [string]
--description the description of the new secret [string]
-h, --help Show help [boolean]
```
### secret list
```
goldsky secret list
```
How to use:
```
goldsky secret list
list all secrets
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### secret reveal
```
goldsky secret reveal
```
How to use:
```
goldsky secret reveal
reveal a secret
Positionals:
name the name of the secret [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### secret update
```
goldsky secret update
```
How to use:
```
goldsky secret update
update a secret
Positionals:
name the name of the secret [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
--value the new value of the secret [string]
--description the new description of the secret [string]
-h, --help Show help [boolean]
```
### secret delete
```
goldsky secret delete
```
How to use:
```
goldsky secret delete
delete a secret
Positionals:
name the name of the secret to delete [string] [required]
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-f, --force Force the deletion without prompting for confirmation [boolean] [default: false]
-h, --help Show help [boolean]
```
## telemetry
```
goldsky telemetry
```
How to use:
```
goldsky telemetry
Commands related to CLI telemetry
Commands:
goldsky telemetry status Display the CLI telemetry status
goldsky telemetry enable Enable anonymous CLI telemetry
goldsky telemetry disable Disable anonymous CLI telemetry
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### telemetry status
```
goldsky telemetry status
```
How to use:
```
goldsky telemetry status
Display the CLI telemetry status
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### telemetry enable
```
goldsky telemetry enable
```
How to use:
```
goldsky telemetry enable
Enable anonymous CLI telemetry
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
### telemetry disable
```
goldsky telemetry disable
```
How to use:
```
goldsky telemetry disable
Disable anonymous CLI telemetry
Options:
--token CLI Auth Token [string] [default: ""]
--color Colorize output [boolean] [default: true]
-h, --help Show help [boolean]
```
# Instant subgraph configuration
Source: https://docs.goldsky.com/reference/config-file/instant-subgraph
## Configuration schemas
Currently there is only a single configuration schema, [version 1](#version-1). This configuration file is required for [instant / no-code subgraphs](/subgraphs/guides/create-a-no-code-subgraph).
### Version 1
* **\[REQUIRED]** `version` (`string`, must be `"1"`) - The version of the configuration schema.
* ***\[OPTIONAL]*** `name` (`string`) - The name of the subgraph.
* **\[REQUIRED]** `abis` (map of `object`) - A map of ABI names to ABI source configurations.
* **\[REQUIRED]** `path` (`string`) - The path to the ABI source, relative to the configuration file.
* **\[REQUIRED]** `instances` (array of `object`) - A list of data source or data template instances to index.
* ***\[OPTIONAL]*** `enableCallHandlers` (`boolean`) - Whether to enable call handler indexing for the subgraph
*Note that `abis` also supports inline ABI definitions, either as the raw ABI array or as the JSON string.*
#### Data source instance
Data sources are instances derived from a single contract address.
* **\[REQUIRED]** `abi` (`string`) - The name of the ABI source.
* **\[REQUIRED]** `address` (`string`) - The contract address to index.
* **\[REQUIRED]** `startBlock` (`number`) - The block to start indexing from.
* **\[REQUIRED]** `chain` (`string`) - The chain to index on.
* ***\[OPTIONAL]*** `enrich` (`object`) - An object containing enrichment configurations.
#### Data template instance
Data templates are instances derived from an event emitted by a contract. The event signature must include an address parameter that contains the contract address that will be indexed.
* **\[REQUIRED]** `abi` (`string`) - The name of the ABI data template instance (e.g., the pool).
* **\[REQUIRED]** `source` (`object`) - The source event details to create a new data template instance.
* **\[REQUIRED]** `abi` (`string`) - The name of the ABI data template source (e.g., the factory).
* **\[REQUIRED]** `eventSignature` (`string`) - The event signature to listen for.
* **\[REQUIRED]** `addressParam` (`string`) - The parameter to extract the contract address from.
* ***\[OPTIONAL]*** `enrich` (`object`) - An object containing enrichment configurations.
#### Instance enrichment
Enrichments allow data source and template instances to be enriched by performing eth calls and mapping the outputs to one or more fields and/or entities.
* ***\[OPTIONAL]*** `options` (`object`) - enrichment options.
* ***\[OPTIONAL]*** `debugging` (`boolean`) - Flag to emit debugging logs.
* ***\[OPTIONAL]*** `imports` (array of `string`) - List of additional imports to include in the generated mapping file. You only need to include additional imports if you are using those types within your configuration.
* **\[REQUIRED]** `handlers` (map of `object`) - A map of trigger signatures to enrichment handler configurations (signature must be defined in the instance abi).
* ***\[OPTIONAL]*** `calls` (map of `object`) - A map of call reference names to eth call configurations. This can be omitted if mapping expressions do not require any eth calls.
* **\[REQUIRED]** `entities` (map of `object`) - A map of entity names to entity configurations.
#### Enrichment call configuration
Enrichment call configurations capture all information required to perform an eth call within the context of an existing event or call handler mapping function.
* ***\[OPTIONAL]*** `abi` (`string`) - The name of the abi defining the call to perform (if omitted then we'll use the instance abi).
* ***\[OPTIONAL]*** `source` (`string`) - The contract address source [expression](#enrichment-expressions) to use for the call (if omitted then we'll use the current instance source).
* **\[REQUIRED]** `name` (`string`) - The name of the eth call to perform. Note that this must be the exact name as defined in the ABI. The eth call invoker will actually call the `try_` function to safely handle a potential revert and prevent any errors in the subgraph due to an invalid eth call. If the eth call is required then the subgraph will result in an error state.
* ***\[OPTIONAL]*** `params` (`string`) - The parameter [expression](#enrichment-expressions) to use when performing the eth call (this can be omitted if the eth call requires no parameters, and must include all parameters separated by commas otherwise). e.g., `"event.params.owner, event.params.tokenId"`.
* ***\[OPTIONAL]*** `depends_on` (array of `string`) - List of call reference names that this call depends on (this should be used if a parameter is derived from a previously defined call).
* ***\[OPTIONAL]*** `required` (`boolean`) - Flag to indicate that the call must succeed for the enrichment to take place (if the call does not succeed then the enrichment is aborted and no enrichment entity mapping will take place).
* ***\[OPTIONAL]*** `declared` (`boolean`) - Flag to indicate that the call should be marked as declared, meaning that the call will be executed and the result cached prior to the mapping handler function being invoked.
* ***\[OPTIONAL]*** `conditions` (`object`) - Optional condition [expressions](#enrichment-expressions) to test before and after performing the call (if either condition fails then the enrichment is aborted and no enrichment entity mapping will take place).
* ***\[OPTIONAL]*** `pre` (`string`) - The condition to test before performing the call.
* ***\[OPTIONAL]*** `post` (`string`) - The condition to test after performing the call.
#### Enrichment entity configuration
Enrichement entity configurations are a map of field name and type to field value expressions. The configuration supports both a simplified single configuration and a multi-instance configuration. The single configuration is most likely all that is needed for most use cases, but if the need arises to describe an enriched entity where multiple instances are created within a single mapping (think of creating the same entity with different ids for the same event or call handler), then we can describe the entity as an array of configurations where each also includes an `id` expression for determining the unique `id` suffix.
* An entity field mapping key looks like ` `, e.g., `tokenId uint256`
* the field name can be any valid GraphQL schema field identifier, typically this would either be a *camelCase* or *snake\_case* string
* the field type can be any valid ABI type name
* An entity field mapping value is an [expression](#enrichment-expressions), e.g., `calls.owner.toHexString()`
* it must return a value of the type specified in the field mapping key (i.e., `address` must be converted to `string` using `.toHexString()`)
When configuring an entity for multiple instances, the configuration takes the following form
* **\[REQUIRED]** `id` (`string`) - The [expression](#enrichment-expressions) to determine the unique id suffix for the entity instance.
* ***\[OPTIONAL]*** `explicit_id` (`boolean`) - Flag to indicate that the id expression should be used as the explicit id for the entity instance (if omitted then the `id` expression will be appended to the parent entity `id`).
* **\[REQUIRED]** `mapping` (map of `object`) - A map of field name and type to field value expressions (as described above).
#### Enrichment expressions
Enrichment expressions are AssemblyScript expressions that can be used to produce static or dynamic values based on the available runtime context. The expression runtime context includes the `event` object (or the `call` object for call handlers), the (parent) `entity` object, and the `calls` object reference to all previously executed eth calls. Expressions can include any combination of string concatenation, type transformation, math result, or logical branching, meaning that there is a lot of customization available to the configuration when declaring an expression. Note however that static expressions may often be the most appropriate for simple enrichments.
Below each of the runtime context elements are described in more detail:
* `event` and `call` - The incoming event/call object to the mapping handler function. The parameters to this object will already be converted to the entity fields, one for each parameter defined in the corresponding ABI file.
* `entity` - The parent entity object to the mapping handler function, this entity will have already been saved before enrichment begins.
* `calls` - The object containing all previously executed eth calls. This object is used to reference the results of previous calls in the current call configuration. Calls not yet executed can still be referenced but they will be `null` until the call is invoked. Any calls that are marked `required` (or marked as a dependency of another call) will throw an error if accessed before the call is invoked.
## Explanation of common patterns
### Single source pattern
```json5
{
"version": "1",
"name": "TokenDeployed",
"abis": {
"TokenRegistry": {
"path": "./path/to/your/abi.json"
}
},
"instances": [
{
"abi": "TokenRegistry",
"address": "0x...",
"startBlock": 13983724,
"chain": "your_chain"
}
]
}
```
* `"version": "1"`: The version of this config, we only support a value of "1" right now.
* `"name": "TokenDeployed"`: The name of the event you want to track as specified in the ABI file.
* `"abis": { "TokenRegistry": { "path": "./path/to/your/abi.json" } }`: Mapping of ABIs names (can be anything you want) to ABI files.
* `"abi": "TokenRegistry"`: The ABI you want to track. This name must match a key in the `abis` object above.
* `"address": "0x...",`: The address of the contract.
* `"startBlock": 13983724`: The block from which you want your subgraph to start indexing (in most cases, this is the block that deployed your contract)
* `"chain": "your_chain"`: The chain you want to track this contract on
### Factory pattern
Some contracts create other child contracts, which then emit events that you need to track. The configuration here can handle that by allowing you specify a `source` inside an `instance` entry. The `source` tells the indexer which Factory contract event creates a new contract, and the address of the new contract as inferred from the event argument.
```json5
{
"version": "1",
"name": "TokenDeployed",
"abis": {
"Factory": {
"path": "./abis/factory.json"
},
"Pool": {
"path": "./abis/pool.json"
}
},
"instances": [
{
"abi": "Factory",
"address": "0xa98242820EBF3a405D265CCd22A4Ea8F64AFb281",
"startBlock": 16748800,
"chain": "bsc"
},
{
"abi": "Pool",
"source": {
"abi": "Factory",
"eventSignature": "PoolCreated(address,address,bool)",
"addressParam": "pool"
}
}
]
}
```
* `"Factory": { "path": "./abis/factory.json" }`: The path to the ABI for the Factory contract
* `"Pool": { "path": "./abis/pool.json"` }: The path the ABI for the contract deployed by the Factory contract
* `{ "abi": "Pool" }`: This is the main difference between the configuration for factory vs non-factory applications. In this example, the Factory contract makes new Pool contracts and the below configuration specifies that with this `source` object.
* `"source": { "abi": "Factory" }`: The ABI name which creates this contract.
* `"eventSignature": "PoolCreated(address,address,bool)",`: This is the signature of the event from the Factory contract which indicates that this contract was created.
* `"addressParam": "pool"`: The name of the parameter from the Factory contract's event that contains the new address to track.
In this pattern, there is a defined factory contract that makes many pools, and each pool needs to be tracked. We have two ABIs and the last `instance` entry looks for any `PoolCreated` event in the `Factory` ABI, gets a parameter from it, and uses that as a data source to watch for future `Pool` events in the `Pool` ABI.
### Enrichment pattern
```json5
{
"version": "1",
"name": "TokenDeployed",
"abis": {
"TokenRegistry": {
"path": "./path/to/your/abi.json"
}
},
"instances": [
{
"abi": "TokenRegistry",
"address": "0x...",
"startBlock": 13983724,
"chain": "your_chain"
"enrich": {
"Minted(address)": {
"calls": {
"balance": {
"name": "balanceOf",
"params": "event.params.owner"
},
},
"entities": {
"Balance": {
"owner address": "event.params.owner.toHexString()",
"balance uint256": "calls.balance"
}
}
}
}
}
]
}
```
* `"Minted(address)"`: the event signature (as defined in the `TokenRegistry` ABI) to perform the enrichment within.
* `"balance"`: the name of the call reference.
* `"name": "balanceOf"`: the name of the eth call to perform.
* `"params": "event.params.owner"`: the parameter to pass to the `balanceOf` eth call. `event` represents the incoming event object to the `Minted(address)` mapping handler function.
* `"Balance"`: the new enrichment entity name to create.
* `"owner address"`: the first field name and type for the entity. In this case we would see `Balance.owner` defined as a `String` in the generated schema because the `address` type serializes to a `String`.
* `"event.params.owner.toHexString()"`: the expression to determine the value for the `owner` field. `event` represents the incoming event object to the `Minted(address)` mapping handler function. Since `event.params.owner` is an `address` type, we need to convert it to a `String` using the `.toHexString()` method.
* `"balance uint256"`: the second field name and type for the entity. In this case we would see `Balance.balance` defined as a `BigInt` in the generated schema.
* `"calls.balance"`: the expression to determine the value for the `balance` field. `calls` represents the object containing all previously executed eth calls and `balance` refers to our call reference name.
## Examples
### Multi-chain
This example shows how to define multiple chains with many addresses.
```json
{
"name": "TokenDeployed",
"abis": {
"TokenRegistry": {
"path": "./abis/tokenRegistryAbi.json"
}
},
"instances": [
{
"abi": "TokenRegistry",
"address": "0x0A6f564C5c9BeBD66F1595f1B51D1F3de6Ef3b79",
"startBlock": 13983724,
"chain": "mainnet"
},
{
"abi": "TokenRegistry",
"address": "0x2d6775C1673d4cE55e1f827A0D53e62C43d1F304",
"startBlock": 13718798,
"chain": "avalanche"
},
{
"abi": "TokenRegistry",
"address": "0x10B84C73001745D969e7056D7ca474ce1D959FE8",
"startBlock": 59533,
"chain": "evmos"
},
{
"abi": "TokenRegistry",
"address": "0xa7E4Fea3c1468D6C1A3A77e21e6e43Daed855C1b",
"startBlock": 171256,
"chain": "moonbeam"
},
{
"abi": "TokenRegistry",
"address": "0x19d4b0F5871913c714554Bbb457F2a1549f52E04",
"startBlock": 1356181,
"chain": "milkomedac1"
}
]
}
```
This configuration results in multiple deployed subgraphs, each with an identical GraphQL schema for you to fetch data. If you prefer a combined view of the data across all deployed subgraphs, please have a look at [cross-chain subgraphs](/subgraphs/guides/create-a-multi-chain-subgraph).
### Nouns enrichment with balances on transfer
```json5
{
"version": "1",
"name": "nouns/1.0.0",
"abis": {
"nouns": {
"path": "./abis/nouns.json"
}
},
"instances": [
{
"abi": "nouns",
"address": "0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03",
"startBlock": 12985438,
"chain": "mainnet",
"enrich": {
"handlers": {
"Transfer(indexed address,indexed address,indexed uint256)": {
"calls": {
"nouns_balance": {
"name": "balanceOf",
"params": "event.params.to"
}
},
"entities": {
"EnrichmentBalance": {
"tokenId uint256": "event.params.tokenId",
"previousOwner address": "event.params.from.toHexString()",
"owner address": "event.params.to.toHexString()",
"nouns uint256": "calls.nouns_balance"
}
}
}
}
}
}
]
}
```
This configuration will create a new `EnrichmentBalance` entity that contains a `nouns` balance field for each `Transfer` event that occurs on the `nouns` contract. `Transfer` entities will automatically define an `enrichmentBalances` field that will yield an array of enrichment balances for each transfer event. Similarly, all `EnrichmentBalance` entities will define a `transfer` field that will yield the `Transfer` entity that triggered the enrichment. Below is an example GraphQL query to fetch transfers and enrichment balances in various ways.
```graphql
query NounsTransfersAndBalancesDemo {
enrichmentBalances(first:1, orderBy:timestamp_, orderDirection:desc) {
id
timestamp_
tokenId
previousOwner
owner
nouns
transfer {
id
transactionHash_
}
}
transfers(first:1, orderBy:timestamp_, orderDirection:desc) {
id
transactionHash_
timestamp_
tokenId
from
to
enrichmentBalances {
id
nouns
}
}
}
```
# Mirror Pipeline Configuration Schema
Source: https://docs.goldsky.com/reference/config-file/pipeline
Schema details for pipeline configurations
We recently released v3 of pipeline configurations which uses a more intuitive
and user friendly format to define and configure pipelines using a yaml file.
For backward compatibility purposes, we will still support the previous v2
format. This is why you will find references to each format in each yaml file
presented across the documentation. Feel free to use whichever is more
comfortable for you but we encourage you to start migrating to v3 format.
This page includes info on the full Pipeline configuration schema. For conceuptal learning about Pipelines, please refer to the [about Pipeline](/mirror/about-pipeline) page.
Name of the pipeline. Must only contain lowercase letters, numbers, hyphens
and should be less than 50 characters.
[Sources](/reference/config-file#sources) represent origin of the data into the pipeline.
Supported source types:
* [Subgraph Entities](/reference/config-file/pipeline#subgraphentity)
* [Datasets](/reference/config-file/pipeline#dataset)
[Transforms](/reference/config-file#transforms) represent data transformation logic to be applied to either a source and/or transform in the pipeline.
If your pipeline does not need to transform data, this attribute can be an empty object.
Supported transform types:
* [SQL](/reference/config-file/pipeline#sql)
* [Handler](/reference/config-file/pipeline#handler)
[Sinks](/reference/config-file#sinks) represent destination for source and/or transform data out of the pipeline.
Supported sink types:
* [PostgreSQL](/reference/config-file/pipeline#postgresql)
* [Clickhouse](/reference/config-file/pipeline#clickhouse)
* [MySQL](/reference/config-file/pipeline#mysql)
* [Elastic Search](/reference/config-file/pipeline#elasticsearch)
* [Open Search](/reference/config-file/pipeline#opensearch)
* [Kafka](/reference/config-file/pipeline#kafka)
* [File](/reference/config-file/pipeline#file)
* [SQS](/reference/config-file/pipeline#sqs)
* [DynamoDb](/reference/config-file/pipeline#dynamodb)
* [Webhook](/reference/config-file/pipeline#webhook)
It defines the amount of compute power to add to the pipeline. It can take one
of the following values: "s", "m", "l", "xl", "xxl". For new pipeline
creation, it defaults to "s". For updates, it defaults to the current
resource\_size of the pipeline.
Description of the pipeline.
## Sources
Represents the origin of the data into the pipeline. Each source has a unique name to be used as a reference in transforms/sinks.
`sources.` is used as the referenceable name in other transforms and sinks.
`definition.sources[idx].referenceName` is used as the referenceable name in other transforms and sinks.
### Subgraph Entity
Use your [subgraph](/mirror/sources/subgraphs) as a source for your pipeline.
#### Example
In the sources section of your pipeline configuration, you can add a `subgraph_entity` per subgraph entity that you want to use.
```yaml
sources:
subgraph_account:
type: subgraph_entity
name: account
subgraphs:
- name: qidao-optimism
version: 1.1.0
subgraph_market_daily_snapshot:
type: subgraph_entity
name: market_daily_snapshot
subgraphs:
- name: qidao-optimism
version: 1.1.0
```
In the sources section of your pipeline definition, you can add a `subgraphEntity` per subgraph entity that you want to use.
```yaml
sources:
- type: subgraphEntity
# The deployment IDs you gathered above. If you put multiple,
# they must have the same schema
deployments:
- id: QmPuXT3poo1T4rS6agZfT51ZZkiN3zQr6n5F2o1v9dRnnr
# A name, referred to later in the `sourceStreamName` of a transformation or sink
referenceName: account
entity:
# The name of the entities
name: account
- type: subgraphEntity
deployments:
- id: QmPuXT3poo1T4rS6agZfT51ZZkiN3zQr6n5F2o1v9dRnnr
referenceName: market_daily_snapshot
entity:
name: market_daily_snapshot
```
#### Schema
Unique name of the source. This is a user provided value.
Defines the type of the source, for Subgraph Entity sources, it is always `subgraph_entity`.
Description of the source
Entity `name` in your subgraph.
`earliest` processes data from the first block.
`latest` processes data from the latest block at pipeline start time.
Defaults to `latest`
Filter expression that does a [fast scan](/reference/config-file/pipeline#fast-scan) on the dataset. Only useful when `start_at` is set to `earliest`.
Expression follows the SQL standard for what comes after the WHERE clause. Few examples:
```yaml
address = '0x21552aeb494579c772a601f655e9b3c514fda960'
address = '0xb794f5ea0ba39494ce839613ff2qasdf34353dga' OR address = '0x21552aeb494579c772a601f655e9b3c514fda960'
address = '0xb794f5ea0ba39494ce839613ff2qasdf34353dga' AND amount > 500
```
References deployed subgraphs(s) that have the entity mentioned in the `name` attribute.
```yaml
subgraphs:
- name: polymarket
version: 1.0.0
```
Supports subgraphs deployed across multiple chains aka cross-chain usecase.
```yaml
subgraphs:
- name: polymarket
version: 1.0.0
- name: base
version: 1.1.0
```
[Cross-chain subgraph full example](/mirror/guides/merging-crosschain-subgraphs)
Unique name of the source. This is a user provided value.
Defines the type of the source, for Subgraph Entity sources, it is always `subgraphEntity`.
Description of the source
`earliest` processes data from the first block.
`latest` processes data from the latest block at pipeline start time.
Defaults to `latest`
Filter expression that does a [fast scan](/reference/config-file/pipeline#fast-scan) on the dataset. Only useful when `start_at` is set to `earliest`.
Expression follows the SQL standard for what comes after the WHERE clause. Few examples:
```yaml
address = '0x21552aeb494579c772a601f655e9b3c514fda960'
address = '0xb794f5ea0ba39494ce839613ff2qasdf34353dga' OR address = '0x21552aeb494579c772a601f655e9b3c514fda960'
address = '0xb794f5ea0ba39494ce839613ff2qasdf34353dga' AND amount > 500
```
References the entity of the deployed subgraph.
```yaml
entity:
name: fixed_product_market_maker
```
References deployed subgraphs(s) that have the entity mentioned in the `entity.name` attribute.
The value for the `id` is the ipfs hash of the subgraph.
```yaml
deployments:
- id: QmVcgRByfiFSzZfi7RZ21gkJoGKG2jeRA1DrpvCQ6ficNb
```
Supports subgraphs deployed across multiple chains aka cross-chain usecase:
```yaml
deployments:
- id: QmVcgRByfiFSzZfi7RZ21gkJoGKG2jeRA1DrpvCQ6ficNb
- id: QmaA9c8QcavxHJ7iZw6om2GHnmisBJFrnRm8E1ihBoAYjX
```
[Cross-chain subgraph full example](/mirror/guides/merging-crosschain-subgraphs)
### Dataset
Dataset lets you define [Direct Indexing](/mirror/sources/direct-indexing) sources. These data sources are curated by the Goldsky team, with automated QA guaranteeing correctness.
#### Example
```yaml
sources:
base_logs:
type: dataset
dataset_name: base.logs
version: 1.0.0
```
```
sources:
- type: dataset
referenceName: base.logs
version: 1.0.0
```
#### Schema
Unique name of the source. This is a user provided value.
Defines the type of the source, for Dataset sources, it is always `dataset`
Description of the source
Name of a goldsky dataset. Please use `goldsky dataset list` and select your chain of choice.
Please refer to [supported chains](/mirror/sources/direct-indexing#supported-chains) for an overview of what data is available for individual chains.
Version of the goldsky dataset in `dataset_name`.
`earliest` processes data from the first block.
`latest` processes data from the latest block at pipeline start time.
Defaults to `latest`
Filter expression that does a [fast scan](/reference/config-file/pipeline#fast-scan) on the dataset. Only useful when `start_at` is set to `earliest`.
Expression follows the SQL standard for what comes after the WHERE clause. Few examples:
```yaml
address = '0x21552aeb494579c772a601f655e9b3c514fda960'
address = '0xb794f5ea0ba39494ce839613ff2qasdf34353dga' OR address = '0x21552aeb494579c772a601f655e9b3c514fda960'
address = '0xb794f5ea0ba39494ce839613ff2qasdf34353dga' AND amount > 500
```
Unique name of the source. This is a user provided value.
Defines the type of the source, for Dataset sources, it is always `dataset`
Description of the source
Version of the goldsky dataset in `dataset_name`.
`earliest` processes data from the first block.
`latest` processes data from the latest block at pipeline start time.
Defaults to `latest`
Filter expression that does a [fast scan](/reference/config-file/pipeline#fast-scan) on the dataset. Only useful when `start_at` is set to `earliest`.
Expression follows the SQL standard for what comes after the WHERE clause. Few examples:
* `address = '0x21552aeb494579c772a601f655e9b3c514fda960'`
* `address = '0xb794f5ea0ba39494ce839613ff2qasdf34353dga' OR address = '0x21552aeb494579c772a601f655e9b3c514fda960'`
* `address = '0xb794f5ea0ba39494ce839613ff2qasdf34353dga' AND amount > 500`
#### Fast Scan
Processing full datasets (starting from `earliest`) (aka doing a **Backfill**) requires the pipeline to process significant amount of data which affects how quickly it reaches at edge (latest record in the dataset). This is especially true for datasets for larger chains.
However, in many use-cases, pipeline may only be interested in a small-subset of the historical data. In such cases, you can enable **Fast Scan** on your pipeline by defining the `filter` attribute in the `dataset` source.
The filter is pre-applied at the source level; making the initial ingestion of historical data much faster. When defining a `filter` please be sure to use attributes that exist in the dataset. You can get the schema of the dataset by running `goldsky dataset get `.
See example below where we pre-apply a filter based on contract address:
```yaml
sources:
base_logs:
type: dataset
dataset_name: base.logs
version: 1.0.0
filter: address = '0x21552aeb494579c772a601f655e9b3c514fda960'
```
```
sources:
- type: dataset
referenceName: base.logs
version: 1.0.0
filter: address = '0x21552aeb494579c772a601f655e9b3c514fda960'
```
## Transforms
Represents data transformation logic to be applied to either a source and/or transform in the pipeline. Each transform has a unique name to be used as a reference in transforms/sinks.
`transforms.` is used as the referenceable name in other transforms and sinks.
`definition.transforms[idx].referenceName` is used as the referenceable name in other transforms and sinks.
### SQL
SQL query that transforms or filters the data from a `source` or another `transform`.
#### Example
```yaml
transforms:
negative_fpmm_scaled_liquidity_parameter:
sql: SELECT id FROM polymarket.fixed_product_market_maker WHERE scaled_liquidity_parameter < 0
primary_key: id
```
```
transforms:
- referenceName: negative_fpmm_scaled_liquidity_parameter
type: sql
sql: SELECT id FROM polygon.fixed_product_market_maker WHERE scaled_liquidity_parameter < 0
primaryKey: id
```
#### Schema
Unique name of the transform. This is a user provided value.
Defines the type of the transform, for SQL transforms it is always `sql`
The SQL query to be executed on either source or transform in the pipeline.
The source data for sql transform is determined by the `FROM ` part of the query. Any source or transform can be referenced as SQL table.
The primary key for the transformation. If there are any two rows with the same primary\_key, the pipeline will override it with the latest value.
Unique name of the transform. This is a user provided value.
Defines the type of the transform, for SQL transforms it is always `sql`
The SQL query to be executed on either source or transform in the pipeline.
The source data for sql transform is determined by the `FROM ` part of the query. Any source or transform can be referenced as SQL table.
The primary key for the transformation. If there are any two rows with the same primaryKey, the pipeline will override it with the latest value.
### Handler
Lets you transform data by sending data to a [handler](/mirror/transforms/external-handlers) endpoint.
#### Example
```yaml
transforms:
my_external_handler_transform:
type: handler
primary_key: id
url: http://example-url/example-transform-route
from: ethereum.raw_blocks
```
#### Schema
Unique name of the transform. This is a user provided value.
Defines the type of the transform, for Handler transforms it is always `handler`
Endpoint to send the data for transformation.
Data source for the transform. Reference a source/transform defined in this pipeline.
Data sent to your handler will have the same schema as this source/transform.
The primary key for the transformation. If there are any two rows with the same primary\_key, the pipeline will override it with the latest value.
The primary key for the transformation. If there are any two rows with the same primary\_key, the pipeline will override it with the latest value.
Allows overriding the schema of the response data returned by the handler. Default is to expect the same schema as `source|transform` referenced in the `from` attribute.
A map of column names to Flink SQL datatypes. If the handler response schema changes the pipeline needs to be re-deployed with this attribute updated.
To add a new attribute: `new_attribute_name: datatype`
To remove an existing attribute: `existing_attribute_name: null`
To change an existing attribute's datatype: `existing_attribute_name: datatype`
| Data Type | Notes |
| -------------- | ----------------------------------- |
| STRING | |
| BOOLEAN | |
| BYTE | |
| DECIMAL | Supports fixed precision and scale. |
| SMALLINT | |
| INTEGER | |
| BIGINT | |
| FLOAT | |
| DOUBLE | |
| TIME | Supports only a precision of 0. |
| TIMESTAMP | |
| TIMESTAMP\_LTZ | |
| ARRAY | |
| ROW | |
Headers to be sent in the request from the pipeline to the handler endpoint.
A common use case is to pass any tokens your server requires for authentication or any metadata.
Goldksy secret name that contains credentials for calls between the pipeline and the handler.
For handler transform, use the `httpauth` secret type.
Unique name of the transform. This is a user provided value.
Defines the type of the transform, for SQL transforms it is always `sql`
The SQL query to be executed on either source or transform in the pipeline.
The source data for sql transform is determined by the `FROM ` part of the query. Any source or transform can be referenced as SQL table.
The primary key for the transformation. If there are any two rows with the same primaryKey, the pipeline will override it with the latest value.
## Sinks
Represents destination for source and/or transform data out of the pipeline. Since sinks represent the end of the dataflow in the pipeline, unlike source and transform, it does not need to be referenced elsewhere in the configuration.
Most sinks are either databases such as `postgresql`, `dynamodb` etc. Or channels such as `kafka`, `sqs` etc.
Also, most sinks are provided by the user, hence the pipeline needs credentials to be able to write data to a sink. Thus, users need to create a Goldsky Secret and reference it in the sink.
### PostgreSQL
Lets you sink data to a [PostgreSQL](/mirror/sinks/postgres) table.
#### Example
```yaml
sinks:
postgres_test_negative_fpmm_scaled_liquidity_parameter:
type: postgres
from: negative_fpmm_scaled_liquidity_parameter
table: test_negative_fpmm_scaled_liquidity_parameter
schema: public
secret_name: API_POSTGRES_CREDENTIALS
```
```
sinks:
- type: postgres
sourceStreamName: negative_fpmm_scaled_liquidity_parameter
referenceName: postgres_test_negative_fpmm_scaled_liquidity_parameter
table: test_negative_fpmm_scaled_liquidity_parameter
schema: public
secretName: API_POSTGRES_CREDENTIALS
```
#### Schema
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for postgresql it is always `postgressql`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
The destination table. It will be created if it doesn't exist. Schema is defined in the secret credentials.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For postgres sink, use the `jdbc` secret type.
The number of records the pipeline will send together in a batch. Default `100`
The maximum time the pipeline will batch records before flushing to sink. Default: '1s'
Enables auto commit. Default: `true`
Rewrite individual insert statements into multi-value insert statements. Default `true`
Optional column that will be used to select the 'correct' row in case of conflict using the 'greater' wins strategy: - ie later date, higher number.
The column must be numeric.
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for postgresql it is always `postgressql`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
The destination table. It will be created if it doesn't exist. Schema is defined in the secret credentials.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For postgres sink, use the `jdbc` secret type.
The maximum time (in milliseconds) the pipeline will batch events. Default `100`
The maximum time the pipeline will batch events before flushing to sink. Default: '1s'
Enables auto commit. Default: `true`
Rewrite individual insert statements into multi-value insert statements. Default `true`
Optional column that will be used to select the 'correct' row in case of conflict using the 'greater' wins strategy: - ie later date, higher number.
The column must be numeric.
### Clickhouse
Lets you sink data to a [Clickhouse](/mirror/sinks/clickhouse) table.
#### Example
v3 example
to do v2 example
#### Schema
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Clickhouse it is always `clickhouse`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For postgres sink, use the `jdbc` secret type.
The destination table. It will be created if it doesn't exist. Schema is defined in the secret credentials.
The maximum time (in milliseconds) the pipeline will batch records. Default `1000`
The maximum time the pipeline will batch records before flushing to sink. Default: '1s'
Only do inserts on the table and not update or delete.
Increases insert speed and reduces Flush exceptions (which happen when too many mutations are queued up).
More details in the [Clickhouse](/mirror/sinks/clickhouse#append-only-mode) guide. Default `true`.
Column name to be used as a version number. Only used in `append_only_mode = true`.
Use a different primary key than the one that automatically inferred from the source and/or transform.
Ability to override the automatic schema propagation from the pipeline to Clickhouse. Map of `column_name -> clickhouse_datatype`
Useful in situations when data type is incompatible between the pipeline and Clickhouse. Or when wanting to use specific type for a column.
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Clickhouse it is always `clickhouse`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For postgres sink, use the `jdbc` secret type.
The destination table. It will be created if it doesn't exist. Schema is defined in the secret credentials.
The maximum time (in milliseconds) the pipeline will batch records. Default `1000`
The maximum time the pipeline will batch records before flushing to sink. Default: '1s'
Only do inserts on the table and not update or delete.
Increases insert speed and reduces Flush exceptions (which happen when too many mutations are queued up).
More details in the [Clickhouse](/mirror/sinks/clickhouse#append-only-mode) guide. Default `true`.
Column name to be used as a version number. Only used in `append_only_mode = true`.
Use a different primary key than the one that automatically inferred from the source and/or transform.
Ability to override the automatic schema propagation from the pipeline to Clickhouse. Map of `column_name -> clickhouse_datatype`
Useful in situations when data type is incompatible between the pipeline and Clickhouse. Or when wanting to use specific type for a column.
### MySQL
Lets you sink data to a [MySQL](/mirror/sinks/mysql) table.
#### Example
```yaml
sinks:
postgres_test_negative_fpmm_scaled_liquidity_parameter:
type: postgres
from: negative_fpmm_scaled_liquidity_parameter
table: test_negative_fpmm_scaled_liquidity_parameter
schema: public
secret_name: API_POSTGRES_CREDENTIALS
```
```
sinks:
- type: postgres
sourceStreamName: negative_fpmm_scaled_liquidity_parameter
referenceName: postgres_test_negative_fpmm_scaled_liquidity_parameter
table: test_negative_fpmm_scaled_liquidity_parameter
schema: public
secretName: API_POSTGRES_CREDENTIALS
```
#### Schema
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for postgresql it is always `postgressql`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Database name
The destination table. It will be created if it doesn't exist. Schema is defined in the secret credentials.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For postgres sink, use the `jdbc` secret type.
The maximum time (in milliseconds) the pipeline will batch events. Default `100`
The maximum time the pipeline will batch events before flushing to sink. Default: '1s'
Enables auto commit. Default: `true`
Rewrite individual insert statements into multi-value insert statements. Default `true`
Optional column that will be used to select the 'correct' row in case of conflict using the 'greater' wins strategy: - ie later date, higher number.
The column must be numeric.
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for postgresql it is always `postgressql`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Database name
The destination table. It will be created if it doesn't exist. Schema is defined in the secret credentials.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For postgres sink, use the `jdbc` secret type.
The maximum time (in milliseconds) the pipeline will batch events. Default `100`
The maximum time the pipeline will batch events before flushing to sink. Default: '1s'
Enables auto commit. Default: `true`
Rewrite individual insert statements into multi-value insert statements. Default `true`
Optional column that will be used to select the 'correct' row in case of conflict using the 'greater' wins strategy: - ie later date, higher number.
The column must be numeric.
### Elastic Search
Lets you sink data to a [Elastic Search](/mirror/sinks/postgres) index.
#### Example
v3 example
v2 example
#### Schema
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Elastic Search it is always `elasticsearch`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Elastic search index to write to.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For Elastic Search sink, use the `elasticSearch` secret type.
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Elastic Search it is always `elasticsearch`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Elastic search index to write to.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For Elastic Search sink, use the `elasticSearch` secret type.
### Open Search
#### Example
```yaml
sinks:
my_elasticsearch_sink:
description: Type.Optional(Type.String())
type: elasticsearch
from: Type.String()
index: Type.String()
secret_name: Type.String()
```
```
sinks:
- type: elasticsearch
sourceStreamName: negative_fpmm_scaled_liquidity_parameter
referenceName: postgres_test_negative_fpmm_scaled_liquidity_parameter
index: test_negative_fpmm_scaled_liquidity_parameter
secretName: API_POSTGRES_CREDENTIALS
```
#### Schema
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Elastic Search it is always `elasticsearch`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Elastic search index to write to.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For Elastic Search sink, use the `elasticSearch` secret type.
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Elastic Search it is always `elasticsearch`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Elastic search index to write to.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For Elastic Search sink, use the `elasticSearch` secret type.
### Kafka
Lets you sink data to a [Kafka](/mirror/extensions/channels/kafka) topic.
#### Example
```yaml
sinks:
kafka_topic_sink:
type: kafka
from: my_source
topic: accounts
secret_name: KAFKA_SINK_SECRET_CR343D
topic_partitions: 2
```
```yaml
sinks:
- type: kafka
sourceStreamName: my_source
topic: accounts
secretName: KAFKA_SINK_SECRET_CR343D
topicPartitions: 2
```
#### Schema
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Kafka sink it is always `kafka`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Kafka topic name to write to. Will be created if it does not exist.
Number of paritions to be set in the topic. Only applicable if topic does not exists.
When set to `true`, the sink will emit tombstone messages (null values) for DELETE operations instead of the actual payload. This is useful for maintaining the state in Kafka topics where the latest state of a key is required, and older states should be logically deleted. Default `false`
Format of the record in the topic. Supported types: `json`, `avro`. Requires Schema Registry credentials in the secret for `avro` type.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For Kafka sink, use the `kafka` secret type.
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Kafka sink it is always `kafka`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Kafka topic name to write to.
To be used when creating the topic, in case it does not exist.
When set to `true`, the sink will emit tombstone messages (null values) for DELETE operations instead of the actual payload. This is useful for maintaining the state in Kafka topics where the latest state of a key is required, and older states should be logically deleted. Default `false`
Format of the record in the topic. Supported types: `json`, `avro`. Requires Schema Registry credentials in the secret for `avro` type.
Default: `avro`
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For Kafka sink, use the `kafka` secret type.
### File
#### Example
```yaml
sinks:
s3_write:
type: file
path: s3://goldsky/linea/traces/
format: parquet
from: linea.traces
secret_name: GOLDSKY_S3_CREDS
```
```
sinks:
- type: file
sourceStreamName: linea.traces
referenceName: s3_write
path: s3://goldsky/linea/traces/
secretName: GOLDSKY_S3_CREDS
```
#### Schema
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for File sink it is always `file`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Path to write to. Use prefix `s3://`. Currently, only `S3` is supported.
Format of the output file. Supported types: `parquet`, `csv`.
Enables auto-compaction which helps optimize the output file size. Default `false`
Columns to be used for partitioning. Multiple columns are comma separated. For eg: `"col1,col2"`
The maximum sink file size before creating a new one. Default: `128MB`
The maximum time the pipeline will batch records before flushing to sink. Default: `30min`
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for File sink it is always `file`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Path to write to. Use prefix `s3://`. Currently, only `S3` is supported.
Format of the output file. Supported types: `parquet`, `csv`.
Enables auto-compaction which helps optimize the output file size. Default `false`
Columns to be used for partitioning. Multiple columns are comma separated. For eg: `"col1,col2"`
The maximum sink file size before creating a new one. Default: `128MB`
The maximum time the pipeline will batch records before flushing to sink. Default: `30min`
### DynamoDB
#### Example
```yaml
sinks:
postgres_test_negative_fpmm_scaled_liquidity_parameter:
type: postgres
from: negative_fpmm_scaled_liquidity_parameter
table: test_negative_fpmm_scaled_liquidity_parameter
schema: public
secret_name: API_POSTGRES_CREDENTIALS
```
```
sinks:
- type: postgres
sourceStreamName: negative_fpmm_scaled_liquidity_parameter
referenceName: postgres_test_negative_fpmm_scaled_liquidity_parameter
table: test_negative_fpmm_scaled_liquidity_parameter
schema: public
secretName: API_POSTGRES_CREDENTIALS
```
#### Schema
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Clickhouse it is always `clickhouse`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For DynamoDB sink, use the `dynamodb` secret type.
The destination table. It will be created if it doesn't exist.
Endpoint override, useful when writing to a DynamoDB VPC
Maximum number of requests in flight. Default `50`
Batch max size. Default: `25`
Maximum number of records to buffer. Default: `10000`
Fail the sink on write error. Default `false`
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Clickhouse it is always `clickhouse`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
For DynamoDB sink, use the `dynamodb` secret type.
The destination table. It will be created if it doesn't exist.
Endpoint override, useful when writing to a DynamoDB VPC
Maximum number of requests in flight. Default `50`
Batch max size. Default: `25`
Maximum number of records to buffer. Default: `10000`
Fail the sink on write error. Default `false`
### Webhook
#### Example
```yaml
sinks:
webhook_publish:
type: webhook
from: base.logs
url: https://webhook.site/d06324e8-d273-45b4-a18b-c4ad69c6e7e6
secret_name: WEBHOOK_SECRET_CM3UPDBJC0
```
```
sinks:
- type: webhook
sourceStreamName: base.logs
referenceName: webhook_publish
url: https://webhook.site/d06324e8-d273-45b4-a18b-c4ad69c6e7e6
secretName: WEBHOOK_SECRET_CM3UPDBJC0
```
#### Schema
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Webhook sinks it is always `webhook`
Defines the URL to send the record(s) to.
Send only one record per call to the provided url
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
Use this if you do not want to expose authenciation details in plain text in the `headers` attribute.
For webhook sink, use the `httpauth` secret type.
Headers to be sent in the request from the pipeline to the url
User provided description.
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for Webhook sinks it is always `webhook`
Defines the URL to send the record(s) to.
Send only one record per call to the provided url
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
Use this if you do not want to expose authenciation details in plain text in the `headers` attribute.
Use `httpauth` secret type.
Headers to be sent in the request from the pipeline to the url
User provided description.
### SQS
Lets you sink data to a [AWS SQS](/mirror/extensions/channels/aws-sqs) topic.
#### Example
```yaml
sinks:
my_sqs_sink:
type: sqs
url: https://sqs.us-east-1.amazonaws.com/335342423/dev-logs
secret_name: SQS_SECRET_IAM
from: my_transform
```
```yaml
sinks:
- type: sqs
referenceName: my_sqs_sink
url: https://sqs.us-east-1.amazonaws.com/335342423/dev-logs
secretName: SQS_SECRET_IAM
sourceStreamName: my_transform
```
#### Schema
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for postgresql it is always `postgressql`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
Use this if you do not want to expose authenciation details in plain text in the `headers` attribute.
For sqs sink, use the `sqs` secret type.
SQS topic URL
Fail the sink on write error. Default `false`
Unique name of the sink. This is a user provided value.
Defines the type of the sink, for postgresql it is always `postgressql`
User provided description.
Data source for the sink. Reference to either a source or a transform defined in this pipeline.
Goldksy secret name that contains credentials for calls between the pipeline and the sink.
Use this if you do not want to expose authenciation details in plain text in the `headers` attribute.
For sqs sink, use the `sqs` secret type.
SQS topic URL
Fail the sink on write error. Default `false`
## Pipeline runtime attributes
While sources, transforms and sinks define the business logic of your pipeline. There are attributes that change the pipeline execution/runtime.
If you need a refresher on the of pipelines make sure to check out [About Pipeline](/mirror/about-pipeline), here we'll just focus on specific attributes.
Following are request-level attributes that only controls the behavior of a particular request on the pipeline. These attributes should be passed via arguments to the `goldsky pipeline apply ` command.
Defines the desired status for the pipeline which can be one of the three: "ACTIVE", "INACTIVE", "PAUSED". If not provided it will default to the current status of the pipeline.
Defines whether the pipeline should attempt to create a fresh snapshot before this configuration is applied. The pipeline needs to be in a healthy state for snapshot to be created successfully. It defaults to `true`.
Defines whether the pipeline should be started from the latest available snapshot. This attribute is useful in restarting scenarios.
To restart a pipeline from scratch, use `--use_latest_snapshot false`. It defaults to `true`.
Instructs the pipeline to restart. Useful in scenarios where the pipeline needs to be restarted but no configuration change is needed. It defaults to `undefined`.
## Pipeline Runtime Commands
Commands that change the pipeline runtime. Many commands aim to abstract away the above attributes into meaningful actions.
#### Start
There are multiple ways to do this:
* `goldsky pipeline start `
* `goldsky pipeline apply --status ACTIVE`
This command will have no effect on pipeline that already has a desired status of `ACTIVE`.
#### Pause
Pause will attempt to take a snapshot and stop the pipeline so that it can be resumed later.
There are multiple ways to do this:
* `goldsky pipeline pause `
* `goldsky pipeline apply --status PAUSED`
#### Stop
Stopping a pipeline does not attempt to take a snapshot.
There are multiple ways to do this:
* `goldsky pipeline stop `
* `goldsky pipeline apply --status INACTIVE --from-snapshot none`
* `goldsky pipeline apply --status INACTIVE --save-progress false` (prior to CLI version `11.0.0`)
#### Update
Make any needed changes to the pipeline configuration file and run `goldsky pipeline apply `.
By default any update on a `RUNNING` pipeline will attempt to take a snapshot before applying the update.
If you'd like to avoid taking snapshot as part of the update, run:
* `goldsky pipeline apply --from-snapshot last`
* `goldsky pipeline apply --save-progress false` (prior to CLI version `11.0.0`)
This is useful in a situations where the pipeline is running into issues, hence the snapshot will not succeed, blocking the update that is to fix the issue.
#### Resize
Useful in scenarios where the pipeline is running into resource constraints.
There are multiple ways to do this:
* `goldsky pipeline resize `
* `goldsky pipeline apply ` with the config file having the attribute:
```
resource_size: xl
```
#### Restart
Useful in the scenarios where a restart is needed but there are no changes in the configuration. For example, pipeline sink's database connection got stuck because the database has restarted.
There are multiple ways to restart a RUNNING pipeline without any configuration changes:
1. `goldsky pipeline restart --from-snapshot last|none`
The above command will attempt to restart the pipeline.
To restart with no snapshot aka from scratch, provide the `--from-snapshot none` option.
To restart with last available snapshot, provide the `--from-snapshot last` option.
2. `goldsky pipeline apply --restart` (CLI version below 10.0.0)
By default, the above command will will attempt a new snapshot and start the pipeline from that particular snapshot.
To avoid using any existing snapshot or triggering a new one (aka starting from scratch) add the `--from-snapshot none` or `--save-progress false --use-latest-snapshot false` if you are using CLI version older than `11.0.0`.
#### Monitor
Provides pipeline runtime information that is helpful for monitoring/developing a pipeline. Although this command does not change the runtime, it provides info like status, metrics, logs etc. that helps with devleloping a pipeline.
`goldsky pipeline monitor `
# Event Decoding Functions
Source: https://docs.goldsky.com/reference/mirror-functions/decoding-functions
Mirror provides 3 custom functions which can be in used within transforms to decode raw contract events during processing: `_gs_log_decode`, `_gs_tx_decode` and `_gs_fetch_abi`.
## \_gs\_log\_decode
This function decodes [Raw Logs](/reference/schema/EVM-schemas#raw-logs) data given a json string representing the ABI string. Since it is stateless it will scale very well across different pipeline sizes.
It will automatically use the matching event in the ABI, but partial ABIs that only contain the target event will also work.
```Function Function definition
_gs_log_decode(string abi, string topics, string data)
```
#### Params
A string representing a json array of events and functions
Name of the column the dataset which contains a string containing the topics, comma separated.
Name of the column the dataset which contains a string containing the payload of the event.
#### Return
The function will output a [nested ROW](https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/types/#row) type with `event_param::TEXT[]` and `event_signature::TEXT`. If youβre planning on using a sink that doesnβt support nested ROWs, you may want to do a further transformation to unnest the result.
#### Examples
If you were using the [Raw Logs](/reference/schema/EVM-schemas#raw-logs) source dataset, you would call this function passing your ABI and using the 'topics' and 'data' columns. For example:
```sql
SELECT
_gs_log_decode('[
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"name": "src",
"type": "address"
},
{
"indexed": true,
"name": "dst",
"type": "address"
},
{
"indexed": false,
"name": "wad",
"type": "uint256"
}
],
"name": "Transfer",
"type": "event"
}
]', `topics`, `data`) as decoded
from base.raw_logs
```
You would then able to access both topics and data from the decoded column as `decoded.event_params` and `decoded.event_signature` in a second transform. See below for a complete example pipeline.
## \_gs\_tx\_decode
This function decodes [Raw Traces](/reference/schema/EVM-schemas#raw-traces) data given a json string representing the ABI string. Since it is stateless it will scale very well across different pipeline sizes.
It will automatically use the matching function in the ABI, but partial ABIs that only contain the target function will also work.
```Function Function definition
_gs_tx_decode(string abi, string input, string output)
```
#### Params
A string representing a json array of events and functions
Name of the column the dataset which contains a string containing the data sent along with the message call.
Name of the column the dataset which contains a string containing the data returned by the message call.
#### Return
The function will output a [nested ROW](https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/types/#row) type with the name of the function along with its inputs and outputs as arrays of strings. If youβre planning on using a sink that doesnβt support nested ROWs, you may want to do a further transformation to unnest the result.
#### Examples
If you were using the [Raw Traces](/reference/schema/EVM-schemas#raw-traces) source dataset, you would call this function passing your ABI and using the 'input' and 'output' columns. For example:
```sql
SELECT
_gs_tx_decode('[
{
"inputs": [
{
"internalType": "address",
"name": "sharesSubject",
"type": "address"
},
{
"internalType": "uint256",
"name": "amount",
"type": "uint256"
}
],
"name": "getBuyPriceAfterFee",
"outputs": [
{
"internalType": "uint256",
"name": "",
"type": "uint256"
}
],
"stateMutability": "view",
"type": "function"
}
]', `input`, `output`) as decoded
from base.raw_logs
```
You would then able to access both function and its inputs and outputs from the decoded columns as `decoded.function`, `decoded.decoded_inputs` and `decoded.decoded_outputs` in a second transform. You can see an example in [this guide](/mirror/guides/decoding-traces).
## \_gs\_fetch\_abi
We provide a convenient function for fetching ABIs, as often they are too big to copy and paste into the yaml definition.
The ABI will be fetched once when a pipeline starts. If a pipeline is updated to a new version, or restarted, the ABI will be fetched again. It will not be re-read at any point while a pipeline is running.
```Function Function definition
_gs_fetch_abi(string url, string type)
```
#### Params
The URL from where the ABI will be fetched
The type of url. Two types of value are accepted:
* `etherscan` for etherscan or etherscan-compatible APIs
* `raw` for json array ABIs.
If you use etherscan-compatible APIs it's highly recommended to include your own API key if you are using this function with a large pipeline. The amount of workers may result in the
API limits being surpassed.
#### Examples
Following up on the previous example, we can replace the raw ABI definition by a call to basescan using the \_gs\_log\_decode function:
```sql
select
# Call the EVM decoding function
_gs_log_decode(
# This fetches the ABI from basescan, a `etherscan` compatible site.
_gs_fetch_abi('https://api.basescan.org/api?module=contract&action=getabi&address=0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4', 'etherscan'),
`topics`,
`data`
) as `decoded`,
from base.raw_logs
```
In some cases, you may prefer to host the ABI yourself and retrieve it from your a separate server such as Github Gist:
```sql
select
# Call the EVM decoding function
_gs_log_decode(
# This fetches the ABI from Github Gist
_gs_fetch_abi('https://gist.githubusercontent.com/JavierTrujilloG/bde43d5079ea5d03edcc68b4516fd297/raw/7b32cf313cd4810f65e726e531ad065eecc47dc1/friendtech_base.json', 'raw'),
`topics`,
`data`
) as `decoded`,
from base.raw_logs
```
## Pipeline Example
Below is an example pipeline definition for decoding events for Friendtech contract in Base, make sure to visit [this guide](/mirror/guides/decoding-contract-events) for a more in-depth explanation of this pipeline:
```yaml
name: friendtech-decoded-events
apiVersion: 3
sources:
my_base_raw_logs:
type: dataset
dataset_name: base.raw_logs
version: 1.0.0
transforms:
friendtech_decoded:
primary_key: id
# Fetch the ABI from basescan, then use it to decode from the friendtech address.
sql: >
select
`id`,
_gs_log_decode(
_gs_fetch_abi('https://api.basescan.org/api?module=contract&action=getabi&address=0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4', 'etherscan'),
`topics`,
`data`
) as `decoded`,
block_number,
transaction_hash
from my_base_raw_logs
where address='0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4'
friendtech_clean:
primary_key: id
# Clean up the previous transform, unnest the values from the `decoded` object.
sql: >
select
`id`,
decoded.event_params as `event_params`,
decoded.event_signature as `event_signature`,
block_number,
transaction_hash
from friendtech_decoded
where decoded is not null
sinks:
friendtech_events:
secret_name: EXAMPLE_SECRET
type: postgres
from: friendtech_clean
schema: decoded_events
table: friendtech
```
```yaml
sources:
- type: dataset
referenceName: base.raw_logs
version: 1.0.0
transforms:
- referenceName: friendtech_decoded
type: sql
primaryKey: id
# Fetch the ABI from basescan, then use it to decode from the friendtech address.
sql: >
select
`id`,
_gs_log_decode(
_gs_fetch_abi('https://api.basescan.org/api?module=contract&action=getabi&address=0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4', 'etherscan'),
`topics`,
`data`
) as `decoded`,
block_number,
transaction_hash
from base.raw_logs
where address='0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4'
- referenceName: friendtech_clean
primaryKey: id
type: sql
# Clean up the previous transform, unnest the values from the `decoded` object.
sql: >
select
`id`,
decoded.event_params as `event_params`,
decoded.event_signature as `event_signature`,
block_number,
transaction_hash
from friendtech_decoded
where decoded is not null
sinks:
- referenceName: friendtech_events
secretName: EXAMPLE_SECRET
type: postgres
sourceStreamName: friendtech_clean
schema: decoded_events
table: friendtech
```
# EVM chains' data schemas
Source: https://docs.goldsky.com/reference/schema/EVM-schemas
### Blocks
| Field | | Type |
| ------------------- | - | ------ |
| id | | STRING |
| number | | LONG |
| hash | | STRING |
| parent\_hash | | STRING |
| nonce | | STRING |
| sha3\_uncles | | STRING |
| logs\_bloom | | STRING |
| transactions\_root | | STRING |
| state\_root | | STRING |
| receipts\_root | | STRING |
| miner | | STRING |
| difficulty | | DOUBLE |
| total\_difficulty | | DOUBLE |
| size | | LONG |
| extra\_data | | STRING |
| gas\_limit | | LONG |
| gas\_used | | LONG |
| timestamp | | LONG |
| transaction\_count | | LONG |
| base\_fee\_per\_gas | | LONG |
| withdrawals\_root | | STRING |
### Enriched Transactions
| Field | | Type |
| ------------------------------------ | - | ------- |
| id | | STRING |
| hash | | STRING |
| nonce | | LONG |
| block\_hash | | STRING |
| block\_number | | LONG |
| transaction\_index | | LONG |
| from\_address | | STRING |
| to\_address | | STRING |
| value | | DECIMAL |
| gas | | DECIMAL |
| gas\_price | | DECIMAL |
| input | | STRING |
| max\_fee\_per\_gas | | DECIMAL |
| max\_priority\_fee\_per\_gas | | DECIMAL |
| transaction\_type | | LONG |
| block\_timestamp | | LONG |
| receipt\_cumulative\_gas\_used | | DECIMAL |
| receipt\_gas\_used | | DECIMAL |
| receipt\_contract\_address | | STRING |
| receipt\_status | | LONG |
| receipt\_effective\_gas\_price | | DECIMAL |
| receipt\_root\_hash | | STRING |
| receipt\_l1\_fee | | DECIMAL |
| receipt\_l1\_gas\_used | | DECIMAL |
| receipt\_l1\_gas\_price | | DECIMAL |
| receipt\_l1\_fee\_scalar | | DOUBLE |
| receipt\_l1\_blob\_base\_fee | | DECIMAL |
| receipt\_l1\_blob\_base\_fee\_scalar | | LONG |
| blob\_versioned\_hashes | | ARRAY |
| max\_fee\_per\_blob\_gas | | DECIMAL |
| receipt\_l1\_block\_number | | LONG |
| receipt\_l1\_base\_fee\_scalar | | LONG |
| gateway\_fee | | LONG |
| fee\_currency | | STRING |
| gateway\_fee\_recipient | | STRING |
### Logs
| Field | | Type |
| ------------------ | - | ------ |
| id | | STRING |
| block\_number | | LONG |
| block\_hash | | STRING |
| transaction\_hash | | STRING |
| transaction\_index | | LONG |
| log\_index | | LONG |
| address | | STRING |
| data | | STRING |
| topics | | STRING |
| block\_timestamp | | LONG |
### Raw traces
| Field | | Type |
| ------------------ | - | ------- |
| id | | STRING |
| block\_number | | LONG |
| block\_hash | | STRING |
| transaction\_hash | | STRING |
| transaction\_index | | LONG |
| from\_address | | STRING |
| to\_address | | STRING |
| value | | DECIMAL |
| input | | STRING |
| output | | STRING |
| trace\_type | | STRING |
| call\_type | | STRING |
| reward\_type | | STRING |
| gas | | LONG |
| gas\_used | | LONG |
| subtraces | | LONG |
| trace\_address | | STRING |
| error | | STRING |
| status | | LONG |
| trace\_id | | STRING |
| block\_timestamp | | LONG |
# Curated data schemas
Source: https://docs.goldsky.com/reference/schema/curated-schemas
## Token Transfers
### ERC-20
| Field | | Type |
| ------------------ | - | ------- |
| id | | STRING |
| contract\_address | | STRING |
| sender | | STRING |
| recipient | | STRING |
| amount | | DECIMAL |
| transaction\_hash | | STRING |
| block\_hash | | STRING |
| block\_number | | LONG |
| block\_timestamp | | LONG |
| transaction\_index | | LONG |
| log\_index | | LONG |
### ERC-721
| Field | | Type |
| ------------------ | - | ------- |
| id | | STRING |
| contract\_address | | STRING |
| sender | | STRING |
| recipient | | STRING |
| token\_id | | DECIMAL |
| transaction\_hash | | STRING |
| block\_hash | | STRING |
| block\_number | | LONG |
| block\_timestamp | | LONG |
| transaction\_index | | LONG |
| log\_index | | LONG |
### ERC-1155
| Field | | Type |
| ------------------ | - | ------- |
| id | | STRING |
| contract\_address | | STRING |
| sender | | STRING |
| recipient | | STRING |
| token\_id | | DECIMAL |
| amount | | DECIMAL |
| transaction\_hash | | STRING |
| block\_hash | | STRING |
| block\_number | | LONG |
| block\_timestamp | | LONG |
| transaction\_index | | LONG |
| log\_index | | LONG |
## Polymarket Datasets
### Global Open Interest
| Field | | Type |
| ------------ | - | ------- |
| vid | | LONG |
| block\_range | | STRING |
| id | | STRING |
| amount | | DECIMAL |
| \_gs\_chain | | STRING |
| \_gs\_gid | | STRING |
### User Positions
| Field | | Type |
| ------------- | - | ------ |
| vid | | LONG |
| block\_range | | STRING |
| id | | STRING |
| user | | STRING |
| token\_id | | STRING |
| amount | | LONG |
| avg\_price | | LONG |
| realized\_pnl | | LONG |
| total\_bought | | LONG |
| \_gs\_chain | | STRING |
| \_gs\_gid | | STRING |
### User Balances
| Field | | Type |
| ------------ | - | ------- |
| vid | | LONG |
| block\_range | | STRING |
| id | | STRING |
| user | | STRING |
| asset | | STRING |
| balance | | DECIMAL |
| \_gs\_chain | | STRING |
| \_gs\_gid | | STRING |
### Market Open Interest
| Field | | Type |
| ------------ | - | ------- |
| vid | | LONG |
| block\_range | | STRING |
| id | | STRING |
| amount | | DECIMAL |
| \_gs\_chain | | STRING |
| \_gs\_gid | | STRING |
### Order Filled
| Field | | Type |
| --------------------- | - | ------ |
| vid | | LONG |
| block\_range | | STRING |
| id | | STRING |
| transaction\_hash | | STRING |
| timestamp | | LONG |
| order\_hash | | STRING |
| maker | | STRING |
| taker | | STRING |
| maker\_asset\_id | | STRING |
| taker\_asset\_id | | STRING |
| maker\_amount\_filled | | LONG |
| taker\_amount\_filled | | LONG |
| fee | | LONG |
| \_gs\_chain | | STRING |
| \_gs\_gid | | STRING |
### Orders Matched
| Field | | Type |
| ------------ | - | ------- |
| vid | | LONG |
| block\_range | | STRING |
| id | | STRING |
| amount | | DECIMAL |
| \_gs\_chain | | STRING |
| \_gs\_gid | | STRING |
# NFT data schemas
Source: https://docs.goldsky.com/reference/schema/nft-data
Eight pre-curated, enriched NFT data streams are available, encompassing all 721, 1155, and major nonstandard collections (eg. CryptoPunks).
| Data | Description |
| ----------------------- | ------------------------------------------------------------------------------------- |
| **Tokens** | Core, per-token metadata such as traits, token media links, and more. |
| **Owners** | Realtime wallet holdings for individual NFTs. |
| **Collections** | Summary information including social links, realtime floor price, and more. |
| **Sales and transfers** | All historical & realtime transfer events with marketplace sale data if applicable. |
| **Bids** | Current open bids at the individual NFT level. |
| **Bid events** | Realtime bid events (added, modified, removed) at the individual NFT level. |
| **Listings** | Currently active listings at the individual NFT level. |
| **Listing events** | Realtime listing events (added, modified, removed, sold) at the individual NFT level. |
{/*}
The schema for each of these pipelines is below.
## Tokens
## Owners
## Collections
## Sales and transfers
## Bids
## Bid events
## Listings
## Listings events
{*/}
# Non-EVM chains' data schemas
Source: https://docs.goldsky.com/reference/schema/non-EVM-schemas
## Beacon
### Attestations
| Field | | Type |
| ------------------- | - | ------ |
| id | | STRING |
| block\_root | | STRING |
| block\_slot | | LONG |
| block\_epoch | | LONG |
| block\_timestamp | | LONG |
| aggregation\_bits | | STRING |
| slot | | LONG |
| index | | LONG |
| beacon\_block\_root | | STRING |
| source\_epoch | | LONG |
| source\_root | | STRING |
| target\_epoch | | LONG |
| target\_root | | STRING |
| signature | | STRING |
### Attester Slashing
| Field | | Type |
| ----------------------------------- | - | ------ |
| id | | STRING |
| block\_timestamp | | LONG |
| attestation\_1\_attesting\_indices | | ARRAY |
| attestation\_1\_slot | | LONG |
| attestation\_1\_index | | LONG |
| attestation\_1\_beacon\_block\_root | | STRING |
| attestation\_1\_source\_epoch | | LONG |
| attestation\_1\_source\_root | | STRING |
| attestation\_1\_target\_epoch | | LONG |
| attestation\_1\_target\_root | | STRING |
| attestation\_1\_signature | | STRING |
| attestation\_2\_attesting\_indices | | ARRAY |
| attestation\_2\_slot | | LONG |
| attestation\_2\_index | | LONG |
| attestation\_2\_beacon\_block\_root | | STRING |
| attestation\_2\_source\_epoch | | LONG |
| attestation\_2\_source\_root | | STRING |
| attestation\_2\_target\_epoch | | LONG |
| attestation\_2\_target\_root | | STRING |
| attestation\_2\_signature | | STRING |
### Blocks
| Field | | Type |
| --------------------------------- | - | ------- |
| id | | STRING |
| block\_slot | | LONG |
| block\_epoch | | LONG |
| block\_timestamp | | LONG |
| block\_root | | STRING |
| proposer\_index | | LONG |
| skipped | | BOOLEAN |
| parent\_root | | STRING |
| state\_root | | STRING |
| randao\_reveal | | STRING |
| graffiti | | STRING |
| eth1\_block\_hash | | STRING |
| eth1\_deposit\_root | | STRING |
| eth1\_deposit\_count | | LONG |
| signature | | STRING |
| execution\_payload\_block\_number | | LONG |
| execution\_payload\_block\_hash | | STRING |
| blob\_gas\_used | | DECIMAL |
| excess\_blob\_gas | | DECIMAL |
| blob\_kzg\_commitments | | STRING |
| sync\_committee\_bits | | STRING |
| sync\_committee\_signature | | STRING |
### BLS Signature to Execution Address Changes
| Field | | Type |
| ---------------------- | - | ------ |
| id | | STRING |
| block\_slot | | LONG |
| block\_epoch | | LONG |
| block\_timestamp | | LONG |
| proposer\_index | | LONG |
| block\_root | | STRING |
| validator\_index | | LONG |
| from\_bls\_pubkey | | STRING |
| to\_execution\_address | | STRING |
| signature | | STRING |
### Deposits
| Field | | Type |
| ----------------------- | - | ------ |
| id | | STRING |
| block\_slot | | LONG |
| block\_epoch | | LONG |
| block\_timestamp | | LONG |
| block\_root | | STRING |
| pubkey | | STRING |
| withdrawal\_credentials | | STRING |
| amount | | LONG |
| signature | | STRING |
### Proposer Slashing
| Field | | Type |
| -------------------------- | - | ------ |
| id | | STRING |
| block\_timestamp | | LONG |
| header\_1\_slot | | LONG |
| header\_1\_proposer\_index | | LONG |
| header\_1\_parent\_root | | STRING |
| header\_1\_state\_root | | STRING |
| header\_1\_body\_root | | STRING |
| header\_1\_signature | | STRING |
| header\_2\_slot | | LONG |
| header\_2\_proposer\_index | | LONG |
| header\_2\_parent\_root | | STRING |
| header\_2\_state\_root | | STRING |
| header\_2\_body\_root | | STRING |
| header\_2\_signature | | STRING |
### Voluntary Exits
| Field | | Type |
| ---------------- | - | ------ |
| id | | STRING |
| block\_timestamp | | LONG |
### Withdrawls
| Field | | Type |
| ----------------- | - | ------ |
| id | | STRING |
| block\_slot | | LONG |
| block\_epoch | | LONG |
| block\_timestamp | | LONG |
| block\_root | | STRING |
| index | | LONG |
| normalized\_index | | LONG |
| validator\_index | | LONG |
| address | | STRING |
| amount | | LONG |
## Solana
### Edge Accounts
| Field | | Type |
| ----------------------- | - | ------- |
| id | | STRING |
| block\_slot | | LONG |
| block\_hash | | STRING |
| block\_timestamp | | LONG |
| pubkey | | STRING |
| tx\_signature | | STRING |
| executable | | BOOLEAN |
| lamports | | DOUBLE |
| owner | | STRING |
| rent\_epoch | | DECIMAL |
| data | | STRING |
| program | | STRING |
| space | | LONG |
| account\_type | | STRING |
| is\_native | | BOOLEAN |
| mint | | STRING |
| mint\_authority | | STRING |
| state | | STRING |
| token\_amount | | DOUBLE |
| token\_amount\_decimals | | LONG |
| supply | | LONG |
| program\_data | | STRING |
| authorized\_voters | | STRING |
| authorized\_withdrawer | | STRING |
| prior\_voters | | STRING |
| node\_pubkey | | STRING |
| commission | | LONG |
| epoch\_credits | | STRING |
| votes | | STRING |
| root\_slot | | LONG |
| last\_timestamp | | STRING |
### Edge Blocks
| Field | | Type |
| --------------------- | - | ------- |
| id | | STRING |
| skipped | | BOOLEAN |
| slot | | LONG |
| parent\_slot | | LONG |
| hash | | STRING |
| timestamp | | LONG |
| height | | LONG |
| previous\_block\_hash | | STRING |
| transaction\_count | | LONG |
| leader\_reward | | LONG |
| leader | | STRING |
### Edge Instructions
| Field | | Type |
| ----------------- | - | ------ |
| id | | STRING |
| block\_slot | | LONG |
| block\_hash | | STRING |
| block\_timestamp | | LONG |
| tx\_signature | | STRING |
| index | | LONG |
| parent\_index | | LONG |
| accounts | | STRING |
| data | | STRING |
| program | | STRING |
| program\_id | | STRING |
| instruction\_type | | STRING |
| params | | STRING |
| parsed | | STRING |
### Edge Rewards
| Field | | Type |
| ---------------- | - | ------ |
| id | | STRING |
| block\_slot | | LONG |
| block\_hash | | STRING |
| block\_timestamp | | LONG |
| commission | | LONG |
| lamports | | LONG |
| post\_balance | | LONG |
| pub\_key | | STRING |
| reward\_type | | STRING |
### Edge Token Transfers
| Field | | Type |
| ---------------- | - | ------- |
| id | | STRING |
| block\_slot | | LONG |
| block\_hash | | STRING |
| block\_timestamp | | LONG |
| source | | STRING |
| destination | | STRING |
| authority | | STRING |
| value | | DECIMAL |
| decimals | | DECIMAL |
| mint | | STRING |
| mint\_authority | | STRING |
| transfer\_type | | STRING |
| tx\_signature | | STRING |
### Edge Tokens
| Field | | Type |
| -------------------------- | - | ------- |
| id | | STRING |
| block\_slot | | LONG |
| block\_hash | | STRING |
| block\_timestamp | | LONG |
| tx\_signature | | STRING |
| mint | | STRING |
| update\_authority | | STRING |
| name | | STRING |
| symbol | | STRING |
| uri | | STRING |
| seller\_fee\_basis\_points | | LONG |
| creators | | STRING |
| primary\_sale\_happened | | BOOLEAN |
| is\_mutable | | BOOLEAN |
| is\_nft | | BOOLEAN |
| token\_type | | STRING |
| retrieval\_timestamp | | LONG |
### Edge Transactions
| Field | | Type |
| ------------------------ | - | ------- |
| id | | STRING |
| index | | LONG |
| signature | | STRING |
| block\_hash | | STRING |
| recent\_block\_hash | | STRING |
| block\_slot | | LONG |
| block\_timestamp | | LONG |
| fee | | LONG |
| status | | LONG |
| err | | STRING |
| accounts | | STRING |
| log\_messages | | STRING |
| balance\_changes | | STRING |
| pre\_token\_balances | | STRING |
| post\_token\_balances | | STRING |
| compute\_units\_consumed | | DECIMAL |
### Edge Transactions with Instructions
| Field | | Type |
| ------------------------ | - | ------- |
| id | | STRING |
| index | | LONG |
| signature | | STRING |
| block\_hash | | STRING |
| recent\_block\_hash | | STRING |
| block\_slot | | LONG |
| block\_timestamp | | LONG |
| fee | | LONG |
| status | | LONG |
| err | | STRING |
| accounts | | STRING |
| log\_messages | | STRING |
| balance\_changes | | STRING |
| pre\_token\_balances | | STRING |
| post\_token\_balances | | STRING |
| compute\_units\_consumed | | DECIMAL |
| instructions | | STRING |
## Starknet
### Blocks
| Field | | Type |
| ----------------------------- | - | ------- |
| id | | STRING |
| number | | LONG |
| hash | | STRING |
| parent\_hash | | STRING |
| new\_root | | STRING |
| status | | STRING |
| sequencer\_address | | STRING |
| timestamp | | LONG |
| transaction\_count | | LONG |
| l1\_da\_mode | | STRING |
| l1\_gas\_price\_in\_fri | | DECIMAL |
| l1\_gas\_price\_in\_wei | | DECIMAL |
| l1\_data\_gas\_price\_in\_fri | | DECIMAL |
| l1\_data\_gas\_price\_in\_wei | | DECIMAL |
| starknet\_version | | STRING |
### Events
| Field | | Type |
| ------------------------------ | - | ------ |
| id | | STRING |
| block\_number | | LONG |
| block\_hash | | STRING |
| block\_timestamp | | LONG |
| transaction\_hash | | STRING |
| transaction\_execution\_status | | STRING |
| transaction\_type | | STRING |
| transaction\_version | | LONG |
| from\_address | | STRING |
| data | | ARRAY |
| keys | | ARRAY |
### Messages
| Field | | Type |
| ------------------------------ | - | ------ |
| id | | STRING |
| block\_number | | LONG |
| block\_hash | | STRING |
| block\_timestamp | | LONG |
| transaction\_hash | | STRING |
| transaction\_execution\_status | | STRING |
| transaction\_type | | STRING |
| transaction\_version | | LONG |
| message\_index | | LONG |
| from\_address | | STRING |
| to\_address | | STRING |
| payload | | ARRAY |
### Enriched Transactions
| Field | | Type |
| ------------------------------------------------ | - | ------- |
| id | | STRING |
| block\_number | | LONG |
| block\_hash | | STRING |
| block\_timestamp | | LONG |
| version | | LONG |
| hash | | STRING |
| transaction\_type | | STRING |
| execution\_status | | STRING |
| finality\_status | | STRING |
| nonce | | LONG |
| contract\_address | | STRING |
| entry\_point\_selector | | STRING |
| contract\_address\_salt | | STRING |
| class\_hash | | STRING |
| calldata | | ARRAY |
| constructor\_calldata | | ARRAY |
| signature | | ARRAY |
| sender\_address | | STRING |
| max\_fee | | DECIMAL |
| transaction\_index | | LONG |
| actual\_fee\_amount | | DECIMAL |
| actual\_fee\_unit | | STRING |
| resource\_bounds\_l1\_gas\_max\_amount | | LONG |
| resource\_bounds\_l1\_gas\_max\_price\_per\_unit | | LONG |
| resource\_bounds\_l2\_gas\_max\_amount | | LONG |
| resource\_bounds\_l2\_gas\_max\_price\_per\_unit | | LONG |
| message\_count | | LONG |
| event\_count | | LONG |
## Stellar
### Assets
| Field | | Type |
| ------------- | - | ------ |
| asset\_code | | STRING |
| asset\_issuer | | STRING |
| asset\_type | | STRING |
| id | | STRING |
### Contract Events
| Field | | Type |
| ------------------------------ | - | ------- |
| id | | STRING |
| transaction\_hash | | STRING |
| transaction\_id | | LONG |
| successful | | BOOLEAN |
| ledger\_sequence | | LONG |
| closed\_at | | LONG |
| in\_successful\_contract\_call | | BOOLEAN |
| contract\_id | | STRING |
| type | | INT |
| type\_string | | STRING |
| topics | | STRING |
| topics\_decoded | | STRING |
| data | | STRING |
| data\_decoded | | STRING |
| contract\_event\_xdr | | STRING |
### Raw Effects
| Field | | Type |
| -------------- | - | ------ |
| id | | STRING |
| address | | STRING |
| address\_muxed | | STRING |
| operation\_id | | LONG |
| type | | INT |
| type\_string | | STRING |
| closed\_at | | LONG |
| details | | STRING |
### Ledgers
| Field | | Type |
| ------------------------------ | - | ------ |
| id | | STRING |
| sequence | | LONG |
| ledger\_hash | | STRING |
| previous\_ledger\_hash | | STRING |
| ledger\_header | | STRING |
| transaction\_count | | INT |
| operation\_count | | INT |
| successful\_transaction\_count | | INT |
| failed\_transaction\_count | | INT |
| tx\_set\_operation\_count | | STRING |
| closed\_at | | LONG |
| total\_coins | | LONG |
| fee\_pool | | LONG |
| base\_fee | | LONG |
| base\_reserve | | LONG |
| max\_tx\_set\_size | | LONG |
| protocol\_version | | LONG |
| ledger\_id | | LONG |
| soroban\_fee\_write\_1kb | | LONG |
### Operations
| Field | | Type |
| ----------------------- | - | ---------------- |
| source\_account | | STRING |
| source\_account\_muxed | | STRING |
| type | | INT |
| type\_string | | STRING |
| details | | STRING |
| transaction\_id | | LONG |
| id | | LONG |
| closed\_at | | TIMESTAMP-MICROS |
| operation\_result\_code | | STRING |
| operation\_trace\_code | | STRING |
### Transactions
| Field | | Type |
| --------------------------------------- | - | ---------------- |
| id | | STRING |
| transaction\_hash | | STRING |
| ledger\_sequence | | LONG |
| account | | STRING |
| account\_muxed | | STRING |
| account\_sequence | | LONG |
| max\_fee | | LONG |
| fee\_charged | | LONG |
| operation\_count | | INT |
| tx\_envelope | | STRING |
| tx\_result | | STRING |
| tx\_meta | | STRING |
| tx\_fee\_meta | | STRING |
| created\_at | | TIMESTAMP-MICROS |
| memo\_type | | STRING |
| memo | | STRING |
| time\_bounds | | STRING |
| successful | | BOOLEAN |
| transaction\_id | | LONG |
| fee\_account | | STRING |
| fee\_account\_muxed | | STRING |
| inner\_transaction\_hash | | STRING |
| new\_max\_fee | | LONG |
| ledger\_bounds | | STRING |
| min\_account\_sequence | | LONG |
| min\_account\_sequence\_age | | LONG |
| min\_account\_sequence\_ledger\_gap | | LONG |
| extra\_signers | | ARRAY |
| closed\_at | | TIMESTAMP-MICROS |
| resource\_fee | | LONG |
| soroban\_resources\_instructions | | STRING |
| soroban\_resources\_read\_bytes | | STRING |
| soroban\_resources\_write\_bytes | | STRING |
| transaction\_result\_code | | STRING |
| inclusion\_fee\_bid | | LONG |
| inclusion\_fee\_charged | | LONG |
| resource\_fee\_refund | | LONG |
| non\_refundable\_resource\_fee\_charged | | LONG |
| refundable\_resource\_fee\_charged | | LONG |
| rent\_fee\_charged | | LONG |
### Trades
| Field | | Type |
| ---------------------------- | - | ------- |
| id | | STRING |
| order | | INT |
| ledger\_closed\_at | | LONG |
| selling\_account\_address | | STRING |
| selling\_asset\_code | | STRING |
| selling\_asset\_issuer | | STRING |
| selling\_asset\_type | | STRING |
| selling\_asset\_id | | LONG |
| selling\_amount | | DOUBLE |
| buying\_account\_address | | STRING |
| buying\_asset\_code | | STRING |
| buying\_asset\_issuer | | STRING |
| buying\_asset\_type | | STRING |
| buying\_asset\_id | | LONG |
| buying\_amount | | DOUBLE |
| price\_n | | LONG |
| price\_d | | LONG |
| selling\_offer\_id | | LONG |
| buying\_offer\_id | | LONG |
| selling\_liquidity\_pool\_id | | STRING |
| liquidity\_pool\_fee | | LONG |
| history\_operation\_id | | LONG |
| trade\_type | | INT |
| rounding\_slippage | | LONG |
| seller\_is\_exact | | BOOLEAN |
## Sui
### Checkpoints
| Field | | Type |
| ----------------------------- | - | ------- |
| sequence\_number | | LONG |
| checkpoint\_digest | | STRING |
| epoch | | LONG |
| tx\_digests | | STRING |
| network\_total\_transactions | | LONG |
| timestamp\_ms | | LONG |
| previous\_checkpoint\_digest | | STRING |
| total\_gas\_cost | | LONG |
| computation\_cost | | LONG |
| storage\_cost | | LONG |
| storage\_rebate | | LONG |
| non\_refundable\_storage\_fee | | LONG |
| checkpoint\_commitments | | STRING |
| validator\_signature | | STRING |
| successful\_tx\_num | | LONG |
| end\_of\_epoch\_data | | STRING |
| end\_of\_epoch | | BOOLEAN |
### Raw Transactions
| Field | | Type |
| -------------------- | - | ------ |
| tx\_sequence\_number | | LONG |
| tx\_digest | | STRING |
| sender\_signed\_data | | STRING |
| effects | | STRING |
### Events
| Field | | Type |
| ---------------------------- | - | ------ |
| tx\_sequence\_number | | LONG |
| event\_sequence\_number | | LONG |
| checkpoint\_sequence\_number | | LONG |
| transaction\_digest | | STRING |
| senders | | STRING |
| package | | STRING |
| module | | STRING |
| event\_type | | STRING |
| event\_type\_package | | STRING |
| event\_type\_module | | STRING |
| event\_type\_name | | STRING |
| bcs | | STRING |
| timestamp\_ms | | LONG |
### Epochs
| Field | | Type |
| ---------------------------------- | - | ------ |
| id | | STRING |
| epoch | | LONG |
| first\_checkpoint\_id | | LONG |
| epoch\_start\_timestamp | | LONG |
| reference\_gas\_price | | LONG |
| protocol\_version | | LONG |
| total\_stake | | LONG |
| storage\_fund\_balance | | LONG |
| system\_state | | STRING |
| epoch\_total\_transactions | | LONG |
| last\_checkpoint\_id | | LONG |
| epoch\_end\_timestamp | | LONG |
| storage\_fund\_reinvestment | | LONG |
| storage\_charge | | LONG |
| storage\_rebate | | LONG |
| stake\_subsidy\_amount | | LONG |
| total\_gas\_fees | | LONG |
| total\_stake\_rewards\_distributed | | LONG |
| leftover\_storage\_fund\_inflow | | LONG |
| epoch\_commitments | | STRING |
### Packages
| Field | | Type |
| ---------------------------- | - | ------ |
| package\_id | | STRING |
| checkpoint\_sequence\_number | | LONG |
| move\_package | | STRING |
# Subgraphs vs Mirror
Source: https://docs.goldsky.com/subgraph-vs-mirror
Wondering how to best leverage Goldsky's products? This guide explains the differences between Subgraphs and Mirror and how you can use them individually or together to meet your needs.
Goldsky offers two flagship products, [Subgraphs](/subgraphs/introduction) and [Mirror](/mirror/introduction), designed to help developers interact with blockchain data more efficiently. While both tools serve similar goals - retrieving, processing, and querying blockchain data - they differ significantly in how they handle data management, scalability, and customization.
Goldsky Subgraphs are a powerful abstraction built on top of blockchain indexing technology. They allow developers to define data sources, transformation logic, and queries using GraphQL against an instant API endpoint, making it easier to retrieve structured blockchain data. Subgraphs are particularly useful for dApps (decentralized applications) that need to query specific pieces of on-chain data efficiently.
Goldsky Mirror provide a different approach to managing blockchain and off-chain data, focusing on real-time data streaming directly to a database, offering full control and customization over the data pipeline. Instead of providing an API endpoint for querying data like subgraphs, Mirror Pipelines stream the raw or processed data directly to a database managed by the user. This setup allows users to have more flexibility in how they store, manipulate, and query this data.
Letβs now look into the difference between both products from the perspective of important functional dimensions:
### 1. **Data Design**
* **Subgraphs**:
* The data model is optimized specifically for on-chain data:
* The entities and data types you define in subgraphs are tailored to represent blockchain data efficiently.
* You can create relationships and aggregations between on-chain entities (e.g. track the balance of a user for specific tokens)
* You can enrich entity data with other on-chain sources using `eth_call`. However, integrating off-chain data is not supported.
* On-chain data is restricted to EVM-compatible chains.
* **Mirror**:
* The data model is flexible and open to any type of data:
* You can combine blockchain data with your own data seamlessly, offering more complex and customizable use cases.
* Using Mirror the recommendation is to get data into your database and do more aggregations/enrichments downstream. You can also perform some pre-processing using SQL dialect before writing to your database.
* On-chain data not restricted to EVMs. Mirror currently support alternative L1s such as [Solana](https://docs.goldsky.com/mirror/cryptohouse), Sui and many others.
### 2. **Infrastructure**
* **Subgraphs**:
* Provide an instant GraphQL API thatβs ready to use right out of the box for querying blockchain data.
* The entire infrastructure (including indexing, database and querying layer) is managed by Goldsky, minimizing setup time but limiting control over data handling.
* **Mirror**:
* Fully runs on Goldskyβs infrastructure to stream data into your database in real-time, but ultimately, you need to set up and manage your own data storage and querying infrastructure.
* Offers full control over data, allowing you to optimize infrastructure and scale as needed. This way, you can colocate the data with other data realms of your business and offer greater privacy and UX.
### 3. **Ecosystem & Development**
* **Subgraphs**:
* Established technology with a rich ecosystem. Numerous open-source repositories are available for reference, and [Goldsky's Instant Subgraphs](https://docs.goldsky.com/subgraphs/guides/create-a-no-code-subgraph) allow you to quickly create subgraphs with no code.
* There is a vast community around subgraph development, making it easier to find support, tutorials, and pre-built examples.
* **Mirror**:
* As a newer product, Mirror Pipelines doesnβt have as many public examples or pre-built repositories to reference. However, users can benefit from Goldskyβs support to set up and optimize their pipelines. We also create [curated datasets](https://docs.goldsky.com/chains/supported-networks#curated-datasets) for specific use cases that are readily available for deployment.
### 4. **Scalability & Performance**
* **Subgraphs**:
* Perform well under low throughput conditions (less than 40-50 events per second). However, as event frequency increases, latency grows, and maintaining subgraphs becomes more complex.
* Goldsky offers the ability to fine-tune subgraphs, providing [custom indexers](https://docs.goldsky.com/subgraphs/serverless-vs-dedicated) that help optimize performance and efficiency. However, it's important to note that there's a limit to how much fine-tuning can be done within the subgraph framework
* Multi-chain setups often require reindexing from scratch, which can be time-consuming when switching between chains. This can slow down applications that rely on frequent updates from multiple chains.
* **Mirror**:
* Designed for scalability. You can expand your infrastructure horizontally (adding more servers, optimizing queries, etc.) as the data load increases. A default Mirror pipeline writes about 2,000 rows a second, but you can scale up to 40 workers with an XXL Mirror pipeline. With that, you can see speeds of over 100,000 rows a second; backfilling the entire Ethereum blocks table in under 4 minutes.
* Thanks to its fast backfill capabilities and the fact that you can colocate data as you see fit, Mirror is optimized to multi-chain applications as represented in [this article by Splitβs Engineering team](https://splits.org/blog/engineering-multichain/)
* Real-time streaming ensures that query latency is only limited by the performance of your own database, not by external API calls or indexing limitations.
### 5. **Expressiveness on data transformation**
* **Subgraphs**:
* Data transformation in subgraph mappings is very expressive due to the fact thatβs defined on Javascript which is a very popular language with lots of support and customization possibilities.
* **Mirror**:
* Data transformation in Mirror can be done in 2 ways:
* [SQL transforms](/mirror/transforms/sql-transforms): this can be advantageous for users proficient on SQL but it can feel a bit more rigid for developers with not as much experience with this language.
* [External Handlers](/mirror/transforms/external-handlers): you own the processing layer and have full flexibility on how you would like transform the data using the technology and framework of your choice.
### **Common Use Cases**
Now that we have covered the most important functional differences, letβs look at some practical scenarios where it makes more sense choosing one technology over the other:
* **Subgraphs**:
* Best suited for applications that deal exclusively with on-chain data and donβt need integration with off-chain sources.
* Ideal for predefined data models, such as dApps that need to query specific smart contract events or execute standard blockchain queries.
* Great for low to moderate traffic scenarios with relatively straightforward data structures and querying needs.
* **Mirror Pipelines**:
* A better fit for applications that require both on-chain and off-chain data, offering the flexibility to combine data sources for advanced analytics or decision-making.
* Ideal for multi-chain applications, as it simplifies the process of managing data across different blockchains without the need for reindexing. This is specially true if your application needs non-EVM data like Solana.
* Perfect for high-traffic applications, where low latency and real-time data access are critical to the performance and user experience.
### Subgraph + Mirror
Fortunately, you are not restricted to choose one technology over the other: Subgraphs and Mirror can be combined to leverage the strengths of both technologies by [definining subgraphs as the data source](https://docs.goldsky.com/mirror/sources/subgraphs) to your pipelines. This dual approach ensures that applications can benefit from the speed and convenience of instant APIs while also gaining full control over data storage and integration through Mirror πͺ
### **Conclusion**
While both Subgraphs and Mirror Pipelines offer powerful solutions for interacting with blockchain data, choosing the right tool depends on your specific needs. In some cases, either technology may seem like a viable option, but it's important to carefully evaluate your requirements. If you're working with simpler on-chain data queries or need quick setup and ease of use, Subgraphs might be the best fit. On the other hand, for more complex applications that require real-time data streaming, multi-chain support, or the integration of off-chain data, Mirror Pipelines provides the flexibility and control you need. Remember that you are not constrained to one technology but that you can combine them to get the best of both worlds.
Ultimately, selecting the right solutionβor a combination of bothβdepends on aligning with your projectβs performance, scalability, and infrastructure goals to ensure long-term success.
# Deploy a subgraph
Source: https://docs.goldsky.com/subgraphs/deploying-subgraphs
There are three primary ways to deploy a subgraph on Goldsky:
1. From source code
2. Migrating from The Graph or any other subgraph host
3. Via instant, no-code subgraphs
For any of the above, you'll need to have the Goldsky CLI installed and be logged in; you can do this by following the instructions below.
For these examples we'll use the Ethereum contract for [POAP](https://poap.xyz).
# From source code
If youβve developed your own subgraph, you can deploy it from the source. In our example weβll work off of a clone of the [POAP subgraph](https://github.com/goldsky-io/poap-subgraph).
First we need to clone the Git repo.
```shell
git clone https://github.com/goldsky-io/poap-subgraph
```
Now change into that directory. From here, we'll build the subgraph from templates. Open source subgraphs have different instructions to get them to build, so check the `README.md` or look at the `package.json` for hints as to the correct build commands. Usually it's a two step process, but since POAP is deployed on multiple chains, there's one extra step at the start to generate the correct data from templates.
```shell
yarn prepare:mainnet
yarn codegen
yarn build
```
Then you can deploy the subgraph to Goldsky using the following command.
```shell
goldsky subgraph deploy poap-subgraph/1.0.0 --path .
```
# From The Graph or another host
For a detailed walkthrough, follow our [dedicated guide](/subgraphs/migrate-from-the-graph).
# Via instant, no-code subgraphs
For a detailed walkthrough, follow our [dedicated guide](/subgraphs/guides/create-a-no-code-subgraph).
# GraphQL Endpoints
Source: https://docs.goldsky.com/subgraphs/graphql-endpoints
All subgraphs come with a GraphQL interface that allows you to query the data in the subgraph. Traditionally these GraphQL
interfaces are completely public and can be accessed by anyone. Goldsky supports public GraphQL endpoints for both
subgraphs and their tags.
## Public endpoints
For example, in the Goldsky managed community project there exists the `uniswap-v3-base/1.0.0` subgraph with a tag of `prod`.
This subgraph has a [public endpoint](https://api.goldsky.com/api/public/project_cl8ylkiw00krx0hvza0qw17vn/subgraphs/uniswap-v3-base/1.0.0/gn)
and the tag `prod` also has a [public endpoint](https://api.goldsky.com/api/public/project_cl8ylkiw00krx0hvza0qw17vn/subgraphs/uniswap-v3-base/prod/gn).

In general, public endpoints come in the form of `https://api.goldsky.com/api/public//subgraphs///gn`
Goldsky adds rate limiting to all public endpoints to prevent abuse. We currently have a default rate limit of 50 requests per 10 seconds.
This can be unlocked by contacting us at [support@goldsky.com](mailto:support@goldsky.com).
One major downside of public endpoints is that they are completely public and can be accessed by anyone. This means that
anyone can query the data in the subgraph and potentially abuse the endpoint. This is why we also support private endpoints.
## \[*BETA*] Private endpoints
Private endpoints are only accessible by authenticated users. This means that you can control who can access the data in
your subgraph. Private endpoints are only available to users who have been granted access to the subgraph. Accessing
a private endpoint requires sending an `Authorization` header with the GraphQL request. The value of the `Authorization`
header should be in the form of `Bearer ` where the `token` is an API token that has been generated through
[Goldsky project general settings](https://app.goldsky.com/dashboard/settings#general). Remember that API tokens are scoped to specific projects. This means an API
token for `projectA` cannot be used to access the private endpoints of subgraphs in `projectB`.
Private endpoints can be toggled on and off for each subgraph and tag. This means that you can have a mix of public and
private endpoints for your subgraph. For example, you can have a public endpoint for your subgraph and a private endpoint
for a specific tag.
Here's an example of how to access a private endpoint using the GraphiQL interface:

Private subgraphs endpoints follow the same format as public subgraph endpoints except they start with `/api/private`
instead of `/api/public`. For example, the private endpoint for the `prod` tag of the `uniswap-v3-base/1.0.0` subgraph
would be `https://api.goldsky.com/api/private/project_cl8ylkiw00krx0hvza0qw17vn/subgraphs/uniswap-v3-base/1.0.0/gn`.
### Revoking access
To revoke access to a private endpoint you can simply delete the API token that was used to access the endpoint. If you
don't know which key is used to access the endpoint, you'll have to revoke all API tokens for all users that have access
to the project. While this step is not ideal during this beta, this step will be addressed before this feature reaches general availability.
## Enabling and disabling public and private endpoints
By default, all new subgraphs and their tags come with the public endpoint enabled and the private endpoint disabled.
Both of these settings can be changed using the CLI and the webapp. To change either setting, you must have [`Editor` permissions](../rbac).
### CLI
To toggle one of these settings using the CLI you can use the `goldsky subgraph update` command with the
`--public-endpoint ` flag and/or the `--private-endpoint ` flag. Here's a complete example
disabling the public endpoint and enabling the private endpoint for the `prod` tag of the `uniswap-v3-base/1.0.0` subgraph:
```bash
goldsky subgraph update uniswap-v3-base/prod --public-endpoint disabled --private-endpoint enabled
```
### Dashboard
To toggle one of these settings using the dashboard webapp you can navigate to the subgraph detail page and use the relevant
toggles to enable or disable the public or private endpoints of the subgraph or its tags.
[//]: # "TODO: add a screenshot of this once the implementation and design are complete"
### Errors
Goldsky does not enforce CORS on our GraphQL endpoints. If you see an error that references CORS, or an error with the response code 429, you're likely seeing an issue with rate limiting. Rate limits can be unlocked on a case-by-case basis on the Scale plan and above. Please [reach out to us](mailto:support@goldsky.com?subject=Rate%20limits%20or%20errors) if you need help with rate limits or any GraphQL response errors.
# Create a multi-chain subgraph
Source: https://docs.goldsky.com/subgraphs/guides/create-a-multi-chain-subgraph
Use Mirror to sync multiple subgraphs to one table.
You can use subgraphs as a pipeline source, allowing you to combine the flexibility of subgraph indexing with the expressiveness of the database of your choice. **You can also push data from *multiple subgraphs* with the same schema into the same sink, allowing you to merge subgraphs across chains.**
## What you'll need
1. One or more subgraphs in your project - this can be from community subgraphs, a deployed subgraph, or a [no-code subgraph](/subgraphs/guides/create-a-no-code-subgraph).
If more than one subgraph is desired, they need to have the same graphql schema. You can use [this tool](https://www.npmjs.com/package/graphql-schema-diff) to compare schemas.
2. A working database supported by Mirror. For more information on setting up a sink, see the [sink documentation](/mirror/sinks/).
## Walkthrough
`goldsky secret list` will show you the database secrets available on your active project.
If you need to setup a secret, you can use `goldsky secret create -h`. [Here](/mirror/manage-secrets) is an example.
Open the [Subgraphs Dashboard](https://app.goldsky.com/subgraphs) and find the deployment IDs of each subgraph you would like to use as a source.
Run the following query against the subgraph to get the deployment ID.
```graphql
query {
_meta {
deployment
}
}
```
Open a text editor create your definition, using the `subgraphEntity` source. In this example we will use subgraphs on Optimism and on BSC:
* `qidao-optimism` (`QmPuXT3poo1T4rS6agZfT51ZZkiN3zQr6n5F2o1v9dRnnr`)
* `qidao-bsc` (`QmWgW69CaTwJYwcSdu36mkXgwWY11RjvX1oMGykrxT3wDS`)
They have the same schema, and we will be syncing the `account` and `event` entities from each.
Entities may be camelCased in the GraphQL API, but here they must be snake\_cased. For example, `dailySnapshot` will be `daily_snapshot` here.
```yaml qidao-crosschain.yaml
sources:
- type: subgraphEntity
# The deployment IDs you gathered above. If you put multiple,
# they must have the same schema
deployments:
- id: QmPuXT3poo1T4rS6agZfT51ZZkiN3zQr6n5F2o1v9dRnnr
- id: QmWgW69CaTwJYwcSdu36mkXgwWY11RjvX1oMGykrxT3wDS
# A reference name, referred to later in the `sourceStreamName` of either a transformation or a sink.
referenceName: account
entity:
# The name of the entities
name: account
- type: subgraphEntity
deployments:
- id: QmPuXT3poo1T4rS6agZfT51ZZkiN3zQr6n5F2o1v9dRnnr
- id: QmWgW69CaTwJYwcSdu36mkXgwWY11RjvX1oMGykrxT3wDS
referenceName: market_daily_snapshot
entity:
name: market_daily_snapshot
# We are just replicating data, so we don't need any SQL transforms.
transforms: []
sinks:
# In this example, we're using a postgres secret called SUPER_SECRET_SECRET.
# Feel free to change this out with any other type of sink.
- type: postgres
# The sourceStreamName matches the above `referenceNames`
sourceStreamName: account
table: qidao_accounts
schema: public
secretName: SUPER_SECRET_SECRET
- type: postgres
sourceStreamName: market_daily_snapshot
table: qidao_market_daily_snapshot
schema: public
secretName: SUPER_SECRET_SECRET
```
```shell
goldsky pipeline create qidao-crosschain --definition-path qidao-crosschain.yaml --status ACTIVE
```
You should see a response from the server like:
```
β Successfully validated --definition-path file
β Created pipeline with name: qidao-crosschain
name: qidao-crosschain
version: 1
project_id: project_cl8ylkiw00krx0hvza0qw17vn
status: INACTIVE
resource_size: s
is_deleted: false
created_at: 1697696162607
updated_at: 1697696162607
definition:
sources:
- type: subgraphEntity
entity:
name: account
referenceName: account
deployments:
- id: QmPuXT3poo1T4rS6agZfT51ZZkiN3zQr6n5F2o1v9dRnnr
- id: QmWgW69CaTwJYwcSdu36mkXgwWY11RjvX1oMGykrxT3wDS
- type: subgraphEntity
entity:
name: market_daily_snapshot
referenceName: market_daily_snapshot
deployments:
- id: QmPuXT3poo1T4rS6agZfT51ZZkiN3zQr6n5F2o1v9dRnnr
- id: QmWgW69CaTwJYwcSdu36mkXgwWY11RjvX1oMGykrxT3wDS
...
```
Monitor the pipeline with `goldsky pipeline monitor qidao-crosschain`. The status should change from `STARTING` to `RUNNING` in a minute or so, and data will start appearing in your postgresql database.
Once you have multiple subgraphs being written into one destination database, you can set up a GraphQL API server with this database as a source; there are many options to do this:
* [Apollo Server](https://www.apollographql.com/docs/apollo-server/)
* [Express GraphQL](https://graphql.org/graphql-js/running-an-express-graphql-server/)
* [Hasura](https://hasura.io/) \[recommended for quick-start]
# Create no-code subgraphs
Source: https://docs.goldsky.com/subgraphs/guides/create-a-no-code-subgraph
## What you'll need
1. The contract address you're interested in indexing.
2. The ABI (Application Binary Interface) of the contract.
## Walkthrough
If the contract youβre interested in indexing is a contract you deployed, then youβll have the contract address and ABI handy. Otherwise, you can use a mix of public explorer tools to find this information. For example, if weβre interested in indexing the [friend.tech](http://friend.tech) contractβ¦
1. Find the contract address from [Dappradar](https://dappradar.com/)
2. Click through to the [block explorer](https://basescan.org/address/0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4#code) where the ABI can be found under the `Contract ABI` section. You can also [click here](https://api.basescan.org/api?module=contract\&action=getabi\&address=0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4\&format=raw) to download it.
Save the ABI to your local file system and make a note of the contract address. Also make a note of the block number the contract was deployed at, youβll need this at a later step.
The next step is to create the Instant Subgraph configuration file (e.g. `friendtech-config.json`). This file consists of five key sections:
1. Config version number
2. Config name
3. ABIs
4. Chains
5. Contract instances
### Version number
As of October 2023, our Instant Subgraph configuration system is on version 1. This may change in the future. This is **not the version number of your subgraph**, but of Goldsky's configuration file format.
### Config name
This is a name of your choice that helps you understand what this config is for. It is only used for internal debugging. For this guide, we'll useΒ `friendtech`.
### ABIs, chains, and contract instances
These three sections are interconnected.
1. Name your ABI and enter the path to the ABI file you saved earlier (relative to where this config file is located). In this case, `ftshares` and `abi.json`.
2. Write out the contract instance, referencing the ABI you named earlier, address it's deployed at, chain it's on, the start block.
```json friendtech-config.json
{
"version": "1",
"name": "friendtech",
"abis": {
"ftshares": {
"path": "./abi.json"
}
},
"instances": [
{
"abi": "ftshares",
"address": "0xCF205808Ed36593aa40a44F10c7f7C2F67d4A4d4",
"startBlock": 2430440,
"chain": "base"
}
]
}
```
The abi name in `instances` should match a key in `abis`, in this example, `ftshares`. It is possible to have more than one `chains` and more than one ABI. Multiple chains will result in multiple subgraphs. The file `abi.json` in this example should contain the friendtech ABI [downloaded from here](https://api.basescan.org/api?module=contract\&action=getabi\&address=0xcf205808ed36593aa40a44f10c7f7c2f67d4a4d4\&format=raw).
This configuration can handle multiple contracts with distinct ABIs, the same contract across multiple chains, or multiple contracts with distinct ABIs on multiple chains.
**For a complete reference of the various properties, please see theΒ [Instant Subgraphs references docs](/reference/config-file/instant-subgraph)**
With your configuration file ready, it's time to deploy the subgraph.
1. Open the CLI and log in to your Goldsky account with the command:Β `goldsky login`.
2. Deploy the subgraph using the command:Β `goldsky subgraph deploy name/version --from-abi `, then pass in the path to the config file you created. Note - do NOT pass in the ABI itself, but rather the config file defined above. Example: `goldsky subgraph deploy friendtech/1.0 --from-abi friendtech-config.json`
Goldsky will generate all the necessary subgraph code, deploy it, and return an endpoint that you can start querying.
Clicking the endpoint link takes you to a web client where you can browse the schema and draft queries to integrate into your app.
## Extending your subgraph with enrichments
Enrichments are a powerful way to add additional data to your subgraph by performing eth calls in the middle of an event or call handler.
See the [enrichments configuration reference](/reference/config-file/instant-subgraph#instance-enrichment) for more information on how to define these enrichments, and for an [example configuration with enrichments](/reference/config-file/instant-subgraph#nouns-enrichment-with-balances-on-transfer).
### Concepts
* Enrichments are defined at the instance level, and executed at the trigger handler level. This means that you can have different enrichments for different data sources or templates and that all enrichment executions are isolated to the handler they are being called from.
* any additional imports from `@graphprotocol/graph-ts` beyond `BigInt`, `Bytes`, and `store` can be declared in the `options.imports` field of the enrichment (e.g., `BigDecimal`).
* Enrichments always begin by performing all eth calls first, if any eth calls are aborted then the enrichment as a whole is aborted.
* calls marked as `required` or having another call declare them as a `depends_on` dependency will abort if the call is not successful, otherwise the call output value will remain as `null`.
* calls marked as `declared` will configure the subgraph to execute the call prior to invoking the mapping handler. This can be useful for performance reasons, but only works for eth calls that have no mapping handler dependencies.
* calls support `pre` and `post` expressions for `conditions` to test before and after the call, if either fails the call is aborted. Since these are expressions, they can be dynamic or constant values.
* call `source` is an expression and therefore allows for dynamic values using math or concatenations. If the `source` is simply a contract address then it will be automatically converted to an `Address` type.
* call `params` is an expression list and can also be dynamic values or constants.
* Enrichments support defining new entities as well as updating existing entities. If the entity name matches the trigger entity name, then the entity field mappings will be applied to the existing entity.
* entity names should be singular and capitalized, this will ensure that the generated does not produce naming conflicts.
* entity field mapping values are expressions and can be dynamic or constant values.
* new enrichment entities are linked to the parent (trigger) entity that created them, with the parent (trigger) entity also linking to the new entity or entities in the opposite direction (always a collection type).
* note that while you can define existing entities that are not the trigger entity, you may not update existing entities only create new instances of that entity.
* entities support being created multiple times in a single enrichment, but require a unique `id` expression to be defined for each entity, `id` can by a dynamic value or a constant. this `id` is appended to the parent entity `id` to create a unique `id` for each enrichment entity in the list.
* entities can be made mutable by setting the `explicit_id` flag to `true`, this will use the value of `id` without appending it to the parent entity `id`, creating an addressable entity that can be updated.
### Snippets
Below are some various examples of configurations for different scenarios. To keep each example brief, we will only show the `enrich` section of the configuration, and in most cases only the part of the `enrich` section that is relevant. See the [enrichments configuration reference](/reference/config-file/instant-subgraph#instance-enrichment) for the full configuration reference.
#### Options
Here we are enabling debugging for the enrichment (this will output the enrichment steps to the subgraph log), as well as importing `BigDecimal` for use in a `calls` or `entities` section.
```json
"enrich": {
"options": {
"imports": ["BigDecimal"],
"debugging": true
}
}
```
#### Call self
Here we are calling a function on the same contract as the trigger event. This means we can omit the `abi` and `source` configuration fields, as they are implied in this scenario, we only need to include the `name` and `params` fields (if the function declares paramters). We can refer to the result of this call using `calls.balance`.
```json
"calls": {
"balance": {
"name": "balanceOf",
"params": "event.params.owner"
}
}
```
#### Call dependency
Here we are creating a 2-call dependency, where the second call depends on the first call (the params are `calls.owner` meaning we need the value of the `owner` call before we can invoke `balanceOf`). This means that if the first call fails, the second call will not be executed. Calls are always executed in the order they are configured, so the second call will have access to the output of the first call (in this example, we use that output as a parameter to the second call). We can list multiple calls in the `depends_on` array to create a dependency graph (if needed). Adding a call to the `depends_on` array will not automatically re-order the calls, so be sure to list them in the correct order.
```json
"calls": {
"owner": {
"name": "ownerOf",
"params": "event.params.id"
},
"balance": {
"depends_on": ["owner"],
"name": "balanceOf",
"params": "calls.owner"
}
}
```
#### External contract call for known address
Here we are calling a function on an external contract, where we know the address of the contract ahead of time. In this case, we need to include the `abi` and `source` configuration fields.
```json
"calls": {
"usdc_balance": {
"abi": "erc20",
"source": "0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48",
"name": "balanceOf",
"params": "event.params.owner"
}
}
```
#### External contract call for dynamic address
Here we are setting up a 2 call chain to first determine the contract address, then call a function on that contract. In our example, the `contractAddress` function is returning an `Address` type so we can use the call result directly in the `source` field of the second call. If `contractAddress` was instead returning a `string` type, then we would use `"source": "Address.fromString(calls.contract_address)"`, though this would be an unusual case to observe.
```json
"calls": {
"contract_address": {
"name": "contractAddress",
"params": "event.params.id"
},
"balance": {
"depends_on": ["contract_address"],
"abi": "erc20",
"source": "calls.contract_address",
"name": "balanceOf",
"params": "event.params.owner"
}
}
```
#### Required call
Here we are marking a call as required, meaning that if the call fails then the enrichment as a whole will be aborted. This is useful when you do not want to create a new entity (or enrich an existing entity) if the call does not return any meaningful data. Also note that when using `depends_on`, the dependency call is automatically marked as required. This should be used when the address of the contract being called may not always implement the function being called.
```json
"calls": {
"balance": {
"abi": "erc20",
"name": "balanceOf",
"source": "event.params.address",
"params": "event.params.owner",
"required": true
}
}
```
#### Pre and post conditions
Here we are using conditions to prevent a call from being executed or to abort the enrichment if the call result is not satisfactory. Avoiding an eth call can have a big performance impact if the inputs to the call are often invalid. Avoiding the creation of an entity can save on entity counts if the entity is not needed or useful for various call results. Conditions are simply checked at their target site in the enrichment, and evaluated to its negation to check if an abort is necessary (e.g., `true` becomes `!(true)`, which is always false and therefore never aborts). In this example, we're excluding the call if the `owner` is in a deny list, and we're aborting the enrichment if the balance is `0`.
```json
"calls": {
"balance": {
"name": "balanceOf",
"params": "event.params.owner",
"conditions": {
"pre": "![Address.zero().toHexString(), \"0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03\"].includes(event.params.owner.toHexString())",
"post": "result.value.gt(BigInt.zero())"
}
}
}
```
#### Simple entity field mapping constant
Here were are simply replicating the `id` field from the event params into our enrichment entity. This can be useful if you want to filter or sort the enrichment entities by this field.
```json
"MyEntity": {
"id uint256": "event.params.id"
},
```
#### Simple entity field mapping expression
Here we are applying a serialization function to the value of a call result. This is necessary as the enrichment code generator does not resolve the effective type of an expression, so if there is a type mismatch a serialization function must be applied (in this case `String` vs `Address`).
```json
"MyEntity": {
"owner address": "calls.owner.toHexString()"
},
```
#### Complex entity field mapping expression
Here we are conditionally setting the value of `usd_balance` on whether or not the `usdc_balance` call was successful. If the call was not successful, then we set the value to `BigDecimal.zero()`, otherwise we divide the call result by `10^6` (USDC decimals) to convert the balance to a `USD` value.
```json
"MyEntity": {
"usd_balance fixed": "calls.usdc_balance === null ? BigDecimal.zero() : calls.usdc_balance!.divDecimal(BigInt.fromU32(10).pow(6).toBigDecimal())"
},
```
#### Multiple entity instances
Here we are creating multiple instances of an entity in a single enrichment. Each entity id will be suffixed with the provided `id` value.
```json
"MyEntity": [
{
"id": "'sender'",
"mapping": {
"balance fixed": "calls.balance"
}
},
{
"id": "'receiver'",
"mapping": {
"balance fixed": "calls.balance"
}
}
]
```
#### Addressable entity
Here we are creating an entity that is addressable by an explicit id. This means that we can update this entity with new values.
```json
"MyEntity": [
{
"id": "calls.owner.toHexString()",
"explicit_id": true,
"mapping": {
"current_balance fixed": "calls.balance"
}
}
]
```
*We must use an array for our entity definition to allow setting the `explicit_id` flag.*
## Examples
Here are some examples of various instant subgraph configurations. Each example builds on the previous example.
Each of these examples can be saved locally to a file (e.g., `subgraph.json`) and deployed using `goldsky subgraph deploy nouns/1.0.0 --from-abi subgraph.json`.
### Simple NOUNS example
This is a basic instant subgraph configuration, a great starting point for learning about instant subgraphs.
```json5 simple-nouns-config.json
{
"version": "1",
"name": "nouns/1.0.0",
"abis": {
"nouns": [
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"internalType": "address",
"name": "from",
"type": "address"
},
{
"indexed": true,
"internalType": "address",
"name": "to",
"type": "address"
},
{
"indexed": true,
"internalType": "uint256",
"name": "tokenId",
"type": "uint256"
}
],
"name": "Transfer",
"type": "event"
}
]
},
"instances": [
{
"abi": "nouns",
"address": "0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03",
"startBlock": 12985438,
"chain": "mainnet"
}
]
}
```
### NOUNS enrichment with receiver balance on transfer
This example describes a very simple enrichment that adds a `balance` field to a `Balance` enrichment entity. This `balance` field is populated by calling the `balanceOf` function on the `to` address of the `Transfer` event.
```json5 nouns-balance-config.json
{
"version": "1",
"name": "nouns/1.0.0",
"abis": {
"nouns": [
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"internalType": "address",
"name": "from",
"type": "address"
},
{
"indexed": true,
"internalType": "address",
"name": "to",
"type": "address"
},
{
"indexed": true,
"internalType": "uint256",
"name": "tokenId",
"type": "uint256"
}
],
"name": "Transfer",
"type": "event"
},
{
"inputs": [
{ "internalType": "address", "name": "owner", "type": "address" }
],
"name": "balanceOf",
"outputs": [
{ "internalType": "uint256", "name": "", "type": "uint256" }
],
"stateMutability": "view",
"type": "function"
}
]
},
"instances": [
{
"abi": "nouns",
"address": "0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03",
"startBlock": 12985438,
"chain": "mainnet",
"enrich": {
"handlers": {
"Transfer(indexed address,indexed address,indexed uint256)": {
"calls": {
"balance": {
"name": "balanceOf",
"params": "event.params.to",
"required": true
}
},
"entities": {
"Balance": {
"owner address": "event.params.to.toHexString()",
"balance uint256": "calls.balance"
}
}
}
}
}
}
]
}
```
### NOUNS enrichment with sender & receiver balance on transfer entities
This example alters our previous example by capturing the `balance` field on both `FromBalance` and `ToBalance` enrichment entities. This `balance` field is populated by calling the `balanceOf` function on both the `from` and `to` address of the `Transfer` event.
```json5 nouns-balance-config-2.json
{
"version": "1",
"name": "nouns/1.0.0",
"abis": {
"nouns": [
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"internalType": "address",
"name": "from",
"type": "address"
},
{
"indexed": true,
"internalType": "address",
"name": "to",
"type": "address"
},
{
"indexed": true,
"internalType": "uint256",
"name": "tokenId",
"type": "uint256"
}
],
"name": "Transfer",
"type": "event"
},
{
"inputs": [
{ "internalType": "address", "name": "owner", "type": "address" }
],
"name": "balanceOf",
"outputs": [
{ "internalType": "uint256", "name": "", "type": "uint256" }
],
"stateMutability": "view",
"type": "function"
}
]
},
"instances": [
{
"abi": "nouns",
"address": "0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03",
"startBlock": 12985438,
"chain": "mainnet",
"enrich": {
"handlers": {
"Transfer(indexed address,indexed address,indexed uint256)": {
"calls": {
"from_balance": {
"name": "balanceOf",
"params": "event.params.from",
"required": true
},
"to_balance": {
"name": "balanceOf",
"params": "event.params.to",
"required": true
}
},
"entities": {
"FromBalance": {
"owner address": "event.params.from.toHexString()",
"balance uint256": "calls.from_balance"
},
"ToBalance": {
"owner address": "event.params.to.toHexString()",
"balance uint256": "calls.to_balance"
}
}
}
}
}
}
]
}
```
### NOUNS enrichment with mutable current balance on transfer for both sender & receiver
This example alters our previous example balance entities to become a single mutable `Balance` entity, so that both sender and receiver use the same entity.
```json5 nouns-mutable-balance-config.json
{
"version": "1",
"name": "nouns/1.0.0",
"abis": {
"nouns": [
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"internalType": "address",
"name": "from",
"type": "address"
},
{
"indexed": true,
"internalType": "address",
"name": "to",
"type": "address"
},
{
"indexed": true,
"internalType": "uint256",
"name": "tokenId",
"type": "uint256"
}
],
"name": "Transfer",
"type": "event"
},
{
"inputs": [
{ "internalType": "address", "name": "owner", "type": "address" }
],
"name": "balanceOf",
"outputs": [
{ "internalType": "uint256", "name": "", "type": "uint256" }
],
"stateMutability": "view",
"type": "function"
}
]
},
"instances": [
{
"abi": "nouns",
"address": "0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03",
"startBlock": 12985438,
"chain": "mainnet",
"enrich": {
"handlers": {
"Transfer(indexed address,indexed address,indexed uint256)": {
"calls": {
"from_balance": {
"name": "balanceOf",
"params": "event.params.from",
"required": true
},
"to_balance": {
"name": "balanceOf",
"params": "event.params.to",
"required": true
}
},
"entities": {
"Balance": [
{
"id": "event.params.from.toHexString()",
"explicit_id": true,
"mapping": {
"balance uint256": "calls.from_balance"
}
},
{
"id": "event.params.to.toHexString()",
"explicit_id": true,
"mapping": {
"balance uint256": "calls.to_balance"
}
}
]
}
}
}
}
}
]
}
```
We can now query the `Balance` entity by the owner address (`id`) to see the current balance.
```graphql
{
balance(id: "0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03") {
id
balance
}
}
```
### NOUNS enrichment with declared eth call
This example alters our previous example by adding the `declared` flag to boost performance of the `balanceOf` eth calls. declared calls only work for eth calls that have no mapping handler dependencies, in other words the call can be executed from the event params only. Also note that call handlers do not support delcared calls (yet), if `declared` is set on a call handler enrichment it will be ignored.
```json5 nouns-declared-calls-config.json
{
"version": "1",
"name": "nouns/1.0.0",
"abis": {
"nouns": [
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"internalType": "address",
"name": "from",
"type": "address"
},
{
"indexed": true,
"internalType": "address",
"name": "to",
"type": "address"
},
{
"indexed": true,
"internalType": "uint256",
"name": "tokenId",
"type": "uint256"
}
],
"name": "Transfer",
"type": "event"
},
{
"inputs": [
{ "internalType": "address", "name": "owner", "type": "address" }
],
"name": "balanceOf",
"outputs": [
{ "internalType": "uint256", "name": "", "type": "uint256" }
],
"stateMutability": "view",
"type": "function"
}
]
},
"instances": [
{
"abi": "nouns",
"address": "0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03",
"startBlock": 12985438,
"chain": "mainnet",
"enrich": {
"handlers": {
"Transfer(indexed address,indexed address,indexed uint256)": {
"calls": {
"from_balance": {
"name": "balanceOf",
"params": "event.params.from",
"required": true,
"declared": true
},
"to_balance": {
"name": "balanceOf",
"params": "event.params.to",
"required": true,
"declared": true
}
},
"entities": {
"Balance": [
{
"id": "event.params.from.toHexString()",
"explicit_id": true,
"mapping": {
"balance uint256": "calls.from_balance"
}
},
{
"id": "event.params.to.toHexString()",
"explicit_id": true,
"mapping": {
"balance uint256": "calls.to_balance"
}
}
]
}
}
}
}
}
]
}
```
# Declared eth-calls
Source: https://docs.goldsky.com/subgraphs/guides/declared-eth-calls
Improve your subgraph performance by declaring eth_calls
[Declarative eth\_calls](https://thegraph.com/docs/en/developing/creating-a-subgraph/#declared-eth_call) are a valuable subgraph feature that allows eth\_calls to be executed ahead of time which significantly improves the indexing performance.
Check out [this example](https://github.com/goldsky-io/documentation-examples/tree/main/goldsky-subgraphs/declared-eth-call-subgraph) for a practical implementation of this method using an ERC-20 subgraph on the Taiko network
# Send subgraph-driven webhooks
Source: https://docs.goldsky.com/subgraphs/guides/send-subgraph-driven-webhooks
Receive real-time HTTP POST requests based on your subgraphs.
Power Discord notifications, back-end operations, orderbooks, and more with webhooks for subgraphs. Receive real-time HTTP POST requests to your backends whenever a subgraph indexes a new event. Every project has webhooks enabled by default for free.
Let's speed-run a simple example of a webhook. We'll create a webhook that sends a POST request to a URL of your choice whenever a trade occurs on the X2Y2 exchange.
## What you'll need
1. One or more subgraphs in your project - this can be from community subgraphs, a deployed subgraph, or a [no-code subgraph](/subgraphs/guides/create-a-no-code-subgraph).
2. A webhook handler; making a fully functional webhook handler is out of scope for this walkthrough so we'll be using a test platform called [Webhook.site](https://webhook.site/).
## Walkthrough
Use Messari's [x2y2](https://thegraph.com/hosted-service/subgraph/messari/x2y2-ethereum) subgraph to the x2y2 exchange.
```shell
> goldsky subgraph deploy x2y2/v1 --from-ipfs-hash Qmaj3MHPQ5AecbPuzUyLo9rFvuQwcAYpkXrf3dTUPV8rRu
Deploying Subgraph:
β Downloading subgraph from IPFS (This can take a while)
β Validating build path
β Packaging deployment bundle from /var/folders/p5/7qc7spd57jbfv00n84yzc97h0000gn/T/goldsky-deploy-Qmaj3MHPQ5AecbPuzUyLo9rFvuQwcAYpkXrf3dTUPV8rRu
```
Let's use a pre-made webhook handler by going to [webhook.site](https://webhook.site) and copying the URL. It may look like something like `https://webhook.site/`
Don't use format `https://webhook.site/#!/`
Any new webhook can be sent to this URL and we'll be able to see and inspect the request body.
Create a webhook to receive x2y2 trades.
```shell
> goldsky subgraph webhook create x2y2/v1 --name x2y2-trade-webhook --entity trade --url https://webhook.site/
β Creating webhook
Webhook 'x2y2-trade-webhook' created.
Make sure calls to your endpoint have the following value for the 'goldsky-webhook-secret' header: whs_01GNV4RMJCFVH14S4YAFW7RGQK
```
A secret will be generated for you to use in your webhook handler. This secret is used to authenticate the webhook request. You can ignore it for the purposes for this speed run.
Inspect the webhook.site URL (or your custom handler) again, you should see events start to stream in.
# Subgraph deploy wizard
Source: https://docs.goldsky.com/subgraphs/guides/subgraph-deploy-wizard
## What you'll need
1. The contract address(es) you're interested in indexing.
2. That's it! π
## Walkthrough
We're going to build a subgraph to track the [Nouns contract](https://etherscan.io/address/0x9c8ff314c9bc7f6e59a9d9225fb22946427edc03) on `mainnet`.
```
goldsky subgraph init
```
*Remember to run `goldsky login` first if you haven't already authenticated with Goldsky.*
This will launch the wizard and guide you through the process of deploying a subgraph on Goldsky.
```
β Goldsky Subgraph configuration wizard
```
The name must start with a letter and contain only letters, numbers, underscores, and hyphens.
e.g., `nouns-demo`
```
β
β Subgraph name
β nouns-demo
β
```
*see [related argument documentation](#nameandversion-positional-argument)*
This will default to `1.0.0`, but you can change this to anything as long as it starts with a letter or number and contains only letters, numbers, underscores, hyphens, pluses, and periods.
e.g., `1.0.0-demo+docs`
```
β
β Subgraph version
β 1.0.0-demo+docs
β
```
*see [related argument documentation](#nameandversion-positional-argument)*
This must be any valid path on your system, and will default to subgraph name and version as parent and child directories respectively. The target path is where the no-code subgraph configuration will be written, as well as where any remotely fetched files will be saved. Target path is expanded, with `~` (user home directory) and environment variables being replaced accordingly.
*If you have already run through this guide, or you already have created `~/my-subgraphs/nouns-demo/1.0.0-demo+docs` then this step will be followed with a prompt to confirm overwriting existing files.*
e.g., `~/my-subgraphs/nouns-demo/1.0.0-demo+docs`
```
β
β Subgraph path
β ~/my-subgraphs/nouns-demo/1.0.0-demo+docs
β
```
*see [related argument documentation](#target-path)*
In most cases this can be left blank so that we automatically source ABIs from local and remote sources. If you have local path(s) that contain various ABIs, you can specify them here.
e.g., `~/my-subgraphs/abis`
*In this case, we'll leave this blank here because we haven't saved any ABIs locally to `~/my-subgraphs/abis` yet.*
```
β
β Contract ABI source
β path/to/abi, leave blank to skip
β
```
*see [related argument documentation](#abi)*
You can add any number of contract addresses here (as long as you add at least one). After entering all details about a contract, you'll be asked if you want to add another contract. Contract addresses must begin with a `0x` and be exactly `42` characters long.
e.g., `0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03`
```
β
β Contract address
β 0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03
β
```
*see [related argument documentation](#contract)*
Decide which network you would like to index for this contract, refer to our [supported networks](/chains/supported-networks) for the full list of available options. If the wrong network is selected, your contract may not exist on that network and no data will indexed.
e.g., `mainnet`
```
β
β Contract network
β mainnet
β
```
*see [related argument documentation](#network)*
The start block will be automatically determined based on the network you specified in the previous step. A remote source is interrogated to determine this start block, but not all remote sources are able to respond with a valid start block value. If the remote source is unable to acquire a valid start block then the prompt will fallback to `0` and you'll be able to manually enter a start block. If you are unsure what the start block might be, using `0` is a safe bet but may result in a longer indexing time before any data is available.
e.g., `12985438`
*In this case, the wizard should have automatically determined the start block for our contract on `mainnet`. If there is a networking issue and the start block is not fetched automatically, please enter `12985438` manually.*
*On some networks, contracts deployed more than a year ago may not be possible to automatically determine the start block due to a default configuration option in a common RPC provider software.*
```
β
β Found start block: 12985438
β
β Start block
β 12985438
β
```
*see [related argument documentation](#start-block)*
In some cases, you may want to index the same contract on multiple networks. If this is the case, you can choose `Yes` and add another network here to repeat the past `2` steps for another network. If you only want to index this contract on one network, you can choose `No` and move on to the next step.
*In this case, we only want to index this contract on the `mainnet` network, so we'll choose `No`.*
```
β
β Add another network?
β β Yes / β No
β
```
The contract name will be used to produce generated subgraph code files. This should be a human-readable name that describes the contract you're indexing and must begin with a letter and contain ony letters, numbers, hypens, underscores, and spaces.
e.g., `NOUNS`
*The contract name does not need to be all caps, this is just a convention used in this example.*
```
β
β Contract name
β NOUNS
β
```
*see [related argument documentation](#contract-name)*
In some cases, you may want to index multiple contracts in the same subgraph. If this is the case, you can choose `Yes` and add another contract here to repeat all past steps since previously entering a contract for a new contract. If you only want to index this one contract, you can choose `No` and move on to the next step.
*In this case, we only want to index this one contract, so we'll choose `No`.*
```
β
β Add another contract?
β β Yes / β No
β
```
The subgraph description is only for your own reference and will not be used in the generated subgraph code. This can be any text you like, or left empty if no description is desired. The wizard will start with a generic default description.
e.g., `Goldsky Instant Subgraph for NOUNS`
*In this case, we'll accept the generic default description.*
```
β
β Subgraph description
β Goldsky Instant Subgraph for NOUNS
β
```
*see [related argument documentation](#description)*
By enabling call hanlders, the subgraph will index all contract calls in addition to events. This will increase the amount of data indexed and may result in a longer indexing time. Choose `Yes` to include calls, otherwise if you only want to index contract events you can choose `No` and move on to the next step.
*In this case, we will include call handlers, so we'll choose `Yes`.*
```
β
β Enable subgraph call handlers?
β β Yes / β No
β
```
*see [related argument documentation](#call-handlers)*
We've finished collecting all the necessary information to initialize your subgraph. A brief summary of all your choices as well as a note on whether build and/or deploy is enabled by default is displayed (you will still have an option to cancel before building or deploying). If you're ready to proceed, choose `Yes` to generate the no-code subgraph configuration file. If anything doesn't look quite right you can choose `No` to abort the wizard and start over.
*In this case, we're happy with all our choices and will choose `Yes` to proceed.*
```
β
β Subgraph configuration summary
β
β Build and deploy will be performed
β
β Name: nouns-demo
β Description: Goldsky Instant Subgraph for NOUNS
β Version: 1.0.0-demo+docs
β TargetPath: /Users/someone/my-subgraphs/nouns-demo/1.0.0-demo+docs
β CallHandlers: enabled
β AbiSources:
β - /Users/someone/my-subgraphs/nouns-demo/1.0.0-demo+docs/abis
β Contracts:
β - Address: 0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03
β Name: NOUNS
β Networks:
β - Network: mainnet
β StartBlock: 12985438
β
ββββ
β
β Proceed with subgraph initialization?
β β Yes / β No
β
```
This step is where we fetch any missing ABI's from remote sources.
Once all no-code subgraph configuration files have been written to the target path, the wizard will ask if you would like to proceed with the build stage. This will compile the generated subgraph(s) into a deployable artifact. If you choose `Yes`, the wizard will run the build stage. If you choose `No`, the wizard will exit and all configuration files will remain in the target path.
*In this case, we will choose `Yes` to proceed with the build stage.*
*If you haven't yet logged in with `goldsky login`, the build step will abort with guidance to login first.*
```
β
β Subgraph configuration complete!
β Initializing subgraph nouns-demo/1.0.0-demo+docs
β
β Writing subgraph files to '/Users/someone/my-subgraphs/nouns-demo/1.0.0-demo+docs': All subgraph configuration files written!
β
β Proceed with subgraph build?
β β Yes / β No
β
```
Once the build stage has completed, the wizard will ask if you would like to proceed with the deploy stage. This will deploy the built subgraph(s) to Goldsky for the networks configured (1 subgraph per network). If you choose `Yes`, the wizard will run the deploy stage. If you choose `No`, the wizard will exit and all configuration files will remain in the target path.
*In this case, we will choose `Yes` to proceed with the deploy stage.*
```
β
β Building subgraphs: 1 subgraph built!
β
β Proceed with subgraph deploy?
β β Yes / β No
β
```
Our subgraph has now been successfully deployed to Goldsky. The wizard provides a summary of the files written locally, the builds and deploys that were performed, and links to the subgraph dashboard and the GraphiQL web interface to query the subgraph data.
```
β
β Deploying 1 subgraphs
β
β nouns-demo-mainnet/1.0.0-demo+docs deployed
β
β Subgraph initialization summary
β
β Configuration files:
β
β β’ β¦/nouns-demo/1.0.0-demo+docs/abis/nouns.json
β β’ β¦/nouns-demo/1.0.0-demo+docs/nouns-demo-mainnet-subgraph.json
β
β Build:
β
β β BUILT mainnet
β
β Deploy:
β
β β DEPLOYED nouns-demo-mainnet/1.0.0-demo+docs
β
ββββ
β
β Deployed subgraph summary
β
β nouns-demo-mainnet/1.0.0-demo+docs
β
β β’ Dashboard: https://app.goldsky.com/β¦/dashboard/subgraphs/nouns-demo-mainnet/1.0.0-demo+docs
β β’ Queries : https://api.goldsky.com/api/public/β¦/subgraphs/nouns-demo-mainnet/1.0.0-demo+docs/gn
β
ββββ
β
β Subgraph initialization complete!
```
*Most terminals will allow you to `Cmd+click` or `Ctrl+click` on the links to open them in your default browser.*
With our subgraph deployed we can now monitor its indexing progress and stats using the Goldsky Subgraph *Dashboard* link provided by the wizard. Over the next few minutes our subgraph will reach the edge of mainnet and our queryable data will be fully up to date.

*It could take up to a few hours for this subgraph to fully index.*
We can now use the GraphiQL *Queries* web interface link provided by the wizard to query the subgraph data. The GraphiQL web interface allows us to test out queries and inspect the indexed data for our the subgraph. The GraphiQL link is also available from the Goldsky Subgraph dashboard. We can use the following query to monitor the latest (`5`) Nouns minted as the subgraph data is indexed.
```graphql
query LatestNouns($count: Int = 5) {
nounCreateds(first: $count, orderBy: tokenId, orderDirection: desc) {
id
block_number
transactionHash_
timestamp_
tokenId
seed_background
seed_body
seed_accessory
seed_head
seed_glasses
}
}
```
*We can query the data as it is being indexed, however until our indexing reaches the edge of the chain we won't be able to see the most recent on-chain data.*
## Wizard CLI arguments
The wizard CLI has many optional arguments that you can use to reduce the amount of manual input required. If sufficient arguments are provided, the wizard will run in non-interactive mode and automatically generate the no-code subgraph configuration file without any prompting. If some arguments are provided but not enough for non-interactive mode, the wizard will run in interactive mode and prompt you for any missing information but automatically prepare the default response with any arguments provided so that you may hit enter to use your supplied argument value.
All arguments are optional, if none are supplied then all information will be collected interactively.
### `nameAndVersion` positional argument
This is the only positional argument in the format `name`/`version`. It can be omitted completely, provided as only a `name`, or provided as the full `name` and `version` pair. If only the `name` is provided then the `/` should be omitted. It is not possible to only provide a `version` without a `name`.
* The `name` must start with a letter and contain only letters, numbers, underscores, and hyphens for the name portion.
* The `version` must start with a letter or number and contain only letters, numbers, underscores, hyphens, pluses, and periods
#### Examples
* `my-subgraph_2024/1.0.0`
* `my-subgraph_2024`
### `--target-path`
The target path can be an absolute or relative path to a local directory. If the directory does not yet exist then it will be created, if it does exist then the `--force` [argument](#force) must be provided to overwrite existing files.
#### Examples
All of these examples should result in the same target path (for a user named `someone`).
* `~/my-subgraphs`
* `$HOME/my-subgraphs`
* `/Users/someone/my-subgraphs`
* `$(pwd)/my-subgraphs`
### `--force`
This switch prevents the wizard from prompting you to confirm overwriting existing files, or aborting in non-interactive mode.
#### Examples
* `--force` or `--force true` to overwrite
* `--no-force` or `--force false` avoid overwriting
### `--from-config`
If you already have an existing no-code configuration file, you can provide the path to that file here. The wizard will use this file as a template and prompt you for any missing information as well as attempt to fetch any remote files that are not present. Both JSON and yaml formats are supported, and the file must conform to the [version 1 schema](#version-1).
#### Examples
* `~/my-subgraphs/my-subgraph_2024/1.0/subgraph_config.json`
* `~/my-subgraphs/my-subgraph_2024/1.0/subgraph_config.yaml`
### `--abi`
This argument provides the ABI sources, multiple sources can be provided by joining with a comma. Currently only local sources are supported. Known remote sources for ABI's on various supported networks will be automatically used if no local sources can provide an ABI.
#### Examples
* `~/my-subgraphs/abis`
* `~/my-subgraphs/abis,~/my-abis`
### `--contract`
This argument provides the contract address or addresses to index, multiple addresses can be provided by joining with a comma. Each address must begin with a `0x` and be exactly `42` characters long. When supplying multiple contract addresses, interactive mode will provide defaults for each supplied contract successively and default to adding more contracts until until all supplied contracts have been configured.
#### Examples
* `0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03`
* `0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03,0xA178b166bea52449d56895231Bb1194f20c2f102`
### `--contract-events`
This argument provides the contract events to index, multiple events can be provided by joining with a comma. Only valid event names for the contract ABI will be applied, any discrepancy will present the interactive event selector. When supplying no events the interactive event selector always appear.
#### Examples
* `NounCreated`
* `NounCreated,NounBurned`
### `--contract-calls`
This argument provides the contract calls to index, multiple calls can be provided by joining with a comma. Only valid calls names for the contract ABI will be applied, any discrepancy will present the interactive call selector. When supplying no calls the interactive call selector always appear.
#### Examples
* `approve`
* `approve,burn`
### `--network`
This argument provides the network to index the contract on. The network must be one of the supported networks, refer to our [supported networks](/chains/supported-networks) for the full list of available options. Multiple networks can be provided by joining with a comma. When supplying multiple networks, interactive mode will provide defaults for each supplied network successively and default to adding more networks until all supplied networks have been configured. Note that multiple networks will be applied to each contract supplied, so multiple networks and multiple contracts result in the cartesian product of networks and contracts.
#### Examples
* `mainnet`
* `mainnet,xdai` *(for a single contract, means 2 networks for the same contract are indexed)*
* `mainnet,xdai` *(for two contracts, means 2 contracts for each network, 4 contracts total indexed, 2 per network)*
### `--start-block`
This argument provides the start block to index from, multiple start blocks can be provided by joining with a comma. When supplying multiple start blocks, interactive mode will provide defaults for each supplied start block successively and default to adding more start blocks until all supplied start blocks have been configured. Because a start block is required for each contract and network combination, multiple contracts and multiple networks result in the cartesian product of start blocks. In cases where the start block is not known ahead of time for some contract and network pairs, it can be left empty with successive commas to allow the wizard to attempt to determine the start block from a remote source.
#### Examples
* `12985438`
* `12985438,20922867`
* `12985438,,20922867` *(for 2 contracts and 2 networks, where we know the start blocks for both contracts on the 1st network but not the 2nd network)*
### `--contract-name`
This argument provides the contract name to use in the generated subgraph code, multiple contract names can be provided by joining with a comma. If any contract names contain spaces, the whole argument must be wrapped in quotes. Each contract name must start with a letter and contain only letters, numbers, hypens, underscores, and spaces. When supplying multiple contract names, interactive mode will provide defaults for each supplied contract successively and default to adding more contracts until all supplied contracts have been configured.
#### Examples
* `My-Subgraph_Data`
* `"My Subgraph Data"`
* `"My Subgraph Data,My Other Subgraph Data"`
* `subgraph1,subgraph2`
### `--description`
This argument provides the description for the whole no-code subgraph deployment. If multiple networks are supplied the same description will be used for each subgraph deployuments on each network.
### `--call-handlers`
This switch enables call handlers for the subgraph. By default, call handlers are disabled and only events are indexed. Enabling call handlers will increase the amount of data indexed and may result in a longer indexing time but will provide more contract interaction data.
#### Examples
* `--call-handlers` or `--call-handlers true` to enable
* `--no-call-handlers` or `--call-handlers false` to disable
### `--build`
This switch enables the build stage after the wizard has completed writing the configuration files. By default, the build stage is enabled in interactive mode and disabled in non-interactive mode. Enabling the build stage will compile the generated subgraph(s) into a deployable artifact. Explicitly disabling the build stage will also prevent the deploy stage from running, `--no-build` is all that is required to stop after the write files stage.
#### Examples
* `--build` or `--build true` to enable
* `--no-build` or `--build false` to disable
### `--deploy`
This switch enables the deploy stage after the wizard has completed building the subgraph(s). By default, the deploy stage is enabled in interactive mode and disabled in non-interactive mode. Enabling the deploy stage will deploy the built subgraph(s) to the specified network(s). Enabling the deploy stage will implicitly enable the build stage, `--deploy` is all that is required to run both build and deploy stages.
#### Examples
* `--deploy` or `--deploy true` to enable
* `--no-deploy` or `--deploy false` to disable
## Non-interactive mode
If you're looking to automate the process of deploying a subgraph, you can use the wizard in non-interactive mode by passing all the necessary arguments as flags. This can be useful if you're looking to deploy a subgraph as part of a CI/CD pipeline or other automated process. The command will still write all the necessary files to your target path, but it won't prompt you for any input. If the wizard cannot determine a required input value, the command will abort.
It is recommended to use `--force` and `--build` or `--deploy` flags when running the wizard in non-interactive mode. This will ensure that existing files are overwritten and that the subgraph is built and/or deployed after initialization.
### Examples
1. Deploy the **NOUNS** subgraph on `mainnet`
```
goldsky subgraph init nouns-demo/1.0.0 \
--contract 0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03 \
--network mainnet \
--start-block 12985438 \
--contract-name NOUNS \
--call-handlers \
--deploy
```
2. Deploy the **NOUNS** subgraph on `mainnet` with the interactive event and call selectors
```
goldsky subgraph init nouns-demo/1.0.0 \
--contract 0x9C8fF314C9Bc7F6e59A9d9225Fb22946427eDC03 \
--contract-events \
--contract-calls \
--network mainnet \
--start-block 12985438 \
--contract-name NOUNS \
--call-handlers \
--deploy
```
3. Deploy the **Uniswap v3** subgraph on `mainnet`
```
goldsky subgraph init uniswap-v3/1.0.0 \
--contract 0x1f9840a85d5aF5bf1D1762F925BDADdC4201F984 \
--network mainnet \
--start-block 10861674 \
--contract-name UniswapV3 \
--call-handlers \
--deploy
```
## Configuration schemas
See the [Instant subgraph configuration reference](/reference/config-file/instant-subgraph) for more information on the configuration schema.
# Introduction
Source: https://docs.goldsky.com/subgraphs/introduction
# Index Onchain Data with Subgraphs
Goldsky provides a completely backwards-compatible subgraph indexing solution. The core of the indexing uses exactly the same WASM processing layer, but in addition, Goldsky offers:
* a rewritten RPC layer, autoscaling query layer, and storage optimizations to improve reliability (99.9%+ uptime) and performance (up to 6x faster)
* webhooks support out-the-box to enable notifications, messaging, and other push-based use cases
* support for custom EVM chains so you can index your own rollup or private blockchain seamlessly
Deploy a subgraph to Goldsky's shared indexing infrastructure in a number of different ways.
Migrate a subgraph from The Graph's hosted service or any other subgraph host.
Use Tags to manage your subgraph endpoints and swap them in/out seamlessly.
Learn the pros and cons of Goldsky's dedicated indexing infrastructure.
# Migrate from The Graph or another host
Source: https://docs.goldsky.com/subgraphs/migrate-from-the-graph
Goldsky provides a one-step migration for your subgraphs on The Graph's hosted service / decentralized network, or other subgraph host (including your own graph-node). This is a **drop-in replacement** with the following benefits:
* The same subgraph API that your apps already use, allowing for seamless, zero-downtime migration
* A load-balanced network of third-party and on-prem RPC nodes to improve performance and reliability
* Tagging and versioning to hotswap subgraphs, allowing for seamless updates on your frontend
* Alerts and auto-recovery in case of subgraph data consistency issues due to corruption from re-orgs or other issues
* A world-class team who monitors your subgraphs 24/7, with on-call engineering support to help troubleshoot any issues
## Migrate subgraphs to Goldsky
If you have subgraphs deployed to The Graphβs hosted service, the following command seamlessly migrates your subgraph to Goldsky:
```bash
goldsky subgraph deploy your-subgraph-name/your-version --from-url
```
If you have subgraphs deployed to The Graph's decentralized network, use the IPFS hash instead (visible on The Graph's Explorer page for the specified subgraph):
```bash
goldsky subgraph deploy your-subgraph-name/your-version --from-ipfs-hash
```
You can get this IPFS deployment hash by querying any subgraph GraphQL endpoint with the following query:
```GraphQL
query {
_meta {
deployment
}
}
```
## Monitor indexing progress
Once you started the migration with the above command, you can monitor your subgraph's indexing status with:
```bash
goldsky subgraph list
```
Alternatively, navigate to [app.goldsky.com](https://app.goldsky.com) to see your subgraphs, their indexing progress, and more.
# Choosing shared vs. dedicated
Source: https://docs.goldsky.com/subgraphs/serverless-vs-dedicated
## Serverless subgraphs
When you make a new subgraph on Goldsky, by default it's hosted on our highly resilient **Serverless Subgraph Platform**.
The platform is fully autoscaling, with a re-engineered RPC and storage layer, and is tuned for fast indexing across the majority of use-cases. It's also completely backwards compatible and runs the same WASM engine as the vanilla open-source graph-node engine.
* Optimized RPC multi-provider layer with a global cache that uses a combination of dedicated and commercial RPC APIs for uptime
* I/O optimized database with under 1ms average commit times
## Dedicated subgraph indexers
When you need improved customizability and performance, Goldsky offers dedicated subgraph indexing nodes. Dedicated machines are provisioned for your project, allowing for customization and optimization at both the indexing and querying layers.
### Indexing enhancements
* support for any EVM-compatible private chain or app chain
* custom RPC layer optimizations methods based on subgraph needs to improve indexing speed
### Querying enhancements
* enable caching with custom rules
* custom database optimizations to speed up specific query patterns, bringing expensive queries down from seconds to milliseconds
To launch a dedicated indexer, please contact us via email at [sales@goldsky.com](mailto:sales@goldsky.com) to get started within one business day.
### Limitations
By default, dedicated indexers are disconnected from Goldsky's [Mirror](mirror/sources/subgraphs) functionality; if you'd like to index and mirror a custom EVM chain, [contact us](mailto:sales@goldsky.com).
# Example Subgraphs Repo
Source: https://docs.goldsky.com/subgraphs/subgraphs-github
# Manage API endpoints with tags
Source: https://docs.goldsky.com/subgraphs/tags
Tags are used to maintain a consistent GraphQL endpoint. You can treat them like pointers or aliases to specific versions, allowing you to swap in new subgraphs in your app without changing your front-end code.
By default, subgraph API endpoints are named after the subgraph name and version, so if you update your subgraph to a new version, you'll need to update your front end to point to the new endpoint.
Using tags, you can manage your versions and seamlessly upgrade your subgraph version without having the URL change.
In this example, we'll assume you have already deployed a subgraph with the name and version `poap-subgraph/1.0.0`. We'll show you how to create a tag and how to move it to another subgraph version.
First, create a tag using the Goldsky CLI and associate it with your subgraph.
```shell
goldsky subgraph tag create subgraph/1.0.0 --tag prod
```
We've now created a new tag called `prod`. Now our GraphQL endpoint will use the word `prod` instead of the version number. You should see the new GraphQL endpoint listed in your terminal after running the command.
Let's say you've upgraded your `poap-subgraph` to verison `2.0.0` and want to start querying it with your `prod` GraphQL endpoint. It's as simple as creating the tag again on the new version.
```shell
goldsky subgraph tag create subgraph/2.0.0 --tag prod
```
Like before, you should see the GraphQL endpoint after running this command, and it should be the same as before. Now your queries will be routed to the `2.0.0` version of the subgraph seamlessly
# Subgraph Webhooks
Source: https://docs.goldsky.com/subgraphs/webhooks
Create webhooks that trigger on every subgraph entity change
When you need to execute code or update a backend based on webhooks, you can use subgraph webhooks to send a payload to an HTTP server for every subgraph entity change.
See the [webhook quick start](/subgraphs/guides/send-subgraph-driven-webhooks) for more a step by step guide in using this feature.
If you're using this feature to push and sync data to a database, consider using [mirror](/subgraphs/guides/create-a-multi-chain-subgraph) to sync subgraph data to your backend with guaranteed data delivery.
## How it works
When a subgraph handler does something like `entity.save()`, an update is written to an intermediate db which powers the subgraph API. This update is interpreted by a real-time watcher and set to your webhook handler, with an `UPDATE`, `INSERT`, or `DELETE` operation.
### Interpreting entity updates
If you're tracking an immutable entity (as in one that does not get updated), then this section is not applicable.
Subgraphs store all versions of entities, each with a `block_range` column which shows it's valid for each block range. This allows you to distinguish between an entity changing vs a change being rolled-back due to blockchain reorgs.
### Entity updates and removals
Updates (when an existing entity's `.save()` is called) in a subgraph entity system is denoted as a new version row being created, with a corresponding update on the last version's row.
There is an entity with the `id: 1` created at block 1. A webhook will fire:
```json
{
op: "INSERT"
data: {
new: {
id: 1,
value: 1,
vid: 2,
block_range: "[1,)"
},
old: null
}
}
```
In the following block number 2, the entity is updated again.
Two webhooks are then fired. One to track the new version being created,
```json
{
op: "INSERT"
data: {
new: {
id: 1,
value: 2,
vid: 2,
block_range: "[2,)"
},
old: null
}
}
```
Another to track the previous version being updated,
```json
{
op: "UPDATE"
data: {
new: {
id: 1,
value: 1,
vid: 1,
block_range: "[1,2)"
},
old: {
id: 1,
value: 1,
vid: 1,
block_range: "[1,)"
}
}
}
```
Similar to updates, entity removal in a subgraph mapping handler simply involves updating the block range associated with the entity. There is no actual row deletion outside of blockchain reorganizations.
Entities with a "closed" block range (e.g., \[123, 1234)) can be removed if they aren't needed for historical state.
It is recommended to maintain a "deleted\_at" and "updated\_at" timestamp in the local representation of the entity and keep them updated accordingly.
### Tracking the latest state
If your goal is to track the latest state of an entity for the most recent block, when you see any `CREATE` or `UPDATE` webhook, you can do an `upsert` in your database for the `id`. The `id` always tracks a unique entity. The `vid` in the payload denotes the version of the `id`, where the highest `vid` is the latest version.
### Handling Updates and Race Conditions
It is important to note that there is no guarantee of ordering between the insert and update operation webhooks, as they are part of the same atomic operation when a subgraph mapping handler runs.
An effective strategy involves utilizing the "deleted\_at" and "updated\_at" flags in the local representation to manage any potential race conditions.
## Reference
### Create a new webhook
To create a new webhook for a subgraph entity:
```shell
goldsky subgraph webhook create my-subgraph/1.0.0 --name "" --url "" --entity ""
```
Optionally, you can also add `--secret "some-secret"` to have control over the secret you can use to identify valid traffic from goldsky.
### List webhooks
To see a list of already configured webhooks:
```shell
goldsky subgraph webhook list
```
## Delete a webhook
If you no longer need a webhook, you can delete it with the following command:
```shell
goldsky subgraph webhook delete
```
### Webhook Payload
The webhook payload is a JSON object with the following fields:
```json
{
"op": "INSERT", // Can be either INSERT, UPDATE, or DELETE
"data_source": "x2y2/v1", // The subgraph or indexer that is being tracked
"data": {
"old": null, // Entity Data, null if op is INSERT
"new": { // Entity data, null if op is DELETE
// This is an example from a subgraph tracking x2y2
"amount": "1",
"log_index": 268,
"price_eth": "0.017",
"strategy": "STANDARD_SALE",
"collection": "0x7bdb0a896efacdd130e764f426e555d1ebb52f54",
"seller": "0xd582a0530a1e5aee63052a68aa745657a8471504",
"transaction_hash": "0x996d3c9cda22fa47e9bb16e4837a28fccbd5643c952ed687a80fd97ceafb69c6",
"id": "0x996d3c9cda22fa47e9bb16e4837a28fccbd5643c952ed687a80fd97ceafb69c6-268",
"block_number": "16322627",
"vid": "1677156",
"timestamp": "1672705139",
"is_bundle": false,
"buyer": "0x539ea5d6ec0093ff6401dbcd14d049c37a77151b",
"block_range": "[16322627,)",
"token_id": "383"
}
},
"webhook_name": "x2y2-webhook", // Name of your webhook
"webhook_id": "webhook_clcfdc9gb00i50hyd43qeeidu" // Uniquely generated ID for the webhook
"id": "36a1a4a6-1411-4a13-939c-9dd6422b5674", // Unique ID for the event
"delivery_info": {
"max_retries": 10,
"current_retry": 0
},
"entity": "trade" // The subgraph entity being tracked
}
```
# Teams and projects
Source: https://docs.goldsky.com/teams-and-projects
Teams and projects help you collaborate with your team to build realtime data pipelines.
## Overview
Projects are the primary structure around which work on Goldsky is organized. Every project consists of a group of subgraphs and pipelines, as well as a list of team members who have access to that project.
To manage team members for a given project, select the project and navigate to the [Settings](https://app.goldsky.com/dashboard/settings) menu.
Goldsky supports [RBAC](/rbac) to help restrict who can do what.
From the settings menu, click `Invite User` and enter your team members' email address (invitees needs to have a Goldsky account).
From the settings menu, click the tree dots next to the team members' name and click "Remove Team Member".
Leaving a project is only available if you're not the only member of the project and it's not your only project.
You can remove yourself from a project by clicking the three dots next to your account under βPersonalβ and clicking βLeave Projectβ.
## Using the Command Line to Manage Teams and Projects
Project and team management is also supported through the command line interface - you can find a description of all project and team-related commands in the [CLI Reference](/reference/cli#project).
* `goldsky login` Login using the API key that you generated in Settings at [https://app.goldsky.com](https://app.goldsky.com) for the given project.
* `goldsky project list` lists all projects youβre a member of
* `goldsky project create --name ""` creates a new project. Note: this will not log you into that project: you need to go to [https://app.goldsky.com/dashboard/settings](https://app.goldsky.com/dashboard/settings) and generate the API key for that project, then use `goldsky login` with that key.
* `goldsky project update --name ""` will update the name of the currently active project.
* `goldsky project leave --projectId ""` will remove yourself from the project specified. Note: you cannot leave the project you're currently authenticated with.
* `goldsky project users list` will list all the team members of the currently active project
* `goldsky project users invite --emails "" "" (passing as many emails as you want) --role ` will add new users who already have Goldsky accounts to the active project. Note that if you enter email addresses of people who donβt already have Goldsky accounts, youβll see an error message and none of the users will be invited, even if some people in the list already have Goldsky accounts. Use the `--role` flag to determine what permissions these new user(s) will have in your project.
* `goldsky project users remove --email ""` will remove a team member from the active project (youβll see an error message if theyβre not a team member of the project).
* `goldsky project users update --email "" --role ` will update the role of a user in the active project (youβll see an error message if theyβre not a team member of the project)