Name of the pipeline. Must only contain lowercase letters, numbers, hyphens
and should be less than 50 characters.
Sources represent origin of the data into the pipeline.Supported source types:
Transforms represent data transformation logic to be applied to either a source and/or transform in the pipeline.
If your pipeline does not need to transform data, this attribute can be an empty object.Supported transform types:
Sinks represent destination for source and/or transform data out of the pipeline.Supported sink types:
It defines the amount of compute power to add to the pipeline. It can take one
of the following values: “s”, “m”, “l”, “xl”, “xxl”. For new pipeline
creation, it defaults to “s”. For updates, it defaults to the current
resource_size of the pipeline.
Description of the pipeline.
Sources
Represents the origin of the data into the pipeline. Each source has a unique name to be used as a reference in transforms/sinks.sources.<key_name> is used as the referenceable name in other transforms and sinks.
Subgraph Entity
Use your subgraph as a source for your pipeline.Example
In the sources section of your pipeline configuration, you can add asubgraph_entity per subgraph entity that you want to use.
example.yaml
Schema
Unique name of the source. This is a user provided value.
Defines the type of the source, for Subgraph Entity sources, it is always
subgraph_entity.Description of the source
Entity
name in your subgraph.earliest processes data from the first block.latest processes data from the latest block at pipeline start time.Defaults to latestFilter expression that does a fast scan on the dataset. Only useful when
start_at is set to earliest.Expression follows the SQL standard for what comes after the WHERE clause. Few examples:References deployed subgraphs(s) that have the entity mentioned in the Supports subgraphs deployed across multiple chains aka cross-chain usecase.Cross-chain subgraph full example
name attribute.Dataset
Dataset lets you define Direct Indexing sources. These data sources are curated by the Goldsky team, with automated QA guaranteeing correctness.Example
Schema
Unique name of the source. This is a user provided value.
Defines the type of the source, for Dataset sources, it is always
datasetDescription of the source
Name of a goldsky dataset. Please use
goldsky dataset list and select your chain of choice.Please refer to supported chains for an overview of what data is available for individual chains.Version of the goldsky dataset in
dataset_name.earliest processes data from the first block.latest processes data from the latest block at pipeline start time.Defaults to latestFilter expression that does a fast scan on the dataset. Only useful when
start_at is set to earliest.Expression follows the SQL standard for what comes after the WHERE clause. Few examples:Fast Scan
Processing full datasets (starting fromearliest) (aka doing a Backfill) requires the pipeline to process significant amount of data which affects how quickly it reaches at edge (latest record in the dataset). This is especially true for datasets for larger chains.
However, in many use-cases, pipeline may only be interested in a small-subset of the historical data. In such cases, you can enable Fast Scan on your pipeline by defining the filter attribute in the dataset source.
The filter is pre-applied at the source level; making the initial ingestion of historical data much faster. When defining a filter please be sure to use attributes that exist in the dataset. You can get the schema of the dataset by running goldsky dataset get <dataset_name>.
See example below where we pre-apply a filter based on contract address:
Transforms
Represents data transformation logic to be applied to either a source and/or transform in the pipeline. Each transform has a unique name to be used as a reference in transforms/sinks.transforms.<key_name> is used as the referenceable name in other transforms and sinks.
SQL
SQL query that transforms or filters the data from asource or another transform.
Example
Schema
Handler
Lets you transform data by sending data to a handler endpoint.Example
Schema
Sinks
Represents destination for source and/or transform data out of the pipeline. Since sinks represent the end of the dataflow in the pipeline, unlike source and transform, it does not need to be referenced elsewhere in the configuration. Most sinks are provided by the user, hence the pipeline needs credentials to be able to write data to a sink. Thus, users need to create a Goldsky Secret and reference it in the sink.PostgreSQL
Lets you sink data to a PostgreSQL table.Example
Schema
Clickhouse
Lets you sink data to a Clickhouse table.Example
Schema
MySQL
Lets you sink data to a MySQL table.Example
Schema
Elastic Search
Lets you sink data to a Elastic Search index.Example
Schema
Open Search
Example
Schema
Kafka
Lets you sink data to a Kafka topic.Example
Schema
File
Example
Schema
DynamoDB
Example
Schema
Webhook
Example
Schema
SQS
Lets you sink data to a AWS SQS topic.Example
Schema
Pipeline runtime attributes
While sources, transforms and sinks define the business logic of your pipeline. There are attributes that change the pipeline execution/runtime. If you need a refresher on the of pipelines make sure to check out About Pipeline, here we’ll just focus on specific attributes. Following are request-level attributes that only controls the behavior of a particular request on the pipeline. These attributes should be passed via arguments to thegoldsky pipeline apply <config_file> <arguments/flags> command.
Defines the desired status for the pipeline which can be one of the three: “ACTIVE”, “INACTIVE”, “PAUSED”. If not provided it will default to the current status of the pipeline.
Defines whether the pipeline should attempt to create a fresh snapshot before this configuration is applied. The pipeline needs to be in a healthy state for snapshot to be created successfully. It defaults to
true.Defines whether the pipeline should be started from the latest available snapshot. This attribute is useful in restarting scenarios.
To restart a pipeline from scratch, use
--use_latest_snapshot false. It defaults to true.Instructs the pipeline to restart. Useful in scenarios where the pipeline needs to be restarted but no configuration change is needed. It defaults to
undefined.Pipeline Runtime Commands
Commands that change the pipeline runtime. Many commands aim to abstract away the above attributes into meaningful actions.Start
There are multiple ways to do this:goldsky pipeline start <name_or_path_to_config_file>goldsky pipeline apply <name_or_path_to_config_file> --status ACTIVE
ACTIVE.
Pause
Pause will attempt to take a snapshot and stop the pipeline so that it can be resumed later. There are multiple ways to do this:goldsky pipeline pause <name_or_path_to_config_file>goldsky pipeline apply <name_or_path_to_config_file> --status PAUSED
Stop
Stopping a pipeline does not attempt to take a snapshot. There are multiple ways to do this:goldsky pipeline stop <pipeline_name(if exists) or path_to_config>goldsky pipeline apply <path_to_config> --status INACTIVE --from-snapshot nonegoldsky pipeline apply <path_to_config> --status INACTIVE --save-progress false(prior to CLI version11.0.0)
Update
Make any needed changes to the pipeline configuration file and rungoldsky pipeline apply <name_or_path_to_config_file>.
By default any update on a RUNNING pipeline will attempt to take a snapshot before applying the update.
If you’d like to avoid taking snapshot as part of the update, run:
goldsky pipeline apply <name_or_path_to_config_file> --from-snapshot lastgoldsky pipeline apply <name_or_path_to_config_file> --save-progress false(prior to CLI version11.0.0)
Resize
Useful in scenarios where the pipeline is running into resource constraints. There are multiple ways to do this:goldsky pipeline resize <resource_size>goldsky pipeline apply <name_or_path_to_config_file>with the config file having the attribute:
Restart
Useful in the scenarios where a restart is needed but there are no changes in the configuration. For example, pipeline sink’s database connection got stuck because the database has restarted. There are multiple ways to restart a RUNNING pipeline without any configuration changes:goldsky pipeline restart <path_to_config_or_name> --from-snapshot last|none
--from-snapshot none option.
To restart with last available snapshot, provide the --from-snapshot last option.
goldsky pipeline apply <path_to_configuration> --restart(CLI version below 10.0.0)
--from-snapshot none or --save-progress false --use-latest-snapshot false if you are using CLI version older than 11.0.0.
Monitor
Provides pipeline runtime information that is helpful for monitoring/developing a pipeline. Although this command does not change the runtime, it provides info like status, metrics, logs etc. that helps with devleloping a pipeline.goldsky pipeline monitor <name_or_path_to_config_file>