Schema details for pipeline configurations
sources.<key_name>
is used as the referenceable name in other transforms and sinks.subgraph_entity
per subgraph entity that you want to use.subgraph_entity
.name
in your subgraph.earliest
processes data from the first block.latest
processes data from the latest block at pipeline start time.Defaults to latest
start_at
is set to earliest
.Expression follows the SQL standard for what comes after the WHERE clause. Few examples:name
attribute.dataset
goldsky dataset list
and select your chain of choice.Please refer to supported chains for an overview of what data is available for individual chains.dataset_name
.earliest
processes data from the first block.latest
processes data from the latest block at pipeline start time.Defaults to latest
earliest
) (aka doing a Backfill) requires the pipeline to process significant amount of data which affects how quickly it reaches at edge (latest record in the dataset). This is especially true for datasets for larger chains.
However, in many use-cases, pipeline may only be interested in a small-subset of the historical data. In such cases, you can enable Fast Scan on your pipeline by defining the filter
attribute in the dataset
source.
The filter is pre-applied at the source level; making the initial ingestion of historical data much faster. When defining a filter
please be sure to use attributes that exist in the dataset. You can get the schema of the dataset by running goldsky dataset get <dataset_name>
.
See example below where we pre-apply a filter based on contract address:
transforms.<key_name>
is used as the referenceable name in other transforms and sinks.source
or another transform
.
sql
FROM <table_name>
part of the query. Any source or transform can be referenced as SQL table.handler
source|transform
referenced in the from
attribute.A map of column names to Flink SQL datatypes. If the handler response schema changes the pipeline needs to be re-deployed with this attribute updated.To add a new attribute: new_attribute_name: datatype
To remove an existing attribute: existing_attribute_name: null
To change an existing attribute’s datatype: existing_attribute_name: datatype
Complete list of supported datatypes
Data Type | Notes |
---|---|
STRING | |
BOOLEAN | |
BYTE | |
DECIMAL | Supports fixed precision and scale. |
SMALLINT | |
INTEGER | |
BIGINT | |
FLOAT | |
DOUBLE | |
TIME | Supports only a precision of 0. |
TIMESTAMP | |
TIMESTAMP_LTZ | |
ARRAY | |
ROW |
httpauth
secret type.postgresql
, dynamodb
etc. Or channels such as kafka
, sqs
etc.
Also, most sinks are provided by the user, hence the pipeline needs credentials to be able to write data to a sink. Thus, users need to create a Goldsky Secret and reference it in the sink.
postgressql
jdbc
secret type.100
true
true
clickhouse
jdbc
secret type.1000
true
.append_only_mode = true
.column_name -> clickhouse_datatype
Useful in situations when data type is incompatible between the pipeline and Clickhouse. Or when wanting to use specific type for a column.postgressql
jdbc
secret type.100
true
true
elasticsearch
elasticSearch
secret type.elasticsearch
elasticSearch
secret type.kafka
true
, the sink will emit tombstone messages (null values) for DELETE operations instead of the actual payload. This is useful for maintaining the state in Kafka topics where the latest state of a key is required, and older states should be logically deleted. Default false
json
, avro
. Requires Schema Registry credentials in the secret for avro
type.kafka
secret type.file
s3://
. Currently, only S3
is supported.parquet
, csv
.false
"col1,col2"
128MB
30min
clickhouse
dynamodb
secret type.50
25
10000
false
webhook
headers
attribute.For webhook sink, use the httpauth
secret type.postgressql
headers
attribute.For sqs sink, use the sqs
secret type.false
goldsky pipeline apply <config_file> <arguments/flags>
command.
true
.--use_latest_snapshot false
. It defaults to true
.undefined
.goldsky pipeline start <name_or_path_to_config_file>
goldsky pipeline apply <name_or_path_to_config_file> --status ACTIVE
ACTIVE
.
goldsky pipeline pause <name_or_path_to_config_file>
goldsky pipeline apply <name_or_path_to_config_file> --status PAUSED
goldsky pipeline stop <pipeline_name(if exists) or path_to_config>
goldsky pipeline apply <path_to_config> --status INACTIVE --from-snapshot none
goldsky pipeline apply <path_to_config> --status INACTIVE --save-progress false
(prior to CLI version 11.0.0
)goldsky pipeline apply <name_or_path_to_config_file>
.
By default any update on a RUNNING
pipeline will attempt to take a snapshot before applying the update.
If you’d like to avoid taking snapshot as part of the update, run:
goldsky pipeline apply <name_or_path_to_config_file> --from-snapshot last
goldsky pipeline apply <name_or_path_to_config_file> --save-progress false
(prior to CLI version 11.0.0
)goldsky pipeline resize <resource_size>
goldsky pipeline apply <name_or_path_to_config_file>
with the config file having the attribute:goldsky pipeline restart <path_to_config_or_name> --from-snapshot last|none
--from-snapshot none
option.
To restart with last available snapshot, provide the --from-snapshot last
option.
goldsky pipeline apply <path_to_configuration> --restart
(CLI version below 10.0.0)--from-snapshot none
or --save-progress false --use-latest-snapshot false
if you are using CLI version older than 11.0.0
.
goldsky pipeline monitor <name_or_path_to_config_file>