The sub-second realtime and reorg-aware advantages of mirror are greatly
diminished when using our S3 connector due to the constraints of file-based
storage. If possible, it’s highly recommended to use one of the other channels
or sinks instead!
The files are created in Parquet format.
Files will be emitted on an interval, essentially mimicing a mini-batch system.
Data will also be append-only, so if there is a reorg, data with the same id will be emitted. It’s up to the downstream consumers of this data to deduplicate the data.
Full configuration details for this sink is available in the reference page.
Secrets
Create an AWS S3 secret with the following CLI command:
goldsky secret create --name AN_AWS_S3_SECRET --value '{
"accessKeyId": "Type.String()",
"secretAccessKey": "Type.String()",
"region": "Type.String()",
"type": "s3"
}'
Partitioning
This sink supports folder-based partitioning through the partition_columns
option.
In this example, it will store files in a different file for each day, based on the block_timestamp
of each Base transfer.
s3://test-bucket/base/transfers/erc20/<yyyy-MM-dd>
name: example-partition
apiVersion: 3
sources:
base.transfers:
dataset_name: base.erc20_transfers
version: 1.2.0
type: dataset
start_at: latest
transforms:
transform_transactions:
type: sql
primary_key: id
sql: |-
select *, from_unixtime(block_timestamp/1000, 'yyyy-MM-dd') as dt
from base.transfers
sinks:
filesink_transform_transactions:
secret_name: S3_SECRET
path: s3://test-bucket/base/transfers/erc20/
type: file
format: parquet
partition_columns: dt
from: transform_transactions