The sub-second realtime and reorg-aware advantages of mirror are greatly diminished when using our S3 connector due to the constraints of file-based storage. If possible, it’s highly recommended to use one of the other sinks instead!

The files are created in Parquet format.

Files will be emitted on an interval, essentially mimicing a mini-batch system.

Data will also be append-only, so if there is a reorg, data with the same id will be emitted. It’s up to the downstream consumers of this data to deduplicate the data.

Pipeline example

sinks:
  - referenceName: Type.String()
    type: file
    sourceStreamName: Type.String()
    description: Type.Optional(Type.String())
    sourceStreamName: Type.String()
    secretName: Type.String()
    path: Type.String()
    partitionColumns: Type.Optional(Type.String())

Secrets

Create an AWS S3 secret with the following CLI command:

goldsky secret create AN_AWS_S3_SECRET --value '{
  "accessKeyId": "Type.String()",
  "secretAccessKey": "Type.String()",
  "region": "Type.String()"
}'