Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.goldsky.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

By default, pipelines run as long-running deployments that continuously process data. Job mode allows you to run a pipeline as a one-time task that runs to completion and then exits.
Job mode requires sources that can signal completion. Solana datasets always support this, as do EVM datasets configured for fast scan (a filter: expression with start_at: earliest or omitted). Plain EVM datasets without fast scan cannot be used with job: true — see limitations below.

When to use job mode

Use job mode for:
  • Historical Solana backfills: Process a specific range of Solana blocks using end_block — the pipeline self-terminates when the range completes.
  • Historical EVM backfills: Backfill filtered EVM data using fast scan. Include an upper bound on block_number inside the filter: expression (e.g., ... AND block_number <= 20000000) and the pipeline self-terminates once the bounded scan is complete. end_block is not supported on EVM sources.
  • One-time data migrations: Move data from one system to another.
  • Testing and development: Quick runs without maintaining a long-running deployment.

Configuration

Add job: true to your pipeline configuration along with end_block on your Solana source:
name: solana-token-backfill
resource_size: m
job: true

sources:
  token_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_block: 250000000
    end_block: 250000002

transforms:
  filtered_transfers:
    type: sql
    primary_key: id
    sql: |
      SELECT *
      FROM token_transfers

sinks:
  postgres_output:
    type: postgres
    from: filtered_transfers
    schema: public
    table: token_transfers
    secret_name: MY_POSTGRES
    primary_key: id

How termination works

Job mode requires every source in the pipeline to be bounded — to have a finite end so the pipeline knows when it’s done. The engine rejects job: true at deploy time otherwise with an error naming the offending sources. Two ways to produce a bounded source:
  • Solana datasets are bounded by start_block + end_block (or block_ranges for multiple windows).
  • EVM datasets with fast scan — a filter: expression with start_at: earliest (or omitted) — are bounded by including an upper limit on block_number inside the filter: (e.g., ... AND block_number <= 20000000). The top-level end_block field is not supported on EVM dataset sources; it is silently ignored.
For a bounded source:
  1. The source processes blocks up to and including the upper bound (Solana end_block or the block_number limit in an EVM filter:).
  2. The engine waits for the checkpoint covering that bound to finalize, ensuring data is fully persisted to sinks.
  3. The pipeline process exits cleanly.
Setting job: true deploys the pipeline as a Kubernetes Job instead of a Deployment, which means:
  • No automatic restarts on failure (backoff_limit: 0).
  • Auto-cleanup 1 hour after the process exits (success or failure).

Job behavior

When job: true is set:
  1. No restarts: Failed jobs do not automatically restart (unlike deployments)
  2. Auto-cleanup: Jobs are automatically deleted 1 hour after termination (success or failure)
  3. No restart command: goldsky turbo restart is not supported for jobs — delete and re-apply instead
  4. Cannot switch modes in place: A pipeline name deployed as a job cannot be redeployed as a deployment (or vice versa) — you must delete it first
Switching between job and deployment mode requires deleting the existing pipeline first. If you try to deploy a pipeline with job: false over an existing job (or vice versa), you’ll receive a conflict error. Run goldsky turbo delete <pipeline-name> before redeploying with the new mode.

Example: Solana block range processing

name: solana-backfill
resource_size: l
job: true

sources:
  solana_txs:
    type: dataset
    dataset_name: solana.transactions
    version: 1.0.0
    start_block: 312000000
    end_block: 312100000

transforms:
  processed_txs:
    type: sql
    primary_key: id
    sql: |
      SELECT *
      FROM solana_txs

sinks:
  clickhouse_archive:
    type: clickhouse
    from: processed_txs
    table: solana_historical_txs
    primary_key: id
    secret_name: MY_CLICKHOUSE
To backfill multiple disjoint Solana slot windows in a single job, use block_ranges instead of start_block/end_block. It accepts a JSON array of [start, end] pairs and terminates cleanly once every range has been processed.

Example: EVM fast-scan backfill

name: base-usdc-backfill
resource_size: m
job: true

sources:
  base_usdc_logs:
    type: dataset
    dataset_name: base.raw_logs
    version: 1.0.0
    start_at: earliest
    filter: address = '0x833589fcd6edb6e08f4c7c32d4f71b54bda02913' AND block_number <= 20000000

transforms:
  parsed_transfers:
    type: sql
    primary_key: id
    sql: |
      SELECT *
      FROM base_usdc_logs

sinks:
  postgres_output:
    type: postgres
    from: parsed_transfers
    schema: public
    table: base_usdc_transfers_backfill
    primary_key: id
    secret_name: MY_POSTGRES
The filter + start_at: earliest combination enables fast scan, which bounds the source to a backfill over the Goldsky data lake. The block_number <= 20000000 clause inside the filter makes the backfill finite, so the pipeline terminates cleanly once block 20,000,000 is processed.
EVM dataset sources do not support the top-level end_block field — it is silently ignored. Always express the upper block bound inside the filter: expression. end_block works for Solana sources only.

Limitations

Job mode requires every source in the pipeline to be bounded. In practice that means:
  • Plain EVM datasets cannot use job mode. Without a filter:, an EVM dataset is a continuous stream from the chain tip with no end. Applying it with job: true is rejected at deploy time with a validation error.
  • Not every chain supports fast scan. Even with a filter:, EVM job mode only works on chains where the underlying datasets support fast scan. Check the Fast Scan column in supported chains — chains marked , or specific datasets marked with an asterisk *, do not support fast scan and cannot be used with job: true.
  • SQL-level filtering does not bound a source. WHERE block_number BETWEEN ... in a transform filters rows but does not cause the source to stop — the pipeline keeps consuming data indefinitely. To terminate, the bound must be on the source itself: a block_number clause inside an EVM filter: (fast scan), or end_block on a Solana source.
  • end_block is Solana-only. The top-level end_block field on EVM dataset sources is silently ignored. Express the upper bound inside the filter: expression.
To run a bounded EVM job, configure fast scan on the source and include a block_number upper bound in the filter:
sources:
  filtered_base_logs:
    type: dataset
    dataset_name: base.raw_logs
    version: 1.0.0
    start_at: earliest
    filter: address = '0x21552aeb494579c772a601f655e9b3c514fda960' AND block_number <= 25000000
If your use case can’t be expressed with fast scan (e.g., you need the full, unfiltered dataset), run the pipeline as a streaming deployment and delete it once the data you need is ingested:
goldsky turbo delete <pipeline-name>

Monitoring job status

View job status with:
# Check job status (shows JOB_MODE column)
goldsky turbo list

# View job logs (only available while the job is running or after it has Succeeded)
goldsky turbo logs <pipeline-name>
--follow is not useful for job logs — for a completed job the stream returns immediately, and for a failed job logs are unavailable (status is Failed, which rejects log requests). Logs are also deleted when the job is auto-cleaned up 1 hour after termination, so retrieve any logs you need before that window closes.

Job vs deployment

FeatureJob (job: true)Deployment (job: false, default)
DurationRuns to completion (bounded sources)Continuous processing
Restart on failureNoYes (automatic)
Resource cleanupAuto-deleted 1 hour after terminationPersists until deleted
goldsky turbo restartNot supportedSupported
Use caseSolana backfills; EVM fast-scan backfillsReal-time streaming
Switch to the other modeMust delete firstMust delete first

Best practices

For Solana sources, always define start_block and end_block:
sources:
  solana_data:
    type: dataset
    dataset_name: solana.transactions
    version: 1.0.0
    start_block: 250000000
    end_block: 251000000
For EVM sources, use fast scan (filter + start_at: earliest) and put the upper block_number bound inside the filter: expression. end_block is not supported on EVM dataset sources — it is silently ignored.
sources:
  evm_data:
    type: dataset
    dataset_name: base.raw_logs
    version: 1.0.0
    start_at: earliest
    filter: address = '0x21552aeb494579c772a601f655e9b3c514fda960' AND block_number <= 25000000
For large backfills, use larger resource sizes to process data faster:
resource_size: l  # Use large for big historical jobs
Jobs auto-delete 1 hour after termination — fetch logs before the cleanup window closes:
goldsky turbo logs <pipeline-name>
Always delete completed jobs before redeploying:
goldsky turbo delete <pipeline-name>
goldsky turbo apply -f <pipeline-file>