How to run one-time batch jobs and historical data backfills

Overview

By default, pipelines run as long-running deployments that continuously process data. Job mode allows you to run a pipeline as a one-time task that runs to completion and then exits.

When to Use Job Mode

Use job mode for:

Historical data backfills: Process a specific range of historical data once
One-time data migrations: Move data from one system to another
Scheduled batch processing: Run as a cron job for periodic processing
Testing and development: Quick runs without maintaining a long-running deployment

Configuration

Add job: true to your pipeline configuration:

name: solana-token-backfill
resource_size: m
job: true # Run as a one-time job

sources:
  token_transfers:
    type: dataset
    dataset_name: solana.token_transfers
    version: 1.0.0
    start_at: earliest # Process all historical data

transforms:
  filtered_transfers:
    type: sql
    primary_key: id
    sql: |
      SELECT *
      FROM token_transfers
      WHERE slot BETWEEN 250000000 AND 250000002

sinks:
  postgres_output:
    type: postgres
    from: filtered_transfers
    schema: public
    table: token_transfers
    secret_name: MY_POSTGRES
    primary_key: id

Job Behavior

When job: true is set:

Runs to Completion: The pipeline processes all available data and then exits
No Restarts: Failed jobs do not automatically restart (unlike deployments)
Auto-Cleanup: Jobs are automatically deleted 1 hour after termination (success or failure)
Cannot Switch: Once deployed as a job, you cannot update to a deployment without deleting it first

Job mode pipelines must be deleted before redeploying. If you try to deploy a pipeline that exists as a job, you’ll receive a conflict error. Use goldsky turbo delete <pipeline-name> first.

Example: EVM Historical Backfill

Process a specific range of Ethereum blocks:

name: ethereum-backfill
resource_size: l
job: true

sources:
  ethereum_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: earliest

transforms:
  filtered_transfers:
    type: sql
    primary_key: id
    sql: |
      SELECT *
      FROM ethereum_transfers
      WHERE block_number BETWEEN 15000000 AND 16000000

sinks:
  postgres_output:
    type: postgres
    from: filtered_transfers
    schema: public
    table: historical_transfers
    secret_name: MY_POSTGRES
    primary_key: id

Example: Solana Block Range Processing

Process a specific range of Solana blocks:

name: solana-backfill
resource_size: l
job: true

sources:
  solana_txs:
    type: dataset
    dataset_name: solana.transactions
    version: 1.0.0
    start_block: 312000000
    end_block: 312100000 # Job will stop after processing this block

transforms:
  processed_txs:
    type: sql
    primary_key: id
    sql: |
      SELECT *
      FROM solana_txs

sinks:
  clickhouse_archive:
    type: clickhouse
    from: processed_txs
    table: solana_historical_txs
    primary_key: id
    secret_name: MY_CLICKHOUSE

Monitoring Job Status

View job status with:

# Check job status
goldsky turbo list

# View job logs
goldsky turbo logs <pipeline-name>

Job vs Deployment

Feature	Job (`job: true`)	Deployment (`job: false`, default)
Duration	Runs to completion	Continuous processing
Restart on failure	No	Yes (automatic)
Resource cleanup	Auto-deleted 1 hour after termination	Persists until deleted
Use case	One-time tasks, backfills	Real-time streaming
Update behavior	Must delete first	Can update in-place

Best Practices

Define clear boundaries

When using job mode, define clear start and end points for your data:

transforms:
  bounded_data:
    type: sql
    sql: |
      SELECT * FROM source
      WHERE block_number BETWEEN 15000000 AND 16000000

Use appropriate resource sizes

For large backfills, use larger resource sizes to process data faster:

resource_size: l  # Use large for big historical jobs

Monitor completion

Jobs auto-delete after 1 hour of termination. Monitor logs before cleanup:

goldsky turbo logs <pipeline-name>

Delete before redeployment

Always delete completed jobs before redeploying:

goldsky turbo delete <pipeline-name>
goldsky turbo apply -f <pipeline-file>

Turbo

Reference

Job mode

Overview

When to Use Job Mode

Configuration

Job Behavior

Example: EVM Historical Backfill

Example: Solana Block Range Processing

Monitoring Job Status

Job vs Deployment

Best Practices

Turbo

Reference

​Overview

​When to Use Job Mode

​Configuration

​Job Behavior

​Example: EVM Historical Backfill

​Example: Solana Block Range Processing

​Monitoring Job Status

​Job vs Deployment

​Best Practices

Overview

When to Use Job Mode

Configuration

Job Behavior

Example: EVM Historical Backfill

Example: Solana Block Range Processing

Monitoring Job Status

Job vs Deployment

Best Practices