How to transform blockchain data with SQL, TypeScript, and HTTP APIs

Transforms sit between sources and sinks in your pipeline, processing data as it flows through. They allow you to:

Filter and project data with SQL
Enrich data by calling external HTTP APIs
Execute custom JavaScript/TypeScript logic with WebAssembly
Create dynamic lookup tables

Transform Types

SQL Transform

Use familiar SQL for filtering, projections, and column transformations

Dynamic Tables

Create updatable lookup tables for filtering

HTTP Handlers

Call external APIs to enrich your data

WebAssembly Scripts

Execute custom JavaScript/TypeScript code

Transform Chaining

You can chain multiple transforms together, with each transform receiving the output of the previous one:

sources:
  raw_events:
    type: dataset
    dataset_name: ethereum.logs
    version: 1.0.0

transforms:
  # First transform: filter to specific contract
  filtered_events:
    type: sql
    sql: |
      SELECT * FROM raw_events
      WHERE address = lower('0x...')

  # Second transform: enrich with external data
  enriched_events:
    type: handler
    from: filtered_events
    url: https://api.example.com/enrich
    primary_key: log_index

  # Third transform: final formatting
  final_events:
    type: sql
    from: enriched_events
    sql: |
      SELECT
        transaction_hash,
        enriched_data,
        block_timestamp
      FROM enriched_events

sinks:
  postgres_sink:
    type: postgres
    from: final_events
    # ...

The `from` field

By default, a transform receives data from the source. Use the from field to receive data from another transform:

transforms:
  transform_1:
    type: sql
    sql: SELECT * FROM my_source # Implicitly uses the source

  transform_2:
    type: sql
    from: transform_1 # Explicitly uses transform_1's output
    sql: SELECT * FROM transform_1

Primary Keys

Most transforms require a primary_key field that identifies the unique identifier for each row:

transforms:
  my_transform:
    type: sql
    primary_key: id # or transaction_hash, log_index, etc.
    sql: SELECT id, data FROM source

The primary key is used for:

Upsert operations in sinks
Deduplication
Ordering guarantees

Transform Naming

Like sources, transforms are referenced by the name you give them:

transforms:
  step_1:
    type: sql
    sql: SELECT * FROM source

  step_2:
    type: sql
    from: step_1 # Reference by name
    sql: SELECT * FROM step_1

Choose descriptive names that indicate what the transform does (e.g., filtered_transfers, enriched_events).

Performance Considerations

SQL Transforms

SQL transforms are highly optimized using Apache DataFusion
Projections (selecting specific columns) are very efficient
Filters are pushed down to reduce data movement
Note: Joins and aggregations are currently disabled in streaming mode

HTTP Handlers

HTTP handlers add latency due to external API calls - Use batching when possible to reduce API calls - Consider caching frequently accessed data - Set appropriate timeouts for the external service

WebAssembly Scripts

WASM transforms execute in a sandboxed environment
TypeScript is transpiled to JavaScript at runtime
Keep scripts simple for best performance
Complex calculations are fine, but avoid heavy I/O

Data Flow

Understanding how data flows through transforms:

Source → Transform 1 → Transform 2 → Transform 3 → Sink
   │          │            │             │          │
   └─ RecordBatch ──────────────────────────────────┘

Data is passed between operators as RecordBatches (columnar data format), which enables:

Efficient memory usage
Fast serialization/deserialization
Vectorized processing

Special Column: `_gs_op`

All data includes a special _gs_op column that tracks the operation type:

i - Insert (new record)
u - Update (modified record)
d - Delete (removed record)

You can use this in SQL transforms:

transforms:
  inserts_only:
    type: sql
    sql: |
      SELECT * FROM source
      WHERE _gs_op = 'i'

The _gs_op column is automatically maintained by Turbo Pipelines and should be preserved in your transforms if you need upsert semantics in your sink.

Best Practices

Start with SQL

Use SQL transforms for filtering and basic transformations whenever possible - they’re the most performant.

Keep transforms focused

Each transform should do one thing well. Chain multiple simple transforms rather than creating one complex transform.

Test locally

Use the validate command to test your transform logic before deploying.

Monitor performance

Use logs and metrics to identify slow transforms and optimize accordingly.

Turbo

Reference

​Transform Types

SQL Transform

Dynamic Tables

HTTP Handlers

WebAssembly Scripts

​Transform Chaining

​The from field

​Primary Keys

​Transform Naming

​Performance Considerations

​Data Flow

​Special Column: _gs_op

​Best Practices

Transform Types

Transform Chaining

The `from` field

Primary Keys

Transform Naming

Performance Considerations

Data Flow

Special Column: `_gs_op`

Best Practices