> ## Documentation Index
> Fetch the complete documentation index at: https://docs.goldsky.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> Process and transform streaming data with SQL, HTTP handlers, and Typescript

Transforms sit between sources and sinks in your pipeline, processing data as it flows through. They allow you to:

* Filter and project data with SQL
* Enrich data by calling external HTTP APIs
* Execute custom JavaScript/TypeScript logic with WebAssembly
* Create dynamic lookup tables
* Throttle stream throughput to a fixed rate

## Transform Types

<CardGroup cols={2}>
  <Card title="SQL Transform" icon="code" href="/turbo-pipelines/transforms/sql">
    Use familiar SQL for filtering, projections, and column transformations
  </Card>

  <Card title="Dynamic Tables" icon="table" href="/turbo-pipelines/transforms/dynamic-tables">
    Create updatable lookup tables for filtering
  </Card>

  <Card title="HTTP Handlers" icon="webhook" href="/turbo-pipelines/transforms/http-handler">
    Call external APIs to enrich your data
  </Card>

  <Card title="WebAssembly Scripts" icon="bolt" href="/turbo-pipelines/transforms/typescript">
    Execute custom JavaScript/TypeScript code
  </Card>

  <Card title="Throttle" icon="gauge" href="/turbo-pipelines/transforms/throttle">
    Cap stream throughput to a fixed records-per-second rate
  </Card>
</CardGroup>

## Transform Chaining

You can chain multiple transforms together, with each transform receiving the output of the previous one:

```yaml theme={null}
sources:
  raw_events:
    type: dataset
    dataset_name: ethereum.logs

transforms:
  # First transform: filter to a specific contract
  filtered_events:
    type: sql
    primary_key: log_index
    sql: |
      SELECT * FROM raw_events
      WHERE address = lower('0x...')

  # Second transform: enrich with external data
  enriched_events:
    type: handler
    from: filtered_events
    url: https://api.example.com/enrich
    primary_key: log_index

  # Third transform: final formatting
  final_events:
    type: sql
    primary_key: log_index
    sql: |
      SELECT
        transaction_hash,
        enriched_data,
        block_timestamp
      FROM enriched_events

sinks:
  postgres_sink:
    type: postgres
    from: final_events
    # ...
```

## Referencing upstream data

How a transform declares its input depends on the transform type:

* **`sql` transforms** reference the upstream source or transform by name in the SQL `FROM` clause. They do not accept a top-level `from` field.
* **`handler` and `script` transforms** use a top-level `from` field to name the upstream source or transform.
* **`dynamic_table` transforms** are populated either by an inline `sql` query that reads from the pipeline, or externally via writes to the backing table; they do not take a `from` field.

```yaml theme={null}
transforms:
  # SQL: upstream is named in the SQL FROM clause
  filtered:
    type: sql
    primary_key: id
    sql: SELECT * FROM my_source

  # SQL chained on top of another transform — again, referenced in FROM
  projected:
    type: sql
    primary_key: id
    sql: SELECT id, data FROM filtered

  # Handler / script: upstream is named in the top-level `from` field
  enriched:
    type: handler
    from: projected
    url: https://api.example.com/enrich
    primary_key: id
```

## Primary Keys

`sql`, `handler`, and `script` transforms require a `primary_key` field that names the column uniquely identifying each row. (`dynamic_table` does not take `primary_key` — it uses `column` instead to name its key column.)

```yaml theme={null}
transforms:
  my_transform:
    type: sql
    primary_key: id # or transaction_hash, log_index, etc.
    sql: SELECT id, data FROM source
```

The primary key is used for:

* Upsert operations in sinks
* Deduplication
* Ordering guarantees

## Transform Naming

Like sources, transforms are referenced by the name you give them:

```yaml theme={null}
transforms:
  step_1:
    type: sql
    primary_key: id
    sql: SELECT * FROM source

  step_2:
    type: sql
    primary_key: id
    sql: SELECT * FROM step_1 # Reference the upstream transform by name
```

Choose descriptive names that indicate what the transform does (e.g., `filtered_transfers`, `enriched_events`).

## Performance Considerations

<AccordionGroup>
  <Accordion title="SQL Transforms">
    * SQL transforms are highly optimized using Apache DataFusion
    * Projections (selecting specific columns) are very efficient
    * Filters are pushed down to reduce data movement
    * **Note**: Joins, aggregations, and window functions are not supported in streaming mode. Use [dynamic tables](/turbo-pipelines/transforms/dynamic-tables) for lookup-style joins.
  </Accordion>

  <Accordion title="HTTP Handlers">
    * HTTP handlers add latency due to external API calls
    * Send multiple rows per request (`one_row_per_request: false`) when the endpoint supports it, to reduce per-row overhead
    * Consider caching frequently accessed data on the handler side
    * Set appropriate timeouts on the external service
  </Accordion>

  <Accordion title="WebAssembly Scripts">
    * WASM transforms execute in a sandboxed environment
    * TypeScript is transpiled to JavaScript at runtime
    * Keep scripts simple for best performance
    * Complex calculations are fine, but avoid heavy I/O
  </Accordion>
</AccordionGroup>

## Data Flow

Understanding how data flows through transforms:

```
Source → Transform 1 → Transform 2 → Transform 3 → Sink
   │          │            │             │          │
   └─ RecordBatch ──────────────────────────────────┘
```

Data is passed between operators as **RecordBatches** (columnar data format), which enables:

* Efficient memory usage
* Fast serialization/deserialization
* Vectorized processing

## Special Column: `_gs_op`

All data includes a special `_gs_op` column that tracks the operation type:

* `i` - Insert (new record)
* `u` - Update (modified record)
* `d` - Delete (removed record)

You can use this in SQL transforms:

```yaml theme={null}
transforms:
  inserts_only:
    type: sql
    primary_key: id
    sql: |
      SELECT * FROM source
      WHERE _gs_op = 'i'
```

<Note>
  The `_gs_op` column is automatically maintained by Turbo Pipelines and should
  be preserved in your transforms if you need upsert semantics in your sink.
</Note>

## Best Practices

<Steps>
  <Step title="Start with SQL">
    Use SQL transforms for filtering and basic transformations whenever possible - they're the most performant.
  </Step>

  <Step title="Keep transforms focused">
    Each transform should do one thing well. Chain multiple simple transforms
    rather than creating one complex transform.
  </Step>

  <Step title="Validate before deploying">
    Run `goldsky turbo validate <file>` to check your pipeline config before deploying.
  </Step>

  <Step title="Monitor performance">
    Use logs and metrics to identify slow transforms and optimize accordingly.
  </Step>
</Steps>
