> ## Documentation Index > Fetch the complete documentation index at: https://docs.goldsky.com/llms.txt > Use this file to discover all available pages before exploring further. # Overview > Process and transform streaming data with SQL, HTTP handlers, and Typescript Transforms sit between sources and sinks in your pipeline, processing data as it flows through. They allow you to: * Filter and project data with SQL * Enrich data by calling external HTTP APIs * Execute custom JavaScript/TypeScript logic with WebAssembly * Create dynamic lookup tables * Throttle stream throughput to a fixed rate ## Transform Types Use familiar SQL for filtering, projections, and column transformations Create updatable lookup tables for filtering Call external APIs to enrich your data Execute custom JavaScript/TypeScript code Cap stream throughput to a fixed records-per-second rate ## Transform Chaining You can chain multiple transforms together, with each transform receiving the output of the previous one: ```yaml theme={null} sources: raw_events: type: dataset dataset_name: ethereum.logs transforms: # First transform: filter to a specific contract filtered_events: type: sql primary_key: log_index sql: | SELECT * FROM raw_events WHERE address = lower('0x...') # Second transform: enrich with external data enriched_events: type: handler from: filtered_events url: https://api.example.com/enrich primary_key: log_index # Third transform: final formatting final_events: type: sql primary_key: log_index sql: | SELECT transaction_hash, enriched_data, block_timestamp FROM enriched_events sinks: postgres_sink: type: postgres from: final_events # ... ``` ## Referencing upstream data How a transform declares its input depends on the transform type: * **`sql` transforms** reference the upstream source or transform by name in the SQL `FROM` clause. They do not accept a top-level `from` field. * **`handler` and `script` transforms** use a top-level `from` field to name the upstream source or transform. * **`dynamic_table` transforms** are populated either by an inline `sql` query that reads from the pipeline, or externally via writes to the backing table; they do not take a `from` field. ```yaml theme={null} transforms: # SQL: upstream is named in the SQL FROM clause filtered: type: sql primary_key: id sql: SELECT * FROM my_source # SQL chained on top of another transform — again, referenced in FROM projected: type: sql primary_key: id sql: SELECT id, data FROM filtered # Handler / script: upstream is named in the top-level `from` field enriched: type: handler from: projected url: https://api.example.com/enrich primary_key: id ``` ## Primary Keys `sql`, `handler`, and `script` transforms require a `primary_key` field that names the column uniquely identifying each row. (`dynamic_table` does not take `primary_key` — it uses `column` instead to name its key column.) ```yaml theme={null} transforms: my_transform: type: sql primary_key: id # or transaction_hash, log_index, etc. sql: SELECT id, data FROM source ``` The primary key is used for: * Upsert operations in sinks * Deduplication * Ordering guarantees ## Transform Naming Like sources, transforms are referenced by the name you give them: ```yaml theme={null} transforms: step_1: type: sql primary_key: id sql: SELECT * FROM source step_2: type: sql primary_key: id sql: SELECT * FROM step_1 # Reference the upstream transform by name ``` Choose descriptive names that indicate what the transform does (e.g., `filtered_transfers`, `enriched_events`). ## Performance Considerations * SQL transforms are highly optimized using Apache DataFusion * Projections (selecting specific columns) are very efficient * Filters are pushed down to reduce data movement * **Note**: Joins, aggregations, and window functions are not supported in streaming mode. Use [dynamic tables](/turbo-pipelines/transforms/dynamic-tables) for lookup-style joins. * HTTP handlers add latency due to external API calls * Send multiple rows per request (`one_row_per_request: false`) when the endpoint supports it, to reduce per-row overhead * Consider caching frequently accessed data on the handler side * Set appropriate timeouts on the external service * WASM transforms execute in a sandboxed environment * TypeScript is transpiled to JavaScript at runtime * Keep scripts simple for best performance * Complex calculations are fine, but avoid heavy I/O ## Data Flow Understanding how data flows through transforms: ``` Source → Transform 1 → Transform 2 → Transform 3 → Sink │ │ │ │ │ └─ RecordBatch ──────────────────────────────────┘ ``` Data is passed between operators as **RecordBatches** (columnar data format), which enables: * Efficient memory usage * Fast serialization/deserialization * Vectorized processing ## Special Column: `_gs_op` All data includes a special `_gs_op` column that tracks the operation type: * `i` - Insert (new record) * `u` - Update (modified record) * `d` - Delete (removed record) You can use this in SQL transforms: ```yaml theme={null} transforms: inserts_only: type: sql primary_key: id sql: | SELECT * FROM source WHERE _gs_op = 'i' ``` The `_gs_op` column is automatically maintained by Turbo Pipelines and should be preserved in your transforms if you need upsert semantics in your sink. ## Best Practices Use SQL transforms for filtering and basic transformations whenever possible - they're the most performant. Each transform should do one thing well. Chain multiple simple transforms rather than creating one complex transform. Run `goldsky turbo validate ` to check your pipeline config before deploying. Use logs and metrics to identify slow transforms and optimize accordingly.