Transforms sit between sources and sinks in your pipeline, processing data as it flows through. They allow you to:Documentation Index
Fetch the complete documentation index at: https://docs.goldsky.com/llms.txt
Use this file to discover all available pages before exploring further.
- Filter and project data with SQL
- Enrich data by calling external HTTP APIs
- Execute custom JavaScript/TypeScript logic with WebAssembly
- Create dynamic lookup tables
Transform Types
SQL Transform
Use familiar SQL for filtering, projections, and column transformations
Dynamic Tables
Create updatable lookup tables for filtering
HTTP Handlers
Call external APIs to enrich your data
WebAssembly Scripts
Execute custom JavaScript/TypeScript code
Transform Chaining
You can chain multiple transforms together, with each transform receiving the output of the previous one:Referencing upstream data
How a transform declares its input depends on the transform type:sqltransforms reference the upstream source or transform by name in the SQLFROMclause. They do not accept a top-levelfromfield.handlerandscripttransforms use a top-levelfromfield to name the upstream source or transform.dynamic_tabletransforms are populated either by an inlinesqlquery that reads from the pipeline, or externally via writes to the backing table; they do not take afromfield.
Primary Keys
sql, handler, and script transforms require a primary_key field that names the column uniquely identifying each row. (dynamic_table does not take primary_key — it uses column instead to name its key column.)
- Upsert operations in sinks
- Deduplication
- Ordering guarantees
Transform Naming
Like sources, transforms are referenced by the name you give them:filtered_transfers, enriched_events).
Performance Considerations
SQL Transforms
SQL Transforms
- SQL transforms are highly optimized using Apache DataFusion
- Projections (selecting specific columns) are very efficient
- Filters are pushed down to reduce data movement
- Note: Joins, aggregations, and window functions are not supported in streaming mode. Use dynamic tables for lookup-style joins.
HTTP Handlers
HTTP Handlers
- HTTP handlers add latency due to external API calls
- Send multiple rows per request (
one_row_per_request: false) when the endpoint supports it, to reduce per-row overhead - Consider caching frequently accessed data on the handler side
- Set appropriate timeouts on the external service
WebAssembly Scripts
WebAssembly Scripts
- WASM transforms execute in a sandboxed environment
- TypeScript is transpiled to JavaScript at runtime
- Keep scripts simple for best performance
- Complex calculations are fine, but avoid heavy I/O
Data Flow
Understanding how data flows through transforms:- Efficient memory usage
- Fast serialization/deserialization
- Vectorized processing
Special Column: _gs_op
All data includes a special _gs_op column that tracks the operation type:
i- Insert (new record)u- Update (modified record)d- Delete (removed record)
The
_gs_op column is automatically maintained by Turbo Pipelines and should
be preserved in your transforms if you need upsert semantics in your sink.Best Practices
Start with SQL
Use SQL transforms for filtering and basic transformations whenever possible - they’re the most performant.
Keep transforms focused
Each transform should do one thing well. Chain multiple simple transforms
rather than creating one complex transform.
Validate before deploying
Run
goldsky turbo validate <file> to check your pipeline config before deploying.