What are Transforms?
Transforms sit between sources and sinks in your pipeline, processing data as it flows through. They allow you to:- Filter and project data with SQL
- Enrich data by calling external HTTP APIs
- Execute custom JavaScript/TypeScript logic with WebAssembly
- Create dynamic lookup tables
Transform Types
SQL Transform
Use familiar SQL for filtering, projections, and column transformations
Dynamic Tables
Create updatable lookup tables for filtering
HTTP Handlers
Call external APIs to enrich your data
WebAssembly Scripts
Execute custom JavaScript/TypeScript code
Transform Chaining
You can chain multiple transforms together, with each transform receiving the output of the previous one:The from Field
By default, a transform receives data from the source. Use the from field to receive data from another transform:
Primary Keys
Most transforms require aprimary_key field that identifies the unique identifier for each row:
- Upsert operations in sinks
- Deduplication
- Ordering guarantees
Transform Naming
Like sources, transforms are referenced by the name you give them:filtered_transfers, enriched_events).
Performance Considerations
SQL Transforms
SQL Transforms
- SQL transforms are highly optimized using Apache DataFusion
- Projections (selecting specific columns) are very efficient
- Filters are pushed down to reduce data movement
- Note: Joins and aggregations are currently disabled in streaming mode
HTTP Handlers
HTTP Handlers
- HTTP handlers add latency due to external API calls
- Use batching when possible to reduce API calls
- Consider caching frequently accessed data
- Set appropriate timeouts for the external service
WebAssembly Scripts
WebAssembly Scripts
- WASM transforms execute in a sandboxed environment
- TypeScript is transpiled to JavaScript at runtime
- Keep scripts simple for best performance
- Complex calculations are fine, but avoid heavy I/O
Data Flow
Understanding how data flows through transforms:- Efficient memory usage
- Fast serialization/deserialization
- Vectorized processing
Special Column: _gs_op
All data includes a special _gs_op column that tracks the operation type:
i- Insert (new record)u- Update (modified record)d- Delete (removed record)
The
_gs_op column is automatically maintained by Turbo Pipelines and should be preserved in your transforms if you need upsert semantics in your sink.Best Practices
1
Start with SQL
Use SQL transforms for filtering and basic transformations whenever possible - they’re the most performant.
2
Keep transforms focused
Each transform should do one thing well. Chain multiple simple transforms rather than creating one complex transform.
3
Test locally
Use the
validate command to test your transform logic before deploying.4
Monitor performance
Use logs and metrics to identify slow transforms and optimize accordingly.