Overview
The HTTP Handler transform allows you to enrich streaming data by calling external HTTP endpoints. This is useful for:- Enriching blockchain data with off-chain information
- Calling ML models for predictions or classifications
- Integrating with third-party APIs for additional context
- Custom business logic hosted in external services
Configuration
Parameters
Must be
handlerThe source or transform to read data from
The HTTP endpoint URL to call. Must be a fully-qualified URL (e.g.,
https://api.example.com/enrich)The column that uniquely identifies each row
true: Send each row individually as a single JSON objectfalse(default): Send multiple rows as a JSON array (batched)
Request Format
Single Row Mode (one_row_per_request: true)
When enabled, each row is sent as an individual HTTP POST request with JSON body:
- Your API doesn’t support batch processing
- Each request requires significant processing time
- You need real-time, row-by-row processing
Batch Mode (one_row_per_request: false)
When disabled, multiple rows are sent as a JSON array:
- Your API supports batch processing
- You want to reduce network overhead
- Higher throughput is needed
Response Format
Your HTTP endpoint must return JSON with the same structure as the input, plus any additional fields you want to add.Single Row Response
Batch Response
The response must include all original fields plus any new enriched fields. Missing original fields will cause errors.
Example: Enrich Transfers with Wallet Labels
Example API Implementation
Here’s a simple example of an HTTP endpoint that enriches wallet data:Example: ML Model Integration
Call a machine learning model to classify transactions:Error Handling and Retries
The HTTP handler includes built-in retry logic:- Transient errors (network timeouts, 5xx responses): Retried with exponential backoff
- Permanent errors (4xx responses, invalid JSON): Pipeline fails after max retries
- Timeout: Configurable timeout per request (default: 30 seconds)
Performance Considerations
Latency Impact
Latency Impact
HTTP handlers add latency to your pipeline:
- Each request takes at least the network round-trip time
- Plus your endpoint’s processing time
- Use batching (
one_row_per_request: false) to reduce overhead - Consider caching frequently requested data in your API
Throughput
Throughput
To maximize throughput:
- Use batch mode when possible (10-100 rows per batch works well)
- Ensure your API can handle concurrent requests
- Scale your API horizontally if it becomes a bottleneck
- Monitor API response times in your pipeline logs
Backpressure
Backpressure
If your HTTP endpoint is slow:
- The entire pipeline will slow down to match
- This prevents data loss and memory overflow
- Scale your API or optimize its response time
- Monitor logs for HTTP handler performance metrics
Security Best Practices
1
Use HTTPS
Always use HTTPS endpoints to encrypt data in transit:
2
Implement Authentication
Use API keys or tokens in your endpoint:
3
Validate Input
Always validate incoming data in your endpoint:
4
Rate Limiting
Implement rate limiting to prevent abuse:
Limitations
Debugging
View logs to debug HTTP handler issues:- HTTP status codes (200 = success, 4xx/5xx = errors)
- Response times
- Retry attempts
- Error messages from your endpoint
- “Connection refused”: Your endpoint is not reachable
- “Timeout”: Your endpoint is too slow, optimize or increase timeout
- “Schema mismatch”: Response doesn’t include all original fields
- “Invalid JSON”: Your endpoint returned malformed JSON
Best Practices
1. Filter before enriching
1. Filter before enriching
Only send rows that need enrichment to reduce API calls:
2. Use batching when possible
2. Use batching when possible
Batch mode reduces network overhead:
3. Keep endpoints fast
3. Keep endpoints fast
Aim for under 100ms response times:
- Cache frequently accessed data
- Use database indexes
- Optimize expensive computations
- Consider async processing for slow operations
4. Monitor your API
4. Monitor your API
Track metrics like:
- Request rate
- Response times (p50, p95, p99)
- Error rates
- Resource usage (CPU, memory)
5. Handle failures gracefully
5. Handle failures gracefully
Make your endpoint resilient:
- Return partial results on partial failures
- Log errors for debugging
- Implement circuit breakers for downstream dependencies
- Provide fallback values when enrichment fails