> ## Documentation Index
> Fetch the complete documentation index at: https://docs.goldsky.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Turbo SQL Functions Reference

> Comprehensive guide to custom SQL functions for blockchain data processing

## Overview

Turbo pipelines SQL transforms are powered by Apache DataFusion and include custom functions designed specifically for blockchain and Web3 data processing. These functions extend standard SQL with capabilities for:

* Decoding EVM logs and Solana instructions
* Working with 256-bit integers (U256/I256) common in smart contracts
* Processing blockchain-specific data formats
* Advanced array and JSON manipulation
* Cryptographic operations

<Note>
  All standard DataFusion SQL functions are also available. This reference
  focuses on Turbo-specific custom functions.
</Note>

## EVM & Ethereum Functions

### evm\_log\_decode

Decodes an EVM event log using an ABI definition.

**Aliases:** `evm_decode_log`, `_gs_log_decode`

**Signature:**

```sql theme={null}
evm_log_decode(abi_json, topics, data) → struct(event_signature, event_params)
```

**Parameters:**

| Parameter  | Type   | Description                                                                                                      |
| ---------- | ------ | ---------------------------------------------------------------------------------------------------------------- |
| `abi_json` | string | The event ABI as a JSON array (not a Solidity event signature). Use `_gs_fetch_abi()` to fetch this dynamically. |
| `topics`   | string | Comma-separated hex topic values (e.g. `"0xddf252ad...,0x00000000...from,0x00000000...to"`)                      |
| `data`     | string | Hex-encoded non-indexed event data                                                                               |

<Tip>
  When filtering by event signature (the first topic), you can also use `topics LIKE '0xSIGNATURE%'` as an alternative to `SPLIT_INDEX`. This only works for matching the first topic — use `SPLIT_INDEX` when you need to access topics at other positions.
</Tip>

**Returns:** A struct with two fields:

* `event_signature` (string) — the event name (e.g. `"Transfer"`)
* `event_params` (array of strings) — decoded parameter values in positional order

<Note>
  Parameters are returned by position, not by name. To access the third parameter, use `decoded.event_params[3]`, not `decoded['paramName']`.
</Note>

**Example: Decode an ERC-20 Transfer event**

```sql theme={null}
SELECT
  evm_decode_log(
    '[{"anonymous":false,"inputs":[{"indexed":true,"name":"from","type":"address"},{"indexed":true,"name":"to","type":"address"},{"indexed":false,"name":"value","type":"uint256"}],"name":"Transfer","type":"event"}]',
    topics,
    data
  ) as decoded,
  decoded.event_signature as event_name,
  decoded.event_params[1] as from_address,
  decoded.event_params[2] as to_address,
  decoded.event_params[3] as value
FROM logs
```

**Example: Using `_gs_fetch_abi` for dynamic ABI lookup**

Instead of hardcoding the ABI JSON, fetch it at runtime from Etherscan:

```sql theme={null}
SELECT
  evm_decode_log(
    _gs_fetch_abi(
      'https://api.etherscan.io/api?module=contract&action=getabi&address=0x1F98431c8aD98523631AE4a59f267346ea31F984&apikey=YOUR_KEY',
      'etherscan'
    ),
    topics,
    data
  ) as decoded
FROM logs
WHERE topics LIKE '0x783cca1c%'  -- PoolCreated event signature
```

<Tip>
  The decoder uses caching internally for performance. The same ABI can be
  reused across millions of records efficiently.
</Tip>

### \_gs\_fetch\_abi

Fetches an ABI JSON from a URL. Results are cached for the lifetime of the pipeline.

**Signature:**

```sql theme={null}
_gs_fetch_abi(url, source_type) → string
```

**Parameters:**

| Parameter     | Type   | Values                                                                                                                   |
| ------------- | ------ | ------------------------------------------------------------------------------------------------------------------------ |
| `url`         | string | Full URL to fetch ABI from                                                                                               |
| `source_type` | string | `'etherscan'` (extracts `result` field from Etherscan API response) or `'raw'` (uses response body directly as ABI JSON) |

**Returns:** ABI JSON string that can be passed to `evm_decode_log` or `_gs_decode_instruction_data`

**Example: Fetch ABI from Etherscan**

```sql theme={null}
SELECT
  evm_decode_log(
    _gs_fetch_abi(
      'https://api.etherscan.io/api?module=contract&action=getabi&address=0x...&apikey=YOUR_KEY',
      'etherscan'
    ),
    topics,
    data
  ) as decoded
FROM logs
```

**Example: Fetch raw ABI from URL**

```sql theme={null}
SELECT
  evm_decode_log(
    _gs_fetch_abi('https://example.com/abi.json', 'raw'),
    topics,
    data
  ) as decoded
FROM logs
```

<Tip>
  The fetched ABI is cached internally for performance. The same URL will only be fetched once, even when processing millions of records.
</Tip>

### \_gs\_keccak256

Compute Keccak256 hash (same as Solidity's `keccak256`).

**Signature:**

```sql theme={null}
_gs_keccak256(input) -> VARCHAR
```

**Parameters:**

* `input` - String to hash (can be regular string or hex with "0x" prefix)

**Returns:** Hex-encoded hash with "0x" prefix

**Example: Compute Event Signatures**

```yaml theme={null}
sql: |
  SELECT
    _gs_keccak256('Transfer(address,address,uint256)') as transfer_signature,
    _gs_keccak256('Approval(address,address,uint256)') as approval_signature
  -- Results:
  -- 0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef
  -- 0x8c5be1e5ebec7d5bd14f71427d1e84f3dd0314c0f7b2291e5b200ac8c7c3b925
```

**Example: Verify Topic Matches**

```yaml theme={null}
sql: |
  SELECT *
  FROM ethereum_logs
  WHERE SPLIT_INDEX(topics, ',', 0) = _gs_keccak256('Transfer(address,address,uint256)')
```

### \_gs\_hex\_to\_byte / \_gs\_byte\_to\_hex

Convert between hex strings and bytes.

**Signatures:**

```sql theme={null}
_gs_hex_to_byte(hex_string) -> BINARY
_gs_byte_to_hex(bytes) -> VARCHAR
```

<Note>
  `_gs_hex_to_byte` accepts input with or without a `0x` prefix. `_gs_byte_to_hex` returns a bare hex string (no `0x` prefix).
</Note>

**Example:**

```yaml theme={null}
sql: |
  SELECT
    _gs_hex_to_byte('0x1234') as bytes,
    _gs_byte_to_hex(data) as hex_string
  FROM ethereum_transactions
```

### to\_checksum\_address

Converts a hex address to its [EIP-55](https://eips.ethereum.org/EIPS/eip-55) checksummed form (mixed-case encoding). Optionally accepts a chain ID for [EIP-1191](https://eips.ethereum.org/EIPS/eip-1191) chain-specific checksums.

**Signature:**

```sql theme={null}
to_checksum_address(address) -> VARCHAR
to_checksum_address(address, chain_id) -> VARCHAR
```

**Parameters:**

| Parameter  | Type              | Description                                                               |
| ---------- | ----------------- | ------------------------------------------------------------------------- |
| `address`  | string            | Hex address with or without `0x` prefix. Must be 40 hex characters.       |
| `chain_id` | bigint (optional) | Chain ID for EIP-1191 chain-specific checksumming. Omit for plain EIP-55. |

**Returns:** A `0x`-prefixed checksummed address, or `NULL` for invalid input.

**Example:**

```yaml theme={null}
sql: |
  SELECT
    to_checksum_address(from_address) as from_checksummed,
    to_checksum_address(to_address) as to_checksummed
  FROM erc20_transfers
```

### reverse\_bytes32

Reverses the byte order of a 32-byte value. Useful for endianness conversion of blockchain hashes and 256-bit integers.

**Signature:**

```sql theme={null}
reverse_bytes32(value) -> BINARY(32)
```

**Parameters:**

| Parameter | Type                  | Description      |
| --------- | --------------------- | ---------------- |
| `value`   | `FixedSizeBinary(32)` | A 32-byte value. |

**Returns:** The same 32 bytes in reversed order. `NULL` is passed through.

**Example:**

```yaml theme={null}
sql: |
  SELECT
    transaction_hash,
    reverse_bytes32(transaction_hash) as le_hash
  FROM ethereum_transactions
```

## Solana Functions

Turbo pipelines include specialized functions for decoding Solana program instructions and analyzing transaction data.

Many of the below are already used in our default `solana.instructions` dataset and other datasets.

### \_gs\_decode\_instruction\_data (IDL Decode)

Decode Solana program instruction or event data using an IDL (Interface Definition Language) specification. This is the most flexible Solana decoding function - it works with **any custom Solana program** that has an IDL.

**Signature:**

```sql theme={null}
_gs_decode_instruction_data(idl_json, data) -> STRUCT<name: VARCHAR, value: VARCHAR>
```

**Parameters:**

* `idl_json` - IDL specification as JSON string (can be fetched with `_gs_fetch_abi()`)
* `data` - Base58-encoded instruction data

**Returns:** Struct containing:

* `name` - Instruction or event name (e.g., "initialize", "swap", "transfer")
* `value` - JSON string with decoded parameter values

**How it works:**

1. Parses the IDL JSON specification
2. Attempts to decode as an instruction first
3. Falls back to event decoding if instruction decoding fails
4. Returns both the name and decoded values

**Example: Access Decoded Fields after decoding**

```yaml theme={null}
sql: |
  WITH decoded AS (
    SELECT
      id,
      _gs_decode_instruction_data(
        _gs_fetch_abi('https://api.example.com/jupiter-v6.json', 'raw'),
        data
      ) as decoded_ix,
      accounts
    FROM solana_instructions
    WHERE program_id = 'JUP6LkbZbjS1jKKwapdHNy74zcZ3tLUZoi5QNyVTaV4'
  )
  SELECT
    id,
    decoded_ix.name as instruction_name,
    -- Parse the JSON value to extract specific fields
    json_value(decoded_ix.value, '$.amountIn') as amount_in,
    json_value(decoded_ix.value, '$.minimumAmountOut') as min_amount_out,
    accounts[1] as user_account
  FROM decoded
  WHERE decoded_ix.name = 'sharedAccountsRoute'
```

**Example: Decode with Inline IDL**

For simpler programs, you can provide the IDL directly:

```yaml theme={null}
sql: |
  SELECT
    id,
    _gs_decode_instruction_data(
      '{"instructions":[{"name":"initialize","accounts":[{"name":"authority","isMut":false,"isSigner":true}],"args":[{"name":"bump","type":"u8"}]}]}',
      data
    ) as decoded
  FROM solana_instructions
```

<Tip>
  **Performance**: IDL decoders are cached internally. Reusing the same IDL JSON
  string (via `_gs_fetch_abi()`) across multiple rows is very efficient - the
  IDL is only parsed once.
</Tip>

<Note>
  **When to use:**

  * Custom Solana programs with available IDL specifications
  * Decoding instructions from DEXes, DeFi protocols, NFT marketplaces
  * Any program where you need flexible, schema-driven decoding

  **Use program-specific decoders instead for:**

  * Solana system programs (Token, System, Stake, Vote, etc.) - use `_gs_solana_decode_*` functions below
  * Better performance for known programs with hardcoded decoders
</Note>

### \_gs\_solana\_decode\_token\_program\_instruction

Decode Solana Token Program (SPL Token) instructions.

**Signature:**

```sql theme={null}
_gs_solana_decode_token_program_instruction(data, accounts) -> STRUCT
```

* `data` — Base58-encoded instruction data
* `accounts` — List of account public keys for the instruction

**Example: Decode Token Instructions**

```yaml theme={null}
transforms:
  decoded_token_ops:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        signature,
        _gs_solana_decode_token_program_instruction(data, accounts) as decoded
      FROM solana_ix
      WHERE program_id = 'TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA'
```

### \_gs\_solana\_decode\_system\_program\_instruction

Decode Solana System Program instructions (transfers, account creation, etc.). No IDL required.

**Signature:**

```sql theme={null}
_gs_solana_decode_system_program_instruction(data, accounts) -> STRUCT
```

* `data` — Base58-encoded instruction data
* `accounts` — List of account public keys for the instruction

**Example: Decode System Program Instructions**

```yaml theme={null}
transforms:
  system_decoded:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        block_slot,
        signature,
        _gs_solana_decode_system_program_instruction(data, accounts) as decoded
      FROM solana_ix
      WHERE program_id = '11111111111111111111111111111111'
```

### \_gs\_solana\_get\_accounts

Build a unified list of accounts for a Solana transaction, annotated with source, writable, and signer flags. This is the same logic used to produce the `accounts` column on `solana.instructions`.

**Signature:**

```sql theme={null}
_gs_solana_get_accounts(
  account_keys,
  loaded_readonly,
  loaded_writable,
  num_required_signatures,
  num_readonly_signed_accounts,
  num_readonly_unsigned_accounts,
  source
) -> LIST<STRUCT<pubkey: VARCHAR, source: VARCHAR, writable: BOOLEAN, signer: BOOLEAN>>
```

**Parameters:**

* `account_keys` - `LIST<VARCHAR>` of account pubkeys from the message header
* `loaded_readonly` - `LIST<VARCHAR>` of readonly pubkeys loaded via address lookup tables
* `loaded_writable` - `LIST<VARCHAR>` of writable pubkeys loaded via address lookup tables
* `num_required_signatures` - `UINT8` header field
* `num_readonly_signed_accounts` - `UINT8` header field
* `num_readonly_unsigned_accounts` - `UINT8` header field
* `source` - `VARCHAR` tag stored on each resulting account record

### \_gs\_solana\_get\_balance\_changes

Pair a transaction's unified account list with its `preBalances` / `postBalances` arrays to produce per-account SOL balance changes.

**Signature:**

```sql theme={null}
_gs_solana_get_balance_changes(accounts, pre_balances, post_balances)
  -> LIST<STRUCT<account: VARCHAR, before: UINT64, after: UINT64>>
```

**Parameters:**

* `accounts` - `LIST<STRUCT<pubkey, source, writable, signer>>` (typically the output of `_gs_solana_get_accounts`)
* `pre_balances` - `LIST<UINT64>` of lamport balances before the transaction
* `post_balances` - `LIST<UINT64>` of lamport balances after the transaction

**Example: Monitor Large Balance Changes**

```yaml theme={null}
sql: |
  WITH balance_changes AS (
    SELECT
      signature,
      _gs_solana_get_balance_changes(
        _gs_solana_get_accounts(
          accountKeys,
          loadedReadonly,
          loadedWritable,
          numRequiredSignatures,
          numReadonlySignedAccounts,
          numReadonlyUnsignedAccounts,
          'transaction'
        ),
        preBalances,
        postBalances
      ) as changes
    FROM solana_transactions
  )
  SELECT
    signature,
    change.account,
    change.after - change.before as net_change
  FROM balance_changes
  CROSS JOIN UNNEST(changes) AS change
  WHERE ABS(CAST(change.after AS BIGINT) - CAST(change.before AS BIGINT)) > 1000000000  -- > 1 SOL
```

### Other Solana Decoders

All program-specific decoders take `(data, accounts)` and follow the same pattern:

* `_gs_solana_decode_associated_token_program_instruction(data, accounts)` - Decode ATA program
* `_gs_solana_decode_stake_program_instruction(data, accounts)` - Decode staking operations
* `_gs_solana_decode_vote_program_instruction(data, accounts)` - Decode vote program
* `_gs_solana_decode_bpf_loader_instruction(data, accounts)` - Decode BPF loader
* `_gs_solana_decode_bpf_upgradeable_loader_instruction(data, accounts)` - Decode upgradeable programs
* `_gs_solana_decode_address_lookup_table_instruction(data, accounts)` - Decode address lookup tables

### \_gs\_from\_base58

Decode Base58 strings (common in Solana for addresses and signatures).

**Signature:**

```sql theme={null}
_gs_from_base58(base58_string) -> BINARY
```

**Example:**

```yaml theme={null}
sql: |
  SELECT
    _gs_from_base58(account_address) as decoded_address
  FROM solana_accounts
```

## Large Numbers (U256 & I256)

Blockchain smart contracts frequently use 256-bit integers for token amounts, balances, and calculations. These functions provide precise arithmetic without JavaScript's number precision limits.

### U256 Functions (Unsigned 256-bit)

#### to\_u256

Convert various types to U256.

**Signature:**

```sql theme={null}
to_u256(value) -> U256
```

**Accepts:** VARCHAR, INT64, UINT64, INT32, UINT32, INT16, UINT16, INT8, UINT8

**Example:**

```yaml theme={null}
sql: |
  SELECT
    to_u256('1000000000000000000') as one_eth_wei,
    to_u256(value) as value_u256
  FROM erc20_transfers
```

#### u256\_to\_string

Convert U256 back to decimal string.

**Signature:**

```sql theme={null}
u256_to_string(u256_value) -> VARCHAR
```

#### u256\_add / u256\_sub / u256\_mul / u256\_div / u256\_mod

U256 arithmetic operations.

**Signatures:**

```sql theme={null}
u256_add(a, b) -> U256
u256_sub(a, b) -> U256
u256_mul(a, b) -> U256
u256_div(a, b) -> U256
u256_mod(a, b) -> U256
```

<Tip>
  **Automatic Operator Rewriting**: Once values are cast to U256 or I256 types, you can use standard SQL operators (`+`, `-`, `*`, `/`, `%`) instead of explicit function calls. Turbo's SQL preprocessor automatically rewrites these to the appropriate functions.

  ```sql theme={null}
  -- These are equivalent:
  to_u256(value) / to_u256('1000000')
  u256_div(to_u256(value), to_u256('1000000'))

  -- Use the cleaner syntax:
  to_u256(value) + to_u256(fee)  -- Automatically becomes u256_add()
  to_u256(balance) * to_u256(2)  -- Automatically becomes u256_mul()
  ```
</Tip>

**Example: Calculate Token Amounts in ETH**

```yaml theme={null}
transforms:
  token_amounts_eth:
    type: sql
    primary_key: id
    sql: |
      SELECT
        id,
        from_address,
        to_address,
        value,
        -- Convert wei to ETH (divide by 10^18) - using clean operator syntax
        u256_to_string(
          to_u256(value) / to_u256('1000000000000000000')
        ) as amount_eth,
        -- Calculate 1% fee
        u256_to_string(
          to_u256(value) / to_u256('100')
        ) as fee_1_percent
      FROM erc20_transfers
```

**Example: Sum Multiple Token Amounts**

```yaml theme={null}
sql: |
  SELECT
    to_address,
    -- Add two transfer amounts - using clean operator syntax
    u256_to_string(
      to_u256(transfer1_value) + to_u256(transfer2_value)
    ) as total_received
  FROM combined_transfers
```

<Warning>
  U256 division truncates (no decimals). For percentage calculations or conversions to smaller units, multiply first, then divide.

  Good: `(amount * to_u256('100')) / to_u256('1000000')` (multiply by 100 first)

  Bad: `(amount / to_u256('1000000')) * to_u256('100')` (loses precision)
</Warning>

### I256 Functions (Signed 256-bit)

Similar to U256 but supports negative numbers.

**Available functions:**

* `to_i256(value)` - Convert to I256
* `i256_to_string(i256_value)` - Convert to string
* `i256_add(a, b)` / `i256_sub(a, b)` - Addition/subtraction
* `i256_mul(a, b)` / `i256_div(a, b)` / `i256_mod(a, b)` - Multiplication/division/modulo
* `i256_neg(value)` - Negate (multiply by -1)
* `i256_abs(value)` - Absolute value

**Example: Track Profit/Loss**

```yaml theme={null}
sql: |
  SELECT
    address,
    -- Calculate profit/loss using clean operator syntax
    i256_to_string(
      to_i256(current_balance) - to_i256(initial_balance)
    ) as profit_loss,
    -- Get absolute value of change
    i256_to_string(
      i256_abs(
        to_i256(current_balance) - to_i256(initial_balance)
      )
    ) as abs_change
  FROM balances
```

<Note>
  I256 and U256 operators work the same way - standard operators (`+`, `-`, `*`,
  `/`, `%`) are automatically rewritten to their corresponding function calls by
  the SQL preprocessor.
</Note>

## Array Processing Functions

### array\_filter\_in

Filter array elements where a struct field matches any value in a provided list. This is particularly useful for preventing overflow panics when processing large nested arrays by filtering **before** unnesting.

**Signature:**

```sql theme={null}
array_filter_in(array, field_name, values_list) -> LIST
```

**Parameters:**

* `array` - Array of structs to filter
* `field_name` - Name of the struct field to match against (Utf8 string)
* `values_list` - Array of values to match (supports Utf8 and Int64 types)

**Returns:** Filtered array containing only elements where the specified field matches any value in the list

**Use case: Preventing overflow panics**

When processing data with multiple nested unnest operations (e.g., transactions → operations → ledger\_entry\_changes), the row count can explode exponentially. In extreme cases, this causes an overflow panic in Arrow's take function because Arrow uses i32 offsets internally for `ListArray`. When the total number of rows after unnesting exceeds \~2.1 billion, buffer offset calculations overflow.

By filtering arrays **before** unnesting, you reduce both memory usage and processing time while avoiding these overflow issues.

**Example: Filter Stellar ledger entry changes**

```sql theme={null}
-- Before: unnest explodes ALL ledger_entry_changes, then filters
SELECT ... FROM operations, unnest(ledger_entry_changes) AS change
WHERE change.ledger_entry_type IN ('account', 'trustline')

-- After: filter FIRST, then unnest only matching elements
SELECT ... FROM operations, unnest(
  array_filter_in(ledger_entry_changes, 'ledger_entry_type', ARRAY['account', 'trustline'])
) AS change
```

**Example: Filter Stellar operations by type**

```yaml theme={null}
sql: |
  SELECT
    signature,
    unnest(
      array_filter_in(o.operation.ledger_entry_changes, 'ledger_entry_type', ARRAY['account', 'trustline'])
    ) AS change
  FROM stellar_transactions t,
       unnest(t.operations) AS o
```

### to\_large\_list

Convert a List array (i32 offsets) to a LargeList array (i64 offsets). This allows processing arrays that would otherwise overflow the i32 offset limit during operations like unnest.

**Signature:**

```sql theme={null}
to_large_list(array) -> LARGE_LIST
```

**Parameters:**

* `array` - List, LargeList, or FixedSizeList array to convert

**Returns:** LargeList array with i64 offsets (virtually unlimited capacity vs \~2.1B for i32)

**Use case: Safety net for large arrays**

Use `to_large_list` when you need ALL elements from an array but the arrays might be extremely large. This changes the offset type from i32 (\~2.1B max) to i64, preventing overflow in Arrow's take function.

**Example: Process large arrays safely**

```sql theme={null}
SELECT unnest(to_large_list(my_array_column)) AS element FROM table
```

**Example: Stellar ledger entry changes**

```yaml theme={null}
sql: |
  SELECT
    signature,
    unnest(to_large_list(o.operation.ledger_entry_changes)) AS change
  FROM stellar_transactions t,
       unnest(t.operations) AS o
```

**When to use each function:**

| Function          | Best for                                                  | Trade-off                               |
| ----------------- | --------------------------------------------------------- | --------------------------------------- |
| `array_filter_in` | When you only need a subset of array elements             | Reduces both memory and processing time |
| `to_large_list`   | When you need ALL elements but arrays might be very large | Safety net with minimal overhead        |

### array\_filter

Filter array elements based on field value.

**Signature:**

```sql theme={null}
array_filter(array, field_name, value) -> LIST
```

**Parameters:**

* `array` - Array of structs to filter
* `field_name` - Name of field to match
* `value` - Value to match against

**Example: Filter Logs by Event Name**

```yaml theme={null}
sql: |
  SELECT
    signature,
    -- Get only 'Transfer' events from instruction logs
    array_filter(
      instruction_logs,
      'event_name',
      'Transfer'
    ) as transfer_events
  FROM solana_transactions
```

**Example: Filter Failed Instructions**

```yaml theme={null}
sql: |
  SELECT
    slot,
    array_filter(
      transactions,
      'err',
      null
    ) as successful_transactions
  FROM solana_blocks
```

### array\_filter\_first

Like `array_filter` but returns only the first matching element.

**Signature:**

```sql theme={null}
array_filter_first(array, field_name, value) -> STRUCT (or NULL)
```

**Example:**

```yaml theme={null}
sql: |
  SELECT
    signature,
    array_filter_first(
      instructions,
      'programId',
      'TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA'
    ) as first_token_instruction
  FROM solana_transactions
```

### array\_enumerate

Add index to each array element.

**Signature:**

```sql theme={null}
array_enumerate(array) -> LIST<STRUCT<index: INT, value: T>>
```

**Example:**

```yaml theme={null}
sql: |
  SELECT
    signature,
    array_enumerate(instructions) as indexed_instructions
  FROM solana_transactions

  -- Access with: indexed_instructions[1].index, indexed_instructions[1].value
```

### \_gs\_zip\_arrays

Combine multiple arrays element-wise.

**Signature:**

```sql theme={null}
_gs_zip_arrays(array1, array2, ...) -> LIST<STRUCT>
```

**Example:**

```yaml theme={null}
sql: |
  SELECT
    _gs_zip_arrays(
      account_keys,
      pre_balances,
      post_balances
    ) as account_balance_changes
  FROM solana_transactions
```

## String & Encoding Functions

### string\_to\_array

Split string into array by delimiter.

**Signature:**

```sql theme={null}
string_to_array(string, delimiter) -> LIST<VARCHAR>
```

**Example:**

```yaml theme={null}
sql: |
  SELECT
    string_to_array(metadata_tags, ',') as tags_array
  FROM nft_metadata
```

### SPLIT\_INDEX

Split a string by a delimiter and return the element at a given index. This is particularly useful for extracting individual topics from the `topics` field in `raw_logs` datasets, which is a comma-separated string.

**Signature:**

```sql theme={null}
SPLIT_INDEX(string, delimiter, index) -> VARCHAR
```

**Parameters:**

* `string` - The string to split
* `delimiter` - The delimiter to split on
* `index` - 0-based index of the element to return

**Returns:** The element at the given index, or `NULL` if the index is out of bounds.

**Example: Extract event signature from raw logs**

```yaml theme={null}
sql: |
  SELECT *
  FROM base_logs
  WHERE SPLIT_INDEX(topics, ',', 0) = '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'
```

**Example: Extract indexed parameters from topics**

```yaml theme={null}
sql: |
  SELECT
    id,
    SPLIT_INDEX(topics, ',', 0) as event_signature,
    SPLIT_INDEX(topics, ',', 1) as indexed_param_1,
    SPLIT_INDEX(topics, ',', 2) as indexed_param_2
  FROM base_logs
```

**Example: Distinguish ERC-721 from ERC-20 transfers**

```yaml theme={null}
sql: |
  SELECT *
  FROM ethereum_logs
  WHERE SPLIT_INDEX(topics, ',', 0) = '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'
    AND SPLIT_INDEX(topics, ',', 3) IS NOT NULL  -- ERC-721 has 4 topics, ERC-20 has 3
```

### Regular Expression Functions

Turbo pipelines include Flink-compatible regex functions:

**regexp\_extract** - Extract regex match groups

```sql theme={null}
regexp_extract(string, pattern, group_index) -> VARCHAR
```

**regexp\_replace** - Replace regex matches

```sql theme={null}
regexp_replace(string, pattern, replacement) -> VARCHAR
```

**regexp\_count** - Count regex matches

```sql theme={null}
regexp_count(string, pattern) -> INT
```

**Example: Extract Address from Log Message**

```yaml theme={null}
sql: |
  SELECT
    regexp_extract(
      log_message,
      'address: (0x[0-9a-fA-F]{40})',
      1
    ) as extracted_address
  FROM contract_logs
```

### URL Functions

**parse\_url** - Extract URL components

```sql theme={null}
parse_url(url, component) -> VARCHAR
-- component: 'HOST', 'PATH', 'QUERY', 'PROTOCOL', etc.
```

**url\_encode / url\_decode** - URL encoding/decoding

```sql theme={null}
url_encode(string) -> VARCHAR
url_decode(string) -> VARCHAR
```

**Example:**

```yaml theme={null}
sql: |
  SELECT
    parse_url(metadata_url, 'HOST') as host,
    parse_url(metadata_url, 'PATH') as path
  FROM nft_metadata
```

### Other String Functions

* `bin(number)` - Convert to binary representation
* `translate(string, from_chars, to_chars)` - Character translation
* `elt(index, string1, string2, ...)` - Return element at index
* `locate(substring, string)` - Find substring position
* `unhex(hex_string)` - Decode hex string

Plus all standard SQL string functions: `lower`, `upper`, `trim`, `substring`, `concat`, `replace`, `reverse`, etc.

## JSON Functions

Turbo pipelines have comprehensive JSON support compatible with Flink SQL.

### json\_query

Query JSON documents using path expressions.

**Signature:**

```sql theme={null}
json_query(json_string, path) -> VARCHAR
```

**Example: Extract NFT Metadata**

```yaml theme={null}
transforms:
  nft_attributes:
    type: sql
    primary_key: token_id
    sql: |
      SELECT
        token_id,
        json_query(metadata, '$.name') as nft_name,
        json_query(metadata, '$.description') as description,
        json_query(metadata, '$.image') as image_url,
        json_query(metadata, '$.attributes') as attributes_json
      FROM nft_tokens
```

### json\_value

Extract scalar value from JSON.

**Signature:**

```sql theme={null}
json_value(json_string, path) -> VARCHAR
```

**Example:**

```yaml theme={null}
sql: |
  SELECT
    json_value(metadata, '$.properties.category') as category,
    json_value(metadata, '$.properties.rarity') as rarity
  FROM nft_metadata
```

### parse\_json / try\_parse\_json

Parse JSON string to validate or convert.

**Signatures:**

```sql theme={null}
parse_json(json_string) -> JSON  -- Errors on invalid JSON
try_parse_json(json_string) -> JSON  -- Returns NULL on invalid JSON
```

**Example: Safe JSON Parsing**

```yaml theme={null}
sql: |
  SELECT
    token_id,
    try_parse_json(metadata_string) as parsed_metadata,
    -- Check if JSON is valid
    CASE
      WHEN try_parse_json(metadata_string) IS NOT NULL THEN true
      ELSE false
    END as is_valid_json
  FROM nft_tokens
```

### is\_json

Check if string is valid JSON.

**Signature:**

```sql theme={null}
is_json(string) -> BOOLEAN
```

### json\_object / json\_array

Construct JSON objects and arrays.

**Signatures:**

```sql theme={null}
json_object(key1, value1, key2, value2, ...) -> JSON
json_array(value1, value2, ...) -> JSON
```

**Example: Build JSON Response**

```yaml theme={null}
sql: |
  SELECT
    json_object(
      'address', from_address,
      'amount', value,
      'timestamp', block_timestamp
    ) as transfer_json
  FROM erc20_transfers
```

**Null handling variants:**

* `json_object_absent_on_null` - Omit null fields
* `json_array_absent_on_null` - Omit null elements

### json\_exists

Check if JSON path exists.

**Signature:**

```sql theme={null}
json_exists(json_string, path) -> BOOLEAN
```

**Example:**

```yaml theme={null}
sql: |
  SELECT *
  FROM nft_metadata
  WHERE json_exists(metadata, '$.attributes.rare')
```

### json\_string

Converts any value to its JSON-encoded string representation. Primitives are quoted; structs and arrays are serialized to JSON objects/arrays.

**Signature:**

```sql theme={null}
json_string(value) -> VARCHAR
```

**Parameters:**

| Parameter | Type | Description                                              |
| --------- | ---- | -------------------------------------------------------- |
| `value`   | any  | Any Arrow-supported value, including structs and arrays. |

**Returns:** A JSON string. `NULL` input is passed through.

**Example:**

```yaml theme={null}
sql: |
  SELECT
    json_string('hello')         as quoted_string,   -- '"hello"'
    json_string(42)              as number,          -- '42'
    json_string(struct(a => 1))  as struct_as_json   -- '{"a":1}'
  FROM events
```

## Time Functions

### now / current\_time / current\_date

Get current timestamp, time, or date.

**Signatures:**

```sql theme={null}
now() -> TIMESTAMP
current_time() -> TIME
current_date() -> DATE
```

<Warning>
  These functions are **volatile** - they return different values on each call.
  Use them for adding processing timestamps, not for filtering historical data.
</Warning>

**Example: Add Processing Timestamp**

```yaml theme={null}
sql: |
  SELECT
    *,
    now() as processed_at,
    current_date() as processing_date
  FROM ethereum_transactions
```

### date\_part

Extract part of a timestamp.

**Signature:**

```sql theme={null}
date_part(part, timestamp) -> INT
-- part: 'year', 'month', 'day', 'hour', 'minute', 'second', 'epoch', etc.
```

**Example: Analyze by Hour**

```yaml theme={null}
sql: |
  SELECT
    date_part('hour', to_timestamp(blockTime)) as hour_of_day,
    date_part('epoch', to_timestamp(blockTime)) - blockTime as age_seconds
  FROM solana_blocks
```

### to\_timestamp / to\_timestamp\_micros

Convert Unix timestamp to TIMESTAMP.

**Signatures:**

```sql theme={null}
to_timestamp(seconds) -> TIMESTAMP
to_timestamp_micros(microseconds) -> TIMESTAMP
```

**Example:**

```yaml theme={null}
sql: |
  SELECT
    slot,
    to_timestamp(blockTime) as block_timestamp,
    to_timestamp_micros(blockTime * 1000000) as block_timestamp_precise
  FROM solana_blocks
```

## Hash & ID Functions

### \_gs\_xxhash

Fast non-cryptographic hash function (XXH3 128-bit). Returns a 32-character hex string.

**Signature:**

```sql theme={null}
_gs_xxhash(input) -> VARCHAR
```

**Example: Create Deterministic IDs**

```yaml theme={null}
sql: |
  SELECT
    _gs_xxhash(concat(transaction_hash, '_', log_index::VARCHAR)) as unique_id,
    transaction_hash,
    log_index
  FROM ethereum_logs
```

### uuid7

Generates a UUIDv7 — a time-ordered UUID that embeds the current timestamp. Useful as a primary key when you want inserts to be monotonically sorted by time.

**Signature:**

```sql theme={null}
uuid7() -> VARCHAR
```

**Returns:** A newly generated UUIDv7 string (e.g. `018f2a5c-1234-7abc-8def-0123456789ab`). Called once per row, so every row gets a distinct value.

**Example:**

```yaml theme={null}
sql: |
  SELECT
    uuid7() as event_id,
    transaction_hash,
    block_number
  FROM ethereum_logs
```

## Dynamic Table Functions

### dynamic\_table\_check

Check if a value exists in a dynamic table (async function).

**Signature:**

```sql theme={null}
dynamic_table_check(table_name, value) -> BOOLEAN
```

**Parameters:**

* `table_name` - Reference name of the dynamic table (from your pipeline config)
* `value` - Value to check for existence

**Returns:** `true` if value exists in the table, `false` otherwise

See the [Dynamic Tables](/turbo-pipelines/transforms/dynamic-tables) documentation for complete details.

**Example:**

```yaml theme={null}
transforms:
  tracked_wallets:
    type: dynamic_table
    backend_type: Postgres
    backend_entity_name: user_wallets
    secret_name: MY_POSTGRES

  filtered_transfers:
    type: sql
    primary_key: id
    sql: |
      SELECT *
      FROM erc20_transfers
      WHERE dynamic_table_check('tracked_wallets', from_address)
         OR dynamic_table_check('tracked_wallets', to_address)
```

## Function Composition

Functions can be composed to build complex transformations.

**Example: Decode Log and Convert Value**

```yaml theme={null}
sql: |
  WITH decoded AS (
    SELECT
      id,
      evm_decode_log(abi, topics, data) as evt,
      block_timestamp
    FROM ethereum_logs
  )
  SELECT
    id,
    evt.event_signature as event_name,
    evt.event_params[1] as from_address,
    evt.event_params[2] as to_address,
    -- Convert wei to ETH using U256 with clean operator syntax
    u256_to_string(
      to_u256(evt.event_params[3]) / to_u256('1000000000000000000')
    ) as amount_eth,
    block_timestamp
  FROM decoded
```

**Example: Filter Array and Decode First Match**

```yaml theme={null}
sql: |
  SELECT
    signature,
    _gs_solana_decode_token_program_instruction(
      array_filter_first(
        instructions,
        'programId',
        'TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA'
      ).data,
      array_filter_first(
        instructions,
        'programId',
        'TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA'
      ).accounts
    ) as decoded_token_instruction
  FROM solana_transactions
```

## Performance Considerations

<AccordionGroup>
  <Accordion title="Function Caching">
    Some functions use internal caching for performance:

    * `evm_decode_log` caches ABI decoders (reuse the same ABI string)
    * Regex functions cache compiled patterns
    * For best performance, use literal strings for patterns/ABIs when possible
  </Accordion>

  <Accordion title="Async Functions">
    `dynamic_table_check` is async and may perform I/O operations (database
    lookups). Use it strategically: - Good: Filter with WHERE clause using dynamic
    table - Avoid: Using in complex calculations or deeply nested expressions
  </Accordion>

  <Accordion title="U256/I256 Operations">
    Large number operations are more expensive than native integers: - Convert to
    U256 only when needed for precision - Do as much filtering as possible before
    U256 operations - Chain operations to minimize conversions:
    `u256_to_string(u256_div(...))` not `u256_to_string(...) /
            u256_to_string(...)`
  </Accordion>

  <Accordion title="Array Operations">
    Array processing can be expensive on large arrays:

    * Filter arrays early in the pipeline
    * Use `array_filter_first` instead of `array_filter` when you only need one element
    * Consider if SQL filtering can reduce array size before processing
  </Accordion>
</AccordionGroup>

## Best Practices

<Steps>
  <Step title="Use type-specific functions">
    Use U256/I256 for token amounts, regular INT for counters and IDs
  </Step>

  <Step title="Validate data">
    Use `try_parse_json` instead of `parse_json` when data quality is uncertain
  </Step>

  <Step title="Minimize function calls">
    Store function results in CTEs when used multiple times: `sql WITH decoded
            AS ( SELECT id, evm_decode_log(abi, topics, data) as evt FROM logs ) SELECT
            evt.event_signature, evt.event_params[1], evt.event_params[2] FROM decoded `
  </Step>

  <Step title="Handle NULLs">
    Many blockchain fields can be NULL - use COALESCE or NULL checks
  </Step>

  <Step title="Use appropriate hash functions">
    * `_gs_keccak256` for EVM-compatible hashing (event signatures, etc.)
    * `_gs_xxhash` for fast non-cryptographic hashing (IDs, deduplication)
  </Step>
</Steps>

## Common Patterns

### Pattern: Decode and Transform ERC-20 Transfers

```yaml theme={null}
transforms:
  decoded_erc20_transfers:
    type: sql
    primary_key: id
    sql: |
      WITH decoded AS (
        SELECT
          id,
          address as contract_address,
          transaction_hash,
          evm_decode_log(
            '[{"anonymous":false,"inputs":[{"indexed":true,"name":"from","type":"address"},{"indexed":true,"name":"to","type":"address"},{"indexed":false,"name":"value","type":"uint256"}],"name":"Transfer","type":"event"}]',
            topics,
            data
          ) as evt,
          block_number,
          block_timestamp
        FROM ethereum_logs
        WHERE SPLIT_INDEX(topics, ',', 0) = '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'
      )
      SELECT
        id,
        contract_address,
        transaction_hash,
        lower(evt.event_params[1]) as from_address,
        lower(evt.event_params[2]) as to_address,
        evt.event_params[3] as value_wei,
        u256_to_string(
          to_u256(evt.event_params[3]) / to_u256('1000000000000000000')
        ) as value_eth,
        block_number,
        block_timestamp
      FROM decoded
```

### Pattern: Filter Solana Token Transfers

```yaml theme={null}
transforms:
  solana_token_transfers:
    type: sql
    primary_key: signature
    sql: |
      WITH token_ix AS (
        SELECT
          signature,
          slot,
          blockTime,
          array_filter(
            instructions,
            'programId',
            'TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA'
          ) as token_instructions
        FROM solana_transactions
      )
      SELECT
        signature,
        slot,
        to_timestamp(blockTime) as block_timestamp,
        _gs_solana_decode_token_program_instruction(
          token_instructions[1].data,
          token_instructions[1].accounts
        ) as decoded_instruction,
        array_length(token_instructions) as token_instruction_count
      FROM token_ix
      WHERE array_length(token_instructions) > 0
```

### Pattern: Process NFT Metadata with JSON

```yaml theme={null}
transforms:
  nft_enriched:
    type: sql
    primary_key: token_id
    sql: |
      SELECT
        token_id,
        contract_address,
        -- Extract metadata fields safely
        COALESCE(
          json_value(metadata, '$.name'),
          'Unknown'
        ) as name,
        json_value(metadata, '$.description') as description,
        json_value(metadata, '$.image') as image_url,
        -- Check for rarity trait
        json_exists(metadata, '$.attributes[?(@.trait_type == "Rarity")]') as has_rarity,
        -- Validate metadata
        is_json(metadata) as valid_metadata
      FROM nft_tokens
      WHERE metadata IS NOT NULL
        AND is_json(metadata)
```
