Skip to main content

Overview

Write data to a PostgreSQL database with automatic table creation and upsert support. You have two paths:
  • Goldsky-hosted Postgres (recommended for most users). Goldsky provisions and manages the database for you via NeonDB. No database administration, no networking setup. Available on Scale plans and above.
  • Bring your own Postgres. Connect Turbo to a Postgres database you already run (Neon, Supabase, RDS, Cloud SQL, self-hosted, and similar).
Pick a path below to get the connection details, then use the same sinks configuration for both.
Hosted Postgres works the same way for both Mirror and Turbo pipelines. This page is the source of truth for both. Mirror users are welcome here.

Configuration

The same sink configuration works for both Goldsky-hosted and bring-your-own Postgres. Only the secret behind secret_name differs.
sinks:
  my_postgres_sink:
    type: postgres
    from: my_transform
    schema: public
    table: my_table
    secret_name: MY_POSTGRES_SECRET # or your hosted Postgres secret name
    primary_key: id # Optional, enables upsert behavior

Parameters

type
string
required
Must be postgres.
from
string
required
Name of the transform or source to read data from.
schema
string
required
PostgreSQL schema name (for example, public, analytics). Created automatically if it doesn’t exist.
table
string
required
Table name to write to. Created automatically if it doesn’t exist, with columns inferred from the upstream Arrow schema.
secret_name
string
required
Name of a Goldsky secret holding the Postgres connection details (host, port, user, password, databaseName). For hosted Postgres, use the secret name shown on the hosted database detail page.
primary_key
string
Comma-separated list of columns to use as the table’s primary key. When set, writes become upserts (INSERT ... ON CONFLICT DO UPDATE). When omitted, writes are plain inserts and no primary key is added to the auto-created table.
on_conflict
string
default:"update"
Behavior when a row with the same primary_key already exists. update runs ON CONFLICT ... DO UPDATE SET, nothing runs ON CONFLICT ... DO NOTHING. Only applies when primary_key is set.
batch_size
integer
Maximum number of rows to accumulate before flushing to Postgres. Falls back to the engine default when unset.
batch_flush_interval
string
Maximum time to wait before flushing a partial batch, parsed as a humantime duration (for example, "500ms", "1s", "2s"). Falls back to the engine default when unset.
parallelism
integer
default:"1"
Number of parallel writer tasks. Each task processes a slice of the accumulated batch concurrently. Increase for sink-bound pipelines.

Behavior

  • Auto schema and table creation: both the schema and the table are created with CREATE ... IF NOT EXISTS on first write. Columns are derived from the upstream Arrow schema.
  • No automatic schema migrations: if the table already exists, the engine does not ALTER TABLE to add or change columns. Upstream schema changes that add new columns will fail at insert time. Drop or manually migrate the table first.
  • Upsert vs. insert: setting primary_key produces INSERT ... ON CONFLICT (<pk>) DO UPDATE SET .... Without primary_key, rows are plain INSERTs and no primary key constraint is created.
  • Type mapping: Arrow types are mapped automatically. Int32 to INTEGER, Int64 to BIGINT, Utf8 to TEXT, Boolean to BOOLEAN, Decimal(p, s) to NUMERIC(p, s), structs/lists/maps to JSONB.
  • 256-bit integers (U256 / I256): see 256-bit integers below.
  • UInt64: stored as NUMERIC (because BIGINT is signed and cannot hold the full unsigned range).

256-bit integers (U256 / I256)

EVM-native datasets contain values that exceed any standard SQL integer type. For example, uint256 on value, balance, totalSupply, raw wei amounts, and ERC-20 amount columns. Turbo carries these through the pipeline as Arrow FixedSizeBinary(32) columns tagged with internal U256 or I256 metadata. When the Postgres sink writes one of these columns, it does the following automatically:
  • Auto-created tables: the column is created as NUMERIC(78, 0). 78 digits is the smallest decimal precision that can represent every U256 value (2^256 − 1 has 78 decimal digits) and every I256 value. 0 scale means whole numbers only, no fractional part is ever produced.
  • In-flight encoding: at write time, the sink converts each 32-byte big-endian value to its full decimal string representation (no scientific notation, no truncation) and sends that string in the INSERT. Postgres parses the string directly into the NUMERIC column. The conversion does not round, does not lose precision, and is independent of any client locale.
  • Existing tables with a different type: if the destination table already exists and the matching column is not NUMERIC(78, 0) (or a wider NUMERIC that can hold the same range), the sink fails the write instead of silently coercing or truncating. To migrate, drop the table and let the sink recreate it, or manually ALTER the column to NUMERIC(78, 0).

Querying NUMERIC(78, 0) values

NUMERIC(78, 0) values are returned by Postgres drivers in different ways depending on the language. A few common cases:
  • psql / generic SQL: use the value as-is. Casts like CAST(value AS NUMERIC) / 1e18 work, but mixing NUMERIC with floating-point loses precision. Keep arithmetic in NUMERIC if you care about exactness.
  • Node.js (pg library): NUMERIC is returned as a JavaScript string by default to avoid silent precision loss. Convert with BigInt(row.value) rather than Number(row.value).
  • Python (psycopg2 / psycopg3): returned as decimal.Decimal, which is full-precision.
  • Go (pgx): use pgtype.Numeric or scan into a *big.Int / *big.Float. Scanning into int64 will overflow.
Do not scan a NUMERIC(78, 0) column into a 64-bit integer or a double. The value will overflow or lose precision silently in most drivers.

Pre-converting in a transform

If your downstream consumers cannot work with arbitrary-precision decimals, convert before the sink:
  • SQL transform: cast to a smaller type only when you know the value fits. For example, ERC-20 amounts scaled by decimals often fit in DECIMAL(38, 18). Out-of-range values will raise an error.
  • TypeScript transform: use BigInt for math and return the result as a string. The TypeScript sandbox supports BigInt natively. See the TypeScript transform docs.

Tips for backfilling large datasets into PostgreSQL

While PostgreSQL offers fast access of data, writing large backfills into PostgreSQL can sometimes be hard to scale. Pipelines are often bottlenecked against sinks. Things to try:

Avoid indexes on tables until after the backfill

Indexes increase the amount of writes needed for each insert. When doing many writes, inserts can slow down significantly if you’re hitting resource limits.

Bigger batch sizes for the inserts

The batch_size setting controls how many rows are batched into a single insert statement. Depending on the size of the events, you can increase this to help with write performance. 1000 is a good number to start with. The pipeline will collect data until the batch is full, or until batch_flush_interval is met.

Temporarily scale up the database

Look at your database stats like CPU and memory to see where the bottlenecks are. Big writes are often not blocked on CPU or RAM, but on network or disk I/O. For Google Cloud SQL, there are I/O burst limits that you can surpass by increasing the amount of CPU. For AWS RDS instances (including Aurora), the network burst limits are documented for each instance. A rule of thumb is to look at the EBS baseline I/O performance, since burst credits are easily used up in a backfill scenario.

Provider-specific notes

AWS Aurora Postgres

When using Aurora for large datasets, use Aurora I/O optimized. It charges for more storage but gives you immense savings on I/O credits. If you’re streaming the entire chain into your database or have a very active subgraph, these savings can be considerable, and the disk performance is significantly more stable, resulting in a more stable CPU usage pattern.

Supabase

Supabase’s direct connection URLs only support IPv6 connections and will not work with our default validation. There are two solutions:
  1. Use Session Pooling. In the Supabase connection screen, scroll down to see the connection string for the session pooler. This is included in all Supabase plans and works for most people. Sessions will expire and may produce some warning logs in your pipeline logs. These are handled gracefully and no action is needed. No data will be lost due to a session disconnection.
  2. Alternatively, buy the IPv4 add-on if session pooling doesn’t fit your needs. It can lead to more persistent direct connections.

Neon

Use the connection string from your Neon project dashboard. SSL is required (?sslmode=require is included by default in the connection string Neon gives you).

Self-hosted PostgreSQL

Set listen_addresses = '*' in postgresql.conf, add a pg_hba.conf rule for your Goldsky writer role, and ensure your host firewall allows inbound TCP on the Postgres port.

Pricing

Hosted Postgres usage (CPU-hours and storage) is metered hourly and billed under the Hosted databases line on the pricing page. Paused or deleted pipelines transition the database to idle, and you are not charged for utilization during idle time (storage is still billed). Bring-your-own Postgres has no Goldsky database charges. You pay your own provider directly.

Example

sinks:
  postgres_transfers:
    type: postgres
    from: filtered_transfers
    schema: public
    table: erc20_transfers
    secret_name: MY_POSTGRES
    primary_key: id