> ## Documentation Index
> Fetch the complete documentation index at: https://docs.goldsky.com/llms.txt
> Use this file to discover all available pages before exploring further.

# ClickHouse

[ClickHouse](https://clickhouse.com/) is a highly performant and cost-effective OLAP database that can support real-time inserts. Mirror pipelines can write subgraph or blockchain data directly into ClickHouse with full data guarantees and reorganization handling.

Mirror can work with any ClickHouse setup, but we have several strong defaults. From our experimentation, the `ReplacingMergeTree` table engine with `append_only_mode` offers the best real-time data performance for large datasets.

[ReplacingMergeTree](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree) engine is used for all sink tables by default. If you don't want to use a ReplacingMergeTree, you can pre-create the table with any data engine you'd like. If you don't want to use a ReplacingMergeTree, you can disable `append_only_mode`.

Full configuration details for Clickhouse sink is available in the [reference](/mirror/reference/config-file/pipeline#clickhouse) page.

## Secrets

<Warning>
  **Use HTTP**
  Mirror writes to ClickHouse via the `http`  interface (often port `8443`), rather than the `tcp` interface (often port `9000`).
</Warning>

```shell theme={null}

goldsky secret create --name A_CLICKHOUSE_SECRET --value '{
  "url": "clickhouse://blah.host.com:8443?ssl=true",
  "type": "clickHouse",
  "username": "default",
  "password": "qwerty123",
  "databaseName": "myDatabase"
}'
```

## Required permissions

The user will need the following permissions for the target database.

* CREATE DATABASE permissions for that database
* INSERT, SELECT, CREATE, DROP table permissions for tables within that database

```sql theme={null}
CREATE USER 'username' IDENTIFIED WITH password 'user_password';

GRANT CREATE DATABASE ON goldsky.* TO 'username';
GRANT SELECT, INSERT, DROP, CREATE ON goldsky.* TO 'username';
```

It's highly recommended to assign a ROLE to the user as well, and restrict the amount of total memory and CPU the pipeline has access to. The pipeline will take what it needs to insert as fast as possible, and while that may be desired for a backfill, in a production scenario you may want to isolate those resources.

## Data consistency with ReplacingMergeTrees

With `ReplacingMergeTree` tables, we can write, overwrite, and flag rows with the same primary key for deletes without actually mutating. As a result, the actual raw data in the table may contain duplicates.

ClickHouse allows you to clean up duplicates and deletes from the table by running

```sql theme={null}
OPTIMIZE <tablename> FINAL;
```

which will merge rows with the same primary key into one. This may not be deterministic and fully clean all data up, so it's recommended to also add the `FINAL` keyword after the table name for queries.

```SQL theme={null}
SELECT <columns>
FROM <table name> FINAL
```

This will run a clean-up process, though there may be performance considerations.

## Append-Only Mode

<Warning>
  **Proceed with Caution**

  Without `append_only_mode=true`, the pipeline may hit ClickHouse mutation flush limits. Write speed will also be slower due to mutations.
</Warning>

Append-only mode means the pipeline will only *write* and not *update* or *delete* tables. There will be no mutations, only inserts.

This drastically increases insert speed and reduces Flush exceptions (which happen when too many mutations are queued up).

It's highly recommended as it can help you operate a large dataset with many writes with a small ClickHouse instance.

When `append_only_mode` is `true` (default and recommended for ReplacingMergeTrees), the sink behaves the following way:

* All updates and deletes are converted to inserts.
* `is_deleted` column is automatically added to a table. It contains `1` in case of deletes, `0` otherwise.
* If `versionColumnName` is specified, it's used as a [version number column](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree#ver) for deduplication. If it's not specified, `insert_time` column is automatically added to a table. It contains insertion time and is used for deduplication.
* Primary key is used in the `ORDER BY` clause.

This allows us to handle blockchain reorganizations natively while providing high insert speeds.

When `append_only_mode` is `false`:

* All updates and deletes are propagated as is.
* No extra columns are added.
* Primary key is used in the `PRIMARY KEY` clause.
