ReplacingMergeTree
table engine with append_only_mode
offers the best real-time data performance for large datasets.
ReplacingMergeTree engine is used for all sink tables by default. If you don’t want to use a ReplacingMergeTree, you can pre-create the table with any data engine you’d like. If you don’t want to use a ReplacingMergeTree, you can disable append_only_mode
.
Full configuration details for Clickhouse sink is available in the reference page.
Secrets
Use HTTP
Mirror writes to ClickHouse via the
http
interface (often port 8443
), rather than the tcp
interface (often port 9000
).Required permissions
The user will need the following permissions for the target database.- CREATE DATABASE permissions for that database
- INSERT, SELECT, CREATE, DROP table permissions for tables within that database
Data consistency with ReplacingMergeTrees
WithReplacingMergeTree
tables, we can write, overwrite, and flag rows with the same primary key for deletes without actually mutating. As a result, the actual raw data in the table may contain duplicates.
ClickHouse allows you to clean up duplicates and deletes from the table by running
FINAL
keyword after the table name for queries.
Append-Only Mode
Proceed with CautionWithout
append_only_mode=true
(v2: appendOnlyMode=true
), the pipeline may hit ClickHouse mutation flush limits. Write speed will also be slower due to mutations.append_only_mode
(v2: appendOnlyMode
) is true
(default and recommended for ReplacingMergeTrees), the sink behaves the following way:
- All updates and deletes are converted to inserts.
is_deleted
column is automatically added to a table. It contains1
in case of deletes,0
otherwise.- If
versionColumnName
is specified, it’s used as a version number column for deduplication. If it’s not specified,insert_time
column is automatically added to a table. It contains insertion time and is used for deduplication. - Primary key is used in the
ORDER BY
clause.
append_only_mode
(v2: appendOnlyMode
) is false
:
- All updates and deletes are propagated as is.
- No extra columns are added.
- Primary key is used in the
PRIMARY KEY
clause.