Role Creation
Here is an example snippet to give the permissions needed for pipelines.Secret Creation
Create a MySQL secret with the following CLI command:Examples
Getting an edge-only stream of decoded logs
This definition gets real-time edge stream of decoded logs straight into a MySQL table namedeth_logs
in the goldsky
schema, with the secret A_MYSQL_SECRET
created above.
Tips for backfilling large datasets into MySQL
While MySQL offers fast access of data, writing large backfills into MySQL can sometimes be hard to scale. Often, pipelines are bottlenecked against sinks. Here are some things to try:Avoid indexes on tables until after the backfill
Indexes increase the amount of writes needed for each insert. When doing many writes, inserts can slow down the process significantly if we’re hitting resources limitations.Bigger batch_sizes for the inserts
Thesink_buffer_max_rows
setting controls how many rows are batched into a single insert statement. Depending on the size of the events, you can increase this to help with write performance. 1000
is a good number to start with. The pipeline will collect data until the batch is full, or until the sink_buffer_interval
is met.
Temporarily scale up the database
Take a look at your database stats like CPU and Memory to see where the bottlenecks are. Often, big writes aren’t blocked on CPU or RAM, but rather on network or disk I/O. For Google Cloud SQL, there are I/O burst limits that you can surpass by increasing the amount of CPU. For AWS RDS instances (including Aurora), the network burst limits are documented for each instance. A rule of thumb is to look at theEBS baseline I/O
performance as burst credits are easily used up in a backfill scenario.
Aurora Tips
When using Aurora, for large datasets, make sure to useAurora I/O optimized
, which charges for more storage, but gives you immense savings on I/O credits. If you’re streaming the entire chain into your database, or have a very active subgraph, these savings can be considerable, and the disk performance is significantly more stable and results in more stable CPU usage pattern.