We have the 7 Best Practice Tips for PostgreSQL Bulk Data Loading. This document will describe the tips from EDB Cloud Service.
Tip 1: Target Cluster is Single Node
Synchronous commits can reduce the commit speed in a high-availability (HA) cluster. If the standby cannot keep up with the replication, it may lead to significant lag between the primary and standby.
To optimize the data loading process, consider scaling down the cluster to a single node before loading the data.
Once the data load is complete, you can scale back up to an HA cluster. Although the standby pod may take some time to join, this will not affect the primary pod.
Tip 2: Use a higher CPU core and Memory size instance type.
EDB Cloud Service is good at managing the compute size. Using a higher CPU core and Memory size will speed up the data load.
After the restoration, you can scale down the compute size.
Tip 3: Engage Cloud Support team
You can engage EDB Cloud Support, and they can keep monitoring your cluster usage and tune the settings properly at the backend.
Tip 4: Enable separated WAL storage for the target cluster
Enable separate WAL storage and set an appropriate volume size. There will be large WAL files generated during the data load, so please do not allocate a very small WAL volume.
Other general suggestions:
-
maintenance_work_mem
: Increase this to allocate more memory for maintenance tasks during the restore. A value like 2GB to 4GB can be effective. -
checkpoint_timeout
: Set a longer timeout (e.g., 15min or higher) to reduce the frequency of checkpoints. -
max_wal_size
: Increase this to handle more transaction logs before a checkpoint is forced.