Problem Description
Disk space can grow very fast in some specific situations and, therefore, we need to constantly monitor the total occupation of the Postgres cluster volume in order to prevent any service interruption.
When we first create a PostgreSQL cluster with BigAnimal we have to have an exact idea about the total size of the cluster and, consequently, make a decision about the proper initial disk size.
When you do that, you know you have to take into account possible future cluster growth, disk space needed for temporary files, disk space needed for some maintenance commands like VACUUM FULL etcetera...
Even with a considerable initial amount of allocated space, you may need to increase the size of the initial volume from time to time.
For these reasons it is essential monitoring the disk space occupation and in particular everything that can cause significant size variations.
Solutions
We provide you with some tips and suggestions about what to monitor and check if the size of your cluster is bigger than expected.
1. Cluster log files
BigAnimal does not store cluster log files in the Kubernetes nodes.
The database logs are exported and you can find log entries and messages along with some metrics in Azure or AWS logs (Azure log analytics workspaces and AWS CloudWatch)
2. WAL (Transactional logs)
PostgreSQL cluster produces constantly transactional logs called WALs to persist and database change and be able to recover databases.
You can check the transactional log total size from the statistics exported to Azure and AWS checking these metrics:
cnp_collector_wal_records
cnp_collector_wal_bytes
or, if you have already asked superuser privileges for edb_admin user, running the following query
select pg_size_pretty(sum(size)) from pg_ls_waldir();
If you observe a constant increase or significant spikes of WAL size, you may want to add some disk space.
3. Temporary Files
As any other database, PostgreSQL needs to create temporary files for its normal operations.
You can check any increase or significant spikes of temporary files size from the statistics exported to Azure and AWS checking these metrics:
cnp_pg_stat_database_temp_bytes
cnp_pg_stat_database_temp_files
or for terminal executing the following query
select SUM(temp_files) as "Temporary Files", pg_size_pretty(SUM(temp_bytes)) as "Size of Total Temporary Files" from pg_stat_database;
Note that the output of this query is cumulative by design.
It is extremely important monitoring this data constantly to be aware of the fluctuations of the number and size of the cluster temporary files.
4. Databases Size
It is crucial to check the databases size to estimate the possible future growth.
You can check any increase or significant spikes of the databases size for the statistics exported to Azure and AWS checking this metric:
cnp_pg_database_size_bytes
You can also check the cluster size for terminal executing the following query:
select pg_size_pretty(SUM(pg_database_size(datname))) from pg_database;
Sometimes the size of the database increases because of some table bloating.
To check that, you can run the query reported in the Postgres wiki here:
https://wiki.postgresql.org/wiki/Show_database_bloat
and if tables appear to be bloated you can run some maintenance command to reduce it.
5. Scale storage
You can scale cluster storage at BigAnimal portal. Document has instructions.
6. Logical Replication Slots
In the case of an Extreme High Availability cluster (EHA), we should keep to monitor the size of the logical replication.
You can use the following query to check the WAL size retained by the logical replication slots
select slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as "Replication Slot Size", active from pg_replication_slots;
Also it is very important to check the value of the field "wal_status" of the system catalog view pg_replication_slot
If set to extended, the WALs files held by the logical replication slots exceed the parameter wal_keep_size.
For more information on this please check the following documentation page
https://www.postgresql.org/docs/13/view-pg-replication-slots.html