Cleanup of a bulky Postgres table

From my experience working at Simpl.

I worked on an interesting task along with the devops team of cleaning up a bulky legacy Postgres table in the db, which was causing cluster-wide performance degradation (Eg: Auto-vacuum processes were triggering frequently, locking table and impacting writes across the entire database cluster). Example: Auto vacuum table load increase in cluster The table was loaded with real-time data from a Kafka consumer running round the clock. It had over ~ 1.6B rows having roughly a month of data; ~ 600GB total size; Of which indexes size ~ 125GB, toast size ~ 125GB. A daily batch job was running to delete data older than 30 days. Command which it used: DELETE FROM <tableNameX> WHERE created_at < '<current_time_minus_thirty_days>' on the writer postgres instance - and this is not an effective command (discussed later)

Query to get metadata of all tables in db:

SELECT
    t.table_schema || '.' || t.table_name AS table_full_name,
    COALESCE(c.reltuples, 0) AS estimated_rows,
    pg_size_pretty(pg_total_relation_size(quote_ident(t.table_schema) || '.' || quote_ident(t.table_name))) AS total_size,
    pg_size_pretty(pg_relation_size(quote_ident(t.table_schema) || '.' || quote_ident(t.table_name))) AS table_size,
    pg_size_pretty(pg_indexes_size(quote_ident(t.table_schema) || '.' || quote_ident(t.table_name))) AS indexes_size,
    pg_size_pretty(pg_total_relation_size(quote_ident(t.table_schema) || '.' || quote_ident(t.table_name)) - pg_relation_size(quote_ident(t.table_schema) || '.' || quote_ident(t.table_name))) AS toast_size
FROM
    information_schema.tables t
JOIN
    pg_class c ON c.relname = t.table_name
JOIN
    pg_namespace n ON n.nspname = t.table_schema AND n.oid = c.relnamespace
WHERE
    t.table_schema NOT IN ('information_schema', 'pg_catalog')
    AND t.table_type = 'BASE TABLE'
ORDER BY
    pg_total_relation_size(quote_ident(t.table_schema) || '.' || quote_ident(t.table_name)) DESC;

Crisp points learned & strategy:

Impact: Decrease in load (blue) Decreased cluster load after activity - blue chunk in graph