What is this issue?
Replication lag measures how far behind a replica is from the primary database.
PG Pilot measures this in bytes of WAL (Write-Ahead Log) not yet consumed.
Lag thresholds:
- Warning: > 1 GB lag
- Critical: > 5 GB lag
High lag means:
- Read replicas are serving stale data
- Failover would result in data loss
- WAL files are accumulating on the primary
Why it matters
Stale Reads
Applications reading from replicas get outdated data
Data Loss Risk
Failover to a lagging replica loses uncommitted transactions
Disk Space
Primary retains WAL files until replicas consume them
Recovery Time
High lag means longer time to catch up during maintenance
How PG Pilot detects it
```sql
SELECT
slot_name,
active,
pg_size_pretty(
pg_current_wal_lsn() - confirmed_flush_lsn
) AS lag_bytes,
pg_current_wal_lsn() - confirmed_flush_lsn AS lag_raw
FROM pg_replication_slots
WHERE slot_type = 'logical'
UNION ALL
SELECT
slot_name,
active,
pg_size_pretty(
pg_current_wal_lsn() - restart_lsn
) AS lag_bytes,
pg_current_wal_lsn() - restart_lsn AS lag_raw
FROM pg_replication_slots
WHERE slot_type = 'physical';
```
How to fix it
Check replica status
Verify the replica is running and connected:
SELECT * FROM pg_stat_replication;Look for the replica's state (should be streaming)
and sent_lsn vs replay_lsn.
Check replica performance
On the replica, check if it's keeping up:
SELECT
pg_last_wal_receive_lsn(),
pg_last_wal_replay_lsn(),
pg_last_xact_replay_timestamp();If replay is behind receive, the replica is I/O bound.
Investigate bottlenecks
Common causes of lag:
- Slow I/O on replica (check disk metrics)
- Network issues between primary and replica
- Heavy read load on replica affecting replay
- Replica running on undersized hardware
Tune replica settings
Increase replay parallelism:
-- On replica (requires restart)
max_parallel_apply_workers_per_subscription = 4For physical replication, ensure sufficient:
- max_wal_senders on primary
- Network bandwidth
- Disk I/O capacity
Prevention
- Size replicas appropriately for the write load
- Monitor lag continuously with PG Pilot
- Set up alerts before lag becomes critical
- Test failover procedures regularly