What is this issue?
PostgreSQL tracks WAL status for each replication slot:
- reserved: WAL is safely retained (normal)
- extended: WAL retention extended beyond wal_keep_size (warning)
- unreserved: WAL may be removed soon (critical)
- lost: Required WAL has been removed (critical - data loss)
The "lost" status means the replica cannot catch up and must be rebuilt
from a fresh base backup.
Why it matters
Data Loss
Lost WAL means the replica has permanent gaps
Replica Rebuild
Must take a new base backup and resync
Replication Failure
Logical replication may fail permanently
Recovery Impact
Point-in-time recovery may be affected
How PG Pilot detects it
```sql
SELECT
slot_name,
slot_type,
active,
wal_status,
safe_wal_size,
pg_size_pretty(
pg_current_wal_lsn() - restart_lsn
) AS lag
FROM pg_replication_slots
WHERE wal_status != 'reserved';
```
How to fix it
For 'extended' status
This is a warning. WAL is being retained beyond normal limits.
- Check why the consumer is behind
- Verify consumer is healthy
- Consider if wal_keep_size needs adjustment
For 'unreserved' status
Immediate action needed:
-- Check how much WAL is at risk
SELECT slot_name, safe_wal_size
FROM pg_replication_slots
WHERE slot_name = 'your_slot';Either:
- Fix the consumer and let it catch up quickly
- Increase wal_keep_size temporarily
- Accept that you may need to rebuild
For 'lost' status
The slot has lost required WAL. You must:
1. Drop the replication slot:
SELECT pg_drop_replication_slot('slot_name');2. For physical replication: Take a new base backup and rebuild standby
3. For logical replication: Recreate the subscription
Increase WAL retention
To prevent future issues:
ALTER SYSTEM SET wal_keep_size = '10GB';
SELECT pg_reload_conf();Or use replication slots properly (they retain WAL automatically).
Prevention
- Monitor WAL status with PG Pilot
- Set appropriate wal_keep_size as a safety net
- Ensure replicas and CDC tools keep up
- Alert on non-reserved WAL status