Rivellum

Rivellum Portal

Download Wallet (Chrome)
Checking...
mainnet

Disaster Recovery

Write-Ahead Log (WAL)

Rivellum uses a write-ahead log with CRC32C checksums for crash recovery. Every state-modifying operation is logged before application.

WAL Configuration

[wal]
wal_dir = "/var/lib/rivellum/wal"
wal_fsync_policy = "strict"    # strict | balanced | dev_fast
PolicyBehaviorUse Case
strictfsync after every writeProduction (highest durability)
balancedPeriodic fsyncProduction (good durability, better throughput)
dev_fastNo fsyncDevelopment/testing only

Crash Recovery

On startup, the node automatically replays the WAL from the last fully-written record. Records with bad CRC32C checksums are skipped (they indicate incomplete writes from a crash).

Snapshots

Create a Snapshot

rivellum-node snapshot create \
    --data-dir /var/lib/rivellum \
    --out /backups/snapshot-$(date +%Y%m%d-%H%M%S)

List Snapshots

rivellum-node snapshot list --data-dir /var/lib/rivellum

Restore from Snapshot

# Stop the node first
systemctl stop rivellum-node

# Restore
rivellum-node snapshot restore \
    --from /backups/snapshot-20250101-120000 \
    --data-dir /var/lib/rivellum

# Start the node — it will replay WAL entries after the snapshot
systemctl start rivellum-node

Backup Strategy

Production Recommendations

  1. Automated snapshots: Schedule snapshots every 6-12 hours
  2. Off-site storage: Copy snapshots to remote storage (S3, GCS, etc.)
  3. WAL retention: Keep WAL files for at least 24 hours after snapshot
  4. Test restores: Periodically verify backup integrity by restoring to a standby node

Automated Backup Script

#!/bin/bash
BACKUP_DIR="/backups/rivellum"
DATA_DIR="/var/lib/rivellum"
RETENTION_DAYS=7

# Create snapshot
SNAP_NAME="snapshot-$(date +%Y%m%d-%H%M%S)"
rivellum-node snapshot create --data-dir "$DATA_DIR" --out "$BACKUP_DIR/$SNAP_NAME"

# Clean old snapshots
find "$BACKUP_DIR" -name 'snapshot-*' -mtime +$RETENTION_DAYS -exec rm -rf {} +

State Recovery Scenarios

ScenarioRecovery Method
Process crashAutomatic WAL replay on restart
Data corruptionRestore from most recent snapshot
Hardware failureRestore snapshot on new hardware, sync from peers
Full resyncDelete data directory, restart node — syncs from genesis

Per-Lane State

Each lane has its own RocksDB instance and WAL partition. During recovery:

  • Lane states are restored independently
  • The MetaRoot aggregator recomputes the global root from lane roots
  • COW snapshots ensure consistent point-in-time state

For monitoring and log configuration, see Logging & Monitoring.