Rivellum

Rivellum Portal

Checking...
testnet

rivellum Backup and Restore Operations Guide

Overview

This guide covers state management, backup strategies, and disaster recovery procedures for rivellum nodes. The snapshot and pruning infrastructure enables operators to:

  • Create point-in-time snapshots of blockchain state
  • Restore from snapshots for disaster recovery or testing
  • Prune old ledger entries to manage disk usage on long-running nodes
  • Verify snapshot integrity through chain ID and state root checks

Snapshot Management

What is a Snapshot?

A snapshot is a complete copy of the blockchain state at a specific height, including:

  • State Database (state.db/) - All account balances, nonces, and contract state
  • Metadata (snapshot_meta.json) - Height, state root, timestamp, chain ID, version

Snapshots do not include the ledger (transaction history), which can be replayed from genesis or other nodes.

Creating Snapshots

Basic Snapshot Creation

# Create a snapshot of the current state
rivellum-node snapshot create --output ./snapshots/snapshot-2024-01-15

# With a description
rivellum-node snapshot create \
  --output ./snapshots/mainnet-snapshot-height-1000000 \
  --description "Mainnet snapshot at height 1M"

Output Structure

snapshot-2024-01-15/
ā”œā”€ā”€ state.db/           # Copy of RocksDB/sled state database
│   ā”œā”€ā”€ CURRENT
│   ā”œā”€ā”€ MANIFEST-*
│   └── *.sst
└── snapshot_meta.json  # Metadata file

Snapshot Metadata

The snapshot_meta.json file contains:

{
  "height": 1000000,
  "state_root": "0x1234...",
  "created_at_ms": 1705334400000,
  "ledger_path": "/data/ledger.log",
  "chain_id": "rivellum-mainnet",
  "version": 1,
  "description": "Mainnet snapshot at height 1M"
}

Listing Snapshots

# List all snapshots in default directory (./snapshots)
rivellum-node snapshot list

# List snapshots in a specific directory
rivellum-node snapshot list --dir /backups/rivellum/snapshots

Example Output:

╔═══════════════════════════════════════════════════════════╗
ā•‘          rivellum AVAILABLE SNAPSHOTS                    ā•‘
ā•šā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•

Found 2 snapshot(s) in: /backups/rivellum/snapshots

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Path:        /backups/rivellum/snapshots/snapshot-2024-01-15
Height:      1000000
State Root:  StateRoot(0x1234...)
Chain ID:    rivellum-mainnet
Created:     1705334400000 ms since epoch
Description: Mainnet snapshot at height 1M

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Path:        /backups/rivellum/snapshots/snapshot-2024-01-20
Height:      1050000
State Root:  StateRoot(0x5678...)
Chain ID:    rivellum-mainnet
Created:     1705766400000 ms since epoch

Restoring from Snapshots

Basic Restore

# Restore from a snapshot (with chain ID verification)
rivellum-node snapshot restore --input ./snapshots/snapshot-2024-01-15

What happens during restore:

  1. Loads snapshot metadata
  2. Verifies chain ID matches node config (unless --no-verify)
  3. Creates backup of existing state: data_dir/state_backup_{timestamp}/
  4. Copies snapshot state.db to data_dir/state.db
  5. Reports success with snapshot details

Restore Without Verification (DANGEROUS)

# Skip chain ID verification (for testing or cross-chain recovery)
rivellum-node snapshot restore \
  --input ./snapshots/snapshot-2024-01-15 \
  --no-verify

āš ļø Warning: Only use --no-verify if you understand the risks. Restoring a snapshot from a different chain can lead to inconsistent state.

Post-Restore Steps

After restoration, you may need to:

  1. Replay ledger - If you have a ledger.log file, replay it to catch up to current height
  2. Sync from peers - Connect to network peers to download missing blocks
  3. Verify state root - Check that the state root matches your expectations

Pruning Configuration

Overview

Pruning automatically removes old ledger entries to manage disk usage. This is critical for long-running nodes that would otherwise accumulate hundreds of GB of transaction history.

Configuration

Add to your config/default.toml:

[pruning]
# Enable automatic pruning
enabled = true

# Keep last N ledger entries (default: 10000)
# This determines how much history is retained
keep_last_entries = 10000

# Prune every N seconds (default: 3600 = 1 hour)
pruning_interval_secs = 3600

# Require snapshot before pruning (default: true)
# Safety feature: ensures you have a snapshot before deleting history
require_snapshot = true

Pruning Modes

Conservative (Recommended)

[pruning]
enabled = true
keep_last_entries = 10000        # ~1 day of history at 1 intent/10s
pruning_interval_secs = 3600     # Prune hourly
require_snapshot = true          # Safety on

Best for: Production mainnet nodes

Aggressive (High Throughput)

[pruning]
enabled = true
keep_last_entries = 1000         # ~2 hours of history
pruning_interval_secs = 600      # Prune every 10 minutes
require_snapshot = true

Best for: Testnet nodes, nodes with limited disk space

Archive Node (No Pruning)

[pruning]
enabled = false

Best for: Explorers, auditing, research nodes

Pruning Safety Features

  1. Require Snapshot - If require_snapshot = true, pruning will fail if no recent snapshot exists
  2. Retention Window - keep_last_entries ensures you always have recent history
  3. Atomic Operations - Pruning operations are atomic per entry
  4. Logging - All pruning operations are logged for auditing

Backup Strategies

Recommended Backup Schedule

Production Mainnet

FrequencyTypeRetentionStorage
DailySnapshot30 daysS3/GCS
WeeklySnapshot1 yearS3 Glacier
MonthlySnapshotPermanentCold storage

Testnet

FrequencyTypeRetentionStorage
WeeklySnapshot4 weeksLocal disk

Automated Snapshot Creation

Using Cron (Linux)

# Add to crontab: Daily snapshot at 2 AM
0 2 * * * /usr/local/bin/rivellum-node snapshot create \
  --config /etc/rivellum/config.toml \
  --output /backups/snapshot-$(date +\%Y-\%m-\%d) \
  --description "Daily automated snapshot"

Using Task Scheduler (Windows)

# PowerShell script: daily_snapshot.ps1
$date = Get-Date -Format "yyyy-MM-dd"
$output = "C:\backups\rivellum\snapshot-$date"
& "C:\Program Files\rivellum\rivellum-node.exe" snapshot create `
  --output $output `
  --description "Daily automated snapshot"

Schedule via Task Scheduler:

  • Trigger: Daily at 2:00 AM
  • Action: Run PowerShell script
  • Run whether user is logged on or not

Cloud Storage Integration

Upload to AWS S3

#!/bin/bash
# backup_to_s3.sh

SNAPSHOT_DIR="/backups/snapshot-$(date +%Y-%m-%d)"
S3_BUCKET="s3://my-rivellum-backups"

# Create snapshot
rivellum-node snapshot create --output "$SNAPSHOT_DIR"

# Upload to S3
aws s3 sync "$SNAPSHOT_DIR" "$S3_BUCKET/snapshots/$(basename $SNAPSHOT_DIR)"

# Clean up old local snapshots (keep last 7 days)
find /backups -name "snapshot-*" -type d -mtime +7 -exec rm -rf {} \;

Upload to Google Cloud Storage

#!/bin/bash
# backup_to_gcs.sh

SNAPSHOT_DIR="/backups/snapshot-$(date +%Y-%m-%d)"
GCS_BUCKET="gs://my-rivellum-backups"

# Create snapshot
rivellum-node snapshot create --output "$SNAPSHOT_DIR"

# Upload to GCS
gsutil -m rsync -r "$SNAPSHOT_DIR" "$GCS_BUCKET/snapshots/$(basename $SNAPSHOT_DIR)"

Storage Recommendations

Snapshot Size Estimation

  • Empty State: ~10 MB
  • Small Network (<10k accounts): ~100 MB
  • Medium Network (~1M accounts): ~5-10 GB
  • Large Network (>10M accounts): ~50-100 GB

Plan for 2-3x growth over 1 year.

Storage Providers

ProviderUse CaseCost (approx)
Local DiskFast access, recent snapshotsHardware cost
AWS S3 StandardActive snapshots (30 days)$0.023/GB/month
AWS S3 GlacierLong-term archives$0.004/GB/month
GCS StandardActive snapshots$0.020/GB/month
GCS NearlineMonthly archives$0.010/GB/month
Backblaze B2Budget option$0.005/GB/month

Disaster Recovery

Scenarios and Solutions

Scenario 1: Corrupted State Database

Symptoms:

  • Node crashes on startup
  • RocksDB corruption errors
  • Inconsistent state root

Recovery:

# 1. Stop the node (if running)
systemctl stop rivellum-node

# 2. Backup corrupted state (for forensics)
mv /data/rivellum/state.db /data/rivellum/state.db.corrupted

# 3. Restore from most recent snapshot
rivellum-node snapshot restore \
  --input /backups/snapshot-2024-01-20

# 4. Replay ledger to catch up (if available)
rivellum-node run --replay-ledger

# 5. Restart node
systemctl start rivellum-node

Scenario 2: Disk Failure

Symptoms:

  • Disk I/O errors
  • Data directory inaccessible

Recovery:

# 1. Replace failed disk and mount at /data

# 2. Download latest snapshot from cloud storage
aws s3 sync s3://my-backups/snapshot-2024-01-20 /backups/snapshot-2024-01-20

# 3. Restore snapshot
rivellum-node snapshot restore \
  --input /backups/snapshot-2024-01-20

# 4. Rejoin network and sync
rivellum-node run --config /etc/rivellum/config.toml

Scenario 3: Accidental Deletion

Symptoms:

  • State directory deleted
  • Ledger missing

Recovery:

# 1. Check for automatic backups (created during restore)
ls -la /data/rivellum/state_backup_*

# 2. Restore from latest backup
mv /data/rivellum/state_backup_1705766400 /data/rivellum/state.db

# 3. Restart node
systemctl restart rivellum-node

Scenario 4: Wrong Chain Restored

Symptoms:

  • Chain ID mismatch errors
  • Genesis doesn't match network

Recovery:

# 1. Identify correct snapshot for your chain
rivellum-node snapshot list --dir /backups

# 2. Restore with correct snapshot
rivellum-node snapshot restore --input /backups/mainnet-snapshot-X

# 3. Verify chain ID in config matches
grep chain_id /etc/rivellum/config.toml
# Should output: chain_id = "rivellum-mainnet"

Recovery Time Objectives (RTO)

ScenarioRTONotes
Corrupted state (local snapshot)5-10 minutesRestore + verify
Disk failure (cloud snapshot)30-60 minutesDownload + restore
Complete node rebuild2-4 hoursInstall + restore + sync

Testing Disaster Recovery

Monthly DR Test:

# 1. Create test environment
mkdir -p /tmp/dr-test/data

# 2. Restore production snapshot to test location
rivellum_DATA_DIR=/tmp/dr-test/data \
  rivellum-node snapshot restore --input /backups/latest

# 3. Verify state
rivellum_DATA_DIR=/tmp/dr-test/data \
  rivellum-node validate-genesis /etc/rivellum/genesis.json

# 4. Clean up
rm -rf /tmp/dr-test

Best Practices

Snapshot Management

  1. Label Descriptively - Use meaningful descriptions with height and date

    rivellum-node snapshot create \
      --output ./snapshot-mainnet-h1000000-2024-01-15 \
      --description "Mainnet snapshot at height 1M before upgrade"
    
  2. Verify After Creation - Always check metadata after creating a snapshot

    cat ./snapshot-*/snapshot_meta.json | jq .
    
  3. Test Restores - Periodically test restoration in a non-production environment

  4. Automate - Use cron/task scheduler for regular snapshots

Pruning

  1. Start Conservative - Begin with large keep_last_entries (10000+)
  2. Monitor Disk Usage - Track disk growth and adjust pruning accordingly
  3. Keep Snapshots - Always set require_snapshot = true in production
  4. Log Pruning - Review logs to ensure pruning runs as expected

Storage

  1. 3-2-1 Rule - 3 copies, 2 different media, 1 offsite

    • Copy 1: Local snapshots (fast access)
    • Copy 2: Network-attached storage (NAS)
    • Copy 3: Cloud storage (S3/GCS)
  2. Encrypt Backups - Encrypt snapshots before uploading to cloud

    tar -czf - ./snapshot-2024-01-15 | \
      gpg --encrypt --recipient ops@rivellum.io > snapshot.tar.gz.gpg
    
  3. Version Snapshots - Keep multiple versions for rollback options

Operational

  1. Document Procedures - Maintain runbooks for common scenarios
  2. Alert on Failures - Monitor snapshot creation and alert on failures
  3. Capacity Planning - Estimate storage needs 6-12 months ahead
  4. Access Control - Restrict snapshot restore to authorized operators

Troubleshooting

Common Issues

"Chain ID mismatch" Error

Problem:

Error: Chain ID mismatch! Snapshot has 'rivellum-testnet' but config expects 'rivellum-mainnet'

Solution:

  • Verify you have the correct snapshot for your network
  • Use --no-verify only if intentionally switching chains (testing only)
  • Check chain_id in config/default.toml

"Source state database not found"

Problem:

Error: Source state database not found: /data/rivellum/state.db

Solution:

  • Ensure node is stopped before creating snapshot
  • Verify data_dir in config points to correct location
  • Check if state database was moved or deleted

Snapshot Restoration Hangs

Problem: Restore command appears frozen

Solution:

  • Large snapshots take time (10+ GB can take 5-10 minutes)
  • Check disk I/O with iostat -x 1 (Linux) or Task Manager (Windows)
  • Ensure destination has enough free space (2x snapshot size)

Pruning Not Running

Problem: Ledger keeps growing despite pruning enabled

Solution:

  • Check logs for pruning errors: grep -i prune /var/log/rivellum/node.log
  • Verify enabled = true in [pruning] config section
  • Ensure require_snapshot is satisfied (create a snapshot if needed)
  • Check that pruning_interval_secs has elapsed

Debugging Commands

# Check current state database size
du -sh /data/rivellum/state.db

# Check ledger size
wc -l /data/rivellum/ledger.log

# Verify snapshot metadata
cat /backups/snapshot-*/snapshot_meta.json | jq .

# Check disk space
df -h /data

# Monitor pruning in real-time (Linux)
tail -f /var/log/rivellum/node.log | grep -i prune

Advanced Topics

Cross-Chain Snapshots

Snapshots can be used to bootstrap testnets from mainnet state:

# 1. Create mainnet snapshot
rivellum-node --config mainnet.toml snapshot create --output /tmp/mainnet-snap

# 2. Restore to testnet (with verification disabled)
rivellum-node --config testnet.toml snapshot restore \
  --input /tmp/mainnet-snap \
  --no-verify

# 3. Modify chain_id in metadata if needed
# Edit testnet config to match

Incremental Snapshots (Future Feature)

Planned for future releases: incremental snapshots that only capture state changes since last full snapshot.

Snapshot Compression

To save storage space:

# Create and compress
rivellum-node snapshot create --output /tmp/snapshot
tar -czf snapshot-$(date +%Y-%m-%d).tar.gz /tmp/snapshot

# Restore from compressed
tar -xzf snapshot-2024-01-15.tar.gz
rivellum-node snapshot restore --input ./snapshot-2024-01-15

References


Last Updated: 2024-01-26
Version: 1.0.0