Beginner

Detecting File Corruption with Checksums

Protect your data from silent corruption using cryptographic checksums for backup verification and integrity monitoring.

What is File Corruption?

File corruption occurs when data is unintentionally modified, making files unusable or incorrect. Common causes:

  • -Bit rot: Magnetic decay on hard drives causes random bit flips over time
  • -Hardware failures: Failing RAM, disk controllers, or cables introduce errors
  • -Software bugs: Application crashes during writes leave partial or corrupted files
  • -Power loss: Sudden shutdowns interrupt write operations
  • -Network errors: Packet loss or transmission errors during file transfers
Silent Data Corruption

The most dangerous corruption is silent-files appear normal but contain incorrect data. A 2013 study found that 8% of disks develop silent corruption within 4 years. Without checksums, you won't know until it's too late.

How Checksums Detect Corruption

A checksum is a unique fingerprint of a file. When the file changes (even by one bit), the checksum changes dramatically:

Example: One bit flip in a 1GB file
Original file SHA-256:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
After 1 bit flip:
9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
Completely different hash → corruption detected

Backup Verification Workflow

Protect your backups with this three-step process:

Step 1: Generate Checksums Before Backup

Create a checksum file for all important files:

Linux/macOS - Generate checksums:
find /path/to/data -type f -exec sha256sum {} \; > checksums.txt
Creates checksums for all files in the directory
Windows PowerShell:
Get-ChildItem -Recurse -File | ForEach-Object {
  Get-FileHash $_.FullName -Algorithm SHA256
} | Export-Csv checksums.csv
Store Checksums Separately

Save checksums.txt in a different location than your backup. If the backup drive fails, you still have the checksums to verify a restore from another source.

Step 2: Perform the Backup

Copy files to your backup destination using your preferred method:

rsync (Linux/macOS)
rsync -av --checksum /source/ /backup/

The --checksum flag uses checksums to skip unchanged files

robocopy (Windows)
robocopy C:\source D:\backup /MIR /R:3 /W:10
Cloud backup tools

Most cloud backup services (Backblaze, Crashplan, etc.) automatically verify checksums during upload

Step 3: Verify After Backup

Immediately verify the backup completed successfully:

Verify all files match:
cd /backup
sha256sum -c /path/to/checksums.txt
Output shows "OK" for each file that matches, "FAILED" for corruption
✓ All Files OK

Backup is complete and verified.

Safe to delete originals (if that's your plan).

✗ Some Files Failed

Corruption detected during backup.

Re-copy failed files and verify again.

Periodic Integrity Checks

Don't wait for a restore to discover corruption. Check your backups regularly:

Monthly Verification

Run checksum verification on your entire backup once per month.

sha256sum -c checksums.txt | grep -v OK

Shows only failed files (empty output = all good)

Automated Monitoring

Set up a cron job (Linux/macOS) or Task Scheduler (Windows) to verify checksums automatically.

0 2 1 * * cd /backup && sha256sum -c checksums.txt | mail -s "Backup Check" you@example.com

Runs on the 1st of each month at 2 AM, emails results

ZFS/Btrfs Scrubbing

Modern filesystems have built-in checksum verification. Enable periodic scrubs:

zpool scrub tank  # ZFS
btrfs scrub start /mnt/backup  # Btrfs

Automatically detects and repairs corruption

Real-World Scenarios

Scenario 1: Photo Archive

You have 10 years of family photos (500GB). Generate checksums, back up to external drive, verify immediately.

Result: 3 files failed verification. Re-copied those files. All photos safe.

Scenario 2: Research Data

Lab generates 2TB of experimental data. Checksums created daily, verified before analysis.

Result: Detected corruption in 1 file after 6 months. Re-ran that experiment instead of publishing bad data.

Scenario 3: Software Distribution

Company distributes software updates. Checksums published on website, users verify downloads.

Result: Users detected corrupted downloads from a CDN node. CDN provider fixed the issue.

Scenario 4: Long-Term Archive

Legal documents stored on tape for 7 years. Checksums verified annually.

Result: Year 5 verification found 12 corrupted files. Restored from redundant backup.

Choosing the Right Algorithm

SHA-256 (Recommended)

Best balance of speed and security. Use for all new systems. Collision probability: 1 in 2256 (more atoms than in the universe).

SHA-512 (High Security)

Slower but higher security margin. Use for critical data or long-term archives (10+ years).

MD5 (Legacy Only)

Fast but cryptographically broken. OK for detecting accidental corruption, NOT for security. Use only if you must maintain compatibility with old systems.

BLAKE3 (Fastest)

Modern algorithm, much faster than SHA-256 on modern CPUs. Great for large files but less widely supported by tools.

Tools and Scripts

Bash script for automated verification:
#!/bin/bash
BACKUP_DIR="/mnt/backup"
CHECKSUM_FILE="$BACKUP_DIR/checksums.txt"
LOG_FILE="$BACKUP_DIR/verify.log"

echo "Starting verification: $(date)" >> "$LOG_FILE"
cd "$BACKUP_DIR"

if sha256sum -c "$CHECKSUM_FILE" >> "$LOG_FILE" 2>&1; then
  echo "✓ All files verified successfully" | mail -s "Backup OK" admin@example.com
else
  echo "✗ Corruption detected!" | mail -s "BACKUP FAILURE" admin@example.com
fi
Python script with progress bar:
import hashlib
from pathlib import Path

def verify_file(filepath, expected_hash):
  sha256 = hashlib.sha256()
  with open(filepath, 'rb') as f:
    for chunk in iter(lambda: f.read(8192), b''):
      sha256.update(chunk)
  return sha256.hexdigest() == expected_hash

# Load checksums and verify each file
with open('checksums.txt') as f:
  for line in f:
    hash_val, filepath = line.strip().split(maxsplit=1)
    if verify_file(filepath, hash_val):
      print(f"✓ {filepath}")
    else:
      print(f"✗ {filepath} CORRUPTED!")

Try It Yourself

Practice with our interactive Hash Calculator:

Exercise: Simulate Corruption Detection
  1. 1. Go to the Hash Calculator
  2. 2. Enter some text: This is my important document
  3. 3. Copy the SHA-256 hash (this is your "checksum")
  4. 4. Change one character: This is my important documant
  5. 5. Notice the hash is completely different → corruption would be detected
  6. 6. Change it back to the original → hash matches again

Official Resources

File System Documentation

Related Guides