mbox series

[0/4] e2scrub: online fsck for ext4

Message ID 151267250365.1350.7799938074608034424.stgit@magnolia
Headers show
Series e2scrub: online fsck for ext4 | expand

Message

Darrick Wong Dec. 7, 2017, 6:48 p.m. UTC
Hi all,

This patch series develops more fully the e2croncheck contrib script.
We start with a e2scrub command that, given an ext4 filesystem on a LVM
volume, creates a snapshot if there's more than 256M free in the LVM
group, runs e2fsck on the snapshot, and deletes the snapshot.  If the
fsck ran cleanly, the fs last-check timestamp is updated and fstrim is
run.  If corruption is found we mark the fs as needing a fsck and advise
a reboot.  A udev rule file is used to prevent the creation of /dev/disk
symlinks to the snapshot.

Next we create an e2scrub_all command that finds all ext4 filesystems
living in LVM volumes and iteratively calls e2scrub on each of them.

The third patch creates a weekly cron job for automatic invocation as
well as systemd service files so that we can (try to) sandbox the scrub
process and run it with idle priority to reduce latency spikes in the
main filesystem.

The fourth patch teaches the build system to build with LTO enabled, and
enables it for the debian package.  This reduces the size of the static
e2fsck binary by 30%, though the static libraries are now significantly
larger because we ship the LTO information.

Missing from this is a boot-time service to remove stale fsck snapshots.

--D

Comments

Andreas Dilger Dec. 7, 2017, 10:50 p.m. UTC | #1
On Dec 7, 2017, at 11:48 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> Hi all,
> 
> This patch series develops more fully the e2croncheck contrib script.
> We start with a e2scrub command that, given an ext4 filesystem on a LVM
> volume, creates a snapshot if there's more than 256M free in the LVM
> group, runs e2fsck on the snapshot, and deletes the snapshot.  If the
> fsck ran cleanly, the fs last-check timestamp is updated and fstrim is
> run.  If corruption is found we mark the fs as needing a fsck and advise
> a reboot.  A udev rule file is used to prevent the creation of /dev/disk
> symlinks to the snapshot.
> 
> Next we create an e2scrub_all command that finds all ext4 filesystems
> living in LVM volumes and iteratively calls e2scrub on each of them.
> 
> The third patch creates a weekly cron job for automatic invocation as
> well as systemd service files so that we can (try to) sandbox the scrub
> process and run it with idle priority to reduce latency spikes in the
> main filesystem.

For reference, here is the lvcheck script that I've been using for the
past 10 years or so, occasionally sending it to the list (most recently
in July 2017).  To use it, just drop the script in /etc/cron.weekly,
or a simple wrapper like "exec /usr/local/sbin/lvcheck".

It automatically scans the LVs looking for filesystems it understands,
creates a snapshot, checks the snapshot, and updates the last check time
and mount count (for ext2/3/4) if the check is clean.

Caveat emptor, I have only used it with ext3/4, the XFS/JFS/Reiserfs
checking was contributed by others.  Might be good to get Btrfs in there...

Cheers, Andreas