Message ID | 647cc60d-18d4-ab53-6c91-52c1f6d29c3a@huawei.com |
---|---|
State | Rejected |
Headers | show |
Series | e2fsck: do not skip deeper checkers when s_last_orphan list has truncated inodes | expand |
On Tue, Mar 15, 2022 at 04:01:45PM +0800, zhanchengbin wrote: > If the system crashes when a file is being truncated, we will get a > problematic inode, > and it will be added into fs->super->s_last_orphan. > When we run `e2fsck -a img`, the s_last_orphan list will be traversed and > deleted. > During this period, orphan inodes in the s_last_orphan list with > i_links_count==0 can > be deleted, and orphan inodes with i_links_count !=0 (ex. the truncated > inode) > cannot be deleted. However, when there are some orphan inodes with > i_links_count !=0, > the EXT2_VALID_FS is still assigned to fs->super->s_state, the deeper > checkers are skipped > with some inconsistency problems. That's not supposed to happen. We regularly put inodes on the orphan list when they are being truncated so that if we crash, the truncation operation can be completed as part of the journal recovery and remount operation. This is true regardles sof whether the recovery is done by e2fsck or by the kernel. If a crash during a truncate leads to an inconsistent file system after the file system is mounted, or after e2fsck does the journal replay and orphan inode list processing, that's a kernel bug, and we should fix the bug in the kernel. Do you have a reliable reproducer for this situation? Thanks, - Ted
在 2022/3/16 1:54, Theodore Ts'o 写道: > On Tue, Mar 15, 2022 at 04:01:45PM +0800, zhanchengbin wrote: >> If the system crashes when a file is being truncated, we will get a >> problematic inode, >> and it will be added into fs->super->s_last_orphan. >> When we run `e2fsck -a img`, the s_last_orphan list will be traversed and >> deleted. >> During this period, orphan inodes in the s_last_orphan list with >> i_links_count==0 can >> be deleted, and orphan inodes with i_links_count !=0 (ex. the truncated >> inode) >> cannot be deleted. However, when there are some orphan inodes with >> i_links_count !=0, >> the EXT2_VALID_FS is still assigned to fs->super->s_state, the deeper >> checkers are skipped >> with some inconsistency problems. > > That's not supposed to happen. We regularly put inodes on the orphan > list when they are being truncated so that if we crash, the truncation > operation can be completed as part of the journal recovery and remount > operation. This is true regardles sof whether the recovery is done by > e2fsck or by the kernel. Yes, you are right. Truncated has been completed,and file ACL has been set to zero in release_inode_blocks(), but the i_blocks was not subtracted acl blocks. So i_blocks is inconsistent。 Li Jinlin sent a patch yesterday to fix it. > > If a crash during a truncate leads to an inconsistent file system > after the file system is mounted, or after e2fsck does the journal > replay and orphan inode list processing, that's a kernel bug, and we > should fix the bug in the kernel. > > Do you have a reliable reproducer for this situation? I have a reproducer but it is not necessarily: #!/bin/bash disk_list=$(multipath -ll | grep filedisk | awk '{print $1}') for disk in ${disk_list} do mkfs.ext4 -F /dev/mapper/$disk mkdir ${disk} done function err_inject() { iscsiadm -m node -p 127.0.0.1 -u &> /dev/null iscsiadm -m node -p 127.0.0.1 -l &> /dev/null sleep 1 iscsiadm -m node -p 9.82.236.206 -u &> /dev/null iscsiadm -m node -p 9.82.236.206 -l &> /dev/null sleep 1 iscsiadm -m node -p 127.0.0.1 -u &> /dev/null iscsiadm -m node -p 127.0.0.1 -l &> /dev/null iscsiadm -m node -p 9.82.236.206 -u &> /dev/null iscsiadm -m node -p 9.82.236.206 -l &> /dev/null sleep 1 } count=0 while true do ((count=count+1)) for disk in ${disk_list} do while true do mount -o data_err=abort,errors=remount-ro /dev/mapper/$disk $disk && break sleep 0.1 done nohup fsstress -d $(pwd)/$disk -l 10 -n 1000 -p 10 &>/dev/null & done sleep 5 for disk in ${disk_list} do dm=$(multipath -ll | grep -w $disk | awk '{print $2}') aqu_sz=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print $(NF-1)}') util=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print $NF}') #if [ "${aqu_sz}" == "0.00" -o "$util" == "0.00" ];then # iostat -x 1 -d 2 # exit 1 #fi mount | grep $disk | grep '(ro' && exit 1 done err_inject while [ -n "`pidof fsstress`" ] do sleep 1 done for disk in ${disk_list} do umount $disk dm=$(multipath -ll | grep -w $disk | awk '{print $2}') aqu_sz=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print $(NF-1)}') util=$(iostat -x 1 -d 2 | grep -w $dm | tail -1 | awk '{print $NF}') if [ "${aqu_sz}" != "0.00" -o "$util" != "0.00" ];then iostat -x 1 -d 2 exit 1 fi dd bs=1M if=/dev/mapper/$disk of=/root/dockerback fsck.ext4 -a /dev/mapper/$disk ret=$? if [ $ret -ne 0 -a $ret -ne 1 ]; then exit 1 fi fsck.ext4 -fn /dev/mapper/$disk ret=$? if [ $ret -ne 0 ]; then exit 1 fi done if [ $count -gt 5 ];then echo 3 > /proc/sys/vm/drop_caches sleep 1 cat /proc/meminfo >> mem.txt echo "" >> mem.txt slabtop -o >> slab.txt echo "" >> slab.txt count=0 fi done > > Thanks, > > - Ted > . >
diff --git a/e2fsck/super.c b/e2fsck/super.c index 9495e029..f4a414b7 100644 --- a/e2fsck/super.c +++ b/e2fsck/super.c @@ -351,6 +351,7 @@ static int release_orphan_inode(e2fsck_t ctx, ext2_ino_t *ino, char *block_buf) inode.i_dtime = ctx->now; } else { inode.i_dtime = 0; + fs->super->s_state &= ~EXT2_VALID_FS; } e2fsck_write_inode_full(ctx, *ino, EXT2_INODE(&inode),
If the system crashes when a file is being truncated, we will get a problematic inode, and it will be added into fs->super->s_last_orphan. When we run `e2fsck -a img`, the s_last_orphan list will be traversed and deleted. During this period, orphan inodes in the s_last_orphan list with i_links_count==0 can be deleted, and orphan inodes with i_links_count !=0 (ex. the truncated inode) cannot be deleted. However, when there are some orphan inodes with i_links_count !=0, the EXT2_VALID_FS is still assigned to fs->super->s_state, the deeper checkers are skipped with some inconsistency problems. Here, we will clean EXT2_VALID_FS flag when there is orphan inodes with i_links_count !=0 for deeper checkers. Problems with truncated files. [root@localhost ~]# e2fsck -a img img: recovering journal img: Truncating orphaned inode 188 (uid=0, gid=0, mode=0100666, size=0) img: Truncating orphaned inode 174 (uid=0, gid=0, mode=0100666, size=0) img: clean, 484/128016 files, 118274/512000 blocks [root@localhost ~]# e2fsck -fn img e2fsck 1.46.5 (30-Dec-2021) Pass 1: Checking inodes, blocks, and sizes Inode 174, i_blocks is 2, should be 0. Fix? no Inode 188, i_blocks is 2, should be 0. Fix? no Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information img: ********** WARNING: Filesystem still has errors ********** img: 484/128016 files (24.6% non-contiguous), 118274/512000 blocks [root@localhost ~]# e2fsck -a img img: clean, 484/128016 files, 118274/512000 blocks But, if run `e2fsck -f img`, EXT2_VALID_FS flag will be clean, so do `e2fsck -a img` again, can fix this problem. [root@localhost ~]# e2fsck -f img e2fsck 1.46.5 (30-Dec-2021) Pass 1: Checking inodes, blocks, and sizes Inode 174, i_blocks is 2, should be 0. Fix<y>? no Inode 188, i_blocks is 2, should be 0. Fix<y>? no Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information img: ********** WARNING: Filesystem still has errors ********** img: 484/128016 files (24.6% non-contiguous), 118274/512000 blocks [root@localhost ~]# e2fsck -a img img was not cleanly unmounted, check forced. img: Inode 174, i_blocks is 2, should be 0. FIXED. img: Inode 188, i_blocks is 2, should be 0. FIXED. img: 484/128016 files (24.6% non-contiguous), 118274/512000 blocks Signed-off-by: zhanchengbin <zhanchengbin1@huawei.com> --- e2fsck/super.c | 1 + 1 file changed, 1 insertion(+) sizeof(inode), "delete_file");