From patchwork Tue Jan 23 08:37:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eryu Guan X-Patchwork-Id: 864654 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zQhX14x4vz9s7n for ; Tue, 23 Jan 2018 19:38:09 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751359AbeAWIiI (ORCPT ); Tue, 23 Jan 2018 03:38:08 -0500 Received: from mx1.redhat.com ([209.132.183.28]:36294 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751173AbeAWIiH (ORCPT ); Tue, 23 Jan 2018 03:38:07 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A61D2C058EBC; Tue, 23 Jan 2018 08:38:07 +0000 (UTC) Received: from localhost (unknown [10.66.12.147]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1F1E465606; Tue, 23 Jan 2018 08:38:06 +0000 (UTC) From: Eryu Guan To: linux-ext4@vger.kernel.org Cc: Theodore Ts'o , Eryu Guan Subject: [PATCH] ext4: update i_disksize if direct write past ondisk size Date: Tue, 23 Jan 2018 16:37:23 +0800 Message-Id: <20180123083723.24908-1-eguan@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 23 Jan 2018 08:38:07 +0000 (UTC) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Currently in ext4 direct write path, we update i_disksize only when new eof is greater than i_size, and don't update it even when new eof is greater than i_disksize but less than i_size. This doesn't work well with delalloc buffer write, which updates i_size and i_disksize only when delalloc blocks are resolved (at writeback time), the i_disksize from direct write can be lost if a previous buffer write succeeded at write time but failed at writeback time, then results in corrupted ondisk inode size. Consider this case, first buffer write 4k data to a new file at offset 16k with delayed allocation, then direct write 4k data to the same file at offset 4k before delalloc blocks are resolved, which doesn't update i_disksize because it writes within i_size(20k), but the extent tree metadata has been committed in journal. Then writeback of the delalloc blocks fails (due to device error etc.), and i_size/i_disksize from buffer write can't be written to disk (still zero). A subsequent umount/mount cycle recovers journal and writes extent tree metadata from direct write to disk, but with i_disksize being zero. Fix it by updating i_disksize too in direct write path when new eof is greater than i_disksize but less than i_size, so i_disksize is always consistent with direct write. This fixes occasional i_size corruption in fstests generic/475. Signed-off-by: Eryu Guan --- I think this matches what XFS does in direct write too. I've tested it by looping generic/475 200 times without hitting a corruption, usually it fails within 5 iterations for me. Also tested by full fstests runs on ext2_4k, ext3_2k, ext4_1k configurations and all results looked good. fs/ext4/inode.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 534a9130f625..2a75b0aafd31 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3668,7 +3668,7 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) int orphan = 0; handle_t *handle; - if (final_size > inode->i_size) { + if (final_size > inode->i_size || final_size > ei->i_disksize) { /* Credits for sb + inode write */ handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); if (IS_ERR(handle)) { @@ -3780,9 +3780,10 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter) ext4_orphan_del(handle, inode); if (ret > 0) { loff_t end = offset + ret; - if (end > inode->i_size) { + if (end > inode->i_size || end > ei->i_disksize) { ei->i_disksize = end; - i_size_write(inode, end); + if (end > inode->i_size) + i_size_write(inode, end); /* * We're going to return a positive `ret' * here due to non-zero-length I/O, so there's