diff mbox

[v4,15/20] ext4: use ext4_zero_partial_blocks in punch_hole

Message ID 1368549454-8930-16-git-send-email-lczerner@redhat.com
State Accepted, archived
Headers show

Commit Message

Lukas Czerner May 14, 2013, 4:37 p.m. UTC
We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

We also introduce function ext4_prepare_punch_hole() to do come common
operations before we attempt to do the actual punch hole on
indirect or extent file which saves us some code duplication.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
v4: Use start-len arguments in ext4_zero_partial_blocks()

 fs/ext4/ext4.h  |    2 +
 fs/ext4/inode.c |  118 +++++++++++++++++++++---------------------------------
 2 files changed, 48 insertions(+), 72 deletions(-)

Comments

Theodore Ts'o June 14, 2013, 3:01 a.m. UTC | #1
On Tue, May 14, 2013 at 06:37:29PM +0200, Lukas Czerner wrote:
> We're doing to get rid of ext4_discard_partial_page_buffers() since it is
> duplicating some code and also partially duplicating work of
> truncate_pagecache_range(), moreover the old implementation was much
> clearer.
> 
> Now when the truncate_inode_pages_range() can handle truncating non page
> aligned regions we can use this to invalidate and zero out block aligned
> region of the punched out range and then use ext4_block_truncate_page()
> to zero the unaligned blocks on the start and end of the range. This
> will greatly simplify the punch hole code. Moreover after this commit we
> can get rid of the ext4_discard_partial_page_buffers() completely.
> 
> We also introduce function ext4_prepare_punch_hole() to do come common
> operations before we attempt to do the actual punch hole on
> indirect or extent file which saves us some code duplication.
> 
> This has been tested on ppc64 with 1k block size with fsx and xfstests
> without any problems.
> 
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>

Hi Lukas,

I've been seeing xfstests failures on test generic/300 in nojournal
mode.

BEGIN TEST: Ext4 4k block w/ no journal Thu Jun 13 22:38:47 EDT 2013
Device: /dev/vdb
mk2fs options: -q -O ^has_journal
mount options: -o block_validity,noload
FSTYP         -- ext4
PLATFORM      -- Linux/i686 candygram 3.10.0-rc2-00477-g1e1cad7
MKFS_OPTIONS  -- -q -O ^has_journal /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,noload /dev/vdc /vdc

generic/300		[20:42:18][  116.877278] fio (3320) used greatest stack depth: 5580 bytes left
[  116.967122] fio (3321) used greatest stack depth: 5560 bytes left
[  117.573861] fio (3325) used greatest stack depth: 5504 bytes left
 [20:44:01] [failed, exit status 1] - output mismatch (see /root/xfstests/results/generic/300.out.bad)
    --- tests/generic/300.out	 2013-06-04 22:42:55.000000000 -0400
    +++ /root/xfstests/results/generic/300.out.bad	       2013-06-13 20:44:01.306666665 -0400
    @@ -2,3 +2,4 @@
     
     Run fio with random aio-dio pattern
     
    +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/300.full)
     ...
     (Run 'diff -u tests/generic/300.out /root/xfstests/results/generic/300.out.bad' to see the entire diff)

It bisects down to this patch, and if I take the dev branch, and
revert patches #15 through #19 in this series, the problem goes away.

Can you investigate and recommend a better fix?

Thanks,

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner June 14, 2013, 10:16 a.m. UTC | #2
On Thu, 13 Jun 2013, Theodore Ts'o wrote:

> Date: Thu, 13 Jun 2013 23:01:54 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukas Czerner <lczerner@redhat.com>
> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
>     linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
>     akpm@linux-foundation.org, hughd@google.com
> Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
> 
> On Tue, May 14, 2013 at 06:37:29PM +0200, Lukas Czerner wrote:
> > We're doing to get rid of ext4_discard_partial_page_buffers() since it is
> > duplicating some code and also partially duplicating work of
> > truncate_pagecache_range(), moreover the old implementation was much
> > clearer.
> > 
> > Now when the truncate_inode_pages_range() can handle truncating non page
> > aligned regions we can use this to invalidate and zero out block aligned
> > region of the punched out range and then use ext4_block_truncate_page()
> > to zero the unaligned blocks on the start and end of the range. This
> > will greatly simplify the punch hole code. Moreover after this commit we
> > can get rid of the ext4_discard_partial_page_buffers() completely.
> > 
> > We also introduce function ext4_prepare_punch_hole() to do come common
> > operations before we attempt to do the actual punch hole on
> > indirect or extent file which saves us some code duplication.
> > 
> > This has been tested on ppc64 with 1k block size with fsx and xfstests
> > without any problems.
> > 
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> 
> Hi Lukas,
> 
> I've been seeing xfstests failures on test generic/300 in nojournal
> mode.
> 
> BEGIN TEST: Ext4 4k block w/ no journal Thu Jun 13 22:38:47 EDT 2013
> Device: /dev/vdb
> mk2fs options: -q -O ^has_journal
> mount options: -o block_validity,noload
> FSTYP         -- ext4
> PLATFORM      -- Linux/i686 candygram 3.10.0-rc2-00477-g1e1cad7
> MKFS_OPTIONS  -- -q -O ^has_journal /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity,noload /dev/vdc /vdc
> 
> generic/300		[20:42:18][  116.877278] fio (3320) used greatest stack depth: 5580 bytes left
> [  116.967122] fio (3321) used greatest stack depth: 5560 bytes left
> [  117.573861] fio (3325) used greatest stack depth: 5504 bytes left
>  [20:44:01] [failed, exit status 1] - output mismatch (see /root/xfstests/results/generic/300.out.bad)
>     --- tests/generic/300.out	 2013-06-04 22:42:55.000000000 -0400
>     +++ /root/xfstests/results/generic/300.out.bad	       2013-06-13 20:44:01.306666665 -0400
>     @@ -2,3 +2,4 @@
>      
>      Run fio with random aio-dio pattern
>      
>     +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/300.full)
>      ...
>      (Run 'diff -u tests/generic/300.out /root/xfstests/results/generic/300.out.bad' to see the entire diff)
> 
> It bisects down to this patch, and if I take the dev branch, and
> revert patches #15 through #19 in this series, the problem goes away.
> 
> Can you investigate and recommend a better fix?

Hi Ted,

I'll take a look at this. Thanks for letting me know.

-Lukas

> 
> Thanks,
> 
> 					- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o June 14, 2013, 1:39 p.m. UTC | #3
On Fri, Jun 14, 2013 at 12:16:53PM +0200, Lukáš Czerner wrote:
> > It bisects down to this patch, and if I take the dev branch, and
> > revert patches #15 through #19 in this series, the problem goes away.

Correction...  reverting patches #15 through #19 (which is what I did in
the dev-with-revert branch found on ext4.git) causes the problem to go
away in the nojournal case, but it causes a huge number of other
problems.  Some of the reverts weren't clean, so it's possible I
screwed up one of the reverts.  It's also possible that only applying
part of this series leaves the tree in an unstable state.

I'd much rather figure out how to fix the problem on the dev branch,
so thank you for looking into this!

						- Ted

BEGIN TEST: Ext4 4k block Thu Jun 13 23:25:45 EDT 2013
Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
END TEST: Ext4 4k block Fri Jun 14 00:29:17 EDT 2013

BEGIN TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 00:29:22 EDT 2013
    +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
generic/270 69s ...     [01:34:24][ 8102.435986] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #25735: block 99331: comm quotacheck: bad entry in directory: rec_len % 4 != 0 -\
 offset=0(0), inode=3739147998, rec_len=57054, name_len=222
Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/263 generic/269 generic/270 generic/285 generic/300 shared/218
END TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 01:49:59 EDT 2013

BEGIN TEST: Ext4 4k block w/ no journal Fri Jun 14 01:50:00 EDT 2013
    +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
generic/270 69s ...     [02:20:21][10531.911437] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9894: block 6526: comm quotacheck: bad entry in directory: rec_len is smaller t\
han minimal - offset=0(0), inode=0, rec_len=0, name_len=0
[10532.535861] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9692: block 6534: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
[10534.266775] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9906: block 6530: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
[10534.697885] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #24673: block 6531: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec\
_len=15934, name_len=62
[10535.157126] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9898: block 6532: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
[10536.395838] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9905: block 6529: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
[10537.029470] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9899: block 6533: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
[10537.259601] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9929: block 6527: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
len=15934, name_len=62
Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/269 generic/270 shared/218 shared/298
END TEST: Ext4 4k block w/ no journal Fri Jun 14 02:32:14 EDT 2013

BEGIN TEST: Ext4 1k block Fri Jun 14 02:32:18 EDT 2013
Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/285 shared/218
END TEST: Ext4 1k block Fri Jun 14 04:00:17 EDT 2013

BEGIN TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 04:00:20 EDT 2013
    +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 shared/218
END TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 05:16:10 EDT 2013

BEGIN TEST: Ext4 4k block w/metadata_csum Fri Jun 14 05:16:12 EDT 2013
Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
END TEST: Ext4 4k block w/metadata_csum Fri Jun 14 06:16:31 EDT 2013

BEGIN TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:16:31 EDT 2013
_check_generic_filesystem: filesystem on /dev/vdb is inconsistent (see /root/xfstests/results/generic/013.full)
Failures: generic/013
END TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:21:37 EDT 2013

BEGIN TEST: Ext4 4k block w/data=journal Fri Jun 14 06:21:41 EDT 2013
    +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
    +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/300.full)
[29242.456266] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/buffer.c:1120 mark_buffer_dirty+0x54/0x1ff()
Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 generic/270 generic/300 shared/218
END TEST: Ext4 4k block w/data=journal Fri Jun 14 07:33:10 EDT 2013

BEGIN TEST: Ext4 4k block w/bigalloc Fri Jun 14 07:33:16 EDT 2013
[33544.485801] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/quota/dquot.c:1090 dquot_claim_space_nodirty+0xf1/0x1e3()
Failures: generic/204 generic/219 generic/233 generic/235 generic/273 generic/275 generic/300 shared/218
END TEST: Ext4 4k block w/bigalloc Fri Jun 14 08:56:56 EDT 2013

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner June 17, 2013, 9:08 a.m. UTC | #4
On Fri, 14 Jun 2013, Ted Ts'o wrote:

> Date: Fri, 14 Jun 2013 09:39:39 -0400
> From: Ted Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: linux-ext4@vger.kernel.org
> Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
> 
> On Fri, Jun 14, 2013 at 12:16:53PM +0200, Lukáš Czerner wrote:
> > > It bisects down to this patch, and if I take the dev branch, and
> > > revert patches #15 through #19 in this series, the problem goes away.
> 
> Correction...  reverting patches #15 through #19 (which is what I did in
> the dev-with-revert branch found on ext4.git) causes the problem to go
> away in the nojournal case, but it causes a huge number of other
> problems.  Some of the reverts weren't clean, so it's possible I
> screwed up one of the reverts.  It's also possible that only applying
> part of this series leaves the tree in an unstable state.
> 
> I'd much rather figure out how to fix the problem on the dev branch,
> so thank you for looking into this!

Wow, this looks bad. Theoretically reverting patches %15 through
#19 should not have any real impact. So far I do not see what is
causing that, but I am looking into this.

I see that there are problems in other mode, not just nojournal. Are
those caused by this as well, or are you seeing those even without
the patchset ?

Thanks!
-Lukas

> 
> 						- Ted
> 
> BEGIN TEST: Ext4 4k block Thu Jun 13 23:25:45 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
> END TEST: Ext4 4k block Fri Jun 14 00:29:17 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 00:29:22 EDT 2013
>     +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> generic/270 69s ...     [01:34:24][ 8102.435986] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #25735: block 99331: comm quotacheck: bad entry in directory: rec_len % 4 != 0 -\
>  offset=0(0), inode=3739147998, rec_len=57054, name_len=222
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/263 generic/269 generic/270 generic/285 generic/300 shared/218
> END TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 01:49:59 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/ no journal Fri Jun 14 01:50:00 EDT 2013
>     +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> generic/270 69s ...     [02:20:21][10531.911437] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9894: block 6526: comm quotacheck: bad entry in directory: rec_len is smaller t\
> han minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> [10532.535861] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9692: block 6534: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10534.266775] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9906: block 6530: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10534.697885] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #24673: block 6531: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec\
> _len=15934, name_len=62
> [10535.157126] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9898: block 6532: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10536.395838] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9905: block 6529: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10537.029470] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9899: block 6533: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10537.259601] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9929: block 6527: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/269 generic/270 shared/218 shared/298
> END TEST: Ext4 4k block w/ no journal Fri Jun 14 02:32:14 EDT 2013
> 
> BEGIN TEST: Ext4 1k block Fri Jun 14 02:32:18 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/285 shared/218
> END TEST: Ext4 1k block Fri Jun 14 04:00:17 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 04:00:20 EDT 2013
>     +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 shared/218
> END TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 05:16:10 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/metadata_csum Fri Jun 14 05:16:12 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
> END TEST: Ext4 4k block w/metadata_csum Fri Jun 14 06:16:31 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:16:31 EDT 2013
> _check_generic_filesystem: filesystem on /dev/vdb is inconsistent (see /root/xfstests/results/generic/013.full)
> Failures: generic/013
> END TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:21:37 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/data=journal Fri Jun 14 06:21:41 EDT 2013
>     +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
>     +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/300.full)
> [29242.456266] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/buffer.c:1120 mark_buffer_dirty+0x54/0x1ff()
> Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 generic/270 generic/300 shared/218
> END TEST: Ext4 4k block w/data=journal Fri Jun 14 07:33:10 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/bigalloc Fri Jun 14 07:33:16 EDT 2013
> [33544.485801] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/quota/dquot.c:1090 dquot_claim_space_nodirty+0xf1/0x1e3()
> Failures: generic/204 generic/219 generic/233 generic/235 generic/273 generic/275 generic/300 shared/218
> END TEST: Ext4 4k block w/bigalloc Fri Jun 14 08:56:56 EDT 2013
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Theodore Ts'o June 17, 2013, 12:25 p.m. UTC | #5
On Mon, Jun 17, 2013 at 11:08:32AM +0200, Lukáš Czerner wrote:
> > Correction...  reverting patches #15 through #19 (which is what I did in
> > the dev-with-revert branch found on ext4.git) causes the problem to go
> > away in the nojournal case, but it causes a huge number of other
> > problems.  Some of the reverts weren't clean, so it's possible I
> > screwed up one of the reverts.  It's also possible that only applying
> > part of this series leaves the tree in an unstable state.
> > 
> > I'd much rather figure out how to fix the problem on the dev branch,
> > so thank you for looking into this!
> 
> Wow, this looks bad. Theoretically reverting patches %15 through
> #19 should not have any real impact. So far I do not see what is
> causing that, but I am looking into this.

I've been looking into this more intensively over the weekend.  I'm
now beginning to think we have had a pre-existing race, and the
changes in question has simply changed the timing.  I tried a version
of the dev branch (you can find it as the branch dev2 in my
kernel.org's ext4.git tree) which only had patches 1 through 10 of the
invalidate page range patches (dropping patches 11 through 20), and I
found that generic/300 was failing in the configuration ext3 (a file
system with nodelalloc, no flex_bg, and no extents).  I also found
the same failure with a 3.10-rc2 configuration.

The your changes seem to make generic/300 failure consistently for me
using the nojournal configuration, but looking at patches in question,
I don't think they could have directly caused the problem.  Instead, I
think they just changed the timing to unmask the problem.

Given that I've seen generic/300 test failures in various different
baselines going all the way back to 3.9-rc4, this isn't a recent
regression.  And given that it does seem to be timing sensitive,
bisecting it is going to be difficult.  On the other hand, given that
using the dev (or master) branch, generic/300 is failing with a
greater than 70% probability using kvm with 2 cpu's, 2 megs of RAM and
5400 rpm laptop drives in nojournal mode, the fact that it's
reproducing relatively reliably hopefully will make it easier to find
the problem.

> I see that there are problems in other mode, not just nojournal. Are
> those caused by this as well, or are you seeing those even without
> the patchset ?

I think the other problems in my dev-with-revert branch was caused by
some screw up on my part when did the revert using git.  I found that
dropping the patches from a copy of the guilt patch stack, and then
applying all of the patches except for the last half of the invalidate
page range patch series, resulted in a clean branch that didn't have
any of these failures.  It's what I should have done late last week,
instead of trying to use "git revert".

Cheers,

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner June 17, 2013, 12:30 p.m. UTC | #6
On Fri, 14 Jun 2013, Ted Ts'o wrote:

> Date: Fri, 14 Jun 2013 09:39:39 -0400
> From: Ted Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: linux-ext4@vger.kernel.org
> Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
> 
> On Fri, Jun 14, 2013 at 12:16:53PM +0200, Lukáš Czerner wrote:
> > > It bisects down to this patch, and if I take the dev branch, and
> > > revert patches #15 through #19 in this series, the problem goes away.
> 
> Correction...  reverting patches #15 through #19 (which is what I did in
> the dev-with-revert branch found on ext4.git) causes the problem to go
> away in the nojournal case, but it causes a huge number of other
> problems.  Some of the reverts weren't clean, so it's possible I
> screwed up one of the reverts.  It's also possible that only applying
> part of this series leaves the tree in an unstable state.
> 
> I'd much rather figure out how to fix the problem on the dev branch,
> so thank you for looking into this!

Ok so it seem like the problems you're seeing here after the revert
might be cause wrong revert. Simply applying patches #1 through #14
on top of the c7788792a5e7b0d5d7f96d0766b4cb6112d47d75 (Linux
3.10-rc2 - that seem to be what is ext4 branch based on?) does not
yield any errors without journal.

FSTYP         -- ext4
PLATFORM      -- Linux/x86_64 rhel6_vm1 3.10.0-rc2-debug+
MKFS_OPTIONS  -- -q -F -b4096 -O ^has_journal /dev/sdb
MOUNT_OPTIONS -- -o acl,user_xattr /dev/sdb /mnt/test1

generic/075 10s ... 10s
generic/091 21s ... 18s
generic/112      10s
generic/127 286s ... 248s
generic/231 206s ... 77s
generic/255 1s ... 1s
generic/263 16s ... 11s
generic/269 29s ... 30s
generic/270 32s ... 32s
generic/300 8s ... 7s
shared/218       4s
shared/298 33s ... 21s
Ran: generic/075 generic/091 generic/112 generic/127 generic/231
generic/255 generic/263 generic/269 generic/270 generic/300
shared/218 shared/298
Passed all 12 tests

If required I will do the revert myself to make sure that nothing
breaks. However I certainly hope it would not be necessary.

I am still working to figure out what's going on. I'll keep you
posted.

Thanks!
-Lukas


> 
> 						- Ted
> 
> BEGIN TEST: Ext4 4k block Thu Jun 13 23:25:45 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
> END TEST: Ext4 4k block Fri Jun 14 00:29:17 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 00:29:22 EDT 2013
>     +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> generic/270 69s ...     [01:34:24][ 8102.435986] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #25735: block 99331: comm quotacheck: bad entry in directory: rec_len % 4 != 0 -\
>  offset=0(0), inode=3739147998, rec_len=57054, name_len=222
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/263 generic/269 generic/270 generic/285 generic/300 shared/218
> END TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Jun 14 01:49:59 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/ no journal Fri Jun 14 01:50:00 EDT 2013
>     +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> generic/270 69s ...     [02:20:21][10531.911437] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9894: block 6526: comm quotacheck: bad entry in directory: rec_len is smaller t\
> han minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> [10532.535861] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9692: block 6534: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10534.266775] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9906: block 6530: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10534.697885] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #24673: block 6531: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec\
> _len=15934, name_len=62
> [10535.157126] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9898: block 6532: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10536.395838] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9905: block 6529: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10537.029470] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9899: block 6533: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> [10537.259601] EXT4-fs error (device vdc): htree_dirblock_to_tree:920: inode #9929: block 6527: comm quotacheck: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1044266558, rec_\
> len=15934, name_len=62
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/269 generic/270 shared/218 shared/298
> END TEST: Ext4 4k block w/ no journal Fri Jun 14 02:32:14 EDT 2013
> 
> BEGIN TEST: Ext4 1k block Fri Jun 14 02:32:18 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 generic/285 shared/218
> END TEST: Ext4 1k block Fri Jun 14 04:00:17 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 04:00:20 EDT 2013
>     +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
> Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 shared/218
> END TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Jun 14 05:16:10 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/metadata_csum Fri Jun 14 05:16:12 EDT 2013
> Failures: generic/075 generic/091 generic/112 generic/127 generic/231 generic/255 generic/263 shared/218
> END TEST: Ext4 4k block w/metadata_csum Fri Jun 14 06:16:31 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:16:31 EDT 2013
> _check_generic_filesystem: filesystem on /dev/vdb is inconsistent (see /root/xfstests/results/generic/013.full)
> Failures: generic/013
> END TEST: Ext4 4k block w/dioread_nolock Fri Jun 14 06:21:37 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/data=journal Fri Jun 14 06:21:41 EDT 2013
>     +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/269.full)
>     +_check_generic_filesystem: filesystem on /dev/vdc is inconsistent (see /root/xfstests/results/generic/300.full)
> [29242.456266] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/buffer.c:1120 mark_buffer_dirty+0x54/0x1ff()
> Failures: generic/075 generic/091 generic/112 generic/127 generic/223 generic/231 generic/255 generic/263 generic/269 generic/270 generic/300 shared/218
> END TEST: Ext4 4k block w/data=journal Fri Jun 14 07:33:10 EDT 2013
> 
> BEGIN TEST: Ext4 4k block w/bigalloc Fri Jun 14 07:33:16 EDT 2013
> [33544.485801] WARNING: at /usr/local/google/home/tytso/linux/ext4/fs/quota/dquot.c:1090 dquot_claim_space_nodirty+0xf1/0x1e3()
> Failures: generic/204 generic/219 generic/233 generic/235 generic/273 generic/275 generic/300 shared/218
> END TEST: Ext4 4k block w/bigalloc Fri Jun 14 08:56:56 EDT 2013
> 
>
Lukas Czerner June 17, 2013, 12:46 p.m. UTC | #7
On Mon, 17 Jun 2013, Theodore Ts'o wrote:

> Date: Mon, 17 Jun 2013 08:25:18 -0400
> From: Theodore Ts'o <tytso@mit.edu>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: linux-ext4@vger.kernel.org
> Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
> 
> On Mon, Jun 17, 2013 at 11:08:32AM +0200, Lukáš Czerner wrote:
> > > Correction...  reverting patches #15 through #19 (which is what I did in
> > > the dev-with-revert branch found on ext4.git) causes the problem to go
> > > away in the nojournal case, but it causes a huge number of other
> > > problems.  Some of the reverts weren't clean, so it's possible I
> > > screwed up one of the reverts.  It's also possible that only applying
> > > part of this series leaves the tree in an unstable state.
> > > 
> > > I'd much rather figure out how to fix the problem on the dev branch,
> > > so thank you for looking into this!
> > 
> > Wow, this looks bad. Theoretically reverting patches %15 through
> > #19 should not have any real impact. So far I do not see what is
> > causing that, but I am looking into this.
> 
> I've been looking into this more intensively over the weekend.  I'm
> now beginning to think we have had a pre-existing race, and the
> changes in question has simply changed the timing.  I tried a version
> of the dev branch (you can find it as the branch dev2 in my
> kernel.org's ext4.git tree) which only had patches 1 through 10 of the
> invalidate page range patches (dropping patches 11 through 20), and I
> found that generic/300 was failing in the configuration ext3 (a file
> system with nodelalloc, no flex_bg, and no extents).  I also found
> the same failure with a 3.10-rc2 configuration.
> 
> The your changes seem to make generic/300 failure consistently for me
> using the nojournal configuration, but looking at patches in question,
> I don't think they could have directly caused the problem.  Instead, I
> think they just changed the timing to unmask the problem.

Ok, I though that there is something weird because patches #1-#14
should not cause anything like that and from my testing (see my
previous mail) it really seems it does not cause it, at least not
directly.

> 
> Given that I've seen generic/300 test failures in various different
> baselines going all the way back to 3.9-rc4, this isn't a recent
> regression.  And given that it does seem to be timing sensitive,
> bisecting it is going to be difficult.  On the other hand, given that
> using the dev (or master) branch, generic/300 is failing with a
> greater than 70% probability using kvm with 2 cpu's, 2 megs of RAM and
> 5400 rpm laptop drives in nojournal mode, the fact that it's
> reproducing relatively reliably hopefully will make it easier to find
> the problem.

As mentioned in previous email test generic/300 runs without any
problems (even in the loop) without journal with patches #1 through
#14 applied on 3.10-rc2 (c7788792a5e7b0d5d7f96d0766b4cb6112d47d75).
This is on kvm with 24 cpu's, 8GB of RAM (I suppose you're not using
2MB of ram in your setup, but rather 2GB :) and server drives with
linear lvm on top of it.

-Lukas

> 
> > I see that there are problems in other mode, not just nojournal. Are
> > those caused by this as well, or are you seeing those even without
> > the patchset ?
> 
> I think the other problems in my dev-with-revert branch was caused by
> some screw up on my part when did the revert using git.  I found that
> dropping the patches from a copy of the guilt patch stack, and then
> applying all of the patches except for the last half of the invalidate
> page range patch series, resulted in a clean branch that didn't have
> any of these failures.  It's what I should have done late last week,
> instead of trying to use "git revert".
> 
> Cheers,
> 
> 					- Ted
>
diff mbox

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9f9719f..2d4b0aa 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2100,6 +2100,8 @@  extern int ext4_block_truncate_page(handle_t *handle,
 		struct address_space *mapping, loff_t from);
 extern int ext4_block_zero_page_range(handle_t *handle,
 		struct address_space *mapping, loff_t from, loff_t length);
+extern int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+			     loff_t lstart, loff_t lend);
 extern int ext4_discard_partial_page_buffers(handle_t *handle,
 		struct address_space *mapping, loff_t from,
 		loff_t length, int flags);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 34ebb62..44333a5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3692,6 +3692,41 @@  unlock:
 	return err;
 }
 
+int ext4_zero_partial_blocks(handle_t *handle, struct inode *inode,
+			     loff_t lstart, loff_t length)
+{
+	struct super_block *sb = inode->i_sb;
+	struct address_space *mapping = inode->i_mapping;
+	unsigned partial = lstart & (sb->s_blocksize - 1);
+	ext4_fsblk_t start, end;
+	loff_t byte_end = (lstart + length - 1);
+	int err = 0;
+
+	start = lstart >> sb->s_blocksize_bits;
+	end = byte_end >> sb->s_blocksize_bits;
+
+	/* Handle partial zero within the single block */
+	if (start == end) {
+		err = ext4_block_zero_page_range(handle, mapping,
+						 lstart, length);
+		return err;
+	}
+	/* Handle partial zero out on the start of the range */
+	if (partial) {
+		err = ext4_block_zero_page_range(handle, mapping,
+						 lstart, sb->s_blocksize);
+		if (err)
+			return err;
+	}
+	/* Handle partial zero out on the end of the range */
+	partial = byte_end & (sb->s_blocksize - 1);
+	if (partial != sb->s_blocksize - 1)
+		err = ext4_block_zero_page_range(handle, mapping,
+						 byte_end - partial,
+						 partial + 1);
+	return err;
+}
+
 int ext4_can_truncate(struct inode *inode)
 {
 	if (S_ISREG(inode->i_mode))
@@ -3720,8 +3755,7 @@  int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 	struct super_block *sb = inode->i_sb;
 	ext4_lblk_t first_block, stop_block;
 	struct address_space *mapping = inode->i_mapping;
-	loff_t first_page, last_page, page_len;
-	loff_t first_page_offset, last_page_offset;
+	loff_t first_block_offset, last_block_offset;
 	handle_t *handle;
 	unsigned int credits;
 	int ret = 0;
@@ -3772,17 +3806,13 @@  int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 		   offset;
 	}
 
-	first_page = (offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	last_page = (offset + length) >> PAGE_CACHE_SHIFT;
+	first_block_offset = round_up(offset, sb->s_blocksize);
+	last_block_offset = round_down((offset + length), sb->s_blocksize) - 1;
 
-	first_page_offset = first_page << PAGE_CACHE_SHIFT;
-	last_page_offset = last_page << PAGE_CACHE_SHIFT;
-
-	/* Now release the pages */
-	if (last_page_offset > first_page_offset) {
-		truncate_pagecache_range(inode, first_page_offset,
-					 last_page_offset - 1);
-	}
+	/* Now release the pages and zero block aligned part of pages*/
+	if (last_block_offset > first_block_offset)
+		truncate_pagecache_range(inode, first_block_offset,
+					 last_block_offset);
 
 	/* Wait all existing dio workers, newcomers will block on i_mutex */
 	ext4_inode_block_unlocked_dio(inode);
@@ -3802,66 +3832,10 @@  int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 		goto out_dio;
 	}
 
-	/*
-	 * Now we need to zero out the non-page-aligned data in the
-	 * pages at the start and tail of the hole, and unmap the
-	 * buffer heads for the block aligned regions of the page that
-	 * were completely zeroed.
-	 */
-	if (first_page > last_page) {
-		/*
-		 * If the file space being truncated is contained
-		 * within a page just zero out and unmap the middle of
-		 * that page
-		 */
-		ret = ext4_discard_partial_page_buffers(handle,
-			mapping, offset, length, 0);
-
-		if (ret)
-			goto out_stop;
-	} else {
-		/*
-		 * zero out and unmap the partial page that contains
-		 * the start of the hole
-		 */
-		page_len = first_page_offset - offset;
-		if (page_len > 0) {
-			ret = ext4_discard_partial_page_buffers(handle, mapping,
-						offset, page_len, 0);
-			if (ret)
-				goto out_stop;
-		}
-
-		/*
-		 * zero out and unmap the partial page that contains
-		 * the end of the hole
-		 */
-		page_len = offset + length - last_page_offset;
-		if (page_len > 0) {
-			ret = ext4_discard_partial_page_buffers(handle, mapping,
-					last_page_offset, page_len, 0);
-			if (ret)
-				goto out_stop;
-		}
-	}
-
-	/*
-	 * If i_size is contained in the last page, we need to
-	 * unmap and zero the partial page after i_size
-	 */
-	if (inode->i_size >> PAGE_CACHE_SHIFT == last_page &&
-	   inode->i_size % PAGE_CACHE_SIZE != 0) {
-		page_len = PAGE_CACHE_SIZE -
-			(inode->i_size & (PAGE_CACHE_SIZE - 1));
-
-		if (page_len > 0) {
-			ret = ext4_discard_partial_page_buffers(handle,
-					mapping, inode->i_size, page_len, 0);
-
-			if (ret)
-				goto out_stop;
-		}
-	}
+	ret = ext4_zero_partial_blocks(handle, inode, offset,
+				       length);
+	if (ret)
+		goto out_stop;
 
 	first_block = (offset + sb->s_blocksize - 1) >>
 		EXT4_BLOCK_SIZE_BITS(sb);