[v4,1/2] dax: dax_layout_busy_page() warn on !exceptional

Message ID 20180710191031.17919-2-ross.zwisler@linux.intel.com
State Superseded, archived
Headers show
Series
  • ext4: fix DAX dma vs truncate/hole-punch
Related show

Commit Message

Ross Zwisler July 10, 2018, 7:10 p.m.
Inodes using DAX should only ever have exceptional entries in their page
caches.  Make this clear by warning if the iteration in
dax_layout_busy_page() ever sees a non-exceptional entry, and by adding a
comment for the pagevec_release() call which only deals with struct page
pointers.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/dax.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Eric Sandeen Aug. 10, 2018, 7:52 p.m. | #1
On 7/10/18 2:10 PM, Ross Zwisler wrote:
> Inodes using DAX should only ever have exceptional entries in their page
> caches.  Make this clear by warning if the iteration in
> dax_layout_busy_page() ever sees a non-exceptional entry, and by adding a
> comment for the pagevec_release() call which only deals with struct page
> pointers.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Hi Ted, hadn't seem feedback from you on this by the time it gathered reviews,
is this something you plan to merge for realz?  (I see it's on your dev
branch now, just not sure of its permanence at this point.)

Thanks,
-Eric

> ---
>  fs/dax.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 641192808bb6..897b51e41d8f 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -566,7 +566,8 @@ struct page *dax_layout_busy_page(struct address_space *mapping)
>  			if (index >= end)
>  				break;
>  
> -			if (!radix_tree_exceptional_entry(pvec_ent))
> +			if (WARN_ON_ONCE(
> +			     !radix_tree_exceptional_entry(pvec_ent)))
>  				continue;
>  
>  			xa_lock_irq(&mapping->i_pages);
> @@ -578,6 +579,13 @@ struct page *dax_layout_busy_page(struct address_space *mapping)
>  			if (page)
>  				break;
>  		}
> +
> +		/*
> +		 * We don't expect normal struct page entries to exist in our
> +		 * tree, but we keep these pagevec calls so that this code is
> +		 * consistent with the common pattern for handling pagevecs
> +		 * throughout the kernel.
> +		 */
>  		pagevec_remove_exceptionals(&pvec);
>  		pagevec_release(&pvec);
>  		index++;
>
Theodore Y. Ts'o Aug. 10, 2018, 8:33 p.m. | #2
On Fri, Aug 10, 2018 at 02:52:53PM -0500, Eric Sandeen wrote:
> 
> Hi Ted, hadn't seem feedback from you on this by the time it gathered reviews,
> is this something you plan to merge for realz?  (I see it's on your dev
> branch now, just not sure of its permanence at this point.)

Yes, the dev branch is pretty much locked down since I assume the
merge window is openning over the weekend.

The reason why I've been silent is because I haven't had a chance to
do was a test run with DAX, because up until recently I wasn't able to
run a DAX regression test run.  (That's because of the CONFIG_KASAN
incompatibility with CONFIG_ZONE_DEVICE that caused the kernel to
instantly blow up on boot if I tried to enable emulated /dev/pmem
devices.)

I just kicked off a DAX test ("gce-xfstests -c dax -g auto") with
CONFIG_KASAN disabled, and I expect it shouldn't show up anything
concerning.  So assuming nothing surprising pops up, yes it should be
merged at the next merge window.

	     		    	     	  - Ted
Theodore Y. Ts'o Aug. 11, 2018, 2:10 a.m. | #3
On Fri, Aug 10, 2018 at 04:33:49PM -0400, Theodore Y. Ts'o wrote:
> I just kicked off a DAX test ("gce-xfstests -c dax -g auto") with
> CONFIG_KASAN disabled, and I expect it shouldn't show up anything
> concerning.  So assuming nothing surprising pops up, yes it should be
> merged at the next merge window.

... and here are the results.  The first is 4.17, and the second is
the ext4 git tree:

ext4/dax: 488 tests, 4 failures, 97 skipped, 2647 seconds
  Failures: ext4/033 generic/344 generic/491 generic/503

ext4/dax: 488 tests, 3 failures, 97 skipped, 2637 seconds
  Failures: generic/081 generic/344 generic/388

The generic/388 failure is a known flake (shutdown stress test).

The generic/081 regression appears to be a device-mapper issue:

generic/081		[22:06:33][   15.079661] run fstests generic/081 at 2018-08-10 22:06:33
[   15.795745] device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load.
[failed, exit status 1] [22:06:36]- output mismatch (see /results/ext4/results-dax/generic/081.out.bad)
    --- tests/generic/081.out	2018-08-09 18:00:42.000000000 -0400
    +++ /results/ext4/results-dax/generic/081.out.bad	2018-08-10 22:06:36.440005460 -0400
    @@ -1,2 +1,4 @@
     QA output created by 081
     Silence is golden
    +Failed to create snapshot
    +(see /results/ext4/results-dax/generic/081.full for details)
    ...
    (Run 'diff -u tests/generic/081.out /results/ext4/results-dax/generic/081.out.bad'  to see the entire diff)

The generic/344 failure seems to be caused by a WARNING triggered in
the nvdimm code:

generic/344		[22:06:36][   18.126280] run fstests generic/344 at 2018-08-10 22:06:36
[   18.303113] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[   18.456988] EXT4-fs (pmem1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[   97.375912] WARNING: CPU: 2 PID: 1712 at /usr/projects/linux/ext4/mm/memory.c:1801 insert_pfn+0x15a/0x170
[   97.377261] CPU: 2 PID: 1712 Comm: holetest Not tainted 4.18.0-rc4-xfstests-00039-g863c37fcb14f #497
[   97.378486] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[   97.379516] RIP: 0010:insert_pfn+0x15a/0x170
[   97.380064] Code: 19 1b 01 eb dd 48 85 d2 74 07 48 23 1d 3f 19 1b 01 48 09 df 48 89 f8 0f 1f 40 00 48 b9 00 02 00 00 00 00 00 04 48 09 c1 eb c8 <0f> 0b e9 5e ff ff ff bb f4 ff ff ff e9 5e ff ff ff e8 80 7b ec ff 
[   97.382469] RSP: 0000:ffffacfd0457fc60 EFLAGS: 00010206
[   97.383123] RAX: 0000000000179e3b RBX: 00000000fffffff0 RCX: 0000000000000002
[   97.384062] RDX: 000fffffffffffff RSI: 8f5c28f5c28f5c29 RDI: 8000000179e3b225
[   97.385134] RBP: ffff923761654558 R08: 0000000000000001 R09: 0000000000000001
[   97.386213] R10: ffff92376f415cc0 R11: 0000000000000001 R12: ffff92377e880cc0
[   97.387264] R13: 00007fab98aab000 R14: 00000000003e860d R15: ffff923761560000
[   97.388327] FS:  00007fab9049c700(0000) GS:ffff92377f200000(0000) knlGS:0000000000000000
[   97.389514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   97.390367] CR2: 00007fab98aabc00 CR3: 00000006a165a004 CR4: 0000000000360ee0
[   97.391432] Call Trace:
[   97.391798]  __vm_insert_mixed+0x7e/0xc0
[   97.392376]  vmf_insert_mixed_mkwrite+0xf/0x30
[   97.393048]  dax_iomap_pte_fault+0xb8b/0xe40
[   97.393691]  ext4_dax_huge_fault+0x145/0x200
[   97.394268]  do_wp_page+0x175/0x5b0
[   97.394710]  __handle_mm_fault+0x587/0xbb0
[   97.395228]  __do_page_fault+0x20c/0x490
[   97.395729]  ? async_page_fault+0x8/0x30
[   97.396251]  async_page_fault+0x1e/0x30
[   97.396719] RIP: 0033:0x5598144275ea
[   97.397187] Code: 0f 85 8a 00 00 00 31 d2 48 85 db 4b 8d 04 34 7e 1f 0f 1f 80 00 00 00 00 48 89 d1 48 83 c2 01 48 0f af 0d 71 1b 20 00 48 39 d3 <48> 89 2c 08 75 e8 8b 0d 36 1b 20 00 31 c0 85 c9 74 0a 8b 15 2e 1b 
[   97.399752] RSP: 002b:00007fab9049bf20 EFLAGS: 00010212
[   97.400541] RAX: 00007fab90c9ec00 RBX: 0000000000010000 RCX: 0000000007e0d000
[   97.401603] RDX: 0000000000007e0e RSI: 0000000000000000 RDI: 0000000000000001
[   97.402673] RBP: 00007fab9049c700 R08: 00007fab9049c700 R09: 00007fab9049c700
[   97.403755] R10: 00007fab9049c9d0 R11: 0000000000000202 R12: 00007fab90c9e000
[   97.404851] R13: 00007ffc4608c9e0 R14: 0000000000000c00 R15: 000055981608e250
[   97.405892] irq event stamp: 1111968
[   97.406460] hardirqs last  enabled at (1111967): [<ffffffffa36c4409>] _raw_spin_unlock_irq+0x29/0x40
[   97.407826] hardirqs last disabled at (1111968): [<ffffffffa380118f>] error_entry+0x7f/0x100
[   97.409080] softirqs last  enabled at (1111390): [<ffffffffa3a00319>] __do_softirq+0x319/0x4d9
[   97.410363] softirqs last disabled at (1111383): [<ffffffffa2e8c9e1>] irq_exit+0xc1/0xd0
[   97.411400] ---[ end trace 69669a34a73c1a49 ]---
[  117.726077] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[  117.727671] EXT4-fs (pmem0): mounted filesystem with ordered data mode. Opts: acl,user_xattr,block_validity,dax
[  117.796623] EXT4-fs (pmem1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[  117.798546] EXT4-fs (pmem1): mounted filesystem with ordered data mode. Opts: acl,user_xattr,block_validity,dax
_check_dmesg: something found in dmesg (see /results/ext4/results-dax/generic/344.dmesg)

		      		      	       - Ted
Jan Kara Aug. 13, 2018, 10:12 a.m. | #4
On Fri 10-08-18 22:10:53, Theodore Y. Ts'o wrote:
> On Fri, Aug 10, 2018 at 04:33:49PM -0400, Theodore Y. Ts'o wrote:
> > I just kicked off a DAX test ("gce-xfstests -c dax -g auto") with
> > CONFIG_KASAN disabled, and I expect it shouldn't show up anything
> > concerning.  So assuming nothing surprising pops up, yes it should be
> > merged at the next merge window.
> 
> ... and here are the results.  The first is 4.17, and the second is
> the ext4 git tree:
> 
> ext4/dax: 488 tests, 4 failures, 97 skipped, 2647 seconds
>   Failures: ext4/033 generic/344 generic/491 generic/503
> 
> ext4/dax: 488 tests, 3 failures, 97 skipped, 2637 seconds
>   Failures: generic/081 generic/344 generic/388
> 
> The generic/388 failure is a known flake (shutdown stress test).
> 
> The generic/081 regression appears to be a device-mapper issue:
> 
> generic/081		[22:06:33][   15.079661] run fstests generic/081 at 2018-08-10 22:06:33
> [   15.795745] device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load.
> [failed, exit status 1] [22:06:36]- output mismatch (see /results/ext4/results-dax/generic/081.out.bad)
>     --- tests/generic/081.out	2018-08-09 18:00:42.000000000 -0400
>     +++ /results/ext4/results-dax/generic/081.out.bad	2018-08-10 22:06:36.440005460 -0400
>     @@ -1,2 +1,4 @@
>      QA output created by 081
>      Silence is golden
>     +Failed to create snapshot
>     +(see /results/ext4/results-dax/generic/081.full for details)
>     ...
>     (Run 'diff -u tests/generic/081.out /results/ext4/results-dax/generic/081.out.bad'  to see the entire diff)

I'll see if this reproduces for me. Doesn't seem to be related to the DAX
patches you caary though.

> The generic/344 failure seems to be caused by a WARNING triggered in
> the nvdimm code:

OK, apparently this is nothing new for you as generic/344 fails for you
even with 3.17. But it should not :). I'll try to see if I can reproduce
this in my test setup during more test runs (I don't remember seeing it
during occasional runs I do) and debug it further.

								Honza

> generic/344		[22:06:36][   18.126280] run fstests generic/344 at 2018-08-10 22:06:36
> [   18.303113] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
> [   18.456988] EXT4-fs (pmem1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
> [   97.375912] WARNING: CPU: 2 PID: 1712 at /usr/projects/linux/ext4/mm/memory.c:1801 insert_pfn+0x15a/0x170
> [   97.377261] CPU: 2 PID: 1712 Comm: holetest Not tainted 4.18.0-rc4-xfstests-00039-g863c37fcb14f #497
> [   97.378486] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
> [   97.379516] RIP: 0010:insert_pfn+0x15a/0x170
> [   97.380064] Code: 19 1b 01 eb dd 48 85 d2 74 07 48 23 1d 3f 19 1b 01 48 09 df 48 89 f8 0f 1f 40 00 48 b9 00 02 00 00 00 00 00 04 48 09 c1 eb c8 <0f> 0b e9 5e ff ff ff bb f4 ff ff ff e9 5e ff ff ff e8 80 7b ec ff 
> [   97.382469] RSP: 0000:ffffacfd0457fc60 EFLAGS: 00010206
> [   97.383123] RAX: 0000000000179e3b RBX: 00000000fffffff0 RCX: 0000000000000002
> [   97.384062] RDX: 000fffffffffffff RSI: 8f5c28f5c28f5c29 RDI: 8000000179e3b225
> [   97.385134] RBP: ffff923761654558 R08: 0000000000000001 R09: 0000000000000001
> [   97.386213] R10: ffff92376f415cc0 R11: 0000000000000001 R12: ffff92377e880cc0
> [   97.387264] R13: 00007fab98aab000 R14: 00000000003e860d R15: ffff923761560000
> [   97.388327] FS:  00007fab9049c700(0000) GS:ffff92377f200000(0000) knlGS:0000000000000000
> [   97.389514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   97.390367] CR2: 00007fab98aabc00 CR3: 00000006a165a004 CR4: 0000000000360ee0
> [   97.391432] Call Trace:
> [   97.391798]  __vm_insert_mixed+0x7e/0xc0
> [   97.392376]  vmf_insert_mixed_mkwrite+0xf/0x30
> [   97.393048]  dax_iomap_pte_fault+0xb8b/0xe40
> [   97.393691]  ext4_dax_huge_fault+0x145/0x200
> [   97.394268]  do_wp_page+0x175/0x5b0
> [   97.394710]  __handle_mm_fault+0x587/0xbb0
> [   97.395228]  __do_page_fault+0x20c/0x490
> [   97.395729]  ? async_page_fault+0x8/0x30
> [   97.396251]  async_page_fault+0x1e/0x30
> [   97.396719] RIP: 0033:0x5598144275ea
> [   97.397187] Code: 0f 85 8a 00 00 00 31 d2 48 85 db 4b 8d 04 34 7e 1f 0f 1f 80 00 00 00 00 48 89 d1 48 83 c2 01 48 0f af 0d 71 1b 20 00 48 39 d3 <48> 89 2c 08 75 e8 8b 0d 36 1b 20 00 31 c0 85 c9 74 0a 8b 15 2e 1b 
> [   97.399752] RSP: 002b:00007fab9049bf20 EFLAGS: 00010212
> [   97.400541] RAX: 00007fab90c9ec00 RBX: 0000000000010000 RCX: 0000000007e0d000
> [   97.401603] RDX: 0000000000007e0e RSI: 0000000000000000 RDI: 0000000000000001
> [   97.402673] RBP: 00007fab9049c700 R08: 00007fab9049c700 R09: 00007fab9049c700
> [   97.403755] R10: 00007fab9049c9d0 R11: 0000000000000202 R12: 00007fab90c9e000
> [   97.404851] R13: 00007ffc4608c9e0 R14: 0000000000000c00 R15: 000055981608e250
> [   97.405892] irq event stamp: 1111968
> [   97.406460] hardirqs last  enabled at (1111967): [<ffffffffa36c4409>] _raw_spin_unlock_irq+0x29/0x40
> [   97.407826] hardirqs last disabled at (1111968): [<ffffffffa380118f>] error_entry+0x7f/0x100
> [   97.409080] softirqs last  enabled at (1111390): [<ffffffffa3a00319>] __do_softirq+0x319/0x4d9
> [   97.410363] softirqs last disabled at (1111383): [<ffffffffa2e8c9e1>] irq_exit+0xc1/0xd0
> [   97.411400] ---[ end trace 69669a34a73c1a49 ]---
> [  117.726077] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
> [  117.727671] EXT4-fs (pmem0): mounted filesystem with ordered data mode. Opts: acl,user_xattr,block_validity,dax
> [  117.796623] EXT4-fs (pmem1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
> [  117.798546] EXT4-fs (pmem1): mounted filesystem with ordered data mode. Opts: acl,user_xattr,block_validity,dax
> _check_dmesg: something found in dmesg (see /results/ext4/results-dax/generic/344.dmesg)
> 
> 		      		      	       - Ted
Theodore Y. Ts'o Aug. 13, 2018, 12:46 p.m. | #5
On Mon, Aug 13, 2018 at 12:12:52PM +0200, Jan Kara wrote:
> > The generic/081 regression appears to be a device-mapper issue...
> 
> I'll see if this reproduces for me. Doesn't seem to be related to the DAX
> patches you caary though.

It does seem to be a DAX-specific failure though.

> > The generic/344 failure seems to be caused by a WARNING triggered in
> > the nvdimm code:
> 
> OK, apparently this is nothing new for you as generic/344 fails for you
> even with 3.17. But it should not :). I'll try to see if I can reproduce
> this in my test setup during more test runs (I don't remember seeing it
> during occasional runs I do) and debug it further.

Thanks!

In case it wasn't clear, I wasn't planning on letting these failures
prevent the patches from going upstream.  As you say, the generic/081
failure looks unrelated to ext4, and the generic/344 isn't a
regression.

						- Ted
Jan Kara Aug. 24, 2018, 3:44 p.m. | #6
On Mon 13-08-18 12:12:52, Jan Kara wrote:
> On Fri 10-08-18 22:10:53, Theodore Y. Ts'o wrote:
> > The generic/344 failure seems to be caused by a WARNING triggered in
> > the nvdimm code:
> 
> OK, apparently this is nothing new for you as generic/344 fails for you
> even with 3.17. But it should not :). I'll try to see if I can reproduce
> this in my test setup during more test runs (I don't remember seeing it
> during occasional runs I do) and debug it further.

I could reproduce this relatively easily but it took me quite a while to
figure out how to best fix this. I'll send the fix shortly (mm: Fix warning
in insert_pfn()).

								Honza
Jan Kara Aug. 27, 2018, 4:09 p.m. | #7
On Mon 13-08-18 12:12:52, Jan Kara wrote:
> On Fri 10-08-18 22:10:53, Theodore Y. Ts'o wrote:
> > On Fri, Aug 10, 2018 at 04:33:49PM -0400, Theodore Y. Ts'o wrote:
> > > I just kicked off a DAX test ("gce-xfstests -c dax -g auto") with
> > > CONFIG_KASAN disabled, and I expect it shouldn't show up anything
> > > concerning.  So assuming nothing surprising pops up, yes it should be
> > > merged at the next merge window.
> > 
> > ... and here are the results.  The first is 4.17, and the second is
> > the ext4 git tree:
> > 
> > ext4/dax: 488 tests, 4 failures, 97 skipped, 2647 seconds
> >   Failures: ext4/033 generic/344 generic/491 generic/503
> > 
> > ext4/dax: 488 tests, 3 failures, 97 skipped, 2637 seconds
> >   Failures: generic/081 generic/344 generic/388
> > 
> > The generic/388 failure is a known flake (shutdown stress test).
> > 
> > The generic/081 regression appears to be a device-mapper issue:
> > 
> > generic/081		[22:06:33][   15.079661] run fstests generic/081 at 2018-08-10 22:06:33
> > [   15.795745] device-mapper: ioctl: can't change device type (old=4 vs new=1) after initial table load.
> > [failed, exit status 1] [22:06:36]- output mismatch (see /results/ext4/results-dax/generic/081.out.bad)
> >     --- tests/generic/081.out	2018-08-09 18:00:42.000000000 -0400
> >     +++ /results/ext4/results-dax/generic/081.out.bad	2018-08-10 22:06:36.440005460 -0400
> >     @@ -1,2 +1,4 @@
> >      QA output created by 081
> >      Silence is golden
> >     +Failed to create snapshot
> >     +(see /results/ext4/results-dax/generic/081.full for details)
> >     ...
> >     (Run 'diff -u tests/generic/081.out /results/ext4/results-dax/generic/081.out.bad'  to see the entire diff)
> 
> I'll see if this reproduces for me. Doesn't seem to be related to the DAX
> patches you caary though.

This is caused by commit dbc626597c39b "dm: prevent DAX mounts if not
supported". I've started discussion with relevant developers how to fix
this.

								Honza

Patch

diff --git a/fs/dax.c b/fs/dax.c
index 641192808bb6..897b51e41d8f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -566,7 +566,8 @@  struct page *dax_layout_busy_page(struct address_space *mapping)
 			if (index >= end)
 				break;
 
-			if (!radix_tree_exceptional_entry(pvec_ent))
+			if (WARN_ON_ONCE(
+			     !radix_tree_exceptional_entry(pvec_ent)))
 				continue;
 
 			xa_lock_irq(&mapping->i_pages);
@@ -578,6 +579,13 @@  struct page *dax_layout_busy_page(struct address_space *mapping)
 			if (page)
 				break;
 		}
+
+		/*
+		 * We don't expect normal struct page entries to exist in our
+		 * tree, but we keep these pagevec calls so that this code is
+		 * consistent with the common pattern for handling pagevecs
+		 * throughout the kernel.
+		 */
 		pagevec_remove_exceptionals(&pvec);
 		pagevec_release(&pvec);
 		index++;