diff mbox series

fanotify23: Make evictable marks tests more reliable

Message ID 20230810131012.13026-1-jack@suse.cz
State Superseded
Headers show
Series fanotify23: Make evictable marks tests more reliable | expand

Commit Message

Jan Kara Aug. 10, 2023, 1:10 p.m. UTC
It has been observed that when fanotify23 test is run in a loop, it
eventually fails as:

fanotify23.c:112: TPASS: FAN_MARK_ADD failed with EEXIST as expected when trying to downgrade to evictable mark
fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after empty mask
fanotify23.c:156: TPASS: Got no events as expected
fanotify23.c:81: TFAIL: FAN_MARK_REMOVE did not fail with ENOENT as expected after drop_caches: SUCCESS (0)

This is because repeated evictions of caches done by the test reclaim
all freeable slab objects in the system. So when the test calls drop
caches, only the dentry and inode of the test file are there to reclaim.
But for inode to be reclaimed, dentry (which holds inode reference) has
to be freed first and for dentry to be freed it has to first cycle
through the LRU which takes two slab reclaim calls.

Call drop caches twice to make sure dentry has chance to pass through
the LRU and be reclaimed.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 testcases/kernel/syscalls/fanotify/fanotify23.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Amir Goldstein Aug. 10, 2023, 1:30 p.m. UTC | #1
On Thu, Aug 10, 2023 at 4:10 PM Jan Kara <jack@suse.cz> wrote:
>
> It has been observed that when fanotify23 test is run in a loop, it
> eventually fails as:
>
> fanotify23.c:112: TPASS: FAN_MARK_ADD failed with EEXIST as expected when trying to downgrade to evictable mark
> fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after empty mask
> fanotify23.c:156: TPASS: Got no events as expected
> fanotify23.c:81: TFAIL: FAN_MARK_REMOVE did not fail with ENOENT as expected after drop_caches: SUCCESS (0)
>
> This is because repeated evictions of caches done by the test reclaim
> all freeable slab objects in the system. So when the test calls drop
> caches, only the dentry and inode of the test file are there to reclaim.
> But for inode to be reclaimed, dentry (which holds inode reference) has
> to be freed first and for dentry to be freed it has to first cycle
> through the LRU which takes two slab reclaim calls.
>
> Call drop caches twice to make sure dentry has chance to pass through
> the LRU and be reclaimed.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  testcases/kernel/syscalls/fanotify/fanotify23.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/testcases/kernel/syscalls/fanotify/fanotify23.c b/testcases/kernel/syscalls/fanotify/fanotify23.c
> index 89fd4f36a09b..2d50f70585b7 100644
> --- a/testcases/kernel/syscalls/fanotify/fanotify23.c
> +++ b/testcases/kernel/syscalls/fanotify/fanotify23.c
> @@ -160,10 +160,15 @@ static void test_fanotify(void)
>         }
>
>         /*
> -        * drop_caches should evict inode from cache and remove evictable mark
> +        * drop_caches should evict inode from cache and remove evictable mark.
> +        * We call drop_caches twice as once the dentries will just cycle
> +        * through the LRU without being reclaimed and if there are no other
> +        * objects to reclaim, the slab reclaim will just stop instead of
> +        * retrying.
>          */
>         fsync_file(TEST_FILE);
>         SAFE_FILE_PRINTF(DROP_CACHES_FILE, "3");
> +       SAFE_FILE_PRINTF(DROP_CACHES_FILE, "3");
>
>         verify_mark_removed(TEST_FILE, "after drop_caches");

If this improves the reliability of the test, I have no problem with
the solution,
but I am a bit uneasy with the fact that fanotiy10 and fanotify23 have two
different mitigations.

Anyway, I think that the explanation above is true for some fs but xfs
inode lifetime and shrinkers for example and more complex, so it does
not hold. Right?
Meaning that the explanation is true because fanotify10 has:

        /* Shrinkers on other fs do not work reliably enough to
guarantee mark eviction on drop_caches */
        .dev_fs_type = "ext2",

Maybe that should be spelled out?

Thanks,
Amir.
Jan Kara Aug. 10, 2023, 2:25 p.m. UTC | #2
On Thu 10-08-23 16:30:32, Amir Goldstein wrote:
> On Thu, Aug 10, 2023 at 4:10 PM Jan Kara <jack@suse.cz> wrote:
> >
> > It has been observed that when fanotify23 test is run in a loop, it
> > eventually fails as:
> >
> > fanotify23.c:112: TPASS: FAN_MARK_ADD failed with EEXIST as expected when trying to downgrade to evictable mark
> > fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after empty mask
> > fanotify23.c:156: TPASS: Got no events as expected
> > fanotify23.c:81: TFAIL: FAN_MARK_REMOVE did not fail with ENOENT as expected after drop_caches: SUCCESS (0)
> >
> > This is because repeated evictions of caches done by the test reclaim
> > all freeable slab objects in the system. So when the test calls drop
> > caches, only the dentry and inode of the test file are there to reclaim.
> > But for inode to be reclaimed, dentry (which holds inode reference) has
> > to be freed first and for dentry to be freed it has to first cycle
> > through the LRU which takes two slab reclaim calls.
> >
> > Call drop caches twice to make sure dentry has chance to pass through
> > the LRU and be reclaimed.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  testcases/kernel/syscalls/fanotify/fanotify23.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/testcases/kernel/syscalls/fanotify/fanotify23.c b/testcases/kernel/syscalls/fanotify/fanotify23.c
> > index 89fd4f36a09b..2d50f70585b7 100644
> > --- a/testcases/kernel/syscalls/fanotify/fanotify23.c
> > +++ b/testcases/kernel/syscalls/fanotify/fanotify23.c
> > @@ -160,10 +160,15 @@ static void test_fanotify(void)
> >         }
> >
> >         /*
> > -        * drop_caches should evict inode from cache and remove evictable mark
> > +        * drop_caches should evict inode from cache and remove evictable mark.
> > +        * We call drop_caches twice as once the dentries will just cycle
> > +        * through the LRU without being reclaimed and if there are no other
> > +        * objects to reclaim, the slab reclaim will just stop instead of
> > +        * retrying.
> >          */
> >         fsync_file(TEST_FILE);
> >         SAFE_FILE_PRINTF(DROP_CACHES_FILE, "3");
> > +       SAFE_FILE_PRINTF(DROP_CACHES_FILE, "3");
> >
> >         verify_mark_removed(TEST_FILE, "after drop_caches");
> 
> If this improves the reliability of the test, I have no problem with
> the solution,

Yes, it fixes the problem at least for me :)

> but I am a bit uneasy with the fact that fanotiy10 and fanotify23 have two
> different mitigations.

I know and I was thinking about that a bit. I could implement something
similar as fanotify10 does for fanotify23 but it seemed to be a bit of an
overkill so I went for the one-liner.

> Anyway, I think that the explanation above is true for some fs but xfs
> inode lifetime and shrinkers for example and more complex, so it does
> not hold. Right?

So the explanation is using internal knowledge of prune_icache_sb()
implementation. XFS has its own rules for aging inodes so it will
not hold there - it has this "inactivation" action which releases the last
inode reference and AFAICS drop_caches results only in pinging background
threads to do work but doesn't really wait for inodes to be inactivated,
let alone freed.

> Meaning that the explanation is true because fanotify10 has:
> 
>         /* Shrinkers on other fs do not work reliably enough to
> guarantee mark eviction on drop_caches */
>         .dev_fs_type = "ext2",
> 
> Maybe that should be spelled out?

I guess you speak about fanotify23 now but yes, this patch depends on the
fact that we are limited to ext2. Perhaps I can add that to the comment but
it already is mentioned in the test as you show above...

								Honza
Amir Goldstein Aug. 10, 2023, 2:38 p.m. UTC | #3
On Thu, Aug 10, 2023 at 5:25 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 10-08-23 16:30:32, Amir Goldstein wrote:
> > On Thu, Aug 10, 2023 at 4:10 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > It has been observed that when fanotify23 test is run in a loop, it
> > > eventually fails as:
> > >
> > > fanotify23.c:112: TPASS: FAN_MARK_ADD failed with EEXIST as expected when trying to downgrade to evictable mark
> > > fanotify23.c:75: TPASS: FAN_MARK_REMOVE failed with ENOENT as expected after empty mask
> > > fanotify23.c:156: TPASS: Got no events as expected
> > > fanotify23.c:81: TFAIL: FAN_MARK_REMOVE did not fail with ENOENT as expected after drop_caches: SUCCESS (0)
> > >
> > > This is because repeated evictions of caches done by the test reclaim
> > > all freeable slab objects in the system. So when the test calls drop
> > > caches, only the dentry and inode of the test file are there to reclaim.
> > > But for inode to be reclaimed, dentry (which holds inode reference) has
> > > to be freed first and for dentry to be freed it has to first cycle
> > > through the LRU which takes two slab reclaim calls.
> > >
> > > Call drop caches twice to make sure dentry has chance to pass through
> > > the LRU and be reclaimed.
> > >
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  testcases/kernel/syscalls/fanotify/fanotify23.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/testcases/kernel/syscalls/fanotify/fanotify23.c b/testcases/kernel/syscalls/fanotify/fanotify23.c
> > > index 89fd4f36a09b..2d50f70585b7 100644
> > > --- a/testcases/kernel/syscalls/fanotify/fanotify23.c
> > > +++ b/testcases/kernel/syscalls/fanotify/fanotify23.c
> > > @@ -160,10 +160,15 @@ static void test_fanotify(void)
> > >         }
> > >
> > >         /*
> > > -        * drop_caches should evict inode from cache and remove evictable mark
> > > +        * drop_caches should evict inode from cache and remove evictable mark.
> > > +        * We call drop_caches twice as once the dentries will just cycle
> > > +        * through the LRU without being reclaimed and if there are no other
> > > +        * objects to reclaim, the slab reclaim will just stop instead of
> > > +        * retrying.
> > >          */
> > >         fsync_file(TEST_FILE);
> > >         SAFE_FILE_PRINTF(DROP_CACHES_FILE, "3");
> > > +       SAFE_FILE_PRINTF(DROP_CACHES_FILE, "3");
> > >
> > >         verify_mark_removed(TEST_FILE, "after drop_caches");
> >
> > If this improves the reliability of the test, I have no problem with
> > the solution,
>
> Yes, it fixes the problem at least for me :)
>
> > but I am a bit uneasy with the fact that fanotiy10 and fanotify23 have two
> > different mitigations.
>
> I know and I was thinking about that a bit. I could implement something
> similar as fanotify10 does for fanotify23 but it seemed to be a bit of an
> overkill so I went for the one-liner.
>
> > Anyway, I think that the explanation above is true for some fs but xfs
> > inode lifetime and shrinkers for example and more complex, so it does
> > not hold. Right?
>
> So the explanation is using internal knowledge of prune_icache_sb()
> implementation. XFS has its own rules for aging inodes so it will
> not hold there - it has this "inactivation" action which releases the last
> inode reference and AFAICS drop_caches results only in pinging background
> threads to do work but doesn't really wait for inodes to be inactivated,
> let alone freed.
>
> > Meaning that the explanation is true because fanotify10 has:
> >
> >         /* Shrinkers on other fs do not work reliably enough to
> > guarantee mark eviction on drop_caches */
> >         .dev_fs_type = "ext2",
> >
> > Maybe that should be spelled out?
>
> I guess you speak about fanotify23 now but yes, this patch depends on the
> fact that we are limited to ext2. Perhaps I can add that to the comment but
> it already is mentioned in the test as you show above...

Yes, that is all I meant, so a reader won't see this explanation and
try to copy the same mitigation to other tests that do not limit
themselves to ext2, better make that fact clear in the comment.

Thanks,
Amir.
Jan Kara Aug. 10, 2023, 5:59 p.m. UTC | #4
On Thu 10-08-23 17:38:15, Amir Goldstein wrote:
> On Thu, Aug 10, 2023 at 5:25 PM Jan Kara <jack@suse.cz> wrote:
> > > Meaning that the explanation is true because fanotify10 has:
> > >
> > >         /* Shrinkers on other fs do not work reliably enough to
> > > guarantee mark eviction on drop_caches */
> > >         .dev_fs_type = "ext2",
> > >
> > > Maybe that should be spelled out?
> >
> > I guess you speak about fanotify23 now but yes, this patch depends on the
> > fact that we are limited to ext2. Perhaps I can add that to the comment but
> > it already is mentioned in the test as you show above...
> 
> Yes, that is all I meant, so a reader won't see this explanation and
> try to copy the same mitigation to other tests that do not limit
> themselves to ext2, better make that fact clear in the comment.

OK, I've added a note to the comment that this is specific to the fs and
the test is limited to ext2. I'll send v2 shortly.

								Honza
diff mbox series

Patch

diff --git a/testcases/kernel/syscalls/fanotify/fanotify23.c b/testcases/kernel/syscalls/fanotify/fanotify23.c
index 89fd4f36a09b..2d50f70585b7 100644
--- a/testcases/kernel/syscalls/fanotify/fanotify23.c
+++ b/testcases/kernel/syscalls/fanotify/fanotify23.c
@@ -160,10 +160,15 @@  static void test_fanotify(void)
 	}
 
 	/*
-	 * drop_caches should evict inode from cache and remove evictable mark
+	 * drop_caches should evict inode from cache and remove evictable mark.
+	 * We call drop_caches twice as once the dentries will just cycle
+	 * through the LRU without being reclaimed and if there are no other
+	 * objects to reclaim, the slab reclaim will just stop instead of
+	 * retrying.
 	 */
 	fsync_file(TEST_FILE);
 	SAFE_FILE_PRINTF(DROP_CACHES_FILE, "3");
+	SAFE_FILE_PRINTF(DROP_CACHES_FILE, "3");
 
 	verify_mark_removed(TEST_FILE, "after drop_caches");