[RFC] sync_file_range02: remove the toplimit of write back
diff mbox series

Message ID 20191218082826.25083-1-liwang@redhat.com
State Accepted
Headers show
Series
  • [RFC] sync_file_range02: remove the toplimit of write back
Related show

Commit Message

Li Wang Dec. 18, 2019, 8:28 a.m. UTC
Test tries to sync a range of $BYTES bytes, and then makes sure that
between $BYTES and $BYTES+10% was written to disk. But sometimes, more
than $BYTES+10% hit the disk: "Synced 39843840, expected 33554432" so
it failed as below.

  tst_test.c:1179: INFO: Testing on ext4
  sync_file_range02.c:74: FAIL: Sync equals write: Synced 39843840,
                          expected 33554432
  sync_file_range02.c:74: FAIL: Sync inside of write: Synced 18612224,
                          expected 16777216

From FS dev's view:

" The test's assumptions are fundamentally false; it thinks it can look
at IO counters (tst_dev_bytes_written) for a disk before and after a
system call, and attribute all of the IO seen to the system call that
was made - this isn't necessarily correct. Other processes may generate
IO in the background.

ext4 defers a lot of IO on a freshly made filesystem to the kernel -
for example it will initialize the journal and inode tables after the
mount, and this will cause extra IO.

Creating ext4 filesystems with the options: "-E lazy_itable_init=0,
lazy_journal_init=0" might help.

Another option would be to raise the threshold. Essentially, the report
here is that the test is failing because the filesystem wrote "too much"
as a result of the sync. How much is "too much?" ..."

Let's remove the toplimit of write back, and think as long as we synced
at least the expected amount, the test passes. The +10% limit seems arbitrary.

Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Sumit Garg <sumit.garg@linaro.org>
---
 testcases/kernel/syscalls/sync_file_range/sync_file_range02.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Jan Stancek Dec. 18, 2019, 9:24 a.m. UTC | #1
----- Original Message -----
> " The test's assumptions are fundamentally false; it thinks it can look
> at IO counters (tst_dev_bytes_written) for a disk before and after a
> system call, and attribute all of the IO seen to the system call that
> was made - this isn't necessarily correct. Other processes may generate
> IO in the background.

We create our own block device, so there shouldn't be other processes
writing to it.

> ext4 defers a lot of IO on a freshly made filesystem to the kernel -
> for example it will initialize the journal and inode tables after the
> mount

Journal was my guess as well.

> Let's remove the toplimit of write back, and think as long as we synced
> at least the expected amount, the test passes. The +10% limit seems
> arbitrary.

I think this is reasonable approach until we find better way
to measure what was synced.

Acked-by: Jan Stancek <jstancek@redhat.com>
Sumit Garg Dec. 18, 2019, 10:13 a.m. UTC | #2
+ Cyril (who originally proposed upper bound check)

On Wed, 18 Dec 2019 at 14:55, Jan Stancek <jstancek@redhat.com> wrote:
>
>
> ----- Original Message -----
> > " The test's assumptions are fundamentally false; it thinks it can look
> > at IO counters (tst_dev_bytes_written) for a disk before and after a
> > system call, and attribute all of the IO seen to the system call that
> > was made - this isn't necessarily correct. Other processes may generate
> > IO in the background.
>
> We create our own block device, so there shouldn't be other processes
> writing to it.
>
> > ext4 defers a lot of IO on a freshly made filesystem to the kernel -
> > for example it will initialize the journal and inode tables after the
> > mount
>
> Journal was my guess as well.

To avoid similar scenarios, I suggested to add a "sync()" call just
prior to test here [1]. And I couldn't reproduce the failure in
1000-times run with 4.19 kernel.

Also, I think having a "sync()" call becomes more important in case we
remove the upper bound otherwise test might not provide reliable
results.

[1] http://lists.linux.it/pipermail/ltp/2019-October/014157.html

-Sumit

>
> > Let's remove the toplimit of write back, and think as long as we synced
> > at least the expected amount, the test passes. The +10% limit seems
> > arbitrary.
>
> I think this is reasonable approach until we find better way
> to measure what was synced.
>
> Acked-by: Jan Stancek <jstancek@redhat.com>
>
>
> --
> Mailing list info: https://lists.linux.it/listinfo/ltp
Li Wang Dec. 19, 2019, 7:10 a.m. UTC | #3
Hi Sumit,

On Wed, Dec 18, 2019 at 6:14 PM Sumit Garg <sumit.garg@linaro.org> wrote:

> + Cyril (who originally proposed upper bound check)
>
> On Wed, 18 Dec 2019 at 14:55, Jan Stancek <jstancek@redhat.com> wrote:
> >
> >
> > ----- Original Message -----
> > > " The test's assumptions are fundamentally false; it thinks it can look
> > > at IO counters (tst_dev_bytes_written) for a disk before and after a
> > > system call, and attribute all of the IO seen to the system call that
> > > was made - this isn't necessarily correct. Other processes may generate
> > > IO in the background.
> >
> > We create our own block device, so there shouldn't be other processes
> > writing to it.
> >
> > > ext4 defers a lot of IO on a freshly made filesystem to the kernel -
> > > for example it will initialize the journal and inode tables after the
> > > mount
> >
> > Journal was my guess as well.
>
> To avoid similar scenarios, I suggested to add a "sync()" call just
> prior to test here [1]. And I couldn't reproduce the failure in
> 1000-times run with 4.19 kernel.
>

Yes, that makes sense to me.


>
> Also, I think having a "sync()" call becomes more important in case we
> remove the upper bound otherwise test might not provide reliable
> results.
>
> [1] http://lists.linux.it/pipermail/ltp/2019-October/014157.html
>
> -Sumit
>
> >
> > > Let's remove the toplimit of write back, and think as long as we synced
> > > at least the expected amount, the test passes. The +10% limit seems
> > > arbitrary.
> >
> > I think this is reasonable approach until we find better way
> > to measure what was synced.
> >
> > Acked-by: Jan Stancek <jstancek@redhat.com>
> >
> >
> > --
> > Mailing list info: https://lists.linux.it/listinfo/ltp
>
>
Sumit Garg Dec. 19, 2019, 8:51 a.m. UTC | #4
Hi Li,

On Thu, 19 Dec 2019 at 12:40, Li Wang <liwang@redhat.com> wrote:
>
> Hi Sumit,
>
> On Wed, Dec 18, 2019 at 6:14 PM Sumit Garg <sumit.garg@linaro.org> wrote:
>>
>> + Cyril (who originally proposed upper bound check)
>>
>> On Wed, 18 Dec 2019 at 14:55, Jan Stancek <jstancek@redhat.com> wrote:
>> >
>> >
>> > ----- Original Message -----
>> > > " The test's assumptions are fundamentally false; it thinks it can look
>> > > at IO counters (tst_dev_bytes_written) for a disk before and after a
>> > > system call, and attribute all of the IO seen to the system call that
>> > > was made - this isn't necessarily correct. Other processes may generate
>> > > IO in the background.
>> >
>> > We create our own block device, so there shouldn't be other processes
>> > writing to it.
>> >
>> > > ext4 defers a lot of IO on a freshly made filesystem to the kernel -
>> > > for example it will initialize the journal and inode tables after the
>> > > mount
>> >
>> > Journal was my guess as well.
>>
>> To avoid similar scenarios, I suggested to add a "sync()" call just
>> prior to test here [1]. And I couldn't reproduce the failure in
>> 1000-times run with 4.19 kernel.
>
>
> Yes, that makes sense to me.
>

Would you mind to give it a try and check if you could reproduce the failure?

-Sumit

>>
>>
>> Also, I think having a "sync()" call becomes more important in case we
>> remove the upper bound otherwise test might not provide reliable
>> results.
>>
>> [1] http://lists.linux.it/pipermail/ltp/2019-October/014157.html
>>
>> -Sumit
>>
>> >
>> > > Let's remove the toplimit of write back, and think as long as we synced
>> > > at least the expected amount, the test passes. The +10% limit seems
>> > > arbitrary.
>> >
>> > I think this is reasonable approach until we find better way
>> > to measure what was synced.
>> >
>> > Acked-by: Jan Stancek <jstancek@redhat.com>
>> >
>> >
>> > --
>> > Mailing list info: https://lists.linux.it/listinfo/ltp
>>
>
>
> --
> Regards,
> Li Wang
Li Wang Dec. 19, 2019, 9:15 a.m. UTC | #5
On Thu, Dec 19, 2019 at 4:52 PM Sumit Garg <sumit.garg@linaro.org> wrote:

> ...
> >> To avoid similar scenarios, I suggested to add a "sync()" call just
> >> prior to test here [1]. And I couldn't reproduce the failure in
> >> 1000-times run with 4.19 kernel.
> >
> >
> > Yes, that makes sense to me.
> >
>
> Would you mind to give it a try and check if you could reproduce the
> failure?
>

Actually, I haven't had one time to reproduce it. This failure was reported
by CI jobs and very low frequency occurred. The fix is based on a
discussion with FS developer analysis.

But I would have a try with this patch(+ sync()).
Li Wang Dec. 31, 2019, 5:25 a.m. UTC | #6
Hi,

On Thu, Dec 19, 2019 at 5:15 PM Li Wang <liwang@redhat.com> wrote:

>
>
> On Thu, Dec 19, 2019 at 4:52 PM Sumit Garg <sumit.garg@linaro.org> wrote:
>
>> ...
>> >> To avoid similar scenarios, I suggested to add a "sync()" call just
>> >> prior to test here [1]. And I couldn't reproduce the failure in
>> >> 1000-times run with 4.19 kernel.
>> >
>> >
>> > Yes, that makes sense to me.
>> >
>>
>> Would you mind to give it a try and check if you could reproduce the
>> failure?
>>
>
> Actually, I haven't had one time to reproduce it. This failure was
> reported by CI jobs and very low frequency occurred. The fix is based on a
> discussion with FS developer analysis.
>
> But I would have a try with this patch(+ sync()).
>

I have tried this for many different systems, and it doesn't hit the
failure at least one time. So if no more comments, I would merge the patch
as below in the next step.

--- a/testcases/kernel/syscalls/sync_file_range/sync_file_range02.c
+++ b/testcases/kernel/syscalls/sync_file_range/sync_file_range02.c
@@ -48,6 +48,8 @@ static void verify_sync_file_range(struct testcase *tc)

        lseek(fd, tc->write_off, SEEK_SET);

+       sync();
+
        tst_dev_bytes_written(tst_device->dev);

        tst_fill_fd(fd, 0, TST_MB, tc->write_size_mb);
@@ -66,8 +68,7 @@ static void verify_sync_file_range(struct testcase *tc)

        SAFE_CLOSE(fd);

-       if ((written >= tc->exp_sync_size) &&
-           (written <= (tc->exp_sync_size + tc->exp_sync_size/10)))
+       if (written >= tc->exp_sync_size)
                tst_res(TPASS, "%s", tc->desc);
        else
                tst_res(TFAIL, "%s: Synced %li, expected %li", tc->desc,
Sumit Garg Dec. 31, 2019, 5:31 a.m. UTC | #7
On Tue, 31 Dec 2019 at 10:55, Li Wang <liwang@redhat.com> wrote:
>
> Hi,
>
> On Thu, Dec 19, 2019 at 5:15 PM Li Wang <liwang@redhat.com> wrote:
>>
>>
>>
>> On Thu, Dec 19, 2019 at 4:52 PM Sumit Garg <sumit.garg@linaro.org> wrote:
>>>
>>> ...
>>> >> To avoid similar scenarios, I suggested to add a "sync()" call just
>>> >> prior to test here [1]. And I couldn't reproduce the failure in
>>> >> 1000-times run with 4.19 kernel.
>>> >
>>> >
>>> > Yes, that makes sense to me.
>>> >
>>>
>>> Would you mind to give it a try and check if you could reproduce the failure?
>>
>>
>> Actually, I haven't had one time to reproduce it. This failure was reported by CI jobs and very low frequency occurred. The fix is based on a discussion with FS developer analysis.
>>
>> But I would have a try with this patch(+ sync()).
>
>
> I have tried this for many different systems, and it doesn't hit the failure at least one time. So if no more comments, I would merge the patch as below in the next step.
>
> --- a/testcases/kernel/syscalls/sync_file_range/sync_file_range02.c
> +++ b/testcases/kernel/syscalls/sync_file_range/sync_file_range02.c
> @@ -48,6 +48,8 @@ static void verify_sync_file_range(struct testcase *tc)
>
>         lseek(fd, tc->write_off, SEEK_SET);
>
> +       sync();
> +
>         tst_dev_bytes_written(tst_device->dev);
>
>         tst_fill_fd(fd, 0, TST_MB, tc->write_size_mb);
> @@ -66,8 +68,7 @@ static void verify_sync_file_range(struct testcase *tc)
>
>         SAFE_CLOSE(fd);
>
> -       if ((written >= tc->exp_sync_size) &&
> -           (written <= (tc->exp_sync_size + tc->exp_sync_size/10)))
> +       if (written >= tc->exp_sync_size)
>                 tst_res(TPASS, "%s", tc->desc);
>         else
>                 tst_res(TFAIL, "%s: Synced %li, expected %li", tc->desc,
>

Reviewed-by: Sumit Garg <sumit.garg@linaro.org>

> --
> Regards,
> Li Wang
Yang Xu Dec. 31, 2019, 5:49 a.m. UTC | #8
Hi Li

on 2019/12/31 13:25, Li Wang wrote:
> Hi,
>
> On Thu, Dec 19, 2019 at 5:15 PM Li Wang <liwang@redhat.com 
> <mailto:liwang@redhat.com>> wrote:
>
>
>
>     On Thu, Dec 19, 2019 at 4:52 PM Sumit Garg <sumit.garg@linaro.org
>     <mailto:sumit.garg@linaro.org>> wrote:
>
>         ...
>         >> To avoid similar scenarios, I suggested to add a "sync()"
>         call just
>         >> prior to test here [1]. And I couldn't reproduce the failure in
>         >> 1000-times run with 4.19 kernel.
>         >
>         >
>         > Yes, that makes sense to me.
>         >
>
>         Would you mind to give it a try and check if you could
>         reproduce the failure?
>
>
>     Actually, I haven't had one time to reproduce it. This failure was
>     reported by CI jobs and very low frequency occurred. The fix is
>     based on a discussion with FS developer analysis.
>
>     But I would have a try with this patch(+ sync()).
>
>
> I have tried this for many different systems, and it doesn't hit the 
> failure at least one time. So if no more comments, I would merge the 
> patch as below in the next step.

I have a question.we must call sync()? I think syncfs is more accurate.

Best Regards
Yang Xu

>
> --- a/testcases/kernel/syscalls/sync_file_range/sync_file_range02.c
> +++ b/testcases/kernel/syscalls/sync_file_range/sync_file_range02.c
> @@ -48,6 +48,8 @@ static void verify_sync_file_range(struct testcase *tc)
>
>         lseek(fd, tc->write_off, SEEK_SET);
>
> +       sync();
> +
>         tst_dev_bytes_written(tst_device->dev);
>
>         tst_fill_fd(fd, 0, TST_MB, tc->write_size_mb);
> @@ -66,8 +68,7 @@ static void verify_sync_file_range(struct testcase *tc)
>
>         SAFE_CLOSE(fd);
>
> -       if ((written >= tc->exp_sync_size) &&
> -           (written <= (tc->exp_sync_size + tc->exp_sync_size/10)))
> +       if (written >= tc->exp_sync_size)
>                 tst_res(TPASS, "%s", tc->desc);
>         else
>                 tst_res(TFAIL, "%s: Synced %li, expected %li", tc->desc,
>
> -- 
> Regards,
> Li Wang
>
Li Wang Dec. 31, 2019, 6:03 a.m. UTC | #9
Hi Yang,

Yang Xu <xuyang2018.jy@cn.fujitsu.com> wrote:

> ...
>
>
> I have tried this for many different systems, and it doesn't hit the
> failure at least one time. So if no more comments, I would merge the patch
> as below in the next step.
>
> I have a question.we must call sync()? I think syncfs is more accurate.
>
>
Here we use sync() is to make fs metadata/cache being written back before
testing because there is no obtainable file descriptor 'fd' for the ext4
deferred IO (e.g. initialize the journal and inode tables).
Yang Xu Dec. 31, 2019, 7:05 a.m. UTC | #10
Hi Li
> Hi Yang,
> 
> Yang Xu <xuyang2018.jy@cn.fujitsu.com 
> <mailto:xuyang2018.jy@cn.fujitsu.com>> wrote:
> 
>     ...
> 
>>
>>     I have tried this for many different systems, and it doesn't hit
>>     the failure at least one time. So if no more comments, I would
>>     merge the patch as below in the next step.
> 
>     I have a question.we must call sync()? I think syncfs is more accurate.
> 
> 
> Here we use sync() is to make fs metadata/cache being written back 
> before testing because there is no obtainable file descriptor 'fd' for 
> the ext4 deferred IO (e.g. initialize the journal and inode tables).
I see. For other test cases using tst_dev_bytes_written api such as 
fdatasync03.c,  we also need to call sync() to make this written value 
more accurate.  What do you think about it?

Best Regards
Yang Xu
> 
> -- 
> Regards,
> Li Wang
Li Wang Dec. 31, 2019, 7:32 a.m. UTC | #11
On Tue, Dec 31, 2019 at 3:05 PM Yang Xu <xuyang2018.jy@cn.fujitsu.com>
wrote:

> ...
> >>     I have tried this for many different systems, and it doesn't hit
> >>     the failure at least one time. So if no more comments, I would
> >>     merge the patch as below in the next step.
> >
> >     I have a question.we must call sync()? I think syncfs is more
> accurate.
> >
> >
> > Here we use sync() is to make fs metadata/cache being written back
> > before testing because there is no obtainable file descriptor 'fd' for
> > the ext4 deferred IO (e.g. initialize the journal and inode tables).
>


> I see. For other test cases using tst_dev_bytes_written api such as
> fdatasync03.c,  we also need to call sync() to make this written value
> more accurate.  What do you think about it?
>

Good point. I think we should take care of them as we do for
sync_file_range02 too. It will more easily report fail in the case of a
situation that synced data is less than expected.

$ git grep tst_dev_bytes_written
kernel/syscalls/fdatasync/fdatasync03.c:
 tst_dev_bytes_written(tst_device->dev);
kernel/syscalls/fdatasync/fdatasync03.c:        written =
tst_dev_bytes_written(tst_device->dev);
kernel/syscalls/fsync/fsync04.c:
 tst_dev_bytes_written(tst_device->dev);
kernel/syscalls/fsync/fsync04.c:        written =
tst_dev_bytes_written(tst_device->dev);
kernel/syscalls/sync/sync03.c:  tst_dev_bytes_written(tst_device->dev);
kernel/syscalls/sync/sync03.c:  written =
tst_dev_bytes_written(tst_device->dev);
kernel/syscalls/sync_file_range/sync_file_range02.c:
 tst_dev_bytes_written(tst_device->dev);
kernel/syscalls/sync_file_range/sync_file_range02.c:    written =
tst_dev_bytes_written(tst_device->dev);
kernel/syscalls/syncfs/syncfs01.c:
 tst_dev_bytes_written(tst_device->dev);
kernel/syscalls/syncfs/syncfs01.c:      written =
tst_dev_bytes_written(tst_device->dev);
Sumit Garg Dec. 31, 2019, 7:41 a.m. UTC | #12
On Tue, 31 Dec 2019 at 13:02, Li Wang <liwang@redhat.com> wrote:
>
>
>
> On Tue, Dec 31, 2019 at 3:05 PM Yang Xu <xuyang2018.jy@cn.fujitsu.com> wrote:
>>
>> ...
>> >>     I have tried this for many different systems, and it doesn't hit
>> >>     the failure at least one time. So if no more comments, I would
>> >>     merge the patch as below in the next step.
>> >
>> >     I have a question.we must call sync()? I think syncfs is more accurate.
>> >
>> >
>> > Here we use sync() is to make fs metadata/cache being written back
>> > before testing because there is no obtainable file descriptor 'fd' for
>> > the ext4 deferred IO (e.g. initialize the journal and inode tables).
>
>
>>
>> I see. For other test cases using tst_dev_bytes_written api such as
>> fdatasync03.c,  we also need to call sync() to make this written value
>> more accurate.  What do you think about it?
>
>
> Good point. I think we should take care of them as we do for sync_file_range02 too. It will more easily report fail in the case of a situation that synced data is less than expected.
>
> $ git grep tst_dev_bytes_written
> kernel/syscalls/fdatasync/fdatasync03.c:        tst_dev_bytes_written(tst_device->dev);
> kernel/syscalls/fdatasync/fdatasync03.c:        written = tst_dev_bytes_written(tst_device->dev);
> kernel/syscalls/fsync/fsync04.c:        tst_dev_bytes_written(tst_device->dev);
> kernel/syscalls/fsync/fsync04.c:        written = tst_dev_bytes_written(tst_device->dev);
> kernel/syscalls/sync/sync03.c:  tst_dev_bytes_written(tst_device->dev);
> kernel/syscalls/sync/sync03.c:  written = tst_dev_bytes_written(tst_device->dev);
> kernel/syscalls/sync_file_range/sync_file_range02.c:    tst_dev_bytes_written(tst_device->dev);
> kernel/syscalls/sync_file_range/sync_file_range02.c:    written = tst_dev_bytes_written(tst_device->dev);
> kernel/syscalls/syncfs/syncfs01.c:      tst_dev_bytes_written(tst_device->dev);
> kernel/syscalls/syncfs/syncfs01.c:      written = tst_dev_bytes_written(tst_device->dev);
>

Agree.

-Sumit

> --
> Regards,
> Li Wang
Li Wang Jan. 2, 2020, 1:45 a.m. UTC | #13
On Tue, Dec 31, 2019 at 1:31 PM Sumit Garg <sumit.garg@linaro.org> wrote:

> ...
>
> Reviewed-by: Sumit Garg <sumit.garg@linaro.org>
>

Pushed.

Patch
diff mbox series

diff --git a/testcases/kernel/syscalls/sync_file_range/sync_file_range02.c b/testcases/kernel/syscalls/sync_file_range/sync_file_range02.c
index eb08143c3..643be14fa 100644
--- a/testcases/kernel/syscalls/sync_file_range/sync_file_range02.c
+++ b/testcases/kernel/syscalls/sync_file_range/sync_file_range02.c
@@ -66,8 +66,7 @@  static void verify_sync_file_range(struct testcase *tc)
 
 	SAFE_CLOSE(fd);
 
-	if ((written >= tc->exp_sync_size) &&
-	    (written <= (tc->exp_sync_size + tc->exp_sync_size/10)))
+	if ((written >= tc->exp_sync_size))
 		tst_res(TPASS, "%s", tc->desc);
 	else
 		tst_res(TFAIL, "%s: Synced %li, expected %li", tc->desc,