Message ID | 20220420114720.1463473-1-liwang@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | [1/2] ioctl09: wait for udevd complete the uevent handling | expand |
Hi! > This is quite necessary to wait for uevent handing settled before > checking the partition is taken effect. > > `udevadm monitor` listening from another terminal: > > KERNEL[447.077683] change /devices/virtual/block/loop0 (block) > KERNEL[447.077939] add /devices/virtual/block/loop0/loop0p1 (block) > KERNEL[447.077999] add /devices/virtual/block/loop0/loop0p2 (block) > UDEV [447.081138] change /devices/virtual/block/loop0 (block) > UDEV [447.083490] add /devices/virtual/block/loop0/loop0p1 (block) > UDEV [447.086898] add /devices/virtual/block/loop0/loop0p2 (block) > > Which helps to get rid of such boring failures: > > ioctl09.c:44: TPASS: access /sys/block/loop0/loop0p1 succeeds > ioctl09.c:52: TPASS: access /dev/loop0p1 succeeds > ioctl09.c:47: TFAIL: access /sys/block/loop0/loop0p2 fails > ioctl09.c:55: TFAIL: access /dev/loop0p2 fails > > I manually confirmed those failures disappeared from a reproducible system > even with running 10000 times. > > Signed-off-by: Li Wang <liwang@redhat.com> > --- > testcases/kernel/syscalls/ioctl/ioctl09.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/testcases/kernel/syscalls/ioctl/ioctl09.c b/testcases/kernel/syscalls/ioctl/ioctl09.c > index 9728ecb9c..533a4a5d0 100644 > --- a/testcases/kernel/syscalls/ioctl/ioctl09.c > +++ b/testcases/kernel/syscalls/ioctl/ioctl09.c > @@ -58,6 +58,10 @@ static void check_partition(int part_num, bool value) > > static void verify_ioctl(void) > { > + char cmd[BUFSIZ]; > + > + sprintf(cmd, "udevadm settle --exit-if-exists=%s", dev_path); > + > const char *const cmd_parted_old[] = {"parted", "-s", "test.img", > "mklabel", "msdos", "mkpart", > "primary", "ext4", "1M", "10M", > @@ -75,11 +79,13 @@ static void verify_ioctl(void) > > loopinfo.lo_flags = LO_FLAGS_PARTSCAN; > SAFE_IOCTL(dev_fd, LOOP_SET_STATUS, &loopinfo); > + tst_system(cmd); > check_partition(1, true); > check_partition(2, false); > > change_partition(cmd_parted_new); > TST_RETRY_FUNC(ioctl(dev_fd, BLKRRPART, 0), TST_RETVAL_EQ0); > + tst_system(cmd); > check_partition(1, true); > check_partition(2, true); We allready use TST_RETRY_FN_EXP_BACKOFF() in there with 30 second timeout isn't that enough? And if it isn't wouldn't simply increasing the timeout to a minute or two fix the issue?
Cyril Hrubis <chrubis@suse.cz> wrote: > > > static void verify_ioctl(void) > > { > > + char cmd[BUFSIZ]; > > + > > + sprintf(cmd, "udevadm settle --exit-if-exists=%s", dev_path); > > + > > const char *const cmd_parted_old[] = {"parted", "-s", "test.img", > > "mklabel", "msdos", "mkpart", > > "primary", "ext4", "1M", > "10M", > > @@ -75,11 +79,13 @@ static void verify_ioctl(void) > > > > loopinfo.lo_flags = LO_FLAGS_PARTSCAN; > > SAFE_IOCTL(dev_fd, LOOP_SET_STATUS, &loopinfo); > > + tst_system(cmd); > > check_partition(1, true); > > check_partition(2, false); > > > > change_partition(cmd_parted_new); > > TST_RETRY_FUNC(ioctl(dev_fd, BLKRRPART, 0), TST_RETVAL_EQ0); > > + tst_system(cmd); > > check_partition(1, true); > > check_partition(2, true); > > We allready use TST_RETRY_FN_EXP_BACKOFF() in there with 30 second > I have to say, for most systems 30 sec is long enough, but we definitely got sporadically fails in larger-scale automation testing. timeout isn't that enough? And if it isn't wouldn't simply increasing > the timeout to a minute or two fix the issue? > That should be better, I just have a try on my reproducible system, but it does not work with enlarged to 180 seconds. ioctl09.c:52: TPASS: access /dev/loop0p1 succeeds octl09.c:47: TFAIL: access /sys/block/loop0/loop0p2 fails Test timeouted, sending SIGKILL! tst_test.c:1509: TINFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 tst_test.c:1511: TBROK: Test killed! (timeout?) Note, the `udevadm settle` uses 180s as default timeout as well, but it can work, I will look into udevadm.c to see if that does something additional besides the waiting. If so, we might need to remove the TST_RETRY_FN_EXP_BACKOFF from this test.
Hi! > timeout isn't that enough? And if it isn't wouldn't simply increasing > > the timeout to a minute or two fix the issue? > > > > > That should be better, I just have a try on my reproducible system, > but it does not work with enlarged to 180 seconds. You have to set the test timeout to be four times of this timeout otherwise it will report that the test had timeouted sporadically. > ioctl09.c:52: TPASS: access /dev/loop0p1 succeeds > octl09.c:47: TFAIL: access /sys/block/loop0/loop0p2 fails > Test timeouted, sending SIGKILL! > tst_test.c:1509: TINFO: If you are running on slow machine, try exporting > LTP_TIMEOUT_MUL > 1 > tst_test.c:1511: TBROK: Test killed! (timeout?) > > > Note, the `udevadm settle` uses 180s as default timeout as well, > but it can work, I will look into udevadm.c to see if that does > something additional besides the waiting. I guess that the difference is that when we wait on a socket actively we return sooner than with the exp backoff. By definition the exp backoff may wait for twice as long as the udevadm and because of that the test may actually timeout because we waited just a little bit longer. > If so, we might need to remove the TST_RETRY_FN_EXP_BACKOFF > from this test. That would be valid option, remove the exp backoff and wait activelly. Btw we already have an infrastructure for matching kernel event uevents in the kernel/uevent/ directory. If we split the uevent.h there into a header and libs/ltpuevent/ we could simply wait for the event by filling in the uevent_desc structure and calling wait_for_uevents().
On Wed, Apr 20, 2022 at 8:50 PM Cyril Hrubis <chrubis@suse.cz> wrote: > > > Note, the `udevadm settle` uses 180s as default timeout as well, > > but it can work, I will look into udevadm.c to see if that does > > something additional besides the waiting. > > I guess that the difference is that when we wait on a socket actively we > return sooner than with the exp backoff. By definition the exp backoff > may wait for twice as long as the udevadm and because of that the test > may actually timeout because we waited just a little bit longer. > That's right, I overlooked that exp backoff time is longer than the set value. > > > If so, we might need to remove the TST_RETRY_FN_EXP_BACKOFF > > from this test. > > That would be valid option, remove the exp backoff and wait activelly. > Yes, wait actively and confirm event handling is completed. (`udevadm settle` is the direct way I can think of now) > Btw we already have an infrastructure for matching kernel event uevents > in the kernel/uevent/ directory. If we split the uevent.h there into a > header and libs/ltpuevent/ we could simply wait for the event by filling > in the uevent_desc structure and calling wait_for_uevents(). > Good to know this, I have looked through that but am afraid it can't fully satisfy our requirements. Because the wait_for_uevents is only falling into an infinite loop and returns if detecting the expected UEVENT (which was sent from the kernel function kobject_uevent_env). The key point here is, we have to confirm `udevd` completed the UEVENT handling, otherwise only detecting event exists is not enough, there is still potentially get the race condition again if the event queue there.
diff --git a/testcases/kernel/syscalls/ioctl/ioctl09.c b/testcases/kernel/syscalls/ioctl/ioctl09.c index 9728ecb9c..533a4a5d0 100644 --- a/testcases/kernel/syscalls/ioctl/ioctl09.c +++ b/testcases/kernel/syscalls/ioctl/ioctl09.c @@ -58,6 +58,10 @@ static void check_partition(int part_num, bool value) static void verify_ioctl(void) { + char cmd[BUFSIZ]; + + sprintf(cmd, "udevadm settle --exit-if-exists=%s", dev_path); + const char *const cmd_parted_old[] = {"parted", "-s", "test.img", "mklabel", "msdos", "mkpart", "primary", "ext4", "1M", "10M", @@ -75,11 +79,13 @@ static void verify_ioctl(void) loopinfo.lo_flags = LO_FLAGS_PARTSCAN; SAFE_IOCTL(dev_fd, LOOP_SET_STATUS, &loopinfo); + tst_system(cmd); check_partition(1, true); check_partition(2, false); change_partition(cmd_parted_new); TST_RETRY_FUNC(ioctl(dev_fd, BLKRRPART, 0), TST_RETVAL_EQ0); + tst_system(cmd); check_partition(1, true); check_partition(2, true);
This is quite necessary to wait for uevent handing settled before checking the partition is taken effect. `udevadm monitor` listening from another terminal: KERNEL[447.077683] change /devices/virtual/block/loop0 (block) KERNEL[447.077939] add /devices/virtual/block/loop0/loop0p1 (block) KERNEL[447.077999] add /devices/virtual/block/loop0/loop0p2 (block) UDEV [447.081138] change /devices/virtual/block/loop0 (block) UDEV [447.083490] add /devices/virtual/block/loop0/loop0p1 (block) UDEV [447.086898] add /devices/virtual/block/loop0/loop0p2 (block) Which helps to get rid of such boring failures: ioctl09.c:44: TPASS: access /sys/block/loop0/loop0p1 succeeds ioctl09.c:52: TPASS: access /dev/loop0p1 succeeds ioctl09.c:47: TFAIL: access /sys/block/loop0/loop0p2 fails ioctl09.c:55: TFAIL: access /dev/loop0p2 fails I manually confirmed those failures disappeared from a reproducible system even with running 10000 times. Signed-off-by: Li Wang <liwang@redhat.com> --- testcases/kernel/syscalls/ioctl/ioctl09.c | 6 ++++++ 1 file changed, 6 insertions(+)