diff mbox series

msgstress03: "Fork failed (may be OK if under stress)" problem observed on qemu.

Message ID CAKWYkK3a-Qp5bZNyL67JkTtzD=_55c0tk7eb69rsikYr+r=QqA@mail.gmail.com
State Rejected
Headers show
Series msgstress03: "Fork failed (may be OK if under stress)" problem observed on qemu. | expand

Commit Message

Kautuk Consul Jan. 21, 2022, 8:23 a.m. UTC
Hi All,

I am running RISCV kernel on qemu and on executing the msgstress03
testcase I observe that it fails with the following failure
log:
msgstress03    0  TINFO  :  Cannot read session user limits from
'/sys/fs/cgroup/user.slice/user-0.slice/pids.max'
msgstress03    0  TINFO  :  Found limit of processes 10178 (from
/sys/fs/cgroup/pids/user.slice/user-0.slice/pids.max)
msgstress03    0  TINFO  :  Requested number of processes higher than
limit (10000 > 9991), setting to 9991
msgstress03    1  TFAIL  :  msgstress03.c:163:  Fork failed (may be OK
if under stress)

The kernel dmesg log shows the following log:
[ 3731.980951] cgroup: fork rejected by pids controller in
/user.slice/user-0.slice/session-c1.scope

I put some logs into the kernel and confirmed that the cgroup limit of
forks, i.e. 10178 is being exceeded by this msgstress03 testcase due
to which it fails to fork() in a legitimate manner.
On analyzing the msgstress03 testcase code I see that the test-case
tends to assume that the "nprocs" number of forks are done
and it is correctly restricted to the limit which is 9991. However,
the total number of forks is much larger (i.e. 2*nprocs) as the nproc
children do an additional fork within do_test().

Due to this on slower machines (where the children do not execute fast
enough and the parent doesn't do a wait syscall fast
enough) this testcase can/will fail. The initial children may even
reach exit(), but they will remain as defunct as the parent process
wil not necessarily be able to execute the wait() syscall on all them
fast enough to ensure that the pids become free for use.

I made the following changes and the test-case passed:
the nprocs value is set with a different calculation.
Specifically, I observe that the msgstress04 testcase uses only
free_pids / 2 pids instead of the full free_pids number of processes.

Can someone confirm my findings ? If needed I can also send out a
patch with my above nprocs/2 changes if required.
Or, if there is any better fix or opinion kindly reply back to us.

Thanks and Regards.

Comments

Cyril Hrubis Jan. 21, 2022, 9:11 a.m. UTC | #1
Hi!
Hi!
> The reason why other test-cases like msgstress04 dont fail is because
> the nprocs value is set with a different calculation.
> Specifically, I observe that the msgstress04 testcase uses only
> free_pids / 2 pids instead of the full free_pids number of processes.
> 
> Can someone confirm my findings ? If needed I can also send out a
> patch with my above nprocs/2 changes if required.
> Or, if there is any better fix or opinion kindly reply back to us.

Actually these test are broken much more than this, they need to be
redesigned and rewritten properly. There is even a work-in-progress
patchset, but unfortunatelly it wasn't updated for nearly a year, see:

https://patchwork.ozlabs.org/project/ltp/list/?series=233661

https://github.com/linux-test-project/ltp/issues/509
Kautuk Consul Jan. 21, 2022, 9:13 a.m. UTC | #2
Thanks for the quick response. Any idea when changes to these
test-cases is merged in the mainline ?

On Fri, Jan 21, 2022 at 2:39 PM Cyril Hrubis <chrubis@suse.cz> wrote:
>
> Hi!
> Hi!
> > The reason why other test-cases like msgstress04 dont fail is because
> > the nprocs value is set with a different calculation.
> > Specifically, I observe that the msgstress04 testcase uses only
> > free_pids / 2 pids instead of the full free_pids number of processes.
> >
> > Can someone confirm my findings ? If needed I can also send out a
> > patch with my above nprocs/2 changes if required.
> > Or, if there is any better fix or opinion kindly reply back to us.
>
> Actually these test are broken much more than this, they need to be
> redesigned and rewritten properly. There is even a work-in-progress
> patchset, but unfortunatelly it wasn't updated for nearly a year, see:
>
> https://patchwork.ozlabs.org/project/ltp/list/?series=233661
>
> https://github.com/linux-test-project/ltp/issues/509
>
> --
> Cyril Hrubis
> chrubis@suse.cz
Cyril Hrubis Jan. 21, 2022, 9:20 a.m. UTC | #3
Hi!
> Thanks for the quick response. Any idea when changes to these
> test-cases is merged in the mainline ?

The current patchset is not finished, so unless somebody picks up the
patchset to finish it it's not going to get merged at all...
diff mbox series

Patch

diff --git a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c
b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c
index 3cb70ab18..75cfc109d 100644
--- a/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c
+++ b/testcases/kernel/syscalls/ipc/msgstress/msgstress03.c
@@ -131,7 +131,7 @@  int main(int argc, char **argv)
        /* Set up array of unique keys for use in allocating message
         * queues
         */
-       for (i = 0; i < nprocs; i++) {
+       for (i = 0; i < nprocs/2; i++) {
                ok = 1;
                do {
                        /* Get random key */
@@ -157,7 +157,7 @@  int main(int argc, char **argv)
         * of random length messages with specific values.
         */

-       for (i = 0; i < nprocs; i++) {
+       for (i = 0; i < nprocs/2; i++) {
                fflush(stdout);
                if ((pid = FORK_OR_VFORK()) < 0) {
                        tst_brkm(TFAIL,
@@ -191,11 +191,11 @@  int main(int argc, char **argv)
                }
        }
        /* Make sure proper number of children exited */
-       if (count != nprocs) {
+       if (count != nprocs/2) {
                tst_brkm(TFAIL,
                         NULL,
                         "Wrong number of children exited, Saw %d, Expected %d",
-                        count, nprocs);
+                        count, nprocs/2);
        }

The reason why other test-cases like msgstress04 dont fail is because