Message ID | 20250510205326.1353857-1-wegao@suse.com |
---|---|
State | Accepted |
Headers | show |
Series | [v1] Fix: dirtyc0w_shmem child process exit with error due to uninitialized lib_pid | expand |
Context | Check | Description |
---|---|---|
ltpci/debian_stable_s390x-linux-gnu-gcc_s390x | success | success |
ltpci/debian_stable_aarch64-linux-gnu-gcc_arm64 | success | success |
ltpci/debian_stable_powerpc64le-linux-gnu-gcc_ppc64el | success | success |
ltpci/debian_stable_gcc | fail | failure |
ltpci/debian_stable_gcc | fail | failure |
ltpci/ubuntu_jammy_gcc | fail | failure |
ltpci/ubuntu_bionic_gcc | fail | failure |
ltpci/debian_testing_clang | fail | failure |
ltpci/opensuse-archive_42-2_gcc | fail | failure |
ltpci/alpine_latest_gcc | fail | failure |
ltpci/debian_testing_gcc | fail | failure |
ltpci/opensuse-leap_latest_gcc | fail | failure |
ltpci/fedora_latest_clang | fail | failure |
ltpci/quay-io-centos-centos_stream9_gcc | fail | failure |
ltpci/debian_oldstable_gcc | fail | failure |
ltpci/debian_oldstable_clang | fail | failure |
On Sat, May 10, 2025 at 4:54 PM Wei Gao via ltp <ltp@lists.linux.it> wrote: > > The dirtyc0w_shmem test spawns a child process using execlp. This child process then calls tst_brk(), which exits early > with a non-zero status because execlp does not inherit the parent's lib_pid variable. Consequently, the parent process > incorrectly reports an "Invalid child exit value". > > This commit addresses this by ensuring the child process has access to the necessary lib_pid and main_pid by passing > them through a shared results structure. This prevents the premature exit in the child and the subsequent error report > in the parent. > > Related commit: > commit a1f82704c28d9e027ca899f5ca2841e7fe49de72 > lib/tst_test.c: Fix tst_brk() handling > > Detail failure log: > tst_tmpdir.c:317: TINFO: Using /tmp/LTP_dirSOGVBC as tmpdir (btrfs filesystem) > tst_test.c:1938: TINFO: LTP version: 20250507.4a0e3a8fa > tst_test.c:1942: TINFO: Tested kernel: 6.4.0-150700.51-default #1 SMP Wed Apr 30 21:35:43 UTC 2025 (6930611) s390x > tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz' > tst_kconfig.c:678: TINFO: CONFIG_FAULT_INJECTION kernel option detected which might slow the execution > tst_test.c:1760: TINFO: Overall timeout per run is 0h 04m 00s > dirtyc0w_shmem.c:54: TINFO: Mounting tmp_dirtyc0w_shmem to /tmp/LTP_dirSOGVBC/tmp_dirtyc0w_shmem fstyp=tmpfs flags=0 > dirtyc0w_shmem_child.c:160: TCONF: System does not have userfaultfd minor fault support for shmem <<<<<<<<<< 1 > tst_test.c:481: TBROK: Invalid child (8163) exit value 32 <<<<<<<< 2 > dirtyc0w_shmem.c:102: TINFO: Umounting /tmp/LTP_dirSOGVBC/tmp_dirtyc0w_shmem > > tmp_dirtyc0w_shmem.c call execlp to create new process run dirtyc0w_shmem_child bin. > > SAFE_EXECLP("dirtyc0w_shmem_child", "dirtyc0w_shmem_child", NULL) > > Within dirtyc0w_shmem_child.c trigger > > tst_brk(TCONF, "System does not have userfaultfd minor fault support for shmem") > > Since execlp does not inherit the parent process's variables lib_pid, so it will return TCONF(32) directly. > > void tst_vbrk_(const char *file, const int lineno, int ttype, const char *fmt, > va_list va) > { > ... > if (!lib_pid) > exit(TTYPE_RESULT(ttype)); <<<<< > ... > } > > So finally captured by check_child_status report an error. > > static void check_child_status(pid_t pid, int status) > { > ... > if (WEXITSTATUS(status)) > tst_brk(TBROK, "Invalid child (%i) exit value %i", pid, WEXITSTATUS(status)); <<<< > } > > Signed-off-by: Wei Gao <wegao@suse.com> > --- > lib/tst_test.c | 25 ++++++++++++++----------- > 1 file changed, 14 insertions(+), 11 deletions(-) > > diff --git a/lib/tst_test.c b/lib/tst_test.c > index 2bb4519dd..b666715b9 100644 > --- a/lib/tst_test.c > +++ b/lib/tst_test.c > @@ -59,7 +59,7 @@ static const char *tid; > static int iterations = 1; > static float duration = -1; > static float timeout_mul = -1; > -static pid_t main_pid, lib_pid; > +static pid_t lib_pid; > static int mntpoint_mounted; > static int ovl_mounted; > static struct timespec tst_start_time; /* valid only for test pid */ > @@ -78,6 +78,8 @@ struct results { > int abort_flag; > unsigned int runtime; > unsigned int overall_time; > + pid_t lib_pid; > + pid_t main_pid; > }; Can we avoid polluting the struct results with main_pid and lib_pid? From a design separation standpoint, results should only store test outcome data, not test infrastructure state. As LTP library already provides a tst_reinit() function for child processes spawned via exec()/execlp() to restore their test context. We could consider two approaches: 1. Create a separate IPC region to store infrastructure state(e.g., main_pid, lib_pid) in a new struct tst_meta_info. The child can then access this data via tst_reinit() without modifying the struct results. 2. Simply pass main_pid and lib_pid through environment variables in the ltp library, and retrieve them from tst_reinit() in the child. Or, maybe others have simpler options. Cc'ing them. In either case, we should set 'tst_test->child_need_reinit' in the parent. @Cyril, @Petr, I support merging this fix before the May release, as I’ve also encountered the same failure during my pre-release testing.
> On Sat, May 10, 2025 at 4:54 PM Wei Gao via ltp <ltp@lists.linux.it> wrote: > > The dirtyc0w_shmem test spawns a child process using execlp. This child process then calls tst_brk(), which exits early > > with a non-zero status because execlp does not inherit the parent's lib_pid variable. Consequently, the parent process > > incorrectly reports an "Invalid child exit value". > > This commit addresses this by ensuring the child process has access to the necessary lib_pid and main_pid by passing > > them through a shared results structure. This prevents the premature exit in the child and the subsequent error report > > in the parent. > > Related commit: > > commit a1f82704c28d9e027ca899f5ca2841e7fe49de72 > > lib/tst_test.c: Fix tst_brk() handling > > Detail failure log: > > tst_tmpdir.c:317: TINFO: Using /tmp/LTP_dirSOGVBC as tmpdir (btrfs filesystem) > > tst_test.c:1938: TINFO: LTP version: 20250507.4a0e3a8fa > > tst_test.c:1942: TINFO: Tested kernel: 6.4.0-150700.51-default #1 SMP Wed Apr 30 21:35:43 UTC 2025 (6930611) s390x > > tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz' > > tst_kconfig.c:678: TINFO: CONFIG_FAULT_INJECTION kernel option detected which might slow the execution > > tst_test.c:1760: TINFO: Overall timeout per run is 0h 04m 00s > > dirtyc0w_shmem.c:54: TINFO: Mounting tmp_dirtyc0w_shmem to /tmp/LTP_dirSOGVBC/tmp_dirtyc0w_shmem fstyp=tmpfs flags=0 > > dirtyc0w_shmem_child.c:160: TCONF: System does not have userfaultfd minor fault support for shmem <<<<<<<<<< 1 > > tst_test.c:481: TBROK: Invalid child (8163) exit value 32 <<<<<<<< 2 > > dirtyc0w_shmem.c:102: TINFO: Umounting /tmp/LTP_dirSOGVBC/tmp_dirtyc0w_shmem > > tmp_dirtyc0w_shmem.c call execlp to create new process run dirtyc0w_shmem_child bin. > > SAFE_EXECLP("dirtyc0w_shmem_child", "dirtyc0w_shmem_child", NULL) > > Within dirtyc0w_shmem_child.c trigger > > tst_brk(TCONF, "System does not have userfaultfd minor fault support for shmem") > > Since execlp does not inherit the parent process's variables lib_pid, so it will return TCONF(32) directly. > > void tst_vbrk_(const char *file, const int lineno, int ttype, const char *fmt, > > va_list va) > > { > > ... > > if (!lib_pid) > > exit(TTYPE_RESULT(ttype)); <<<<< > > ... > > } > > So finally captured by check_child_status report an error. > > static void check_child_status(pid_t pid, int status) > > { > > ... > > if (WEXITSTATUS(status)) > > tst_brk(TBROK, "Invalid child (%i) exit value %i", pid, WEXITSTATUS(status)); <<<< > > } > > Signed-off-by: Wei Gao <wegao@suse.com> > > --- > > lib/tst_test.c | 25 ++++++++++++++----------- > > 1 file changed, 14 insertions(+), 11 deletions(-) > > diff --git a/lib/tst_test.c b/lib/tst_test.c > > index 2bb4519dd..b666715b9 100644 > > --- a/lib/tst_test.c > > +++ b/lib/tst_test.c > > @@ -59,7 +59,7 @@ static const char *tid; > > static int iterations = 1; > > static float duration = -1; > > static float timeout_mul = -1; > > -static pid_t main_pid, lib_pid; > > +static pid_t lib_pid; Wei, you probably noticed tests fail (thanks to Andrea's CI improvements). It broke at least tst_needs_cmds02.c I guess due missing main_pid: $ gdb ./tst_needs_cmds02 ... (gdb) run ... tst_cmd.c:268: TCONF: Couldn't find 'mkfs.ext45' in $PATH Program received signal SIGSEGV, Segmentation fault. 0x000055555555bea4 in tst_vbrk_ (file=<optimized out>, lineno=<optimized out>, ttype=<optimized out>, fmt=<optimized out>, va=<optimized out>) at tst_test.c:397 397 if (!results->lib_pid) (gdb) bt #0 0x000055555555bea4 in tst_vbrk_ (file=<optimized out>, lineno=<optimized out>, ttype=<optimized out>, fmt=<optimized out>, va=<optimized out>) at tst_test.c:397 #1 0x000055555555b5be in tst_brk_ (file=file@entry=0x55555557de1c "tst_cmd.c", lineno=lineno@entry=268, ttype=ttype@entry=32, fmt=fmt@entry=0x55555557deed "Couldn't find '%s' in $PATH") at tst_test.c:460 #2 0x000055555556bc57 in tst_check_cmd (cmd=0x55555557c643 "mkfs.ext45 >= 1.43.0", brk_nosupp=brk_nosupp@entry=1) at tst_cmd.c:268 #3 0x000055555555e641 in do_setup (argc=1, argv=0x7fffffffcaa8) at tst_test.c:1363 #4 tst_run_tcases (argc=1, argv=0x7fffffffcaa8, self=self@entry=0x55555558a720 <test>) at tst_test.c:1935 #5 0x000055555555b044 in main (argc=<optimized out>, argv=<optimized out>) at ../../include/tst_test.h:729 > > static int mntpoint_mounted; > > static int ovl_mounted; > > static struct timespec tst_start_time; /* valid only for test pid */ > > @@ -78,6 +78,8 @@ struct results { > > int abort_flag; > > unsigned int runtime; > > unsigned int overall_time; > > + pid_t lib_pid; > > + pid_t main_pid; > > }; > Can we avoid polluting the struct results with main_pid and lib_pid? > From a design separation standpoint, results should only store test > outcome data, not test infrastructure state. +1 > As LTP library already provides a tst_reinit() function for child processes > spawned via exec()/execlp() to restore their test context. > We could consider two approaches: > 1. Create a separate IPC region to store infrastructure state(e.g., > main_pid, lib_pid) > in a new struct tst_meta_info. The child can then access this data via > tst_reinit() > without modifying the struct results. > 2. Simply pass main_pid and lib_pid through environment variables in the > ltp library, and retrieve them from tst_reinit() in the child. I suppose using env variables would be simpler but IPC more robust. Better question to Cyril and Jan. > Or, maybe others have simpler options. Cc'ing them. > In either case, we should set 'tst_test->child_need_reinit' in the parent. > @Cyril, @Petr, I support merging this fix before the May release, as I’ve also > encountered the same failure during my pre-release testing. +1 Kind regards, Petr
Hi! > > diff --git a/lib/tst_test.c b/lib/tst_test.c > > index 2bb4519dd..b666715b9 100644 > > --- a/lib/tst_test.c > > +++ b/lib/tst_test.c > > @@ -59,7 +59,7 @@ static const char *tid; > > static int iterations = 1; > > static float duration = -1; > > static float timeout_mul = -1; > > -static pid_t main_pid, lib_pid; > > +static pid_t lib_pid; > > static int mntpoint_mounted; > > static int ovl_mounted; > > static struct timespec tst_start_time; /* valid only for test pid */ > > @@ -78,6 +78,8 @@ struct results { > > int abort_flag; > > unsigned int runtime; > > unsigned int overall_time; > > + pid_t lib_pid; > > + pid_t main_pid; > > }; > > Can we avoid polluting the struct results with main_pid and lib_pid? > From a design separation standpoint, results should only store test > outcome data, not test infrastructure state. We do have the abort_flag and runtime there and checkpoints use the memory after this structure as well, so I wouldn't be so strict. > As LTP library already provides a tst_reinit() function for child processes > spawned via exec()/execlp() to restore their test context. > > We could consider two approaches: > > 1. Create a separate IPC region to store infrastructure state(e.g., > main_pid, lib_pid) > in a new struct tst_meta_info. The child can then access this data via > tst_reinit() > without modifying the struct results. I would like to keep a single IPC region because we have to pass a path to it to all children too. Also given that the minimal amount of memory we can allocate for IPC is a page we can as well have two structures, one for data and one for library infrastructure bits and place these structures in there manually. All we need to do is to make sure that we place them at aligned offsets. > 2. Simply pass main_pid and lib_pid through environment variables in the > ltp library, and retrieve them from tst_reinit() in the child. That would work too, but this adds more complexity. > Or, maybe others have simpler options. Cc'ing them. > > In either case, we should set 'tst_test->child_need_reinit' in the parent. > > @Cyril, @Petr, I support merging this fix before the May release, as I’ve also > encountered the same failure during my pre-release testing. Yes, please, it's good enough even if it's going to be a temporary solution. Reviewed-by: Cyril Hrubis <chrubis@suse.cz>
On Mon, May 12, 2025 at 4:38 PM Cyril Hrubis <chrubis@suse.cz> wrote: > Hi! > > > diff --git a/lib/tst_test.c b/lib/tst_test.c > > > index 2bb4519dd..b666715b9 100644 > > > --- a/lib/tst_test.c > > > +++ b/lib/tst_test.c > > > @@ -59,7 +59,7 @@ static const char *tid; > > > static int iterations = 1; > > > static float duration = -1; > > > static float timeout_mul = -1; > > > -static pid_t main_pid, lib_pid; > > > +static pid_t lib_pid; > > > static int mntpoint_mounted; > > > static int ovl_mounted; > > > static struct timespec tst_start_time; /* valid only for test pid */ > > > @@ -78,6 +78,8 @@ struct results { > > > int abort_flag; > > > unsigned int runtime; > > > unsigned int overall_time; > > > + pid_t lib_pid; > > > + pid_t main_pid; > > > }; > > > > Can we avoid polluting the struct results with main_pid and lib_pid? > > From a design separation standpoint, results should only store test > > outcome data, not test infrastructure state. > > We do have the abort_flag and runtime there and checkpoints use the > memory after this structure as well, so I wouldn't be so strict. > I think we'd better move them into a new structure dedicated to storing library infrastructure. > > > As LTP library already provides a tst_reinit() function for child > processes > > spawned via exec()/execlp() to restore their test context. > > > > We could consider two approaches: > > > > 1. Create a separate IPC region to store infrastructure state(e.g., > > main_pid, lib_pid) > > in a new struct tst_meta_info. The child can then access this data via > > tst_reinit() > > without modifying the struct results. > > I would like to keep a single IPC region because we have to pass a > path to it to all children too. > > Also given that the minimal amount of memory we can allocate for IPC is > a page we can as well have two structures, one for data and one for > library infrastructure bits and place these structures in there > manually. All we need to do is to make sure that we place them at > aligned offsets. > Yes, that was my thought as well. Something maybe like: #define LTP_MAGIC 0x4C54504D struct context { pid_t main_pid; pid_t lib_pid; struct timespec tst_start_time; /* * This is set by a call to tst_brk() with TBROK parameter and means * that the test should exit immediately. */ unsigned int runtime; unsigned int overall_time; int abort_flag; int mntpoint_mounted; int ovl_mounted; int tdebug; }; struct results { int passed; int skipped; int failed; int warnings; int broken; }; static struct ipc_block { unsigned int magic; struct context context; struct results results; futex_t futexes[]; }; static struct ipc_block *ipc = NULL; static struct context *context = NULL; static struct results *results = NULL; #define TST_CONTEXT(ipc) ((struct context *)&(ipc)->context) #define TST_RESULTS(ipc) ((struct results *)&(ipc)->results) #define TST_FUTEXES(ipc) ((futex_t *)(ipc)->futexes) ... > > > 2. Simply pass main_pid and lib_pid through environment variables in the > > ltp library, and retrieve them from tst_reinit() in the child. > > That would work too, but this adds more complexity. > > > Or, maybe others have simpler options. Cc'ing them. > > > > In either case, we should set 'tst_test->child_need_reinit' in the > parent. > > > > @Cyril, @Petr, I support merging this fix before the May release, as > I’ve also > > encountered the same failure during my pre-release testing. > > Yes, please, it's good enough even if it's going to be a temporary > solution. > Sure, we can merge this one. And later, I will work on a new patch to refactor the structure like above. Reviewed-by: Li Wang <liwang@redhat.com> > > Reviewed-by: Cyril Hrubis <chrubis@suse.cz> > > -- > Cyril Hrubis > chrubis@suse.cz > >
Hi! > I think we'd better move them into a new structure dedicated to storing > library infrastructure. Sounds good. > > > As LTP library already provides a tst_reinit() function for child > > processes > > > spawned via exec()/execlp() to restore their test context. > > > > > > We could consider two approaches: > > > > > > 1. Create a separate IPC region to store infrastructure state(e.g., > > > main_pid, lib_pid) > > > in a new struct tst_meta_info. The child can then access this data via > > > tst_reinit() > > > without modifying the struct results. > > > > I would like to keep a single IPC region because we have to pass a > > path to it to all children too. > > > > Also given that the minimal amount of memory we can allocate for IPC is > > a page we can as well have two structures, one for data and one for > > library infrastructure bits and place these structures in there > > manually. All we need to do is to make sure that we place them at > > aligned offsets. > > > > Yes, that was my thought as well. Something maybe like: > > #define LTP_MAGIC 0x4C54504D I suppose that we will check this in the tst_reinit() right? That is nice improvement. > struct context { > pid_t main_pid; > pid_t lib_pid; > struct timespec tst_start_time; > /* > * This is set by a call to tst_brk() with TBROK parameter and means > * that the test should exit immediately. > */ > unsigned int runtime; > unsigned int overall_time; > int abort_flag; > int mntpoint_mounted; > int ovl_mounted; > int tdebug; > }; > > struct results { > int passed; > int skipped; > int failed; > int warnings; > int broken; > }; > > static struct ipc_block { > unsigned int magic; > struct context context; > struct results results; > futex_t futexes[]; > }; Maybe it would make sense to switch to the uint32_t and int32_t here when we are doing the cleanup. Currently we we call tst_reinit() from a 32bit process and the parent was 64bit it wouldn't work at all. We do not have tests like that at the moment but we may possibly end up in this situation later on. > static struct ipc_block *ipc = NULL; > static struct context *context = NULL; > static struct results *results = NULL; > > #define TST_CONTEXT(ipc) ((struct context *)&(ipc)->context) > #define TST_RESULTS(ipc) ((struct results *)&(ipc)->results) > #define TST_FUTEXES(ipc) ((futex_t *)(ipc)->futexes) Is there a reason for these macros? The types should be correct, so there should be no need for the casts.
On Mon, May 12, 2025 at 7:13 PM Cyril Hrubis <chrubis@suse.cz> wrote: > Hi! > > I think we'd better move them into a new structure dedicated to storing > > library infrastructure. > > Sounds good. > > > > > As LTP library already provides a tst_reinit() function for child > > > processes > > > > spawned via exec()/execlp() to restore their test context. > > > > > > > > We could consider two approaches: > > > > > > > > 1. Create a separate IPC region to store infrastructure state(e.g., > > > > main_pid, lib_pid) > > > > in a new struct tst_meta_info. The child can then access this data > via > > > > tst_reinit() > > > > without modifying the struct results. > > > > > > I would like to keep a single IPC region because we have to pass a > > > path to it to all children too. > > > > > > Also given that the minimal amount of memory we can allocate for IPC is > > > a page we can as well have two structures, one for data and one for > > > library infrastructure bits and place these structures in there > > > manually. All we need to do is to make sure that we place them at > > > aligned offsets. > > > > > > > Yes, that was my thought as well. Something maybe like: > > > > #define LTP_MAGIC 0x4C54504D > > I suppose that we will check this in the tst_reinit() right? That is > nice improvement. > Right, to validate the shared memory region. And the magic number 0x4C54504D is actually means "LTPM". > > struct context { > > pid_t main_pid; > > pid_t lib_pid; > > struct timespec tst_start_time; > > /* > > * This is set by a call to tst_brk() with TBROK parameter and > means > > * that the test should exit immediately. > > */ > > unsigned int runtime; > > unsigned int overall_time; > > int abort_flag; > > int mntpoint_mounted; > > int ovl_mounted; > > int tdebug; > > }; > > > > struct results { > > int passed; > > int skipped; > > int failed; > > int warnings; > > int broken; > > }; > > > > struct ipc_block { > > unsigned int magic; > > struct context context; > > struct results results; > > futex_t futexes[]; > > }; > > Maybe it would make sense to switch to the uint32_t and int32_t here > when we are doing the cleanup. Currently we we call tst_reinit() from a > 32bit process and the parent was 64bit it wouldn't work at all. We do > not have tests like that at the moment but we may possibly end up in > this situation later on. > Good point. > > static struct ipc_block *ipc = NULL; > > static struct context *context = NULL; > > static struct results *results = NULL; > > > > #define TST_CONTEXT(ipc) ((struct context *)&(ipc)->context) > > #define TST_RESULTS(ipc) ((struct results *)&(ipc)->results) > > #define TST_FUTEXES(ipc) ((futex_t *)(ipc)->futexes) > > Is there a reason for these macros? The types should be correct, so > there should be no need for the casts. > Yes, just to quickly get context/result address, but I will remove it if it's not necessary. New refactor patch (based on Wei's fix) is on the way.
Hi All, I have just pushed Wei's patch with a refined description to make it get rid of the failure before the May release. NOTE: I'm almost done with the new refactoring work on IPC and will send out patches after my testing.
On Mon, May 12, 2025 at 2:34 PM Petr Vorel <pvorel@suse.cz> wrote: > > On Sat, May 10, 2025 at 4:54 PM Wei Gao via ltp <ltp@lists.linux.it> > wrote: > > > > The dirtyc0w_shmem test spawns a child process using execlp. This > child process then calls tst_brk(), which exits early > > > with a non-zero status because execlp does not inherit the parent's > lib_pid variable. Consequently, the parent process > > > incorrectly reports an "Invalid child exit value". > > > > This commit addresses this by ensuring the child process has access to > the necessary lib_pid and main_pid by passing > > > them through a shared results structure. This prevents the premature > exit in the child and the subsequent error report > > > in the parent. > > > > Related commit: > > > commit a1f82704c28d9e027ca899f5ca2841e7fe49de72 > > > lib/tst_test.c: Fix tst_brk() handling > > > > Detail failure log: > > > tst_tmpdir.c:317: TINFO: Using /tmp/LTP_dirSOGVBC as tmpdir (btrfs > filesystem) > > > tst_test.c:1938: TINFO: LTP version: 20250507.4a0e3a8fa > > > tst_test.c:1942: TINFO: Tested kernel: 6.4.0-150700.51-default #1 SMP > Wed Apr 30 21:35:43 UTC 2025 (6930611) s390x > > > tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz' > > > tst_kconfig.c:678: TINFO: CONFIG_FAULT_INJECTION kernel option > detected which might slow the execution > > > tst_test.c:1760: TINFO: Overall timeout per run is 0h 04m 00s > > > dirtyc0w_shmem.c:54: TINFO: Mounting tmp_dirtyc0w_shmem to > /tmp/LTP_dirSOGVBC/tmp_dirtyc0w_shmem fstyp=tmpfs flags=0 > > > dirtyc0w_shmem_child.c:160: TCONF: System does not have userfaultfd > minor fault support for shmem <<<<<<<<<< 1 > > > tst_test.c:481: TBROK: Invalid child (8163) exit value 32 <<<<<<<< 2 > > > dirtyc0w_shmem.c:102: TINFO: Umounting > /tmp/LTP_dirSOGVBC/tmp_dirtyc0w_shmem > > > > tmp_dirtyc0w_shmem.c call execlp to create new process run > dirtyc0w_shmem_child bin. > > > > SAFE_EXECLP("dirtyc0w_shmem_child", "dirtyc0w_shmem_child", NULL) > > > > Within dirtyc0w_shmem_child.c trigger > > > > tst_brk(TCONF, "System does not have userfaultfd minor fault support > for shmem") > > > > Since execlp does not inherit the parent process's variables lib_pid, > so it will return TCONF(32) directly. > > > > void tst_vbrk_(const char *file, const int lineno, int ttype, const > char *fmt, > > > va_list va) > > > { > > > ... > > > if (!lib_pid) > > > exit(TTYPE_RESULT(ttype)); <<<<< > > > ... > > > } > > > > So finally captured by check_child_status report an error. > > > > static void check_child_status(pid_t pid, int status) > > > { > > > ... > > > if (WEXITSTATUS(status)) > > > tst_brk(TBROK, "Invalid child (%i) exit value %i", > pid, WEXITSTATUS(status)); <<<< > > > } > > > > Signed-off-by: Wei Gao <wegao@suse.com> > > > --- > > > lib/tst_test.c | 25 ++++++++++++++----------- > > > 1 file changed, 14 insertions(+), 11 deletions(-) > > > > diff --git a/lib/tst_test.c b/lib/tst_test.c > > > index 2bb4519dd..b666715b9 100644 > > > --- a/lib/tst_test.c > > > +++ b/lib/tst_test.c > > > @@ -59,7 +59,7 @@ static const char *tid; > > > static int iterations = 1; > > > static float duration = -1; > > > static float timeout_mul = -1; > > > -static pid_t main_pid, lib_pid; > > > +static pid_t lib_pid; > > Wei, you probably noticed tests fail (thanks to Andrea's CI improvements). > It broke at least tst_needs_cmds02.c I guess due missing main_pid: > > $ gdb ./tst_needs_cmds02 > ... > (gdb) run > ... > tst_cmd.c:268: TCONF: Couldn't find 'mkfs.ext45' in $PATH > Program received signal SIGSEGV, Segmentation fault. > 0x000055555555bea4 in tst_vbrk_ (file=<optimized out>, lineno=<optimized > out>, ttype=<optimized out>, fmt=<optimized out>, va=<optimized out>) at > tst_test.c:397 > 397 if (!results->lib_pid) > (gdb) bt > #0 0x000055555555bea4 in tst_vbrk_ (file=<optimized out>, > lineno=<optimized out>, ttype=<optimized out>, fmt=<optimized out>, > va=<optimized out>) at tst_test.c:397 > #1 0x000055555555b5be in tst_brk_ (file=file@entry=0x55555557de1c > "tst_cmd.c", lineno=lineno@entry=268, ttype=ttype@entry=32, fmt=fmt@entry=0x55555557deed > "Couldn't find '%s' in $PATH") > at tst_test.c:460 > #2 0x000055555556bc57 in tst_check_cmd (cmd=0x55555557c643 "mkfs.ext45 >= > 1.43.0", brk_nosupp=brk_nosupp@entry=1) at tst_cmd.c:268 > #3 0x000055555555e641 in do_setup (argc=1, argv=0x7fffffffcaa8) at > tst_test.c:1363 > #4 tst_run_tcases (argc=1, argv=0x7fffffffcaa8, self=self@entry=0x55555558a720 > <test>) at tst_test.c:1935 > #5 0x000055555555b044 in main (argc=<optimized out>, argv=<optimized > out>) at ../../include/tst_test.h:729 > Oops, I neglected that comment before merging, and it caused CI to fail. No time to fix it today though (too late). I'll try to look at the details tomorrow. But hopefully it'll be fixed by the time I wake up tomorrow :). Sorry for the careless push.
It is worth mentioning that our kernel CI report shows that this commit also introduced an issue that incorrectly identified TCONF as TFAIL. This caused quite a few test cases (TCONF) to be reported as failures. I confirmed that the below patch can fix this as well. https://lists.linux.it/pipermail/ltp/2025-May/043493.html
diff --git a/lib/tst_test.c b/lib/tst_test.c index 2bb4519dd..b666715b9 100644 --- a/lib/tst_test.c +++ b/lib/tst_test.c @@ -59,7 +59,7 @@ static const char *tid; static int iterations = 1; static float duration = -1; static float timeout_mul = -1; -static pid_t main_pid, lib_pid; +static pid_t lib_pid; static int mntpoint_mounted; static int ovl_mounted; static struct timespec tst_start_time; /* valid only for test pid */ @@ -78,6 +78,8 @@ struct results { int abort_flag; unsigned int runtime; unsigned int overall_time; + pid_t lib_pid; + pid_t main_pid; }; static struct results *results; @@ -141,6 +143,7 @@ static void setup_ipc(void) tst_futexes = (char *)results + sizeof(struct results); tst_max_futexes = (size - sizeof(struct results))/sizeof(futex_t); } + results->lib_pid = lib_pid; } static void cleanup_ipc(void) @@ -391,7 +394,7 @@ void tst_vbrk_(const char *file, const int lineno, int ttype, const char *fmt, * If tst_brk() is called from some of the C helpers even before the * library was initialized, just exit. */ - if (!lib_pid) + if (!results->lib_pid) exit(TTYPE_RESULT(ttype)); update_results(TTYPE_RESULT(ttype)); @@ -402,13 +405,13 @@ void tst_vbrk_(const char *file, const int lineno, int ttype, const char *fmt, * specified but CLONE_THREAD is not. Use direct syscall to avoid * cleanup running in the child. */ - if (tst_getpid() == main_pid) + if (tst_getpid() == results->main_pid) do_test_cleanup(); /* * The test library process reports result statistics and exits. */ - if (getpid() == lib_pid) + if (getpid() == results->lib_pid) do_exit(TTYPE_RESULT(ttype)); /* @@ -426,9 +429,9 @@ void tst_vbrk_(const char *file, const int lineno, int ttype, const char *fmt, * the main test process. That in turn triggers the code that * kills leftover children once the main test process did exit. */ - if (main_pid && tst_getpid() != main_pid) { + if (results->main_pid && tst_getpid() != results->main_pid) { tst_res(TINFO, "Child process reported TBROK killing the test"); - kill(main_pid, SIGKILL); + kill(results->main_pid, SIGKILL); } } @@ -1501,7 +1504,7 @@ static void do_setup(int argc, char *argv[]) static void do_test_setup(void) { - main_pid = getpid(); + results->main_pid = getpid(); if (!tst_test->all_filesystems && tst_test->skip_filesystems) { long fs_type = tst_fs_type("."); @@ -1521,7 +1524,7 @@ static void do_test_setup(void) if (tst_test->setup) tst_test->setup(); - if (main_pid != tst_getpid()) + if (results->main_pid != tst_getpid()) tst_brk(TBROK, "Runaway child in setup()!"); if (tst_test->caps) @@ -1584,7 +1587,7 @@ static void run_tests(void) heartbeat(); tst_test->test_all(); - if (tst_getpid() != main_pid) + if (tst_getpid() != results->main_pid) exit(0); tst_reap_children(); @@ -1600,7 +1603,7 @@ static void run_tests(void) heartbeat(); tst_test->test(i); - if (tst_getpid() != main_pid) + if (tst_getpid() != results->main_pid) exit(0); tst_reap_children(); @@ -1930,7 +1933,7 @@ void tst_run_tcases(int argc, char *argv[], struct tst_test *self) tst_test = self; do_setup(argc, argv); - tst_enable_oom_protection(lib_pid); + tst_enable_oom_protection(results->lib_pid); SAFE_SIGNAL(SIGALRM, alarm_handler); SAFE_SIGNAL(SIGUSR1, heartbeat_handler);
The dirtyc0w_shmem test spawns a child process using execlp. This child process then calls tst_brk(), which exits early with a non-zero status because execlp does not inherit the parent's lib_pid variable. Consequently, the parent process incorrectly reports an "Invalid child exit value". This commit addresses this by ensuring the child process has access to the necessary lib_pid and main_pid by passing them through a shared results structure. This prevents the premature exit in the child and the subsequent error report in the parent. Related commit: commit a1f82704c28d9e027ca899f5ca2841e7fe49de72 lib/tst_test.c: Fix tst_brk() handling Detail failure log: tst_tmpdir.c:317: TINFO: Using /tmp/LTP_dirSOGVBC as tmpdir (btrfs filesystem) tst_test.c:1938: TINFO: LTP version: 20250507.4a0e3a8fa tst_test.c:1942: TINFO: Tested kernel: 6.4.0-150700.51-default #1 SMP Wed Apr 30 21:35:43 UTC 2025 (6930611) s390x tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz' tst_kconfig.c:678: TINFO: CONFIG_FAULT_INJECTION kernel option detected which might slow the execution tst_test.c:1760: TINFO: Overall timeout per run is 0h 04m 00s dirtyc0w_shmem.c:54: TINFO: Mounting tmp_dirtyc0w_shmem to /tmp/LTP_dirSOGVBC/tmp_dirtyc0w_shmem fstyp=tmpfs flags=0 dirtyc0w_shmem_child.c:160: TCONF: System does not have userfaultfd minor fault support for shmem <<<<<<<<<< 1 tst_test.c:481: TBROK: Invalid child (8163) exit value 32 <<<<<<<< 2 dirtyc0w_shmem.c:102: TINFO: Umounting /tmp/LTP_dirSOGVBC/tmp_dirtyc0w_shmem tmp_dirtyc0w_shmem.c call execlp to create new process run dirtyc0w_shmem_child bin. SAFE_EXECLP("dirtyc0w_shmem_child", "dirtyc0w_shmem_child", NULL) Within dirtyc0w_shmem_child.c trigger tst_brk(TCONF, "System does not have userfaultfd minor fault support for shmem") Since execlp does not inherit the parent process's variables lib_pid, so it will return TCONF(32) directly. void tst_vbrk_(const char *file, const int lineno, int ttype, const char *fmt, va_list va) { ... if (!lib_pid) exit(TTYPE_RESULT(ttype)); <<<<< ... } So finally captured by check_child_status report an error. static void check_child_status(pid_t pid, int status) { ... if (WEXITSTATUS(status)) tst_brk(TBROK, "Invalid child (%i) exit value %i", pid, WEXITSTATUS(status)); <<<< } Signed-off-by: Wei Gao <wegao@suse.com> --- lib/tst_test.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)