diff mbox series

Allow graceful subtest cleanup in shell tests

Message ID 20221110135442.14501-1-mdoucha@suse.cz
State Accepted
Headers show
Series Allow graceful subtest cleanup in shell tests | expand

Commit Message

Martin Doucha Nov. 10, 2022, 1:54 p.m. UTC
The new shell test timeout code sends SIGTERM to any subprocesses when
the main script hits timeout. SIGTERM isn't handled by the LTP library
which means that tools like netstress will be instantly killed without
performing any cleanup. Handle SIGTERM like SIGINT in LTP library
to allow graceful cleanup.

Signed-off-by: Martin Doucha <mdoucha@suse.cz>
---

Note: The current lack of graceful cleanup causes random failures in shell
tests which run the same tool many times (e.g. netstress). When the PID
counter wraps around and the tool accidentally gets the same PID as another
process that got killed by SIGTERM, the new test process will fail during IPC
setup.

 lib/tst_test.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Petr Vorel Nov. 11, 2022, 2:02 p.m. UTC | #1
Hi Martin,

Reviewed-by: Petr Vorel <pvorel@suse.cz>
Thanks!

> The new shell test timeout code sends SIGTERM to any subprocesses when
> the main script hits timeout. SIGTERM isn't handled by the LTP library
> which means that tools like netstress will be instantly killed without
> performing any cleanup. Handle SIGTERM like SIGINT in LTP library
> to allow graceful cleanup.

Besides this, Cyril some time ago suggested to define TST_NO_DEFAULT_MAIN in
nfs05_make_tree.c [1], which is also helper like netstress.c.

Looking what this would be required for netstress.c: implement function it's own
tst_brk(), which would call cleanup() function before calling library's
tst_brk(), parsing getopt parameters, calling setup() in main() etc.
The only thing which works is tst_res() and tst_brk() printing.
I'm not sure if this is worth just to avoid problematic timeout.

Kind regards,
Petr

[1] https://lore.kernel.org/ltp/YqxFo1iFzHatNRIl@yuki/
Richard Palethorpe Nov. 14, 2022, 11:56 a.m. UTC | #2
Hello,

Martin Doucha <mdoucha@suse.cz> writes:

> The new shell test timeout code sends SIGTERM to any subprocesses when
> the main script hits timeout. SIGTERM isn't handled by the LTP library
> which means that tools like netstress will be instantly killed without
> performing any cleanup. Handle SIGTERM like SIGINT in LTP library
> to allow graceful cleanup.
>
> Signed-off-by: Martin Doucha <mdoucha@suse.cz>

Merged with Petr's tag, thanks!

Possibly we should also print the signal that we received somehow.

> ---
>
> Note: The current lack of graceful cleanup causes random failures in shell
> tests which run the same tool many times (e.g. netstress). When the PID
> counter wraps around and the tool accidentally gets the same PID as another
> process that got killed by SIGTERM, the new test process will fail during IPC
> setup.
>
>  lib/tst_test.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/lib/tst_test.c b/lib/tst_test.c
> index b225ba082..1732fd058 100644
> --- a/lib/tst_test.c
> +++ b/lib/tst_test.c
> @@ -1568,6 +1568,7 @@ static int fork_testrun(void)
>  	int status;
>  
>  	SAFE_SIGNAL(SIGINT, sigint_handler);
> +	SAFE_SIGNAL(SIGTERM, sigint_handler);
>  
>  	alarm(results->timeout);
>  
> @@ -1579,6 +1580,7 @@ static int fork_testrun(void)
>  		tst_disable_oom_protection(0);
>  		SAFE_SIGNAL(SIGALRM, SIG_DFL);
>  		SAFE_SIGNAL(SIGUSR1, SIG_DFL);
> +		SAFE_SIGNAL(SIGTERM, SIG_DFL);
>  		SAFE_SIGNAL(SIGINT, SIG_DFL);
>  		SAFE_SETPGID(0, 0);
>  		testrun();
> @@ -1586,6 +1588,7 @@ static int fork_testrun(void)
>  
>  	SAFE_WAITPID(test_pid, &status, 0);
>  	alarm(0);
> +	SAFE_SIGNAL(SIGTERM, SIG_DFL);
>  	SAFE_SIGNAL(SIGINT, SIG_DFL);
>  
>  	if (tst_test->taint_check && tst_taint_check()) {
> -- 
> 2.37.3
diff mbox series

Patch

diff --git a/lib/tst_test.c b/lib/tst_test.c
index b225ba082..1732fd058 100644
--- a/lib/tst_test.c
+++ b/lib/tst_test.c
@@ -1568,6 +1568,7 @@  static int fork_testrun(void)
 	int status;
 
 	SAFE_SIGNAL(SIGINT, sigint_handler);
+	SAFE_SIGNAL(SIGTERM, sigint_handler);
 
 	alarm(results->timeout);
 
@@ -1579,6 +1580,7 @@  static int fork_testrun(void)
 		tst_disable_oom_protection(0);
 		SAFE_SIGNAL(SIGALRM, SIG_DFL);
 		SAFE_SIGNAL(SIGUSR1, SIG_DFL);
+		SAFE_SIGNAL(SIGTERM, SIG_DFL);
 		SAFE_SIGNAL(SIGINT, SIG_DFL);
 		SAFE_SETPGID(0, 0);
 		testrun();
@@ -1586,6 +1588,7 @@  static int fork_testrun(void)
 
 	SAFE_WAITPID(test_pid, &status, 0);
 	alarm(0);
+	SAFE_SIGNAL(SIGTERM, SIG_DFL);
 	SAFE_SIGNAL(SIGINT, SIG_DFL);
 
 	if (tst_test->taint_check && tst_taint_check()) {