diff mbox series

[1/1] netstress: Workaround race between SETSID() and exit(0)

Message ID 20220218160035.4121-1-pvorel@suse.cz
State Rejected
Headers show
Series [1/1] netstress: Workaround race between SETSID() and exit(0) | expand

Commit Message

Petr Vorel Feb. 18, 2022, 4 p.m. UTC
There is a race between the SETSID() and exit(0) in move_to_background()
caused by "Killed the leftover descendant processes" introduced in
72b172867 ("Terminate leftover subprocesses when main test process
crashes").

If the main test process calls exit(0) before the newly forked child
managed to do SETSID() it's killed by the test library because it's
still in the old process group. Therefore kill mask SIGUSR1 with dummy
handler to avoid heartbeat_handler() doing the cleanup.

Link: https://lore.kernel.org/ltp/Yg+RXbUTOxK56iZa@pevik/

Suggested-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Petr Vorel <pvorel@suse.cz>
---
 testcases/network/netstress/netstress.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Petr Vorel Feb. 18, 2022, 4:10 p.m. UTC | #1
Hm, on one of the machines it blocked after 12 runs:

tcp_ipsec 1 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ'
tcp_ipsec 1 TINFO: run client 'netstress -l -H 10.0.0.1 -n 100 -N 100 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times

root     22529  0.0  0.0   4812   792 pts/2    S+   11:02   0:00 ns_exec 3181 net mnt sh -c  netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ  || echo RTERR
root     22530  0.0  0.2  18216  2880 pts/2    S+   11:02   0:00 sh -c  netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ  || echo RTERR
root     22531  0.0  0.0   9072   920 pts/2    S+   11:02   0:00 netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ
root     22532  0.0  0.0   9072   160 pts/2    S    11:02   0:00 netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ
root     22533  0.0  0.0   9072   168 ?        Ss   11:02   0:00 netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ

# strace -p 22531
strace: Process 22531 attached
wait4(22532, strace: Process 22531 detached
 <detached ...>

# strace -p 22532
strace: Process 22532 attached
pause(strace: Process 22532 detached
 <detached ...>

# strace -p 22533
strace: Process 22533 attached
accept(5,  <unfinished ...>)            = ?

But maybe just caused by running in loop without any sleep (was ok next time):
i=0; while true; do i=$((i+1)); echo "=== $i ==="; ./tcp_ipsec.sh -s 100:1000:65535:R65535 || break; done

Kind regards,
Petr
Cyril Hrubis Feb. 18, 2022, 4:14 p.m. UTC | #2
Hi!
> Hm, on one of the machines it blocked after 12 runs:

Ah, there is another race there. The new thread may send the signal
before the parent is sleeping in pause()...

Just use the checkpoint WAKE() WAIT() pair instead, these are
RaceFree(tm).
diff mbox series

Patch

diff --git a/testcases/network/netstress/netstress.c b/testcases/network/netstress/netstress.c
index 0914c65bd4..51daa72c6d 100644
--- a/testcases/network/netstress/netstress.c
+++ b/testcases/network/netstress/netstress.c
@@ -38,6 +38,10 @@  static int rand_r(LTP_ATTRIBUTE_UNUSED unsigned int *seed)
 }
 #endif
 
+static void sig_handler(int sig LTP_ATTRIBUTE_UNUSED)
+{
+}
+
 static const int max_msg_len = (1 << 16) - 1;
 static const int min_msg_len = 5;
 
@@ -713,11 +717,15 @@  static void server_cleanup(void)
 
 static void move_to_background(void)
 {
-	if (SAFE_FORK())
+	if (SAFE_FORK()) {
+		pause();
 		exit(0);
+	}
 
 	SAFE_SETSID();
 
+	SAFE_KILL(getppid(), SIGUSR1);
+
 	close(STDIN_FILENO);
 	SAFE_OPEN("/dev/null", O_RDONLY);
 	close(STDOUT_FILENO);
@@ -843,6 +851,8 @@  static void set_protocol_type(void)
 
 static void setup(void)
 {
+	SAFE_SIGNAL(SIGUSR1, sig_handler);
+
 	if (tst_parse_int(aarg, &clients_num, 1, INT_MAX))
 		tst_brk(TBROK, "Invalid client number '%s'", aarg);
 	if (tst_parse_int(rarg, &client_max_requests, 1, INT_MAX))