Message ID | 20220218160035.4121-1-pvorel@suse.cz |
---|---|
State | Rejected |
Headers | show |
Series | [1/1] netstress: Workaround race between SETSID() and exit(0) | expand |
Hm, on one of the machines it blocked after 12 runs: tcp_ipsec 1 TINFO: run server 'netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ' tcp_ipsec 1 TINFO: run client 'netstress -l -H 10.0.0.1 -n 100 -N 100 -D ltp_ns_veth2 -a 2 -r 100 -d tst_netload.res' 5 times root 22529 0.0 0.0 4812 792 pts/2 S+ 11:02 0:00 ns_exec 3181 net mnt sh -c netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ || echo RTERR root 22530 0.0 0.2 18216 2880 pts/2 S+ 11:02 0:00 sh -c netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ || echo RTERR root 22531 0.0 0.0 9072 920 pts/2 S+ 11:02 0:00 netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ root 22532 0.0 0.0 9072 160 pts/2 S 11:02 0:00 netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ root 22533 0.0 0.0 9072 168 ? Ss 11:02 0:00 netstress -D ltp_ns_veth1 -R 10 -B /tmp/LTP_tcp_ipsec.txQv6HznbZ # strace -p 22531 strace: Process 22531 attached wait4(22532, strace: Process 22531 detached <detached ...> # strace -p 22532 strace: Process 22532 attached pause(strace: Process 22532 detached <detached ...> # strace -p 22533 strace: Process 22533 attached accept(5, <unfinished ...>) = ? But maybe just caused by running in loop without any sleep (was ok next time): i=0; while true; do i=$((i+1)); echo "=== $i ==="; ./tcp_ipsec.sh -s 100:1000:65535:R65535 || break; done Kind regards, Petr
Hi!
> Hm, on one of the machines it blocked after 12 runs:
Ah, there is another race there. The new thread may send the signal
before the parent is sleeping in pause()...
Just use the checkpoint WAKE() WAIT() pair instead, these are
RaceFree(tm).
diff --git a/testcases/network/netstress/netstress.c b/testcases/network/netstress/netstress.c index 0914c65bd4..51daa72c6d 100644 --- a/testcases/network/netstress/netstress.c +++ b/testcases/network/netstress/netstress.c @@ -38,6 +38,10 @@ static int rand_r(LTP_ATTRIBUTE_UNUSED unsigned int *seed) } #endif +static void sig_handler(int sig LTP_ATTRIBUTE_UNUSED) +{ +} + static const int max_msg_len = (1 << 16) - 1; static const int min_msg_len = 5; @@ -713,11 +717,15 @@ static void server_cleanup(void) static void move_to_background(void) { - if (SAFE_FORK()) + if (SAFE_FORK()) { + pause(); exit(0); + } SAFE_SETSID(); + SAFE_KILL(getppid(), SIGUSR1); + close(STDIN_FILENO); SAFE_OPEN("/dev/null", O_RDONLY); close(STDOUT_FILENO); @@ -843,6 +851,8 @@ static void set_protocol_type(void) static void setup(void) { + SAFE_SIGNAL(SIGUSR1, sig_handler); + if (tst_parse_int(aarg, &clients_num, 1, INT_MAX)) tst_brk(TBROK, "Invalid client number '%s'", aarg); if (tst_parse_int(rarg, &client_max_requests, 1, INT_MAX))
There is a race between the SETSID() and exit(0) in move_to_background() caused by "Killed the leftover descendant processes" introduced in 72b172867 ("Terminate leftover subprocesses when main test process crashes"). If the main test process calls exit(0) before the newly forked child managed to do SETSID() it's killed by the test library because it's still in the old process group. Therefore kill mask SIGUSR1 with dummy handler to avoid heartbeat_handler() doing the cleanup. Link: https://lore.kernel.org/ltp/Yg+RXbUTOxK56iZa@pevik/ Suggested-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Petr Vorel <pvorel@suse.cz> --- testcases/network/netstress/netstress.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)