Message ID | c2818c4a-3afe-e691-6589-1c24651b43a5@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | Speedup various nptl/tst-cancel20 and nptl/tst-cancel21 tests. | expand |
On 9/12/19 8:15 AM, Stefan Liebler wrote: > Hi, > > each of thease tests - tst-cancel20, tst-cancelx20, tst-cancel21, > tst-cancelx21 and tst-cancel21-static - runs for 8s. > > do_one_test is called 4 times. Either in do_one_test (tst-cancel20.c) > or in tf (tst-cancel21.c) "sleep (1)" is called 2 times. > > This patch reduces the sleep time. Using usleep (5ms) leads to a > runtime of one tst-cancel... invocation of roughly 40ms instead of 8s. > As the nptl tests run in sequence, this patch saves roughly 39s of > runtime for "make check". > > Bye, > Stefan > > ChangeLog: > > * nptl/tst-cancel20.c (do_one_test): Reduce sleep times. > * nptl/tst-cancel21.c (tf): Likewise. I see two possible solutions: (a) Explain in detail why the sleep is needed, and why reducing the sleep is safe. (b) Remove the sleep and use proper barriers to provide the synchronization the test is using sleep to accomplish. Your current patch does (a) without explaining why it is safe to reduce the sleep period, and that the test continues to test what it was written to test. Is it safe to reduce the sleep timing? Why is it there?
On 9/12/19 4:07 PM, Carlos O'Donell wrote: > On 9/12/19 8:15 AM, Stefan Liebler wrote: >> Hi, >> >> each of thease tests - tst-cancel20, tst-cancelx20, tst-cancel21, >> tst-cancelx21 and tst-cancel21-static - runs for 8s. >> >> do_one_test is called 4 times. Either in do_one_test (tst-cancel20.c) >> or in tf (tst-cancel21.c) "sleep (1)" is called 2 times. >> >> This patch reduces the sleep time. Using usleep (5ms) leads to a >> runtime of one tst-cancel... invocation of roughly 40ms instead of 8s. >> As the nptl tests run in sequence, this patch saves roughly 39s of >> runtime for "make check". >> >> Bye, >> Stefan >> >> ChangeLog: >> >> * nptl/tst-cancel20.c (do_one_test): Reduce sleep times. >> * nptl/tst-cancel21.c (tf): Likewise. > > I see two possible solutions: > > (a) Explain in detail why the sleep is needed, and why reducing > the sleep is safe. > > (b) Remove the sleep and use proper barriers to provide the > synchronization the test is using sleep to accomplish. > > Your current patch does (a) without explaining why it is safe > to reduce the sleep period, and that the test continues to test > what it was written to test. > > Is it safe to reduce the sleep timing? > > Why is it there? > Hi, tst-cancel20.c cancels a thread and checks if all cleanup-handlers were called in the correct order: if (cleanups != 0x1234L) { printf ("called cleanups %lx\n", cleanups); return 1; } These numbers belong to the cleanup-handlers called in the following functions: 1 => sh_body (sets in_sh_body=1 and blocks in read) 2 => sh (SIGHUP signal-handler; is received after pthread_barrier_wait) 3 => tf_body (waits with pthread_barrier_wait and then blocks in read) 4 => tf (started via pthread_create in do_one_test) do_one_test starts tf, sends SIGHUP after waiting for the barrier and cancels the thread as soon as in_sh_body has been set to 1. tst-cancel21.c differs in the fact, that there a thread sends SIGHUP and cancels the main thread. I've tried to just remove the sleeps. This works fine with tst-cancel20. But with tst-cancelx20 (same code as tst-cancel20.c but compiled with -fexceptions -fasynchronous-unwind-tables) it sometimes fails with: called cleanups 124 => cleanup-handler in tf_body was not called. In the "good" case, SIGHUP is received while tf_body is blocking in read. In the "error" case, SIGHUP is received while tf_body is waiting in pthread_barrier_wait. (This occures on s390x in the same way as on e.g. x86_64; I've used gcc version 9.1.1) I've compiled the test with gcc -S -fverbose-asm. Here is the shortened output: tf_body: .LFB28: ... # tst-cancel20.c:75: int r = pthread_barrier_wait (&b); brasl %r14,pthread_barrier_wait #, ... # tst-cancel20.c:82: if (read (fd[0], &c, 1) == 1) .LEHB4: brasl %r14,read #, ... .LEHE4: ... .L25: # ../sysdeps/nptl/pthread.h:612: __frame->__cancel_routine (__frame->__cancel_arg); lghi %r2,3 #, brasl %r14,cl #, ... brasl %r14,_Unwind_Resume #, .cfi_endproc .section .gcc_except_table .LLSDA28: .byte 0xff .byte 0xff .byte 0x1 .uleb128 .LLSDACSE28-.LLSDACSB28 .LLSDACSB28: .uleb128 .LEHB4-.LFB28 .uleb128 .LEHE4-.LEHB4 .uleb128 .L25-.LFB28 .uleb128 0 .uleb128 .LEHB5-.LFB28 .uleb128 .LEHE5-.LEHB5 .uleb128 0 .uleb128 0 .LLSDACSE28: According to the .gcc_except_table, the region with our cleanup-handler starts - after pthread_barrier_wait - directly before the read call, even though pthread_barrier_wait is within pthread_cleanup_push / pop: tf_body (void) { ... pthread_cleanup_push (cl, (void *) 3L); int r = pthread_barrier_wait (&b); ... if (read (fd[0], &c, 1) == 1) ... read (fd[0], &c, 1); pthread_cleanup_pop (0); } Is this behaviour intended? The difference between those calls is "nothrow": extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ; vs extern int pthread_barrier_wait (pthread_barrier_t *__barrier) __attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1))); Placing a e.g. a printf or read call just before pthread_barrier_wait will lead to the inclusion of pthread_barrier_wait and the cleanup-handler is called! Now, I've removed the sleeps and also removed the pthread_barrier_wait. Instead write to and read from a pipe is used to synchronizse tf_body and do_one_test. Please have a look at the attached patch. Bye. Stefan
On Fri, Sep 20, 2019 at 8:59 AM Stefan Liebler <stli@linux.ibm.com> wrote: ... > I've tried to just remove the sleeps. This works fine with tst-cancel20. > But with tst-cancelx20 (same code as tst-cancel20.c but compiled with > -fexceptions -fasynchronous-unwind-tables) it sometimes fails with: > called cleanups 124 => cleanup-handler in tf_body was not called. > > In the "good" case, SIGHUP is received while tf_body is blocking in read. > In the "error" case, SIGHUP is received while tf_body is waiting in > pthread_barrier_wait. > (This occures on s390x in the same way as on e.g. x86_64; I've used gcc > version 9.1.1) ... > According to the .gcc_except_table, the region with our cleanup-handler > starts - after pthread_barrier_wait - directly before the read call, > even though pthread_barrier_wait is within pthread_cleanup_push / pop: ... > Is this behaviour intended? > > The difference between those calls is "nothrow": > extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ; > vs > extern int pthread_barrier_wait (pthread_barrier_t *__barrier) > __attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1))); This looks like GCC is assuming it can hoist a call to a nothrow function out of a cleanup region that lexically contains it. That's not true when an exception can be thrown by a signal handler. I can see an argument that this is an incorrect optimization when -fasynchronous-unwind-tables is in use, but the documentation makes it sound like -fasynchronous-unwind-tables is only intended to enable *stack tracing* from an arbitrary instruction, not exceptions. (See the documentation for -fnon-call-exceptions as well as for -fasynchronous-exception-tables.) It is not entirely clear to me what this test is supposed to test, but I think the intent is for the SIGHUP to be delivered *only* after we are sure tf_body is blocked on read. If it's delivered at any other point, the test might not be testing the right thing. Instead of your change to use a second pipe, therefore, I suggest use of pthread_sigmask and ppoll: remove the barriers; block SIGHUP using pthread_sigmask in do_test (this will inherit to all child threads); then change tf_body to something like this: static void __attribute__((noinline)) tf_body (void) { char c; pthread_cleanup_push (cl, (void *) 3L); sigset_t unblock_SIGHUP; sigemptyset(&unblock_SIGHUP); struct pollfd pfds[1]; pfds[0].fd = fd[0]; pfds[0].events = POLLIN; // this call is expected to be interrupted by a SIGHUP // before anything is written to the pipe if (ppoll (pfds, 1, 0, &unblock_SIGHUP) != -1) { puts ("read succeeded"); exit (1); } // drain the pipe read (fd[0], &c, 1); pthread_cleanup_pop (0); } ppoll _atomically_ sets the signal mask before waiting, and restores it afterward, so, this should ensure that the SIGHUP is delivered at exactly the right point. zw
* Zack Weinberg: > On Fri, Sep 20, 2019 at 8:59 AM Stefan Liebler <stli@linux.ibm.com> wrote: > ... >> I've tried to just remove the sleeps. This works fine with tst-cancel20. >> But with tst-cancelx20 (same code as tst-cancel20.c but compiled with >> -fexceptions -fasynchronous-unwind-tables) it sometimes fails with: >> called cleanups 124 => cleanup-handler in tf_body was not called. >> >> In the "good" case, SIGHUP is received while tf_body is blocking in read. >> In the "error" case, SIGHUP is received while tf_body is waiting in >> pthread_barrier_wait. >> (This occures on s390x in the same way as on e.g. x86_64; I've used gcc >> version 9.1.1) > ... >> According to the .gcc_except_table, the region with our cleanup-handler >> starts - after pthread_barrier_wait - directly before the read call, >> even though pthread_barrier_wait is within pthread_cleanup_push / pop: > ... >> Is this behaviour intended? >> >> The difference between those calls is "nothrow": >> extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ; >> vs >> extern int pthread_barrier_wait (pthread_barrier_t *__barrier) >> __attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1))); > > This looks like GCC is assuming it can hoist a call to a nothrow > function out of a cleanup region that lexically contains it. That's > not true when an exception can be thrown by a signal handler. I can > see an argument that this is an incorrect optimization when > -fasynchronous-unwind-tables is in use, but the documentation makes it > sound like -fasynchronous-unwind-tables is only intended to enable > *stack tracing* from an arbitrary instruction, not exceptions. > (See the documentation for -fnon-call-exceptions as well as for > -fasynchronous-exception-tables.) I would have expected this to be controlled by -fnon-call-exceptions, but the GCC 8 documentation says this: | '-fnon-call-exceptions' | Generate code that allows trapping instructions to throw | exceptions. Note that this requires platform-specific runtime | support that does not exist everywhere. Moreover, it only allows | _trapping_ instructions to throw exceptions, i.e. memory references | or floating-point instructions. It does not allow exceptions to be | thrown from arbitrary signal handlers such as 'SIGALRM'. So it does not seem to be what we are looking for. Anyway, based on the posted analysis, the tst-cancelx20 test case is invalid. Thanks, Florian
On 9/20/19 5:17 PM, Zack Weinberg wrote: > On Fri, Sep 20, 2019 at 8:59 AM Stefan Liebler <stli@linux.ibm.com> wrote: > ... >> I've tried to just remove the sleeps. This works fine with tst-cancel20. >> But with tst-cancelx20 (same code as tst-cancel20.c but compiled with >> -fexceptions -fasynchronous-unwind-tables) it sometimes fails with: >> called cleanups 124 => cleanup-handler in tf_body was not called. >> >> In the "good" case, SIGHUP is received while tf_body is blocking in read. >> In the "error" case, SIGHUP is received while tf_body is waiting in >> pthread_barrier_wait. >> (This occures on s390x in the same way as on e.g. x86_64; I've used gcc >> version 9.1.1) > ... >> According to the .gcc_except_table, the region with our cleanup-handler >> starts - after pthread_barrier_wait - directly before the read call, >> even though pthread_barrier_wait is within pthread_cleanup_push / pop: > ... >> Is this behaviour intended? >> >> The difference between those calls is "nothrow": >> extern ssize_t read (int __fd, void *__buf, size_t __nbytes) ; >> vs >> extern int pthread_barrier_wait (pthread_barrier_t *__barrier) >> __attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1))); > > This looks like GCC is assuming it can hoist a call to a nothrow > function out of a cleanup region that lexically contains it. That's > not true when an exception can be thrown by a signal handler. I can > see an argument that this is an incorrect optimization when > -fasynchronous-unwind-tables is in use, but the documentation makes it > sound like -fasynchronous-unwind-tables is only intended to enable > *stack tracing* from an arbitrary instruction, not exceptions. > (See the documentation for -fnon-call-exceptions as well as for > -fasynchronous-exception-tables.) > > It is not entirely clear to me what this test is supposed to test, but > I think the intent is for the SIGHUP to be delivered *only* after we > are sure tf_body is blocked on read. If it's delivered at any other > point, the test might not be testing the right thing. Instead of your > change to use a second pipe, therefore, I suggest use of > pthread_sigmask and ppoll: remove the barriers; block SIGHUP using > pthread_sigmask in do_test (this will inherit to all child threads); > then change tf_body to something like this: > > static void __attribute__((noinline)) > tf_body (void) > { > char c; > > pthread_cleanup_push (cl, (void *) 3L); > > sigset_t unblock_SIGHUP; > sigemptyset(&unblock_SIGHUP); > > struct pollfd pfds[1]; > pfds[0].fd = fd[0]; > pfds[0].events = POLLIN; > > // this call is expected to be interrupted by a SIGHUP > // before anything is written to the pipe > if (ppoll (pfds, 1, 0, &unblock_SIGHUP) != -1) > { > puts ("read succeeded"); > exit (1); > } > > // drain the pipe > read (fd[0], &c, 1); > > pthread_cleanup_pop (0); > } > > ppoll _atomically_ sets the signal mask before waiting, and restores > it afterward, so, this should ensure that the SIGHUP is delivered at > exactly the right point. > > zw > Sorry for the long delay. I've give it a try. The signal handler is now called while we are in the syscall in ppoll called in tf_body. Before, the signal handler arrived while we were in the syscall in read called in tf_body. In both cases (read or ppoll), we are in a syscall surrounded with __libc_enable_asynccancel and __libc_disable_asynccancel. If that's okay, I'll proceed with the patch. Bye, Stefan
commit e89d0d6819799737a6133d3994e6f38d00c745e4 Author: Stefan Liebler <stli@linux.ibm.com> Date: Wed Sep 11 09:43:08 2019 +0200 Speedup various nptl/tst-cancel20 and nptl/tst-cancel21 tests. Each of thease tests - tst-cancel20, tst-cancelx20, tst-cancel21, tst-cancelx21 and tst-cancel21-static - runs for 8s. do_one_test is called 4 times. Either in do_one_test (tst-cancel20.c) or in tf (tst-cancel21.c) "sleep (1)" is called 2 times. This patch reduces the sleep time. Using usleep (5ms) leads to a runtime of one tst-cancel... invocation of roughly 40ms instead of 8s. As the nptl tests run in sequence, this patch saves roughly 39s of runtime for "make check". ChangeLog: * nptl/tst-cancel20.c (do_one_test): Reduce sleep times. * nptl/tst-cancel21.c (tf): Likewise. diff --git a/nptl/tst-cancel20.c b/nptl/tst-cancel20.c index 96dfb71fa9..abf250d508 100644 --- a/nptl/tst-cancel20.c +++ b/nptl/tst-cancel20.c @@ -126,7 +126,7 @@ do_one_test (void) return 1; } - sleep (1); + usleep (5000); r = pthread_kill (th, SIGHUP); if (r) @@ -137,7 +137,7 @@ do_one_test (void) } while (in_sh_body == 0) - sleep (1); + usleep (5000); if (pthread_cancel (th) != 0) { diff --git a/nptl/tst-cancel21.c b/nptl/tst-cancel21.c index 361510b027..371759c551 100644 --- a/nptl/tst-cancel21.c +++ b/nptl/tst-cancel21.c @@ -104,7 +104,7 @@ tf (void *arg) exit (1); } - sleep (1); + usleep (5000); r = pthread_kill (th, SIGHUP); if (r) @@ -115,7 +115,7 @@ tf (void *arg) } while (in_sh_body == 0) - sleep (1); + usleep (5000); if (pthread_cancel (th) != 0) {