Message ID | 20190104095256.12266-1-liwang@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | [RFC] fzsync: tst_fzsync_pair_wait exit when parent hit accidental break | expand |
Hello, Li Wang <liwang@redhat.com> writes: > For system(rhel7.6, s390x) without __NR_recvmmsg supported, run > cve-2016-7117 result in timeout and killed by LTP framework. The > root reason is tst_syscall break with cleanup() function calling > in this trace path: > > tst_syscall(__NR_recvmmsg, ...) > tst_brk() > cleanup() > tst_fzsync_pair_cleanup() > SAFE_PTHREAD_JOIN(pair->thread_b, NULL); > > cve-2016-7117 hung at here to wait for thread_b send_and_close() finishing. > But thread_b fall into infinite loop because of tst_fzsync_wait_b without > an extra condition to exit. Eventually, test get timeout error like: > > cve-2016-7117.c:145: CONF: syscall(-1) __NR_recvmmsg not supported > Test timeouted, sending SIGKILL! > tst_test.c:1125: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 > tst_test.c:1126: BROK: Test killed! (timeout?) > > Signed-off-by: Li Wang <liwang@redhat.com> > Cc: Richard Palethorpe <rpalethorpe@suse.com> > --- > include/tst_fuzzy_sync.h | 20 ++++++++++++-------- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h > index de0402c9b..7e4d48f0a 100644 > --- a/include/tst_fuzzy_sync.h > +++ b/include/tst_fuzzy_sync.h > @@ -517,7 +517,8 @@ static void tst_fzsync_pair_update(struct tst_fzsync_pair *pair) > * @return A non-zero value if the thread should continue otherwise the > * calling thread should exit. > */ > -static inline void tst_fzsync_pair_wait(int *our_cntr, > +static inline void tst_fzsync_pair_wait(struct tst_fzsync_pair *pair, > + int *our_cntr, > int *other_cntr, > int *spins) > { > @@ -530,7 +531,8 @@ static inline void tst_fzsync_pair_wait(int *our_cntr, > * then our counter may already have been set to zero. > */ > while (tst_atomic_load(our_cntr) > 0 > - && tst_atomic_load(our_cntr) < INT_MAX) { > + && tst_atomic_load(our_cntr) < INT_MAX > + && !tst_atomic_load(&pair->exit)) { > if (spins) > (*spins)++; > } > @@ -540,14 +542,16 @@ static inline void tst_fzsync_pair_wait(int *our_cntr, > * Once both counters have been set to zero the invariant > * is restored and we can continue. > */ > - while (tst_atomic_load(our_cntr) > 1) > + while (tst_atomic_load(our_cntr) > 1 > + && !tst_atomic_load(&pair->exit)) > ; > } else { > /* > * If our counter is less than the other thread's we are ahead > * of it and need to wait. > */ > - while (tst_atomic_load(our_cntr) < tst_atomic_load(other_cntr)) { > + while (tst_atomic_load(our_cntr) < tst_atomic_load(other_cntr) > + && !tst_atomic_load(&pair->exit)) { > if (spins) > (*spins)++; > } This is how it worked before, so it is fairly safe. However I don't like atomically checking for the exit value on every spin of the delay loop. Also because setting exit just causes it to drop through there is still the (theoretical) risk of it getting stuck on another operation before breaking out of thread B's main loop. Also removing the exit variable makes formal verification a bit easier. Another option might be to use pthread_kill with a realtime signal and a signal handler which immediately exits the current thread. I am not sure how much complexity that will introduce though? -- Thank you, Richard.
Richard Palethorpe <rpalethorpe@suse.de> wrote: > > This is how it worked before, so it is fairly safe. However I don't like > atomically checking for the exit value on every spin of the delay > loop. Also because setting exit just causes it to drop through there is > still the (theoretical) risk of it getting stuck on another operation > before breaking out of thread B's main loop. Yes, and I noticed you removed the exit checking in last update, but I didn't realize that thread B will fall into infinite loop when parent is break abnormally. > > Also removing the exit variable makes formal verification a bit easier. Good point. > > Another option might be to use pthread_kill with a realtime signal and > a signal handler which immediately exits the current thread. I am not > sure how much complexity that will introduce though? Well we can have a try, seems the only disadvantage of this method is thread_B sets signal handler at each loop start in tst_fzsync_run_b repeatedly. Not sure if I understand correctly, what drafted in my mind is: diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h index de0402c9b..6ef6bee01 100644 --- a/include/tst_fuzzy_sync.h +++ b/include/tst_fuzzy_sync.h @@ -63,6 +63,8 @@ #include <time.h> #include <math.h> #include <stdlib.h> +#include <pthread.h> +#include <errno.h> #include "tst_atomic.h" #include "tst_timer.h" #include "tst_safe_pthread.h" @@ -156,8 +158,6 @@ struct tst_fzsync_pair { int a_cntr; /** Internal; Atomic counter used by fzsync_pair_wait() */ int b_cntr; - /** Internal; Used by tst_fzsync_pair_exit() and fzsync_pair_wait() */ - int exit; /** * The maximum desired execution time as a proportion of the timeout * @@ -217,13 +217,28 @@ static void tst_fzsync_pair_init(struct tst_fzsync_pair *pair) */ static void tst_fzsync_pair_cleanup(struct tst_fzsync_pair *pair) { + int kill_ret; + if (pair->thread_b) { - tst_atomic_store(1, &pair->exit); - SAFE_PTHREAD_JOIN(pair->thread_b, NULL); - pair->thread_b = 0; + kill_ret = pthread_kill(pair->thread_b, SIGUSR1); + + if(kill_ret == 0) { + SAFE_PTHREAD_JOIN(pair->thread_b, NULL); + pair->thread_b = 0; + } else if (kill_ret == EINVAL) { + tst_res(TINFO, "Invalid signal was specified"); + } else if (kill_ret == ESRCH) { + tst_res(TINFO, "thread_b is not exist"); + } } } +static void sighandler(int sig) +{ + if (sig == SIGUSR1) + pthread_exit(NULL); +} + /** * Zero some stat fields * @@ -270,7 +285,6 @@ static void tst_fzsync_pair_reset(struct tst_fzsync_pair *pair, pair->a_cntr = 0; pair->b_cntr = 0; - pair->exit = 0; if (run_b) SAFE_PTHREAD_CREATE(&pair->thread_b, 0, run_b, 0); @@ -613,7 +627,6 @@ static inline int tst_fzsync_run_a(struct tst_fzsync_pair *pair) exit = 1; } - tst_atomic_store(exit, &pair->exit); tst_fzsync_wait_a(pair); if (exit) { @@ -632,8 +645,9 @@ static inline int tst_fzsync_run_a(struct tst_fzsync_pair *pair) */ static inline int tst_fzsync_run_b(struct tst_fzsync_pair *pair) { + SAFE_SIGNAL(SIGUSR1, sighandler); tst_fzsync_wait_b(pair); - return !tst_atomic_load(&pair->exit); + return 1; } /**
Hello Li, Li Wang <liwang@redhat.com> writes: > Richard Palethorpe <rpalethorpe@suse.de> wrote: >> >> This is how it worked before, so it is fairly safe. However I don't like >> atomically checking for the exit value on every spin of the delay >> loop. Also because setting exit just causes it to drop through there is >> still the (theoretical) risk of it getting stuck on another operation >> before breaking out of thread B's main loop. > > Yes, and I noticed you removed the exit checking in last update, but I > didn't realize that thread B will fall into infinite loop when parent > is break abnormally. Me too :-) > >> >> Also removing the exit variable makes formal verification a bit easier. > > Good point. > >> >> Another option might be to use pthread_kill with a realtime signal and >> a signal handler which immediately exits the current thread. I am not >> sure how much complexity that will introduce though? > > Well we can have a try, seems the only disadvantage of this method is > thread_B sets signal handler at each loop start in tst_fzsync_run_b > repeatedly. We could wrap thread B's main function 'run_b', which is passed to tst_fzsync_pair_reset, in another function which sets the singal handler at the start of the thread. > > Not sure if I understand correctly, what drafted in my mind is: > > diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h > index de0402c9b..6ef6bee01 100644 > --- a/include/tst_fuzzy_sync.h > +++ b/include/tst_fuzzy_sync.h > @@ -63,6 +63,8 @@ > #include <time.h> > #include <math.h> > #include <stdlib.h> > +#include <pthread.h> > +#include <errno.h> > #include "tst_atomic.h" > #include "tst_timer.h" > #include "tst_safe_pthread.h" > @@ -156,8 +158,6 @@ struct tst_fzsync_pair { > int a_cntr; > /** Internal; Atomic counter used by fzsync_pair_wait() */ > int b_cntr; > - /** Internal; Used by tst_fzsync_pair_exit() and fzsync_pair_wait() */ > - int exit; I was thinking of keeping the exit variable and using the kill signal as a backup. The reason being it should allow thread B to exit gracefully in most scenarious. In theory this should not matter because the test writer should not do any setup in thread B, but it might result in some wierd error/warning messages being printed for some tests. Unfortunately pthread_join has no timeout and pthread_timedjoin_np is non-standard. Another option might be to spin-wait for 'exit' to be incremented to 2 by thread B and send the signal after some arbitrarily large number of spins. What do you think? -- Thank you, Richard.
Hi Richard, Richard Palethorpe <rpalethorpe@suse.de> wrote: > > Well we can have a try, seems the only disadvantage of this method is > > thread_B sets signal handler at each loop start in tst_fzsync_run_b > > repeatedly. > > We could wrap thread B's main function 'run_b', which is passed to > tst_fzsync_pair_reset, in another function which sets the singal handler > at the start of the thread. Good suggestion! This make sense to me. > > - /** Internal; Used by tst_fzsync_pair_exit() and fzsync_pair_wait() */ > > - int exit; > > I was thinking of keeping the exit variable and using the kill signal as > a backup. The reason being it should allow thread B to exit gracefully > in most scenarious. In theory this should not matter because the test > writer should not do any setup in thread B, but it might result in some > wierd error/warning messages being printed for some tests. Yes, that's not a bad solution, but I was a little worried before is that would make things a bit mixed for the thread exiting. However, if we use pair->exit only for normal exiting and signal for unexpected abort, that's also accessible I guess. > > Unfortunately pthread_join has no timeout and pthread_timedjoin_np is > non-standard. or maybe we could achieve a LTP private pthread_timedjoin_np version? I haven't look into more about that so have no idea for the detail/complexity. > > Another option might be to spin-wait for 'exit' to be incremented to 2 > by thread B and send the signal after some arbitrarily large number of > spins. What do you think? Hmm, what's the best value for arbitrarily large number? it seems hard to decide. Comparing the above approaches, currently it's hard to say which one is better. If I have to make a choice, I'd like to try the first method: pair->exit (for normal exiting) + signal(for unexpected abort).
Hello Li, Li Wang <liwang@redhat.com> writes: > Hi Richard, > > Richard Palethorpe <rpalethorpe@suse.de> wrote: > >> > Well we can have a try, seems the only disadvantage of this method is >> > thread_B sets signal handler at each loop start in tst_fzsync_run_b >> > repeatedly. >> >> We could wrap thread B's main function 'run_b', which is passed to >> tst_fzsync_pair_reset, in another function which sets the singal handler >> at the start of the thread. > > Good suggestion! This make sense to me. > >> > - /** Internal; Used by tst_fzsync_pair_exit() and fzsync_pair_wait() */ >> > - int exit; >> >> I was thinking of keeping the exit variable and using the kill signal as >> a backup. The reason being it should allow thread B to exit gracefully >> in most scenarious. In theory this should not matter because the test >> writer should not do any setup in thread B, but it might result in some >> wierd error/warning messages being printed for some tests. > > Yes, that's not a bad solution, but I was a little worried before is > that would make things a bit mixed for the thread exiting. However, if > we use pair->exit only for normal exiting and signal for unexpected > abort, that's also accessible I guess. > >> >> Unfortunately pthread_join has no timeout and pthread_timedjoin_np is >> non-standard. > > or maybe we could achieve a LTP private pthread_timedjoin_np version? I haven't > look into more about that so have no idea for the detail/complexity. > >> >> Another option might be to spin-wait for 'exit' to be incremented to 2 >> by thread B and send the signal after some arbitrarily large number of >> spins. What do you think? > > Hmm, what's the best value for arbitrarily large number? it seems hard > to decide. > > Comparing the above approaches, currently it's hard to say which one > is better. If I have to make a choice, I'd like to try the first > method: pair->exit (for normal exiting) + signal(for unexpected > abort). Yes, this is better than adding another arbitrary constant IMO. -- Thank you, Richard.
Hi! > For system(rhel7.6, s390x) without __NR_recvmmsg supported, run > cve-2016-7117 result in timeout and killed by LTP framework. The > root reason is tst_syscall break with cleanup() function calling > in this trace path: > > tst_syscall(__NR_recvmmsg, ...) > tst_brk() > cleanup() > tst_fzsync_pair_cleanup() > SAFE_PTHREAD_JOIN(pair->thread_b, NULL); > > cve-2016-7117 hung at here to wait for thread_b send_and_close() finishing. > But thread_b fall into infinite loop because of tst_fzsync_wait_b without > an extra condition to exit. Eventually, test get timeout error like: > > cve-2016-7117.c:145: CONF: syscall(-1) __NR_recvmmsg not supported > Test timeouted, sending SIGKILL! > tst_test.c:1125: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 > tst_test.c:1126: BROK: Test killed! (timeout?) Looks like the complete solution will be more complex, so what about we do a simple solution that would make it to the release? We can change the cve-2016-7117 to check if __NR_recvmmsg() is supported in the test setup(), then we can avoid this problem to begin with.
On Tue, Jan 8, 2019, 21:57 Cyril Hrubis <chrubis@suse.cz wrote: > Hi! > > For system(rhel7.6, s390x) without __NR_recvmmsg supported, run > > cve-2016-7117 result in timeout and killed by LTP framework. The > > root reason is tst_syscall break with cleanup() function calling > > in this trace path: > > > > tst_syscall(__NR_recvmmsg, ...) > > tst_brk() > > cleanup() > > tst_fzsync_pair_cleanup() > > SAFE_PTHREAD_JOIN(pair->thread_b, NULL); > > > > cve-2016-7117 hung at here to wait for thread_b send_and_close() > finishing. > > But thread_b fall into infinite loop because of tst_fzsync_wait_b without > > an extra condition to exit. Eventually, test get timeout error like: > > > > cve-2016-7117.c:145: CONF: syscall(-1) __NR_recvmmsg not supported > > Test timeouted, sending SIGKILL! > > tst_test.c:1125: INFO: If you are running on slow machine, try > exporting LTP_TIMEOUT_MUL > 1 > > tst_test.c:1126: BROK: Test killed! (timeout?) > > Looks like the complete solution will be more complex, so what about we > do a simple solution that would make it to the release? > Sure, I'm OK to delay the solution. > We can change the cve-2016-7117 to check if __NR_recvmmsg() is supported > in the test setup(), then we can avoid this problem to begin with. > Feel free to fix in that, thanks! Regards, Li Wang <div dir="auto"><div><div data-smartmail="gmail_signature"><br></div><div class="gmail_quote"><div dir="ltr">On Tue, Jan 8, 2019, 21:57 Cyril Hrubis <<a href="mailto:chrubis@suse.cz">chrubis@suse.cz</a> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi!<br> > For system(rhel7.6, s390x) without __NR_recvmmsg supported, run<br> > cve-2016-7117 result in timeout and killed by LTP framework. The<br> > root reason is tst_syscall break with cleanup() function calling<br> > in this trace path:<br> > <br> > tst_syscall(__NR_recvmmsg, ...)<br> > tst_brk()<br> > cleanup()<br> > tst_fzsync_pair_cleanup()<br> > SAFE_PTHREAD_JOIN(pair->thread_b, NULL);<br> > <br> > cve-2016-7117 hung at here to wait for thread_b send_and_close() finishing.<br> > But thread_b fall into infinite loop because of tst_fzsync_wait_b without<br> > an extra condition to exit. Eventually, test get timeout error like:<br> > <br> > cve-2016-7117.c:145: CONF: syscall(-1) __NR_recvmmsg not supported<br> > Test timeouted, sending SIGKILL!<br> > tst_test.c:1125: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1<br> > tst_test.c:1126: BROK: Test killed! (timeout?)<br> <br> Looks like the complete solution will be more complex, so what about we<br> do a simple solution that would make it to the release?<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Sure, I'm OK to delay the solution.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br> We can change the cve-2016-7117 to check if __NR_recvmmsg() is supported<br> in the test setup(), then we can avoid this problem to begin with.<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Feel free to fix in that, thanks!</div><div dir="auto"><br></div><div dir="auto">Regards,</div><div dir="auto">Li Wang </div></div>
diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h index de0402c9b..7e4d48f0a 100644 --- a/include/tst_fuzzy_sync.h +++ b/include/tst_fuzzy_sync.h @@ -517,7 +517,8 @@ static void tst_fzsync_pair_update(struct tst_fzsync_pair *pair) * @return A non-zero value if the thread should continue otherwise the * calling thread should exit. */ -static inline void tst_fzsync_pair_wait(int *our_cntr, +static inline void tst_fzsync_pair_wait(struct tst_fzsync_pair *pair, + int *our_cntr, int *other_cntr, int *spins) { @@ -530,7 +531,8 @@ static inline void tst_fzsync_pair_wait(int *our_cntr, * then our counter may already have been set to zero. */ while (tst_atomic_load(our_cntr) > 0 - && tst_atomic_load(our_cntr) < INT_MAX) { + && tst_atomic_load(our_cntr) < INT_MAX + && !tst_atomic_load(&pair->exit)) { if (spins) (*spins)++; } @@ -540,14 +542,16 @@ static inline void tst_fzsync_pair_wait(int *our_cntr, * Once both counters have been set to zero the invariant * is restored and we can continue. */ - while (tst_atomic_load(our_cntr) > 1) + while (tst_atomic_load(our_cntr) > 1 + && !tst_atomic_load(&pair->exit)) ; } else { /* * If our counter is less than the other thread's we are ahead * of it and need to wait. */ - while (tst_atomic_load(our_cntr) < tst_atomic_load(other_cntr)) { + while (tst_atomic_load(our_cntr) < tst_atomic_load(other_cntr) + && !tst_atomic_load(&pair->exit)) { if (spins) (*spins)++; } @@ -562,7 +566,7 @@ static inline void tst_fzsync_pair_wait(int *our_cntr, */ static inline void tst_fzsync_wait_a(struct tst_fzsync_pair *pair) { - tst_fzsync_pair_wait(&pair->a_cntr, &pair->b_cntr, NULL); + tst_fzsync_pair_wait(pair, &pair->a_cntr, &pair->b_cntr, NULL); } /** @@ -573,7 +577,7 @@ static inline void tst_fzsync_wait_a(struct tst_fzsync_pair *pair) */ static inline void tst_fzsync_wait_b(struct tst_fzsync_pair *pair) { - tst_fzsync_pair_wait(&pair->b_cntr, &pair->a_cntr, NULL); + tst_fzsync_pair_wait(pair, &pair->b_cntr, &pair->a_cntr, NULL); } /** @@ -678,7 +682,7 @@ static inline void tst_fzsync_start_race_a(struct tst_fzsync_pair *pair) static inline void tst_fzsync_end_race_a(struct tst_fzsync_pair *pair) { tst_fzsync_time(&pair->a_end); - tst_fzsync_pair_wait(&pair->a_cntr, &pair->b_cntr, &pair->spins); + tst_fzsync_pair_wait(pair, &pair->a_cntr, &pair->b_cntr, &pair->spins); } /** @@ -709,7 +713,7 @@ static inline void tst_fzsync_start_race_b(struct tst_fzsync_pair *pair) static inline void tst_fzsync_end_race_b(struct tst_fzsync_pair *pair) { tst_fzsync_time(&pair->b_end); - tst_fzsync_pair_wait(&pair->b_cntr, &pair->a_cntr, &pair->spins); + tst_fzsync_pair_wait(pair, &pair->b_cntr, &pair->a_cntr, &pair->spins); } /**
For system(rhel7.6, s390x) without __NR_recvmmsg supported, run cve-2016-7117 result in timeout and killed by LTP framework. The root reason is tst_syscall break with cleanup() function calling in this trace path: tst_syscall(__NR_recvmmsg, ...) tst_brk() cleanup() tst_fzsync_pair_cleanup() SAFE_PTHREAD_JOIN(pair->thread_b, NULL); cve-2016-7117 hung at here to wait for thread_b send_and_close() finishing. But thread_b fall into infinite loop because of tst_fzsync_wait_b without an extra condition to exit. Eventually, test get timeout error like: cve-2016-7117.c:145: CONF: syscall(-1) __NR_recvmmsg not supported Test timeouted, sending SIGKILL! tst_test.c:1125: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 tst_test.c:1126: BROK: Test killed! (timeout?) Signed-off-by: Li Wang <liwang@redhat.com> Cc: Richard Palethorpe <rpalethorpe@suse.com> --- include/tst_fuzzy_sync.h | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)