diff mbox series

[v2,4/4] fzsync: Limit execution time to prevent test timeouts

Message ID 20180813135908.30752-4-rpalethorpe@suse.com
State Changes Requested
Headers show
Series [v2,1/4] lib: Allow user to easily get LTP_TIMEOUT_MUL value | expand

Commit Message

Richard Palethorpe Aug. 13, 2018, 1:59 p.m. UTC
Use the tst_timer library to check how long the main test loop has been
running for and break the loop if it exceeds 60 seconds. This prevents an
overall test timeout which is reported as a failure.

Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
---
 include/tst_fuzzy_sync.h                      | 47 ++++++++++++-------
 lib/newlib_tests/test16.c                     |  3 +-
 testcases/cve/cve-2014-0196.c                 |  3 +-
 testcases/cve/cve-2016-7117.c                 |  3 +-
 testcases/cve/cve-2017-2671.c                 |  3 +-
 testcases/kernel/syscalls/inotify/inotify09.c |  3 +-
 6 files changed, 41 insertions(+), 21 deletions(-)

Comments

Li Wang Aug. 14, 2018, 9:24 a.m. UTC | #1
Hi Richard,

Richard Palethorpe <rpalethorpe@suse.com> wrote:

Use the tst_timer library to check how long the main test loop has been
> running for and break the loop if it exceeds 60 seconds. This prevents an
>

This summary needs a minor adjustment for v2 too. Otherwise patch set looks
good to me.

Btw, I think the whole fuzzy synchronisation maybe a bit complicated to LTP
users, I read all of the fuzzy_sync functions and spent much time to learn
the code logic :). Of course, if we want to do some improvement work,
better in new divided patch then.

overall test timeout which is reported as a failure.
>
> Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
>

Reviewed-by: Li Wang <liwang@redhat.com>
Richard Palethorpe Aug. 14, 2018, 9:58 a.m. UTC | #2
Li Wang <liwang@redhat.com> writes:

> Hi Richard,
>
> Richard Palethorpe <rpalethorpe@suse.com> wrote:
>
> Use the tst_timer library to check how long the main test loop has been
>> running for and break the loop if it exceeds 60 seconds. This prevents an
>>
>
> This summary needs a minor adjustment for v2 too. Otherwise patch set looks
> good to me.
>
> Btw, I think the whole fuzzy synchronisation maybe a bit complicated to LTP
> users, I read all of the fuzzy_sync functions and spent much time to learn
> the code logic :). Of course, if we want to do some improvement work,
> better in new divided patch then.

Yes, I agree and have some ideas about that (see comments in V1 patches
covering letter). Thanks for the feedback.

Maybe the implementation can detect some common usage mistakes, which I
think is the main issue here. For example if the user has a different
number of waits in one thread than the other, we should be able to
detect that. Then people can learn how to use it more through
experimentation which is much easier than reading the implementation or
even documentation.

>
> overall test timeout which is reported as a failure.
>>
>> Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
>>
>
> Reviewed-by: Li Wang <liwang@redhat.com>


--
Thank you,
Richard.
Richard Palethorpe Aug. 15, 2018, 9:05 a.m. UTC | #3
Hello,

Richard Palethorpe <rpalethorpe@suse.com> writes:

>  /**
>   * tst_fzsync_pair_wait_update_{a,b} - Wait and then recalculate
>   *
> @@ -301,10 +311,26 @@ static inline int tst_fzsync_wait_b(struct tst_fzsync_pair *pair)
>  static inline int tst_fzsync_wait_update_a(struct tst_fzsync_pair *pair)
>  {
>  	static int loop_index;
> +	int timer_state = tst_timer_state_ms(60000);
> +	int exit = 0;
> +
> +	if (!(timer_state & TST_TIMER_STARTED)) {
> +		tst_timer_start(CLOCK_MONOTONIC_RAW);
> +	} else if (timer_state & TST_TIMER_EXPIRED) {
> +		tst_res(TINFO,
> +			"Exceeded fuzzy sync time limit, requesting exit");
> +		exit = 1;

This is not going to work with the -i argument or if the test author
also wants to use tst_timer. I'm going to have to do something
different.

--
Thank you,
Richard.
Li Wang Aug. 16, 2018, 9:27 a.m. UTC | #4
Richard Palethorpe <rpalethorpe@suse.de> wrote:

Hello,
>
> Richard Palethorpe <rpalethorpe@suse.com> writes:
>
> >  /**
> >   * tst_fzsync_pair_wait_update_{a,b} - Wait and then recalculate
> >   *
> > @@ -301,10 +311,26 @@ static inline int tst_fzsync_wait_b(struct
> tst_fzsync_pair *pair)
> >  static inline int tst_fzsync_wait_update_a(struct tst_fzsync_pair
> *pair)
> >  {
> >       static int loop_index;
> > +     int timer_state = tst_timer_state_ms(60000);
> > +     int exit = 0;
> > +
> > +     if (!(timer_state & TST_TIMER_STARTED)) {
> > +             tst_timer_start(CLOCK_MONOTONIC_RAW);
> > +     } else if (timer_state & TST_TIMER_EXPIRED) {
> > +             tst_res(TINFO,
> > +                     "Exceeded fuzzy sync time limit, requesting exit");
> > +             exit = 1;
>
> This is not going to work with the -i argument or if the test author
> also wants to use tst_timer. I'm going to have to do something
> different.


Yes, good catch. I didn't realized that when reviewing the patch.

Maybe we can try to leave the expired time judgement in fzync library and
without using any LTP timer API. But for the -i parameter, it seems hard to
set pair->exit to 0 at a proper time :(.

Now I'm thinking that why not to redesign/improve the fzync first and then
solve the timeout issue. It will be easier than current situation I guess.
Richard Palethorpe Aug. 16, 2018, 11:40 a.m. UTC | #5
Hello,

Li Wang <liwang@redhat.com> writes:

> Richard Palethorpe <rpalethorpe@suse.de> wrote:
>
> Hello,
>>
>> Richard Palethorpe <rpalethorpe@suse.com> writes:
>>
>> >  /**
>> >   * tst_fzsync_pair_wait_update_{a,b} - Wait and then recalculate
>> >   *
>> > @@ -301,10 +311,26 @@ static inline int tst_fzsync_wait_b(struct
>> tst_fzsync_pair *pair)
>> >  static inline int tst_fzsync_wait_update_a(struct tst_fzsync_pair
>> *pair)
>> >  {
>> >       static int loop_index;
>> > +     int timer_state = tst_timer_state_ms(60000);
>> > +     int exit = 0;
>> > +
>> > +     if (!(timer_state & TST_TIMER_STARTED)) {
>> > +             tst_timer_start(CLOCK_MONOTONIC_RAW);
>> > +     } else if (timer_state & TST_TIMER_EXPIRED) {
>> > +             tst_res(TINFO,
>> > +                     "Exceeded fuzzy sync time limit, requesting exit");
>> > +             exit = 1;
>>
>> This is not going to work with the -i argument or if the test author
>> also wants to use tst_timer. I'm going to have to do something
>> different.
>
>
> Yes, good catch. I didn't realized that when reviewing the patch.
>
> Maybe we can try to leave the expired time judgement in fzync library and
> without using any LTP timer API. But for the -i parameter, it seems hard to
> set pair->exit to 0 at a proper time :(.

I have another patch set which allows multiple timers to be used with
tst_timer which I will post soon.

>
> Now I'm thinking that why not to redesign/improve the fzync first and then
> solve the timeout issue. It will be easier than current situation I guess.

Redesigning fzsync is much more difficult in my opinion. At least
cve-2016-7117[1] relies on the delay feature so I can't simply remove it
like I previously said.

The race condition happens on the exit from the recvmmesg and close
syscalls. Close takes a fraction of the time recvmmesg does to execute
so synchronising the two threads before making these syscalls does not
work. They need to be synchronised just as the syscalls are both
exiting. Which requires the close call to be delayed.

We currently do this by taking time stamps at the exit of each
systemcall and using the delay function to try and synchronise these two
points. So that we end up with a time line which looks like the
following (assuming everything worked perfectly):

wait_update        time_stamp
   ^                       ^
   | +--------+ +-------+  |
---|-|  delay |-| close |--|-- - -
   | +--------+ +-------+  |
   | +------------------+  |
---|-|     recvmmesg    |--|-- - -
   | +------------------+  |

With the updated API which I proposed we would get something like the
diagram below where the start of close and recvmmesg are synced. Because
recvmmesg takes so much longer to reach its exit code than close, the
race has an extremely low probability of happening.

start_race     end_race
   ^            ^
   | +-------+  |-------+ |
---|-| close |--| wait  |-|-- - -
   | +-------+  +-------+ |
   | +------------------+ |
---|-|     recvmmesg    |-|-- - -
   | +------------------+ |
                          ^
                         end_race

However this doesn't mean the new API is bad, it just means that it has
to detect that close happens much quicker than recvmmesg and introduce
some random delay before the close. This is non-trivial because we have
to calculate the delay (in spin cycles) from timestamps (in nanoseconds)
taken during start_race and end_race, plus the spin cycles spent waiting
in end_race. On some CPUs the clock's resolution may not be good enough
to get an accurate delta between any of the timestamps, so we will just
have the number cycles spent during wait to work with.

[1] This test is currently quite unreliable, but it at least works
sometimes.

--
Thank you,
Richard.
diff mbox series

Patch

diff --git a/include/tst_fuzzy_sync.h b/include/tst_fuzzy_sync.h
index bcc625e24..75866d73c 100644
--- a/include/tst_fuzzy_sync.h
+++ b/include/tst_fuzzy_sync.h
@@ -42,6 +42,7 @@ 
 #include <time.h>
 #include <math.h>
 #include "tst_atomic.h"
+#include "tst_timer.h"
 
 #ifndef CLOCK_MONOTONIC_RAW
 # define CLOCK_MONOTONIC_RAW CLOCK_MONOTONIC
@@ -265,9 +266,7 @@  static inline int tst_fzsync_pair_wait(struct tst_fzsync_pair *pair,
 			;
 	}
 
-	/* Only exit if we have done the same amount of work as the other thread */
-	return !tst_atomic_load(&pair->exit) ||
-	  tst_atomic_load(other_cntr) <= tst_atomic_load(our_cntr);
+	return !tst_atomic_load(&pair->exit);
 }
 
 static inline int tst_fzsync_wait_a(struct tst_fzsync_pair *pair)
@@ -280,6 +279,17 @@  static inline int tst_fzsync_wait_b(struct tst_fzsync_pair *pair)
 	return tst_fzsync_pair_wait(pair, &pair->b_cntr, &pair->a_cntr);
 }
 
+/**
+ * tst_fzsync_pair_exit - Signal that the other thread should exit
+ *
+ * Causes tst_fzsync_pair_wait() and tst_fzsync_pair_wait_update() to return
+ * 0.
+ */
+static inline void tst_fzsync_pair_exit(struct tst_fzsync_pair *pair)
+{
+	tst_atomic_store(1, &pair->exit);
+}
+
 /**
  * tst_fzsync_pair_wait_update_{a,b} - Wait and then recalculate
  *
@@ -301,10 +311,26 @@  static inline int tst_fzsync_wait_b(struct tst_fzsync_pair *pair)
 static inline int tst_fzsync_wait_update_a(struct tst_fzsync_pair *pair)
 {
 	static int loop_index;
+	int timer_state = tst_timer_state_ms(60000);
+	int exit = 0;
+
+	if (!(timer_state & TST_TIMER_STARTED)) {
+		tst_timer_start(CLOCK_MONOTONIC_RAW);
+	} else if (timer_state & TST_TIMER_EXPIRED) {
+		tst_res(TINFO,
+			"Exceeded fuzzy sync time limit, requesting exit");
+		exit = 1;
+	}
 
 	tst_fzsync_pair_wait(pair, &pair->a_cntr, &pair->b_cntr);
-	loop_index++;
-	tst_fzsync_pair_update(loop_index, pair);
+
+	if (exit) {
+		tst_fzsync_pair_exit(pair);
+	} else {
+		loop_index++;
+		tst_fzsync_pair_update(loop_index, pair);
+	}
+
 	return tst_fzsync_pair_wait(pair, &pair->a_cntr, &pair->b_cntr);
 }
 
@@ -313,14 +339,3 @@  static inline int tst_fzsync_wait_update_b(struct tst_fzsync_pair *pair)
 	tst_fzsync_pair_wait(pair, &pair->b_cntr, &pair->a_cntr);
 	return tst_fzsync_pair_wait(pair, &pair->b_cntr, &pair->a_cntr);
 }
-
-/**
- * tst_fzsync_pair_exit - Signal that the other thread should exit
- *
- * Causes tst_fzsync_pair_wait() and tst_fzsync_pair_wait_update() to return
- * 0.
- */
-static inline void tst_fzsync_pair_exit(struct tst_fzsync_pair *pair)
-{
-	tst_atomic_store(1, &pair->exit);
-}
diff --git a/lib/newlib_tests/test16.c b/lib/newlib_tests/test16.c
index d80bd5369..bb3d86e2d 100644
--- a/lib/newlib_tests/test16.c
+++ b/lib/newlib_tests/test16.c
@@ -65,7 +65,8 @@  static void run(void)
 	unsigned int i, j, fail = 0;
 
 	for (i = 0; i < LOOPS; i++) {
-		tst_fzsync_wait_update_a(&pair);
+		if (!tst_fzsync_wait_update_a(&pair))
+			break;
 		tst_fzsync_delay_a(&pair);
 		seq[seq_n] = 'A';
 		seq_n = i * 2 + 1;
diff --git a/testcases/cve/cve-2014-0196.c b/testcases/cve/cve-2014-0196.c
index d18108897..8d15909b7 100644
--- a/testcases/cve/cve-2014-0196.c
+++ b/testcases/cve/cve-2014-0196.c
@@ -118,7 +118,8 @@  static void run(void)
 		t.c_lflag |= ECHO;
 		tcsetattr(master_fd, TCSANOW, &t);
 
-		tst_fzsync_wait_update_a(&fzsync_pair);
+		if (!tst_fzsync_wait_update_a(&fzsync_pair))
+			break;
 
 		tst_fzsync_delay_a(&fzsync_pair);
 		tst_fzsync_time_a(&fzsync_pair);
diff --git a/testcases/cve/cve-2016-7117.c b/testcases/cve/cve-2016-7117.c
index 3cb7efcdf..d68f62ecc 100644
--- a/testcases/cve/cve-2016-7117.c
+++ b/testcases/cve/cve-2016-7117.c
@@ -140,7 +140,8 @@  static void run(void)
 		if (socketpair(AF_LOCAL, SOCK_DGRAM, 0, (int *)socket_fds))
 			tst_brk(TBROK | TERRNO, "Socket creation failed");
 
-		tst_fzsync_wait_update_a(&fzsync_pair);
+		if (!tst_fzsync_wait_update_a(&fzsync_pair))
+			break;
 		tst_fzsync_delay_a(&fzsync_pair);
 
 		stat = tst_syscall(__NR_recvmmsg,
diff --git a/testcases/cve/cve-2017-2671.c b/testcases/cve/cve-2017-2671.c
index b0471bfff..21dd2a754 100644
--- a/testcases/cve/cve-2017-2671.c
+++ b/testcases/cve/cve-2017-2671.c
@@ -115,7 +115,8 @@  static void run(void)
 		SAFE_CONNECT(sockfd,
 			     (struct sockaddr *)&iaddr, sizeof(iaddr));
 
-		tst_fzsync_wait_update_a(&fzsync_pair);
+		if (!tst_fzsync_wait_update_a(&fzsync_pair))
+			break;
 		tst_fzsync_delay_a(&fzsync_pair);
 		connect(sockfd, (struct sockaddr *)&uaddr, sizeof(uaddr));
 		tst_fzsync_time_a(&fzsync_pair);
diff --git a/testcases/kernel/syscalls/inotify/inotify09.c b/testcases/kernel/syscalls/inotify/inotify09.c
index 475411311..52df38f24 100644
--- a/testcases/kernel/syscalls/inotify/inotify09.c
+++ b/testcases/kernel/syscalls/inotify/inotify09.c
@@ -102,7 +102,8 @@  static void verify_inotify(void)
 		if (wd < 0)
 			tst_brk(TBROK | TERRNO, "inotify_add_watch() failed.");
 
-		tst_fzsync_wait_update_a(&fzsync_pair);
+		if (!tst_fzsync_wait_update_a(&fzsync_pair))
+			break;
 		tst_fzsync_delay_a(&fzsync_pair);
 
 		wd = myinotify_rm_watch(inotify_fd, wd);