[v2,6/6] zram: Increase timeout according to used devices

Message ID	20210129194144.31299-7-pvorel@suse.cz
State	Changes Requested
Headers	show Return-Path: <ltp-bounces+incoming=patchwork.ozlabs.org@lists.linux.it> From: Petr Vorel <pvorel@suse.cz> To: ltp@lists.linux.it Date: Fri, 29 Jan 2021 20:41:44 +0100 Message-Id: <20210129194144.31299-7-pvorel@suse.cz> In-Reply-To: <20210129194144.31299-1-pvorel@suse.cz> References: <20210129194144.31299-1-pvorel@suse.cz> MIME-Version: 1.0 Subject: [LTP] [PATCH v2 6/6] zram: Increase timeout according to used devices Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ltp-bounces+incoming=patchwork.ozlabs.org@lists.linux.it Sender: "ltp" <ltp-bounces+incoming=patchwork.ozlabs.org@lists.linux.it>
Series	zram cleanup \| expand [v2,0/6] zram cleanup [v2,1/6] zram: Calculate dev_num variable [v2,2/6] zram01.sh: Generate test setup variables in setup [v2,3/6] zram: Move zram_compress_alg() to zram02.sh [v2,4/6] zram: Move test specific functions out of zram_lib.sh [RFC,v2,5/6] tst_test.sh: Run _tst_setup_timer after $TST_SETUP [v2,6/6] zram: Increase timeout according to used devices

Petr Vorel Jan. 29, 2021, 7:41 p.m. UTC

to avoid unexpected timeout, which occurred even on just 4 devices.
Default setup is 300, using 200 should be safe.

Signed-off-by: Petr Vorel <pvorel@suse.cz>
---
New in v2.

 testcases/kernel/device-drivers/zram/zram_lib.sh | 2 ++
 1 file changed, 2 insertions(+)

Petr Vorel Jan. 29, 2021, 8:10 p.m. UTC | #1

Hi,

> +	TST_TIMEOUT=$((dev_num*200))
Actually on heavy loaded machine this is not enough due BTRFS.
I can add something like dev_num*600 or even -1 (then previous commit would not
be needed, but IMHO still useful).

Kind regards,
Petr

Petr Vorel Jan. 29, 2021, 8:15 p.m. UTC | #2

> Hi,

> > +	TST_TIMEOUT=$((dev_num*200))
> Actually on heavy loaded machine this is not enough due BTRFS.
> I can add something like dev_num*600 or even -1 (then previous commit would not
> be needed, but IMHO still useful).
And bad thing is that it breaks other zram tests, because the timer probably
does not allow to run the cleanup:

_tst_setup_timer()
{
...
	sleep $sec && tst_res TBROK "test killed, timeout! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9 -$pid &

I'm not sure if shell allow us to do it better. Maybe sent different signal than
SIGKILL and define 'trap _tst_do_exit' for that signal?

Kind regards,
Petr

Li Wang Jan. 30, 2021, 8:26 a.m. UTC | #3

On Sat, Jan 30, 2021 at 4:16 AM Petr Vorel <pvorel@suse.cz> wrote:

> > Hi,
>
> > > +   TST_TIMEOUT=$((dev_num*200))
> > Actually on heavy loaded machine this is not enough due BTRFS.
> > I can add something like dev_num*600 or even -1 (then previous commit
> would not
> > be needed, but IMHO still useful).
>

I personally think -1 is better.



> And bad thing is that it breaks other zram tests, because the timer
> probably
> does not allow to run the cleanup:
>
> _tst_setup_timer()
> {
> ...
>         sleep $sec && tst_res TBROK "test killed, timeout! If you are
> running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9
> -$pid &
>
> I'm not sure if shell allow us to do it better. Maybe sent different
> signal than
> SIGKILL and define 'trap _tst_do_exit' for that signal?


Sounds practicable. I guess sending SIGINT could make more sense, since
sometimes we use CTRL+C stop test in debugging by manual, test should
do cleanup work for that behavior too.

Petr Vorel Feb. 2, 2021, 7:38 a.m. UTC | #4

Hi Li,

> On Sat, Jan 30, 2021 at 4:16 AM Petr Vorel <pvorel@suse.cz> wrote:

> > > Hi,

> > > > +   TST_TIMEOUT=$((dev_num*200))
> > > Actually on heavy loaded machine this is not enough due BTRFS.
> > > I can add something like dev_num*600 or even -1 (then previous commit
> > would not
> > > be needed, but IMHO still useful).


> I personally think -1 is better.
OK. In that case we might avoid now unneeded previous commit.



> > And bad thing is that it breaks other zram tests, because the timer
> > probably
> > does not allow to run the cleanup:

> > _tst_setup_timer()
> > {
> > ...
> >         sleep $sec && tst_res TBROK "test killed, timeout! If you are
> > running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9
> > -$pid &

> > I'm not sure if shell allow us to do it better. Maybe sent different
> > signal than
> > SIGKILL and define 'trap _tst_do_exit' for that signal?


> Sounds practicable. I guess sending SIGINT could make more sense, since
> sometimes we use CTRL+C stop test in debugging by manual, test should
> do cleanup work for that behavior too.
We have already SIGINT defined for main shell process:
trap "tst_brk TBROK 'test interrupted'" INT

so CTRL+C is covered. So maybe run first SIGINT and then SIGKILL for safety
reasons?

Kind regards,
Petr

diff --git testcases/lib/tst_test.sh testcases/lib/tst_test.sh
index 4fa1674cd..61a6fbcdd 100644
--- testcases/lib/tst_test.sh
+++ testcases/lib/tst_test.sh
@@ -459,7 +459,7 @@ _tst_setup_timer()
 
 	tst_res TINFO "timeout per run is ${h}h ${m}m ${s}s"
 
-	sleep $sec && tst_res TBROK "test killed, timeout! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9 -$pid &
+	sleep $sec && tst_res TBROK "test killed, timeout! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && { kill -2 -$pid; sleep 5; kill -9 -$pid; } &
 
 	_tst_setup_timer_pid=$!
 }

Li Wang Feb. 2, 2021, 11:25 a.m. UTC | #5

Hi Petr,

On Tue, Feb 2, 2021 at 3:38 PM Petr Vorel <pvorel@suse.cz> wrote:

> Hi Li,
>
> > On Sat, Jan 30, 2021 at 4:16 AM Petr Vorel <pvorel@suse.cz> wrote:
>
> > > > Hi,
>
> > > > > +   TST_TIMEOUT=$((dev_num*200))
> > > > Actually on heavy loaded machine this is not enough due BTRFS.
> > > > I can add something like dev_num*600 or even -1 (then previous commit
> > > would not
> > > > be needed, but IMHO still useful).
>
>
> > I personally think -1 is better.
> OK. In that case we might avoid now unneeded previous commit.
>
>
>
> > > And bad thing is that it breaks other zram tests, because the timer
> > > probably
> > > does not allow to run the cleanup:
>
> > > _tst_setup_timer()
> > > {
> > > ...
> > >         sleep $sec && tst_res TBROK "test killed, timeout! If you are
> > > running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9
> > > -$pid &
>
> > > I'm not sure if shell allow us to do it better. Maybe sent different
> > > signal than
> > > SIGKILL and define 'trap _tst_do_exit' for that signal?
>
>
> > Sounds practicable. I guess sending SIGINT could make more sense, since
> > sometimes we use CTRL+C stop test in debugging by manual, test should
> > do cleanup work for that behavior too.
> We have already SIGINT defined for main shell process:
> trap "tst_brk TBROK 'test interrupted'" INT
>
> so CTRL+C is covered. So maybe run first SIGINT and then SIGKILL for safety
> reasons?
>

I'm fine with using SIGINT + SIGKILL.

Or, maybe another choice is to catch a signal SIGTERM?  Once we
cancel the specified signal in KILL command, it will try with SIGTERM
by default, if the test process can not be terminated, then a SIGKILL
may be sent.

--- a/testcases/lib/tst_test.sh
+++ b/testcases/lib/tst_test.sh
@@ -22,6 +22,7 @@ export TST_LIB_LOADED=1

 # default trap function
 trap "tst_brk TBROK 'test interrupted'" INT
+trap "_tst_do_exit" TERM

 _tst_do_exit()
 {
@@ -459,7 +460,7 @@ _tst_setup_timer()

        tst_res TINFO "timeout per run is ${h}h ${m}m ${s}s"

-       sleep $sec && tst_res TBROK "test killed, timeout! If you are
running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9
-$pid &
+       sleep $sec && tst_res TBROK "test terminated, timeout! If you are
running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill --
-$pid &

        _tst_setup_timer_pid=$!

Cyril Hrubis March 1, 2021, 3:14 p.m. UTC | #6

Hi!
> > +	TST_TIMEOUT=$((dev_num*200))
> Actually on heavy loaded machine this is not enough due BTRFS.
> I can add something like dev_num*600 or even -1 (then previous commit would not
> be needed, but IMHO still useful).

I would still prefer if we had a timeout there, -1 is for something that
cannot be predicted.

Also we do not expect machine to be heavily loaded, in that case half of
LTP tests would time out.

So I would just measure how long the test takes, then multiply it by 5
or something like that and put that in as a timeout.

Petr Vorel March 1, 2021, 3:41 p.m. UTC | #7

> Hi!
> > > +	TST_TIMEOUT=$((dev_num*200))
> > Actually on heavy loaded machine this is not enough due BTRFS.
> > I can add something like dev_num*600 or even -1 (then previous commit would not
> > be needed, but IMHO still useful).

> I would still prefer if we had a timeout there, -1 is for something that
> cannot be predicted.

> Also we do not expect machine to be heavily loaded, in that case half of
> LTP tests would time out.

> So I would just measure how long the test takes, then multiply it by 5
> or something like that and put that in as a timeout.
Do you mean to use high enough static timeout defined before startup (working
for maximum possible filesystems)? And create tst_set_timeout() for shell as
independent effort?

Kind regards,
Petr

Petr Vorel March 1, 2021, 3:45 p.m. UTC | #8

> > Hi!
> > > > +	TST_TIMEOUT=$((dev_num*200))
> > > Actually on heavy loaded machine this is not enough due BTRFS.
> > > I can add something like dev_num*600 or even -1 (then previous commit would not
> > > be needed, but IMHO still useful).

> > I would still prefer if we had a timeout there, -1 is for something that
> > cannot be predicted.

> > Also we do not expect machine to be heavily loaded, in that case half of
> > LTP tests would time out.

> > So I would just measure how long the test takes, then multiply it by 5
> > or something like that and put that in as a timeout.
> Do you mean to use high enough static timeout defined before startup (working
> for maximum possible filesystems)? And create tst_set_timeout() for shell as
> independent effort?

BTW looking at the docs for C API tst_set_timeout() is expected to be run in the
setup function. Thus this patchset can be reused (just extended).

Kind regards,
Petr

> Kind regards,
> Petr

Cyril Hrubis March 1, 2021, 3:48 p.m. UTC | #9

Hi!
> > I would still prefer if we had a timeout there, -1 is for something that
> > cannot be predicted.
> 
> > Also we do not expect machine to be heavily loaded, in that case half of
> > LTP tests would time out.
> 
> > So I would just measure how long the test takes, then multiply it by 5
> > or something like that and put that in as a timeout.
> Do you mean to use high enough static timeout defined before startup (working
> for maximum possible filesystems)? And create tst_set_timeout() for shell as
> independent effort?

I would do:

* Add tst_set_timeout for shell, so that it mirrors the C API

* Measure runtime of the test divide it by the number of supported
  filesystems, that way we would get mean runtime per filesystem

  now I would multiply this number with arbitrary constantm, e.g. 5 or
  even more, this is now timeout per iteration

  with this number the actuall timeout would be:

  number_of_filesystems * mean_max_per_run


Does this sound sane?

I guess that in the end we will end up with something similar what you
had there, but we would have some data that supports this decision.

Petr Vorel March 1, 2021, 4:18 p.m. UTC | #10

> Hi!
> > > I would still prefer if we had a timeout there, -1 is for something that
> > > cannot be predicted.

> > > Also we do not expect machine to be heavily loaded, in that case half of
> > > LTP tests would time out.

> > > So I would just measure how long the test takes, then multiply it by 5
> > > or something like that and put that in as a timeout.
> > Do you mean to use high enough static timeout defined before startup (working
> > for maximum possible filesystems)? And create tst_set_timeout() for shell as
> > independent effort?

> I would do:

> * Add tst_set_timeout for shell, so that it mirrors the C API
+1. BTW C has .all_filesystems for timeout for each run, which allows not to
bother with timeout depending on number of filesystems (unlike fuzzy sync, which
also sometimes needs tweak fzsync_pair.exec_time_p). I'm for ignoring this fact,
just to let know that shell API is far behind C API.

> * Measure runtime of the test divide it by the number of supported
>   filesystems, that way we would get mean runtime per filesystem

>   now I would multiply this number with arbitrary constantm, e.g. 5 or
>   even more, this is now timeout per iteration

>   with this number the actuall timeout would be:

>   number_of_filesystems * mean_max_per_run


> Does this sound sane?
+1, thanks!

> I guess that in the end we will end up with something similar what you
> had there, but we would have some data that supports this decision.
+1

Kind regards,
Petr

[v2,6/6] zram: Increase timeout according to used devices

Commit Message

Comments

Patch