Message ID | 20210129194144.31299-7-pvorel@suse.cz |
---|---|
State | Changes Requested |
Headers | show |
Series | zram cleanup | expand |
Hi,
> + TST_TIMEOUT=$((dev_num*200))
Actually on heavy loaded machine this is not enough due BTRFS.
I can add something like dev_num*600 or even -1 (then previous commit would not
be needed, but IMHO still useful).
Kind regards,
Petr
> Hi, > > + TST_TIMEOUT=$((dev_num*200)) > Actually on heavy loaded machine this is not enough due BTRFS. > I can add something like dev_num*600 or even -1 (then previous commit would not > be needed, but IMHO still useful). And bad thing is that it breaks other zram tests, because the timer probably does not allow to run the cleanup: _tst_setup_timer() { ... sleep $sec && tst_res TBROK "test killed, timeout! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9 -$pid & I'm not sure if shell allow us to do it better. Maybe sent different signal than SIGKILL and define 'trap _tst_do_exit' for that signal? Kind regards, Petr
On Sat, Jan 30, 2021 at 4:16 AM Petr Vorel <pvorel@suse.cz> wrote: > > Hi, > > > > + TST_TIMEOUT=$((dev_num*200)) > > Actually on heavy loaded machine this is not enough due BTRFS. > > I can add something like dev_num*600 or even -1 (then previous commit > would not > > be needed, but IMHO still useful). > I personally think -1 is better. > And bad thing is that it breaks other zram tests, because the timer > probably > does not allow to run the cleanup: > > _tst_setup_timer() > { > ... > sleep $sec && tst_res TBROK "test killed, timeout! If you are > running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9 > -$pid & > > I'm not sure if shell allow us to do it better. Maybe sent different > signal than > SIGKILL and define 'trap _tst_do_exit' for that signal? Sounds practicable. I guess sending SIGINT could make more sense, since sometimes we use CTRL+C stop test in debugging by manual, test should do cleanup work for that behavior too.
Hi Li, > On Sat, Jan 30, 2021 at 4:16 AM Petr Vorel <pvorel@suse.cz> wrote: > > > Hi, > > > > + TST_TIMEOUT=$((dev_num*200)) > > > Actually on heavy loaded machine this is not enough due BTRFS. > > > I can add something like dev_num*600 or even -1 (then previous commit > > would not > > > be needed, but IMHO still useful). > I personally think -1 is better. OK. In that case we might avoid now unneeded previous commit. > > And bad thing is that it breaks other zram tests, because the timer > > probably > > does not allow to run the cleanup: > > _tst_setup_timer() > > { > > ... > > sleep $sec && tst_res TBROK "test killed, timeout! If you are > > running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9 > > -$pid & > > I'm not sure if shell allow us to do it better. Maybe sent different > > signal than > > SIGKILL and define 'trap _tst_do_exit' for that signal? > Sounds practicable. I guess sending SIGINT could make more sense, since > sometimes we use CTRL+C stop test in debugging by manual, test should > do cleanup work for that behavior too. We have already SIGINT defined for main shell process: trap "tst_brk TBROK 'test interrupted'" INT so CTRL+C is covered. So maybe run first SIGINT and then SIGKILL for safety reasons? Kind regards, Petr diff --git testcases/lib/tst_test.sh testcases/lib/tst_test.sh index 4fa1674cd..61a6fbcdd 100644 --- testcases/lib/tst_test.sh +++ testcases/lib/tst_test.sh @@ -459,7 +459,7 @@ _tst_setup_timer() tst_res TINFO "timeout per run is ${h}h ${m}m ${s}s" - sleep $sec && tst_res TBROK "test killed, timeout! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9 -$pid & + sleep $sec && tst_res TBROK "test killed, timeout! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && { kill -2 -$pid; sleep 5; kill -9 -$pid; } & _tst_setup_timer_pid=$! }
Hi Petr, On Tue, Feb 2, 2021 at 3:38 PM Petr Vorel <pvorel@suse.cz> wrote: > Hi Li, > > > On Sat, Jan 30, 2021 at 4:16 AM Petr Vorel <pvorel@suse.cz> wrote: > > > > > Hi, > > > > > > + TST_TIMEOUT=$((dev_num*200)) > > > > Actually on heavy loaded machine this is not enough due BTRFS. > > > > I can add something like dev_num*600 or even -1 (then previous commit > > > would not > > > > be needed, but IMHO still useful). > > > > I personally think -1 is better. > OK. In that case we might avoid now unneeded previous commit. > > > > > > And bad thing is that it breaks other zram tests, because the timer > > > probably > > > does not allow to run the cleanup: > > > > _tst_setup_timer() > > > { > > > ... > > > sleep $sec && tst_res TBROK "test killed, timeout! If you are > > > running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9 > > > -$pid & > > > > I'm not sure if shell allow us to do it better. Maybe sent different > > > signal than > > > SIGKILL and define 'trap _tst_do_exit' for that signal? > > > > Sounds practicable. I guess sending SIGINT could make more sense, since > > sometimes we use CTRL+C stop test in debugging by manual, test should > > do cleanup work for that behavior too. > We have already SIGINT defined for main shell process: > trap "tst_brk TBROK 'test interrupted'" INT > > so CTRL+C is covered. So maybe run first SIGINT and then SIGKILL for safety > reasons? > I'm fine with using SIGINT + SIGKILL. Or, maybe another choice is to catch a signal SIGTERM? Once we cancel the specified signal in KILL command, it will try with SIGTERM by default, if the test process can not be terminated, then a SIGKILL may be sent. --- a/testcases/lib/tst_test.sh +++ b/testcases/lib/tst_test.sh @@ -22,6 +22,7 @@ export TST_LIB_LOADED=1 # default trap function trap "tst_brk TBROK 'test interrupted'" INT +trap "_tst_do_exit" TERM _tst_do_exit() { @@ -459,7 +460,7 @@ _tst_setup_timer() tst_res TINFO "timeout per run is ${h}h ${m}m ${s}s" - sleep $sec && tst_res TBROK "test killed, timeout! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -9 -$pid & + sleep $sec && tst_res TBROK "test terminated, timeout! If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1" && kill -- -$pid & _tst_setup_timer_pid=$!
Hi! > > + TST_TIMEOUT=$((dev_num*200)) > Actually on heavy loaded machine this is not enough due BTRFS. > I can add something like dev_num*600 or even -1 (then previous commit would not > be needed, but IMHO still useful). I would still prefer if we had a timeout there, -1 is for something that cannot be predicted. Also we do not expect machine to be heavily loaded, in that case half of LTP tests would time out. So I would just measure how long the test takes, then multiply it by 5 or something like that and put that in as a timeout.
> Hi! > > > + TST_TIMEOUT=$((dev_num*200)) > > Actually on heavy loaded machine this is not enough due BTRFS. > > I can add something like dev_num*600 or even -1 (then previous commit would not > > be needed, but IMHO still useful). > I would still prefer if we had a timeout there, -1 is for something that > cannot be predicted. > Also we do not expect machine to be heavily loaded, in that case half of > LTP tests would time out. > So I would just measure how long the test takes, then multiply it by 5 > or something like that and put that in as a timeout. Do you mean to use high enough static timeout defined before startup (working for maximum possible filesystems)? And create tst_set_timeout() for shell as independent effort? Kind regards, Petr
> > Hi! > > > > + TST_TIMEOUT=$((dev_num*200)) > > > Actually on heavy loaded machine this is not enough due BTRFS. > > > I can add something like dev_num*600 or even -1 (then previous commit would not > > > be needed, but IMHO still useful). > > I would still prefer if we had a timeout there, -1 is for something that > > cannot be predicted. > > Also we do not expect machine to be heavily loaded, in that case half of > > LTP tests would time out. > > So I would just measure how long the test takes, then multiply it by 5 > > or something like that and put that in as a timeout. > Do you mean to use high enough static timeout defined before startup (working > for maximum possible filesystems)? And create tst_set_timeout() for shell as > independent effort? BTW looking at the docs for C API tst_set_timeout() is expected to be run in the setup function. Thus this patchset can be reused (just extended). Kind regards, Petr > Kind regards, > Petr
Hi! > > I would still prefer if we had a timeout there, -1 is for something that > > cannot be predicted. > > > Also we do not expect machine to be heavily loaded, in that case half of > > LTP tests would time out. > > > So I would just measure how long the test takes, then multiply it by 5 > > or something like that and put that in as a timeout. > Do you mean to use high enough static timeout defined before startup (working > for maximum possible filesystems)? And create tst_set_timeout() for shell as > independent effort? I would do: * Add tst_set_timeout for shell, so that it mirrors the C API * Measure runtime of the test divide it by the number of supported filesystems, that way we would get mean runtime per filesystem now I would multiply this number with arbitrary constantm, e.g. 5 or even more, this is now timeout per iteration with this number the actuall timeout would be: number_of_filesystems * mean_max_per_run Does this sound sane? I guess that in the end we will end up with something similar what you had there, but we would have some data that supports this decision.
> Hi! > > > I would still prefer if we had a timeout there, -1 is for something that > > > cannot be predicted. > > > Also we do not expect machine to be heavily loaded, in that case half of > > > LTP tests would time out. > > > So I would just measure how long the test takes, then multiply it by 5 > > > or something like that and put that in as a timeout. > > Do you mean to use high enough static timeout defined before startup (working > > for maximum possible filesystems)? And create tst_set_timeout() for shell as > > independent effort? > I would do: > * Add tst_set_timeout for shell, so that it mirrors the C API +1. BTW C has .all_filesystems for timeout for each run, which allows not to bother with timeout depending on number of filesystems (unlike fuzzy sync, which also sometimes needs tweak fzsync_pair.exec_time_p). I'm for ignoring this fact, just to let know that shell API is far behind C API. > * Measure runtime of the test divide it by the number of supported > filesystems, that way we would get mean runtime per filesystem > now I would multiply this number with arbitrary constantm, e.g. 5 or > even more, this is now timeout per iteration > with this number the actuall timeout would be: > number_of_filesystems * mean_max_per_run > Does this sound sane? +1, thanks! > I guess that in the end we will end up with something similar what you > had there, but we would have some data that supports this decision. +1 Kind regards, Petr
diff --git a/testcases/kernel/device-drivers/zram/zram_lib.sh b/testcases/kernel/device-drivers/zram/zram_lib.sh index 26ed1846b..c24b83cfc 100755 --- a/testcases/kernel/device-drivers/zram/zram_lib.sh +++ b/testcases/kernel/device-drivers/zram/zram_lib.sh @@ -47,6 +47,8 @@ zram_load() tst_brk TBROK "dev_num must be > 0" fi + TST_TIMEOUT=$((dev_num*200)) + tst_res TINFO "create '$dev_num' zram device(s)" modprobe zram num_devices=$dev_num || \
to avoid unexpected timeout, which occurred even on just 4 devices. Default setup is 300, using 200 should be safe. Signed-off-by: Petr Vorel <pvorel@suse.cz> --- New in v2. testcases/kernel/device-drivers/zram/zram_lib.sh | 2 ++ 1 file changed, 2 insertions(+)