Message ID | 1544690160-13900-2-git-send-email-yangx.jy@cn.fujitsu.com |
---|---|
State | Accepted |
Headers | show |
Series | [v3,1/3] lib: Introduce tst_strttype() | expand |
Hi! > 1) Catch and report the TFAIL exit status of child process. Looking at the codebase we do have a few usages of tst_brk(TFAIL, "...") to exit the child process, which sort of works but it's incorrect. The tst_brk() always meant "unrecoverable failure have happened, exit the current process as fast as possible". Looking over our codebase most of the tst_brk(TFAIL, "...") should not actually cause the main test process to exit, these were only meant to exit the child and report the result in one call. It will for instance break the test with -i option on the first failure, which is incorrect. So if we ever want to have a function to exit child process with a result we should implement tst_ret() that would be equivalent to tst_res() followed by exit(0). It could be even implemented as: #define tst_ret(ttype, fmt, ...) \ do { \ tst_res_(__FILE__, __LINE__, (ttype), (fmt), ##__VA_ARGS__); \ exit(0); \ } while (0) This function has one big advantage, it increments the results counters before the child process exits. Actually one of the big points of the new test library was that the results counters are atomically increased, because passing the results in exit values is nightmare that cannot be done correclty. > 2) Only update result counters in library process and main test > process because the exit status of child can be reported by > main test process. Actually after I spend some time on it I think that the best solution is to update the results in the piece of shared memory as fast as possible, anything else is prone to various races and corner cases. > 3) Print TCONF message and increase skipped when calling tst_brk(TCONF). > Print TBROK message and increase broken when calling tst_brk(TBROK). > Print TFAIL message and increase failed when calling tst_brk(TFAIL). > 4) Remove duplicate update_results() in run_tcases_per_fs(). I've been thinking about this and the problem is more complex, and I'm even not sure that it's possible to write the library so that the counters are consistent at the time we exit the test if something unexpected happened and we called tst_brk(). Consider for instance this example: #include "tst_test.h" static void do_test(void) { if (!SAFE_FORK()) tst_brk(TBROK, "child"); tst_brk(TBROK, "parent"); } static struct tst_test test = { .test_all = do_test, .forks_child = 1, }; When tst_brk() is called both in parent and child the counter would be incremented only once because the child is not waited for by the main test. We can close this special case by changing the main test pid to wait for the children before it calls exit() in the tst_brk() but that may cause the main process to get stuck undefinitely if the child processes get stuck, so we would have to be careful. Also from the very definition of the TBROK return status the test results would be incomplete at best, since TBROK really means "unrecoverable error happened during the test" which would mostly means that something as low level as filesystem got corrupted and there is no point in presenting the results in that case, so I guess that the best we could do in the case of TBROK is to print big message that says "things went horribly wrong!" or something similar. All in all I would like to avoid applying patches to the test library before we finalize the release, since there is not much time for testing now.
----- Original Message ----- > Hi! > > 1) Catch and report the TFAIL exit status of child process. > > Looking at the codebase we do have a few usages of tst_brk(TFAIL, "...") > to exit the child process, which sort of works but it's incorrect. The > tst_brk() always meant "unrecoverable failure have happened, exit the > current process as fast as possible". Looking over our codebase most of > the tst_brk(TFAIL, "...") should not actually cause the main test > process to exit, these were only meant to exit the child and report the > result in one call. It will for instance break the test with -i option > on the first failure, which is incorrect. Nice example, would you care to add that to docs? > > So if we ever want to have a function to exit child process with a result we > should implement tst_ret() that would be equivalent to tst_res() followed by > exit(0). > > It could be even implemented as: > > #define tst_ret(ttype, fmt, ...) \ > do { \ > tst_res_(__FILE__, __LINE__, (ttype), (fmt), ##__VA_ARGS__); \ > exit(0); \ > } while (0) > > This function has one big advantage, it increments the results counters > before the child process exits. If all call-sites switch to tst_ret(), we could add TFAIL to tst_brk compile time check. > > Actually one of the big points of the new test library was that the > results counters are atomically increased, because passing the results > in exit values is nightmare that cannot be done correclty. > > > 2) Only update result counters in library process and main test > > process because the exit status of child can be reported by > > main test process. > > Actually after I spend some time on it I think that the best solution is > to update the results in the piece of shared memory as fast as possible, > anything else is prone to various races and corner cases. I was thinking this too. If your parent process happens to wait for the child itself, then library will never get to see retcode. Regards, Jan > > > 3) Print TCONF message and increase skipped when calling tst_brk(TCONF). > > Print TBROK message and increase broken when calling tst_brk(TBROK). > > Print TFAIL message and increase failed when calling tst_brk(TFAIL). > > 4) Remove duplicate update_results() in run_tcases_per_fs(). > > I've been thinking about this and the problem is more complex, and I'm > even not sure that it's possible to write the library so that the > counters are consistent at the time we exit the test if something > unexpected happened and we called tst_brk(). > > Consider for instance this example: > > #include "tst_test.h" > > static void do_test(void) > { > if (!SAFE_FORK()) > tst_brk(TBROK, "child"); > tst_brk(TBROK, "parent"); > } > > static struct tst_test test = { > .test_all = do_test, > .forks_child = 1, > }; > > When tst_brk() is called both in parent and child the counter would be > incremented only once because the child is not waited for by the main > test. > > We can close this special case by changing the main test pid to wait for the > children before it calls exit() in the tst_brk() but that may cause the > main process to get stuck undefinitely if the child processes get stuck, > so we would have to be careful. > > Also from the very definition of the TBROK return status the test > results would be incomplete at best, since TBROK really means > "unrecoverable error happened during the test" which would mostly means > that something as low level as filesystem got corrupted and there is no > point in presenting the results in that case, so I guess that the best > we could do in the case of TBROK is to print big message that says > "things went horribly wrong!" or something similar. > > All in all I would like to avoid applying patches to the test library > before we finalize the release, since there is not much time for > testing now. > > -- > Cyril Hrubis > chrubis@suse.cz >
Hi! > > Looking at the codebase we do have a few usages of tst_brk(TFAIL, "...") > > to exit the child process, which sort of works but it's incorrect. The > > tst_brk() always meant "unrecoverable failure have happened, exit the > > current process as fast as possible". Looking over our codebase most of > > the tst_brk(TFAIL, "...") should not actually cause the main test > > process to exit, these were only meant to exit the child and report the > > result in one call. It will for instance break the test with -i option > > on the first failure, which is incorrect. > > Nice example, would you care to add that to docs? Good idea. > > So if we ever want to have a function to exit child process with a result we > > should implement tst_ret() that would be equivalent to tst_res() followed by > > exit(0). > > > > It could be even implemented as: > > > > #define tst_ret(ttype, fmt, ...) \ > > do { \ > > tst_res_(__FILE__, __LINE__, (ttype), (fmt), ##__VA_ARGS__); \ > > exit(0); \ > > } while (0) > > > > This function has one big advantage, it increments the results counters > > before the child process exits. > > If all call-sites switch to tst_ret(), we could add TFAIL to tst_brk > compile time check. Actually I've started to work on that and all call sites can either be converted to tst_ret(TFAIL, ...) or tst_brk(TBROK, ...). So it really looks like we should go this way. Also I guess that this may be even safe enough to go in before the release. I will try to finish and post it tomorrow.
On 2019/01/07 23:06, Cyril Hrubis wrote: > Hi! >> 1) Catch and report the TFAIL exit status of child process. > Looking at the codebase we do have a few usages of tst_brk(TFAIL, "...") > to exit the child process, which sort of works but it's incorrect. The > tst_brk() always meant "unrecoverable failure have happened, exit the > current process as fast as possible". Looking over our codebase most of > the tst_brk(TFAIL, "...") should not actually cause the main test > process to exit, these were only meant to exit the child and report the > result in one call. It will for instance break the test with -i option > on the first failure, which is incorrect. Hi Cyril, Detailed explanation, and i got it. > So if we ever want to have a function to exit child process with a result we > should implement tst_ret() that would be equivalent to tst_res() followed by > exit(0). > > It could be even implemented as: > > #define tst_ret(ttype, fmt, ...) \ > do { \ > tst_res_(__FILE__, __LINE__, (ttype), (fmt), ##__VA_ARGS__); \ > exit(0); \ > } while (0) > > This function has one big advantage, it increments the results counters > before the child process exits. > > Actually one of the big points of the new test library was that the > results counters are atomically increased, because passing the results > in exit values is nightmare that cannot be done correclty. Agreed. All of tst_brk(TFAIL, ...) can be converted to tst_ret(TFAIL, ...) or tst_brk(TBROK, ...) in this way and then add TFAIL to tst_brk compile time check as Jan replied, so that only TCONF and TBROK can be passed into tst_brk(). >> 2) Only update result counters in library process and main test >> process because the exit status of child can be reported by >> main test process. > Actually after I spend some time on it I think that the best solution is > to update the results in the piece of shared memory as fast as possible, > anything else is prone to various races and corner cases. ... >> 3) Print TCONF message and increase skipped when calling tst_brk(TCONF). >> Print TBROK message and increase broken when calling tst_brk(TBROK). >> Print TFAIL message and increase failed when calling tst_brk(TFAIL). >> 4) Remove duplicate update_results() in run_tcases_per_fs(). > I've been thinking about this and the problem is more complex, and I'm > even not sure that it's possible to write the library so that the > counters are consistent at the time we exit the test if something > unexpected happened and we called tst_brk(). > > Consider for instance this example: > > #include "tst_test.h" > > static void do_test(void) > { > if (!SAFE_FORK()) > tst_brk(TBROK, "child"); > tst_brk(TBROK, "parent"); > } > > static struct tst_test test = { > .test_all = do_test, > .forks_child = 1, > }; > > When tst_brk() is called both in parent and child the counter would be > incremented only once because the child is not waited for by the main > test. > > We can close this special case by changing the main test pid to wait for the > children before it calls exit() in the tst_brk() but that may cause the > main process to get stuck undefinitely if the child processes get stuck, > so we would have to be careful. > > Also from the very definition of the TBROK return status the test > results would be incomplete at best, since TBROK really means > "unrecoverable error happened during the test" which would mostly means > that something as low level as filesystem got corrupted and there is no > point in presenting the results in that case, so I guess that the best > we could do in the case of TBROK is to print big message that says > "things went horribly wrong!" or something similar. Sorry, my patch is too rough becasue some suitations are not taken into account. For tst_brk(TCONF), do you mean to replace the current solution using wait() in check_child_status() with your suggested shared memory? For tst_brk(TBROK), do you mean to just print big message instead of updating test results? > All in all I would like to avoid applying patches to the test library > before we finalize the release, since there is not much time for > testing now. Agreed, drop these patches during the upcoming release. We still need to do future investigation and testing. Best Regards, Xiao Yang
Hi! > > So if we ever want to have a function to exit child process with a result we > > should implement tst_ret() that would be equivalent to tst_res() followed by > > exit(0). > > > > It could be even implemented as: > > > > #define tst_ret(ttype, fmt, ...) \ > > do { \ > > tst_res_(__FILE__, __LINE__, (ttype), (fmt), ##__VA_ARGS__); \ > > exit(0); \ > > } while (0) > > > > This function has one big advantage, it increments the results counters > > before the child process exits. > > If all call-sites switch to tst_ret(), we could add TFAIL to tst_brk > compile time check. And it's even more complicated than this. The problem is that if we add tst_ret() that calls exit(0) it will have slightly different semantics for the main test process and for the children. The subtle difference is that we run the loop that executes the tests in the case of the -i parameter or in the case that we defined tcnt is in the main test process. If we call tst_ret() there it will exit the test before we manage to run rest of the iterations and I would like to avoid adding an API that is slightly confusing easy to misuse. I will try to rethink this, however it looks like there is no silver bullet to solve the problem. And it's certainly late for the release. Also I've opened an issue for the problem in order not to forget: https://github.com/linux-test-project/ltp/issues/462
diff --git a/lib/tst_test.c b/lib/tst_test.c index 661fbbf..e46ab8e 100644 --- a/lib/tst_test.c +++ b/lib/tst_test.c @@ -55,6 +55,7 @@ struct results { int skipped; int failed; int warnings; + int broken; unsigned int timeout; }; @@ -177,6 +178,9 @@ static void update_results(int ttype) case TFAIL: tst_atomic_inc(&results->failed); break; + case TBROK: + tst_atomic_inc(&results->broken); + break; } } @@ -305,11 +309,15 @@ void tst_vbrk_(const char *file, const int lineno, int ttype, * specified but CLONE_THREAD is not. Use direct syscall to avoid * cleanup running in the child. */ - if (syscall(SYS_getpid) == main_pid) + if (syscall(SYS_getpid) == main_pid) { + update_results(TTYPE_RESULT(ttype)); do_test_cleanup(); + } - if (getpid() == lib_pid) + if (getpid() == lib_pid) { + update_results(TTYPE_RESULT(ttype)); do_exit(TTYPE_RESULT(ttype)); + } exit(TTYPE_RESULT(ttype)); } @@ -350,6 +358,7 @@ static void check_child_status(pid_t pid, int status) switch (ret) { case TPASS: break; + case TFAIL: case TBROK: case TCONF: tst_brk(ret, "Reported by child (%i)", pid); @@ -591,6 +600,7 @@ static void do_exit(int ret) printf("failed %d\n", results->failed); printf("skipped %d\n", results->skipped); printf("warnings %d\n", results->warnings); + printf("broken %d\n", results->broken); if (results->passed && ret == TCONF) ret = 0; @@ -603,6 +613,9 @@ static void do_exit(int ret) if (results->warnings) ret |= TWARN; + + if (results->broken) + ret |= TBROK; } do_cleanup(); @@ -1155,12 +1168,7 @@ static int run_tcases_per_fs(void) mntpoint_mounted = 0; } - if (ret == TCONF) { - update_results(ret); - continue; - } - - if (ret == 0) + if (ret == TCONF || ret == 0) continue; do_exit(ret);
1) Catch and report the TFAIL exit status of child process. 2) Only update result counters in library process and main test process because the exit status of child can be reported by main test process. 3) Print TCONF message and increase skipped when calling tst_brk(TCONF). Print TBROK message and increase broken when calling tst_brk(TBROK). Print TFAIL message and increase failed when calling tst_brk(TFAIL). 4) Remove duplicate update_results() in run_tcases_per_fs(). Fixes: #408 Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> --- lib/tst_test.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-)