Message ID | 20191106073621.58738-1-lkml@jv-coder.de |
---|---|
State | Changes Requested |
Headers | show |
Series | memcg_lib/memcg_process: Better synchronization of signal USR1 | expand |
> diff --git a/testcases/kernel/controllers/memcg/functional/memcg_lib.sh b/testcases/kernel/controllers/memcg/functional/memcg_lib.sh > index aadaae4d2..7440e1eee 100755 > --- a/testcases/kernel/controllers/memcg/functional/memcg_lib.sh > +++ b/testcases/kernel/controllers/memcg/functional/memcg_lib.sh > @@ -124,6 +124,7 @@ signal_memcg_process() > local usage_start=$(cat ${path}memory.usage_in_bytes) > > kill -s USR1 $pid 2> /dev/null > + TST_CHECKPOINT_WAIT 1 Actually this does not work like this, because some of the tests trigger the oom killer and TEST_CHECKPOINT_WAIT calling tst_checkpoint uses ROD. Is it ok to directly call tst_checkpoint wait 10000 "1" and ignore the result here? BTW: Is there no such thing like TST_CHECKPOINT in the new shell test library?
Hi Joerg, > > diff --git a/testcases/kernel/controllers/memcg/functional/memcg_lib.sh b/testcases/kernel/controllers/memcg/functional/memcg_lib.sh > > index aadaae4d2..7440e1eee 100755 > > --- a/testcases/kernel/controllers/memcg/functional/memcg_lib.sh > > +++ b/testcases/kernel/controllers/memcg/functional/memcg_lib.sh > > @@ -124,6 +124,7 @@ signal_memcg_process() > > local usage_start=$(cat ${path}memory.usage_in_bytes) > > kill -s USR1 $pid 2> /dev/null > > + TST_CHECKPOINT_WAIT 1 > Actually this does not work like this, because some of the > tests trigger the oom killer and TEST_CHECKPOINT_WAIT calling > tst_checkpoint uses ROD. Is it ok to directly call > tst_checkpoint wait 10000 "1" > and ignore the result here? > BTW: Is there no such thing like TST_CHECKPOINT in the new > shell test library? No, there is no support for TST_CHECKPOINT in shell. To be honest I have no idea how to implement it. It could be done in some form of checking some file content and in a loop and sleep in the meantime (ineffective), but sync between C and shell API is IMHO not possible. Kind regards, Petr
Hi!
> No, there is no support for TST_CHECKPOINT in shell.
Well there is actually. There is a tst_checkpoint helper that calls
tst_reinit() to map the shared page to be used with futexes and then
just calls the corresponding wake or wait function.
And yes it has been used in the cgroup testcases to synchronize C
helper that allocates memory and shell test that executes it.
Hi! > Actually this does not work like this, because some of the > tests trigger the oom killer and TEST_CHECKPOINT_WAIT calling > tst_checkpoint uses ROD. Is it ok to directly call > > tst_checkpoint wait 10000 "1" > > and ignore the result here? Wouldn't that delay the test for too long? The default timeout for checkpoints is probably too big. This problem is quite tricky to get right I guess. Maybe we can watch /proc/[pid]/statm for increase data + stack memory. > BTW: Is there no such thing like TST_CHECKPOINT in the new > shell test library? It does not seem to be there, but these shell functions are just wrappers that do check the tst_checkpoint return value, which would be fairly easy to add.
Hi Cyril, > Hi! >> Actually this does not work like this, because some of the >> tests trigger the oom killer and TEST_CHECKPOINT_WAIT calling >> tst_checkpoint uses ROD. Is it ok to directly call >> >> tst_checkpoint wait 10000 "1" >> >> and ignore the result here? > Wouldn't that delay the test for too long? > > The default timeout for checkpoints is probably too big. > > This problem is quite tricky to get right I guess. Maybe we can watch > /proc/[pid]/statm for increase data + stack memory. The timeout is specified on the command line (the 10000) in ms. We run the test with timeout=1000 now and it works fine. It is simpler than thinking about any other synchronization technique. The additonal wait adds less than 30 for all tests, that use memcg_process. >> BTW: Is there no such thing like TST_CHECKPOINT in the new >> shell test library? > It does not seem to be there, but these shell functions are just > wrappers that do check the tst_checkpoint return value, which would be > fairly easy to add. I just wondered if I didn't see it Jörg
Hi, > > No, there is no support for TST_CHECKPOINT in shell. > Well there is actually. There is a tst_checkpoint helper that calls > tst_reinit() to map the shared page to be used with futexes and then > just calls the corresponding wake or wait function. Great. Joerg, sorry for being wrong, thanks Cyril for correction. > And yes it has been used in the cgroup testcases to synchronize C > helper that allocates memory and shell test that executes it. Kind regards, Petr
Hi! > >> Actually this does not work like this, because some of the > >> tests trigger the oom killer and TEST_CHECKPOINT_WAIT calling > >> tst_checkpoint uses ROD. Is it ok to directly call > >> > >> tst_checkpoint wait 10000 "1" > >> > >> and ignore the result here? > > Wouldn't that delay the test for too long? > > > > The default timeout for checkpoints is probably too big. > > > > This problem is quite tricky to get right I guess. Maybe we can watch > > /proc/[pid]/statm for increase data + stack memory. > The timeout is specified on the command line (the 10000) in ms. Ah, sorry I was blind. > We run the test with timeout=1000 now and it works fine. It is simpler > than thinking about any > other synchronization technique. The additonal wait adds less than 30 > for all tests, that use memcg_process. 30 what? seconds? That is unfortunatelly not acceptable. Actually having a closer look at the code there is a loop that checks every 100ms if: 1) the process is still alive 2) if there was increase in usage_in_bytes in the corresponding cgroup So what is wrong with the original code?
>> We run the test with timeout=1000 now and it works fine. It is simpler >> than thinking about any >> other synchronization technique. The additonal wait adds less than 30 >> for all tests, that use memcg_process. > 30 what? seconds? That is unfortunatelly not acceptable. Yes 30 seconds. Why shouldn't that be not acceptable? It is nothing compared to the runtime of other tests. > > Actually having a closer look at the code there is a loop that checks > every 100ms if: > > 1) the process is still alive > 2) if there was increase in usage_in_bytes in the corresponding cgroup > > So what is wrong with the original code? Please reread the description of my initial post. The problem is the signal race not the check. The checkpoint system prevents the race. There is no way around a solid synchronization.
Hi! > >> We run the test with timeout=1000 now and it works fine. It is simpler > >> than thinking about any > >> other synchronization technique. The additonal wait adds less than 30 > >> for all tests, that use memcg_process. > > 30 what? seconds? That is unfortunatelly not acceptable. > Yes 30 seconds. Why shouldn't that be not acceptable? It is nothing compared > to the runtime of other tests. I have written a blog post that partly applies to this case, see: https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > > Actually having a closer look at the code there is a loop that checks > > every 100ms if: > > > > 1) the process is still alive > > 2) if there was increase in usage_in_bytes in the corresponding cgroup > > > > So what is wrong with the original code? > Please reread the description of my initial post. The problem is the > signal race > not the check. The checkpoint system prevents the race. There is no way > around > a solid synchronization. So the problem is that sometimes the program has not finished handling the first signal and we are sending another, right? I guess that the proper solution would be avoding the signals in the first place. I guess that we can estabilish two-way communication with fifos, which would also mean that we would get notified as fast as the child dies as well.
>> Yes 30 seconds. Why shouldn't that be not acceptable? It is nothing compared >> to the runtime of other tests. > I have written a blog post that partly applies to this case, see: > > https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests I know where you are coming from and it is basically the same as my own opinion. The difference is: When I look at ltp I see a runtime of more than 6 hours, looking at the controller test alone it is more than 4 hours. This puts 30 seconds into a very differenet perspective than looking at only syscall tests. (In the testrun I looked at it is around 13 minutes). That is why I don't care about 30 seconds in this case. > > So the problem is that sometimes the program has not finished handling > the first signal and we are sending another, right? > > I guess that the proper solution would be avoding the signals in the > first place. I guess that we can estabilish two-way communication with > fifos, which would also mean that we would get notified as fast as the > child dies as well. Correct. Using fifos is probably a viable solution, but it would require library work, because otherwise the overhead is way too big. Another thing I can think of is extending tst_checkpoint wait to also watch a process and stop waiting, if that process dies. This would be the simplest way to get good synchronization and get rid of the sleep.
Hi, >> >> So the problem is that sometimes the program has not finished handling >> the first signal and we are sending another, right? >> >> I guess that the proper solution would be avoding the signals in the >> first place. I guess that we can estabilish two-way communication with >> fifos, which would also mean that we would get notified as fast as the >> child dies as well. > Correct. Using fifos is probably a viable solution, but it would > require library work, > because otherwise the overhead is way too big. > Another thing I can think of is extending tst_checkpoint wait to also > watch a process > and stop waiting, if that process dies. This would be the simplest way > to get good > synchronization and get rid of the sleep. > When thinking about this yesterday, I had another idea to fix it without much work: For the test, that expects the process to terminate, just don't call tst_checkpoint at all. We know it will fail there. Would this be acceptable for you? Jörg
Hi! > > I have written a blog post that partly applies to this case, see: > > > > https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests > I know where you are coming from and it is basically the same as my own > opinion. > The difference is: When I look at ltp I see a runtime of more than 6 > hours, looking at the > controller test alone it is more than 4 hours. This puts 30 seconds into > a very differenet > perspective than looking at only syscall tests. (In the testrun I looked > at it is around 13 minutes). > That is why I don't care about 30 seconds in this case. controllers testrun runs for 25 minutes on our servers, it will probably be reduced to 15 minutes in two or three years with next upgrade. The main point is that hardware tends to be faster and faster but any sleep in the tests will not scale and ends up being a problem sooner or later. It also greatly depends on which HW are you running the tests on. > > So the problem is that sometimes the program has not finished handling > > the first signal and we are sending another, right? > > > > I guess that the proper solution would be avoding the signals in the > > first place. I guess that we can estabilish two-way communication with > > fifos, which would also mean that we would get notified as fast as the > > child dies as well. > Correct. Using fifos is probably a viable solution, but it would require > library work, > because otherwise the overhead is way too big. > Another thing I can think of is extending tst_checkpoint wait to also > watch a process > and stop waiting, if that process dies. This would be the simplest way > to get good > synchronization and get rid of the sleep. I'm not sure if we can implement this without introducing another race condition. The only way how to wake up futex from sleep before it timeouts in a race-free way is sending a signal. In this case we should see EINTR. But that would mean that the process that is waking up the futex has to be a child of the process, unless we reparent that process, but all that would be too tricky I guess. If we decide to wake the futex regulary to check if the process is alive we can miss the wake. Well the library tries hard and loops over the wake syscall for a while, but this could still fail on very slow devices under load. But if the timing is unfortunate we may miss more than one wake signal, which would lead to timeout. Timing problems like that can easily arise on VMs with a single CPU on overbookend host.
Hi Cyril, > Hi! >>> I have written a blog post that partly applies to this case, see: >>> >>> https://people.kernel.org/metan/why-sleep-is-almost-never-acceptable-in-tests >> I know where you are coming from and it is basically the same as my own >> opinion. >> The difference is: When I look at ltp I see a runtime of more than 6 >> hours, looking at the >> controller test alone it is more than 4 hours. This puts 30 seconds into >> a very differenet >> perspective than looking at only syscall tests. (In the testrun I looked >> at it is around 13 minutes). >> That is why I don't care about 30 seconds in this case. > controllers testrun runs for 25 minutes on our servers, it will probably > be reduced to 15 minutes in two or three years with next upgrade. The > main point is that hardware tends to be faster and faster but any sleep > in the tests will not scale and ends up being a problem sooner or later. > It also greatly depends on which HW are you running the tests on. Ok in that case it makes sense. >> Correct. Using fifos is probably a viable solution, but it would require >> library work, >> because otherwise the overhead is way too big. >> Another thing I can think of is extending tst_checkpoint wait to also >> watch a process >> and stop waiting, if that process dies. This would be the simplest way >> to get good >> synchronization and get rid of the sleep. > I'm not sure if we can implement this without introducing another race > condition. The only way how to wake up futex from sleep before it > timeouts in a race-free way is sending a signal. In this case we should > see EINTR. But that would mean that the process that is waking up the > futex has to be a child of the process, unless we reparent that process, > but all that would be too tricky I guess. > > If we decide to wake the futex regulary to check if the process is alive > we can miss the wake. Well the library tries hard and loops over the > wake syscall for a while, but this could still fail on very slow > devices under load. But if the timing is unfortunate we may miss more > than one wake signal, which would lead to timeout. Timing problems like > that can easily arise on VMs with a single CPU on overbookend host. Ok, so we are back to fifos. I guess this should be part of the library. I will send a proposal for discussion to the mailing list later or next week. Jörg
diff --git a/testcases/kernel/controllers/memcg/functional/memcg_lib.sh b/testcases/kernel/controllers/memcg/functional/memcg_lib.sh index aadaae4d2..7440e1eee 100755 --- a/testcases/kernel/controllers/memcg/functional/memcg_lib.sh +++ b/testcases/kernel/controllers/memcg/functional/memcg_lib.sh @@ -124,6 +124,7 @@ signal_memcg_process() local usage_start=$(cat ${path}memory.usage_in_bytes) kill -s USR1 $pid 2> /dev/null + TST_CHECKPOINT_WAIT 1 if [ -z "$size" ]; then return diff --git a/testcases/kernel/controllers/memcg/functional/memcg_process.c b/testcases/kernel/controllers/memcg/functional/memcg_process.c index 0e671f566..8e846879d 100644 --- a/testcases/kernel/controllers/memcg/functional/memcg_process.c +++ b/testcases/kernel/controllers/memcg/functional/memcg_process.c @@ -24,7 +24,8 @@ static int fd; static volatile int flag_exit; -static volatile int flag_allocated; +static volatile int flag_do_work; +static int flag_allocated; static int opt_mmap_anon; static int opt_mmap_file; @@ -257,22 +258,7 @@ static void sigint_handler(int __attribute__ ((unused)) signo) */ static void sigusr_handler(int __attribute__ ((unused)) signo) { - if (opt_mmap_anon) - mmap_anon(); - - if (opt_mmap_file) - mmap_file(); - - if (opt_mmap_lock1) - mmap_lock1(); - - if (opt_mmap_lock2) - mmap_lock2(); - - if (opt_shm) - shm(); - - flag_allocated = !flag_allocated; + flag_do_work++; } int main(int argc, char *argv[]) @@ -302,8 +288,30 @@ int main(int argc, char *argv[]) TST_CHECKPOINT_WAKE(0); - while (!flag_exit) + while (!flag_exit) { + if (flag_do_work) { + flag_do_work--; + if (opt_mmap_anon) + mmap_anon(); + + if (opt_mmap_file) + mmap_file(); + + if (opt_mmap_lock1) + mmap_lock1(); + + if (opt_mmap_lock2) + mmap_lock2(); + + if (opt_shm) + shm(); + + flag_allocated = !flag_allocated; + + TST_CHECKPOINT_WAKE(1); + } sleep(1); + } close(fd);