Message ID | 20210617070730.7699-1-krzysztof.kozlowski@canonical.com |
---|---|
Headers | show |
Series | controllers/memcg: fixes for newer kernels | expand |
Hello, Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> writes: > Hi, > > This is a resend of previous Github pull: > https://github.com/linux-test-project/ltp/pull/830 LGTM (Reviewed on Github). I doubt tests that expect a precise allocation figure are valid. > > The patches fix several test failures we are hitting since months, see: > https://bugs.launchpad.net/bugs/1829979 > https://bugs.launchpad.net/bugs/1829984 > > Best regards, > Krzysztof > > > Krzysztof Kozlowski (3): > controllers/memcg: accept range of max_usage_in_bytes > controllers/memcg: accept range of usage_in_bytes > controllers/memcg: accept non-zero max_usage_in_bytes after reset > > .../controllers/memcg/functional/memcg_lib.sh | 22 ++++++++++++++----- > .../memcg_max_usage_in_bytes_test.sh | 12 ++++++++-- > .../memcg/functional/memcg_stat_rss.sh | 20 ++++++++--------- > .../memcg/functional/memcg_stat_test.sh | 8 +++---- > .../functional/memcg_usage_in_bytes_test.sh | 10 +++++++-- > 5 files changed, 49 insertions(+), 23 deletions(-) > > -- > 2.27.0
On 17/06/2021 09:07, Krzysztof Kozlowski wrote: > Hi, > > This is a resend of previous Github pull: > https://github.com/linux-test-project/ltp/pull/830 > > The patches fix several test failures we are hitting since months, see: > https://bugs.launchpad.net/bugs/1829979 > https://bugs.launchpad.net/bugs/1829984 > > Best regards, > Krzysztof > > > Krzysztof Kozlowski (3): > controllers/memcg: accept range of max_usage_in_bytes > controllers/memcg: accept range of usage_in_bytes > controllers/memcg: accept non-zero max_usage_in_bytes after reset Hi everyone, The patchset got positive LGTM on the Github. Any further comments for it or can it be applied? Best regards, Krzysztof
On Fri, Jun 25, 2021 at 3:31 AM Krzysztof Kozlowski < krzysztof.kozlowski@canonical.com> wrote: > On 17/06/2021 09:07, Krzysztof Kozlowski wrote: > > Hi, > > > > This is a resend of previous Github pull: > > https://github.com/linux-test-project/ltp/pull/830 > > > > The patches fix several test failures we are hitting since months, see: > > https://bugs.launchpad.net/bugs/1829979 > > https://bugs.launchpad.net/bugs/1829984 > > > > Best regards, > > Krzysztof > > > > > > Krzysztof Kozlowski (3): > > controllers/memcg: accept range of max_usage_in_bytes > > controllers/memcg: accept range of usage_in_bytes > > controllers/memcg: accept non-zero max_usage_in_bytes after reset > > > Hi everyone, > > The patchset got positive LGTM on the Github. Any further comments for > it or can it be applied? > I slightly agree with Richard that we need an explanation/investigation on where the 32*PAGE_SIZE comes from. Otherwise, we are very possible to mask a counter bug if only to make the test happy. Forgive me pour cold water on the method though it looks good in coding:).
On 25/06/2021 10:21, Li Wang wrote: > > > On Fri, Jun 25, 2021 at 3:31 AM Krzysztof Kozlowski > <krzysztof.kozlowski@canonical.com > <mailto:krzysztof.kozlowski@canonical.com>> wrote: > > Hi everyone, > > The patchset got positive LGTM on the Github. Any further comments for > it or can it be applied? > > > I slightly agree with Richard that we need an explanation/investigation > on where the 32*PAGE_SIZE comes from. Otherwise, we are very possible > to mask a counter bug if only to make the test happy. I don't know where 32*PAGE_SIZE comes from and investigation would require effort/time which I don't have. I don't think we mask current bug as this appears in multiple kernels - I tested from v5.4 up to 5.13.0-rc5-next-20210608. It is possible this will mask future bugs but that's life of a project depending on kernel internals, not on API or ABI. The kernel is allowed to change such details any moment because it is neither part of API nor ABI. Therefore you just have to live with inaccurate limits or keep investigating every x-months. For now the test is simply unreliable. Best regards, Krzysztof
Hi, On 6/25/2021 10:21 AM, Li Wang wrote: > On Fri, Jun 25, 2021 at 3:31 AM Krzysztof Kozlowski > <krzysztof.kozlowski@canonical.com > <mailto:krzysztof.kozlowski@canonical.com>> wrote: > > On 17/06/2021 09:07, Krzysztof Kozlowski wrote: > > > The patchset got positive LGTM on the Github. Any further comments for > it or can it be applied? > > > I slightly agree with Richard that we need an explanation/investigation > on where the 32*PAGE_SIZE comes from. Otherwise, we are very possible > to mask a counter bug if only to make the test happy. +1. I think it has something to do with batch processing of counter updates. The memory handling code is heavily batched per cpu and only committed to a global counter after the batch size overflows. At least for the subgroup_charge test I once found that page table cache stored in kmem contributes to the memory counter used in the test. Maybe something in that region was changed in recent kernel version. Maybe you could ask on the kernel mailing list? > > Forgive me pour cold water on the method though it looks good in coding:). I would just suggest to make 32 * PAGESIZE a constant calculated in memcg_lib.sh, instead of calculating (and defining) it everywhere. > > -- > Regards, > Li Wang > Jörg
On 25/06/2021 11:24, Joerg Vehlow wrote: >> Forgive me pour cold water on the method though it looks good in coding:). > I would just suggest to make 32 * PAGESIZE a constant calculated in > memcg_lib.sh, instead of calculating (and defining) it everywhere. I understand that you want to define constant in memcg_lib for a accepted higher bound of some specific memory limit checks? Won't that be extra confusing? Trying to generalize some constant without actually knowing what is it about and to which limits it applies? Now it's this unknown part is quite local and documented in three places. I can move it to bigger scope and pretend it's generic, if you want. Best regards, Krzysztof
On 25/06/2021 10:21, Li Wang wrote: > > > On Fri, Jun 25, 2021 at 3:31 AM Krzysztof Kozlowski > <krzysztof.kozlowski@canonical.com > <mailto:krzysztof.kozlowski@canonical.com>> wrote: > > On 17/06/2021 09:07, Krzysztof Kozlowski wrote: > > Hi, > > > > This is a resend of previous Github pull: > > https://github.com/linux-test-project/ltp/pull/830 > <https://github.com/linux-test-project/ltp/pull/830> > > > > The patches fix several test failures we are hitting since months, > see: > > https://bugs.launchpad.net/bugs/1829979 > <https://bugs.launchpad.net/bugs/1829979> > > https://bugs.launchpad.net/bugs/1829984 > <https://bugs.launchpad.net/bugs/1829984> > > > > Best regards, > > Krzysztof > > > > > > Krzysztof Kozlowski (3): > > controllers/memcg: accept range of max_usage_in_bytes > > controllers/memcg: accept range of usage_in_bytes > > controllers/memcg: accept non-zero max_usage_in_bytes after reset > > > Hi everyone, > > The patchset got positive LGTM on the Github. Any further comments for > it or can it be applied? > > > I slightly agree with Richard that we need an explanation/investigation > on where the 32*PAGE_SIZE comes from. Otherwise, we are very possible > to mask a counter bug if only to make the test happy. On newer v5.11 the max_usage_in_bytes go even above 32 pages up to 34. I got some explanation from Michal Hocko [1] from which one can try to conclude: 1. There is significant caching in memory accounting. Not only charging of cgroup is once per MEMCG_CHARGE_BATCH batch (try_charge()), but also statistics are gathered from per-cpu if threshold exceed MEMCG_CHARGE_BATCH (__mod_memcg_state()). 2. Depending on machine (different amount of CPUs), the memory group charging be less or more accurate. 3. The accuracy of group memory accounting is not considered as important thus test relying on it, will be failing from time to time. I'll send a v3 with significantly increased limits and some explanation. [1] https://lore.kernel.org/linux-mm/85b8a4f9-b9e9-a6ca-5d0c-c1ecb8c11ef3@canonical.com/T/#m6459b3be3a86f5eaf2cfc48dd586b6faf949e440 Best regards, Krzysztof