Message ID | 20210115143246.369676-1-egorenar@linux.ibm.com |
---|---|
State | Accepted |
Headers | show |
Series | [1/1] swapping01: make test more robust | expand |
On Fri, Jan 15, 2021 at 10:33 PM Alexander Egorenkov <egorenar@linux.ibm.com> wrote: > Increase time waiting for swap memory to settle before > performing checks with occupied swap memory. This decreases the chance > of the test to fail in the case of the swap memory taking longer > than expected to settle. > > This fixes sporadic failures observed on IBM s390x systems such as this: > > swapping01.c:149: TPASS: no heavy swapping detected, 4534 MB swapped. > swapping01.c:106: TINFO: available physical memory: 7348 MB > swapping01.c:109: TINFO: try to allocate: 9552 MB > swapping01.c:112: TINFO: memory allocated: 9552 MB > swapping01.c:140: TINFO: swap free init: 25019 MB > swapping01.c:141: TINFO: swap free now: 20484 MB > swapping01.c:149: TPASS: no heavy swapping detected, 4534 MB swapped. > swapping01.c:106: TINFO: available physical memory: 7348 MB > swapping01.c:109: TINFO: try to allocate: 9553 MB > swapping01.c:112: TINFO: memory allocated: 9553 MB > swapping01.c:140: TINFO: swap free init: 25018 MB > swapping01.c:141: TINFO: swap free now: 17694 MB > swapping01.c:149: TPASS: no heavy swapping detected, 7323 MB swapped. > swapping01.c:106: TINFO: available physical memory: 7318 MB > swapping01.c:109: TINFO: try to allocate: 9514 MB > swapping01.c:112: TINFO: memory allocated: 9514 MB > swapping01.c:140: TINFO: swap free init: 25019 MB > swapping01.c:141: TINFO: swap free now: 17697 MB > swapping01.c:145: TFAIL: heavy swapping detected: 7322 MB swapped. > > Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> > --- > testcases/kernel/mem/swapping/swapping01.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/testcases/kernel/mem/swapping/swapping01.c > b/testcases/kernel/mem/swapping/swapping01.c > index 24b8313f3..befb52e91 100644 > --- a/testcases/kernel/mem/swapping/swapping01.c > +++ b/testcases/kernel/mem/swapping/swapping01.c > @@ -127,10 +127,10 @@ static void check_swapping(void) > > /* Still occupying memory, loop for a while */ > i = 0; > - while (i < 10) { > + while (i < 30) { > swap_free_now = SAFE_READ_MEMINFO("SwapFree:"); > sleep(1); > - if (labs(swap_free_now - SAFE_READ_MEMINFO("SwapFree:")) < > 512) > + if (labs(swap_free_now - SAFE_READ_MEMINFO("SwapFree:")) < > 10) > From test error above, if the system's swap-memory taking longer than expected to settle, shouldn't we more even get a swapped consumption (=swap_free_init - swap_free_now) less than mem_over_max? I say this because am imagining the system's SwapFree keeps decreasing during the while loop, as you extend the waiting time then we might get a larger swapped value at the final, which makes the test more easily to FAIL. And the purpose is to detect heavy swapping on s390x but not guarantee running with a purified system, there also possible other daemons have a side effect on system swaping, so I slightly suppose dropping caches before testing may be helpful. Another thing I wondering, do we really need to loop 5 times for this kind of stress testing? Guess one time is enough for testing. mm:swapping01 swapping01 -i 5 > break; > > i++; > -- > 2.26.2 > > > -- > Mailing list info: https://lists.linux.it/listinfo/ltp > >
Hi Alexander, Alexander Egorenkov <egorenar@linux.ibm.com> wrote: > ... > swapping01.c:106: TINFO: available physical memory:* 7348 MB* > swapping01.c:109: TINFO: try to allocate: 9552 MB > swapping01.c:112: TINFO: memory allocated: 9552 MB > swapping01.c:140: TINFO: swap free init: 25019 MB > swapping01.c:141: TINFO: swap free now: 20484 MB > swapping01.c:149: TPASS: no heavy swapping detected, *4534 MB swapped.* > ... > swapping01.c:106: TINFO: available physical memory:* 7318 MB* > swapping01.c:109: TINFO: try to allocate: 9514 MB > swapping01.c:112: TINFO: memory allocated: 9514 MB > swapping01.c:140: TINFO: swap free init: 25019 MB > swapping01.c:141: TINFO: swap free now: 17697 MB > swapping01.c:145: TFAIL: heavy swapping detected:* 7322 MB swapped.* > Looking at the memory status from the output, it only detects available memory 7318MB in the fourth time (< 7348MB, second/third time). But memory heavy swapped in the third and fourth time(7323MB, 7322MB swapped). Obviously, the last two tests influenced by the first-three running. How reproducible of this FAIL? And, Could you help verify whether the following cleanup works? (Also, provide /proc/meminfo or kernel version will help to locate the issues). --- a/testcases/kernel/mem/swapping/swapping01.c +++ b/testcases/kernel/mem/swapping/swapping01.c @@ -81,6 +81,7 @@ static void test_swapping(void) static void init_meminfo(void) { + SAFE_FILE_PRINTF("/proc/sys/vm/drop_caches", "3"); swap_free_init = SAFE_READ_MEMINFO("SwapFree:"); if (FILE_LINES_SCANF("/proc/meminfo", "MemAvailable: %ld", &mem_available_init)) {
Li Wang <liwang@redhat.com> writes: > > Looking at the memory status from the output, it only detects available > memory 7318MB in the fourth time (< 7348MB, second/third time). > But memory heavy swapped in the third and fourth time(7323MB, 7322MB > swapped). > > Obviously, the last two tests influenced by the first-three running. > > How reproducible of this FAIL? And, Could you help verify whether the > following cleanup works? > (Also, provide /proc/meminfo or kernel version will help to locate the > issues). > > --- a/testcases/kernel/mem/swapping/swapping01.c > +++ b/testcases/kernel/mem/swapping/swapping01.c > @@ -81,6 +81,7 @@ static void test_swapping(void) > > static void init_meminfo(void) > { > + SAFE_FILE_PRINTF("/proc/sys/vm/drop_caches", "3"); > swap_free_init = SAFE_READ_MEMINFO("SwapFree:"); > if (FILE_LINES_SCANF("/proc/meminfo", "MemAvailable: %ld", > &mem_available_init)) { > Thanks for looking into it. I usually need no more than 100 iterations to reproduce a FAIL, but on our CI system it frequently fails immediately, that's why i started tweaking the test. Output from today's CI: tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s swapping01.c:106: TINFO: available physical memory: 5472 MB swapping01.c:109: TINFO: try to allocate: 7114 MB swapping01.c:112: TINFO: memory allocated: 7114 MB swapping01.c:143: TFAIL: heavy swapping detected: 5502 MB swapped. Summary: passed 0 failed 1 skipped 0 warnings 0 With my "fix"" i couldn't reproduce the problem even with more than 1000 iterations but i admit my "fix"" was a lucky shot :( cat /proc/meminfo MemTotal: 8172412 kB MemFree: 87304 kB MemAvailable: 95244 kB Buffers: 1392 kB Cached: 31792 kB SwapCached: 2052 kB Active: 21244 kB Inactive: 6587284 kB Active(anon): 2956 kB Inactive(anon): 6581060 kB Active(file): 18288 kB Inactive(file): 6224 kB Unevictable: 19624 kB Mlocked: 19624 kB SwapTotal: 25720840 kB SwapFree: 25531728 kB Dirty: 136 kB Writeback: 0 kB AnonPages: 6596012 kB Mapped: 31976 kB Shmem: 16 kB KReclaimable: 117076 kB Slab: 284640 kB SReclaimable: 117076 kB SUnreclaim: 167564 kB KernelStack: 10160 kB PageTables: 17900 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 29807044 kB Committed_AS: 9102816 kB VmallocTotal: 534773760 kB VmallocUsed: 171424 kB VmallocChunk: 0 kB Percpu: 261120 kB AnonHugePages: 3451904 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1024 kB Hugetlb: 0 kB DirectMap4k: 6144 kB DirectMap1M: 2091008 kB DirectMap2G: 6291456 kB kernel version: 5.11.0-20210118.rc3.git0.b3a3cbdec55b I'm going to test your patch today extensively and then report back. Regards Alex
Li Wang <liwang@redhat.com> writes: > Hi Alexander, > > Alexander Egorenkov <egorenar@linux.ibm.com> wrote: > >> ... >> swapping01.c:106: TINFO: available physical memory:* 7348 MB* >> swapping01.c:109: TINFO: try to allocate: 9552 MB >> swapping01.c:112: TINFO: memory allocated: 9552 MB >> swapping01.c:140: TINFO: swap free init: 25019 MB >> swapping01.c:141: TINFO: swap free now: 20484 MB >> swapping01.c:149: TPASS: no heavy swapping detected, *4534 MB swapped.* >> ... >> swapping01.c:106: TINFO: available physical memory:* 7318 MB* >> swapping01.c:109: TINFO: try to allocate: 9514 MB >> swapping01.c:112: TINFO: memory allocated: 9514 MB >> swapping01.c:140: TINFO: swap free init: 25019 MB >> swapping01.c:141: TINFO: swap free now: 17697 MB >> swapping01.c:145: TFAIL: heavy swapping detected:* 7322 MB swapped.* >> > > Looking at the memory status from the output, it only detects available > memory 7318MB in the fourth time (< 7348MB, second/third time). > But memory heavy swapped in the third and fourth time(7323MB, 7322MB > swapped). > > Obviously, the last two tests influenced by the first-three running. > > How reproducible of this FAIL? And, Could you help verify whether the > following cleanup works? > (Also, provide /proc/meminfo or kernel version will help to locate the > issues). > > --- a/testcases/kernel/mem/swapping/swapping01.c > +++ b/testcases/kernel/mem/swapping/swapping01.c > @@ -81,6 +81,7 @@ static void test_swapping(void) > > static void init_meminfo(void) > { > + SAFE_FILE_PRINTF("/proc/sys/vm/drop_caches", "3"); > swap_free_init = SAFE_READ_MEMINFO("SwapFree:"); > if (FILE_LINES_SCANF("/proc/meminfo", "MemAvailable: %ld", > &mem_available_init)) { > > -- > Regards, > Li Wang > > This Message Is From an External Sender > This message came from outside your organization. > Hi Alexander, > > Alexander Egorenkov <egorenar@linux.ibm.com> wrote: > > ... > swapping01.c:106: TINFO: available physical memory: 7348 MB > swapping01.c:109: TINFO: try to allocate: 9552 MB > swapping01.c:112: TINFO: memory allocated: 9552 MB > swapping01.c:140: TINFO: swap free init: 25019 MB > swapping01.c:141: TINFO: swap free now: 20484 MB > swapping01.c:149: TPASS: no heavy swapping detected, 4534 MB swapped. > ... > swapping01.c:106: TINFO: available physical memory: 7318 MB > swapping01.c:109: TINFO: try to allocate: 9514 MB > swapping01.c:112: TINFO: memory allocated: 9514 MB > swapping01.c:140: TINFO: swap free init: 25019 MB > swapping01.c:141: TINFO: swap free now: 17697 MB > swapping01.c:145: TFAIL: heavy swapping detected: 7322 MB swapped. > > Looking at the memory status from the output, it only detects available > memory 7318MB in the fourth time (< 7348MB, second/third time). > But memory heavy swapped in the third and fourth time(7323MB, 7322MB swapped). > > Obviously, the last two tests influenced by the first-three running. > > How reproducible of this FAIL? And, Could you help verify whether the following cleanup works? > (Also, provide /proc/meminfo or kernel version will help to locate the issues). > > --- a/testcases/kernel/mem/swapping/swapping01.c > +++ b/testcases/kernel/mem/swapping/swapping01.c > @@ -81,6 +81,7 @@ static void test_swapping(void) > > static void init_meminfo(void) > { > + SAFE_FILE_PRINTF("/proc/sys/vm/drop_caches", "3"); > swap_free_init = SAFE_READ_MEMINFO("SwapFree:"); > if (FILE_LINES_SCANF("/proc/meminfo", "MemAvailable: %ld", > &mem_available_init)) { > > -- > Regards, > Li Wang Hi, i did more tests as promised. Your patch didn't help :( Still seeing failures. With your patch: ---------------- swapping01.c:148: TPASS: no heavy swapping detected, 7282 MB swapped. swapping01.c:107: TINFO: available physical memory: 7303 MB swapping01.c:110: TINFO: try to allocate: 9494 MB swapping01.c:113: TINFO: memory allocated: 9494 MB swapping01.c:148: TPASS: no heavy swapping detected, 6204 MB swapped. swapping01.c:107: TINFO: available physical memory: 7299 MB swapping01.c:110: TINFO: try to allocate: 9488 MB swapping01.c:113: TINFO: memory allocated: 9488 MB swapping01.c:144: TFAIL: heavy swapping detected: 9490 MB swapped. Summary: passed 27 failed 1 skipped 0 warnings 0 W/O your patch: --------------- swapping01.c:147: TPASS: no heavy swapping detected, 2229 MB swapped. swapping01.c:106: TINFO: available physical memory: 7311 MB swapping01.c:109: TINFO: try to allocate: 9504 MB swapping01.c:112: TINFO: memory allocated: 9504 MB swapping01.c:147: TPASS: no heavy swapping detected, 2249 MB swapped. swapping01.c:106: TINFO: available physical memory: 7306 MB swapping01.c:109: TINFO: try to allocate: 9498 MB swapping01.c:112: TINFO: memory allocated: 9498 MB swapping01.c:143: TFAIL: heavy swapping detected: 9496 MB swapped. Summary: passed 23 failed 1 skipped 0 warnings 0 I can easily reproduce the problem inside an LPAR with 8GB RAM and 20GB swap. But could *NOT* reproduce it inside a zVM w/o any patches, might be fluke or some other unknown factors probably. My patch seems to work very well, couldn't reproduce the problem with it applied :/ cat /proc/meminfo MemTotal: 8167132 kB MemFree: 7379376 kB MemAvailable: 7396252 kB Buffers: 14952 kB Cached: 55280 kB SwapCached: 3772 kB Active: 42584 kB Inactive: 58776 kB Active(anon): 2872 kB Inactive(anon): 36224 kB Active(file): 39712 kB Inactive(file): 22552 kB Unevictable: 19248 kB Mlocked: 19248 kB SwapTotal: 25053176 kB SwapFree: 24916556 kB Dirty: 16 kB Writeback: 0 kB AnonPages: 48352 kB Mapped: 33416 kB Shmem: 16 kB KReclaimable: 78404 kB Slab: 195456 kB SReclaimable: 78404 kB SUnreclaim: 117052 kB KernelStack: 12576 kB PageTables: 4612 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 29136740 kB Committed_AS: 1004504 kB VmallocTotal: 534773760 kB VmallocUsed: 171464 kB VmallocChunk: 0 kB Percpu: 167200 kB AnonHugePages: 11264 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1024 kB Hugetlb: 0 kB DirectMap4k: 5120 kB DirectMap1M: 4189184 kB DirectMap2G: 4194304 kB uname -a Fedora 33, 5.11.0-20210118.rc3.git0.0bf60767445f.300.fc33.s390x Regards Alex
Hi Alexander, Li, First I found quite hard to have correct setup to avoid TCONF "Not enough swap space to test". It often requires to add additional swap file via mkswap && swapon and use some RAM with: dd if=/dev/zero of=/dev/shm/fill bs=1k count=14024k It'd be great if test could do required setup itself. And it'd also help at least to print swap_free_init and mem_over_max in that TCONF: if (swap_free_init < mem_over_max) - tst_brk(TCONF, "Not enough swap space to test."); + tst_brk(TCONF, "Not enough swap space to test (%ld < %ld)", swap_free_init, mem_over_max); I tried to debug swapping01 on s390x (without this patch) with using additional swap and eat RAM with dd. and got bus error (core dumped). Is that due too aggressive memory eating? I'm not able to crash it on intel arch. # strace ./swapping01 execve("./swapping01", ["./swapping01"], 0x3ffdd9fdd30 /* 163 vars */) = 0 brk(NULL) = 0x18ce000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3ffb41fc000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=51982, ...}) = 0 mmap(NULL, 51982, PROT_READ, MAP_PRIVATE, 3, 0) = 0x3ffb4100000 close(3) = 0 openat(AT_FDCWD, "/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\3\0\26\0\0\0\1\0\0\0\0\0\0\205\370"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=160168, ...}) = 0 mmap(NULL, 143720, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3ffb4080000 mprotect(0x3ffb409d000, 4096, PROT_NONE) = 0 mmap(0x3ffb409e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0x3ffb409e000 mmap(0x3ffb40a0000, 12648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3ffb40a0000 close(3) = 0 openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\2\1\3\0\0\0\0\0\0\0\0\0\3\0\26\0\0\0\1\0\0\0\0\0\2\263\250"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2012560, ...}) = 0 mmap(NULL, 1765032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3ffb3e80000 mprotect(0x3ffb4025000, 4096, PROT_NONE) = 0 mmap(0x3ffb4026000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a5000) = 0x3ffb4026000 mmap(0x3ffb402c000, 11944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3ffb402c000 close(3) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3ffb41fa000 mprotect(0x3ffb4026000, 12288, PROT_READ) = 0 mprotect(0x3ffb409e000, 4096, PROT_READ) = 0 mprotect(0x1025000, 4096, PROT_READ) = 0 mprotect(0x3ffb41a7000, 4096, PROT_READ) = 0 munmap(0x3ffb4100000, 51982) = 0 set_tid_address(0x3ffb41fb060) = 124658 set_robust_list(0x3ffb41fb070, 24) = 0 rt_sigaction(SIGRTMIN, {sa_handler=0x3ffb4088050, sa_mask=[], sa_flags=SA_SIGINFO}, NULL, 8) = 0 rt_sigaction(SIGRT_1, {sa_handler=0x3ffb4088108, sa_mask=[], sa_flags=SA_RESTART|SA_SIGINFO}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 getpid() = 124658 geteuid() = 0 openat(AT_FDCWD, "/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3 read(3, "0-1\n", 8192) = 4 close(3) = 0 access("/dev/shm", F_OK) = 0 getpid() = 124658 openat(AT_FDCWD, "/dev/shm/ltp_swapping01_124658", O_RDWR|O_CREAT|O_EXCL, 0600) = 3 chmod("/dev/shm/ltp_swapping01_124658", 0666) = 0 ftruncate(3, 4096) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x3ffb4100000 unlink("/dev/shm/ltp_swapping01_124658") = 0 close(3) = 0 rt_sigaction(SIGALRM, {sa_handler=0x1005a88, sa_mask=[ALRM], sa_flags=SA_RESTART}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 rt_sigaction(SIGUSR1, {sa_handler=0x1005910, sa_mask=[USR1], sa_flags=SA_RESTART}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 --- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x3ffb4100000} --- +++ killed by SIGBUS (core dumped) +++ Bus error (core dumped) $ cat /proc/meminfo MemTotal: 12300520 kB MemFree: 3218768 kB MemAvailable: 5748908 kB Buffers: 6956 kB Cached: 8756848 kB SwapCached: 0 kB Active: 6651856 kB Inactive: 2152212 kB Active(anon): 6198384 kB Inactive(anon): 24904 kB Active(file): 453472 kB Inactive(file): 2127308 kB Unevictable: 18292 kB Mlocked: 18292 kB SwapTotal: 2538488 kB SwapFree: 2538488 kB Dirty: 16 kB Writeback: 0 kB AnonPages: 56544 kB Mapped: 32840 kB Shmem: 6175368 kB KReclaimable: 105912 kB Slab: 156604 kB SReclaimable: 105912 kB SUnreclaim: 50692 kB KernelStack: 6672 kB PageTables: 1976 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 8688748 kB Committed_AS: 6360300 kB VmallocTotal: 132120576 kB VmallocUsed: 49520 kB VmallocChunk: 0 kB Percpu: 67680 kB AnonHugePages: 22528 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1024 kB Hugetlb: 0 kB DirectMap4k: 6144 kB DirectMap1M: 4188160 kB DirectMap2G: 8388608 kB Kind regards, Petr
Hi Alexander, On Wed, Jan 20, 2021 at 3:26 PM Alexander Egorenkov <egorenar@linux.ibm.com> wrote: > ... > > Hi, > > i did more tests as promised. > Your patch didn't help :( Still seeing failures. > > With your patch: > ---------------- > swapping01.c:148: TPASS: no heavy swapping detected, 7282 MB swapped. > swapping01.c:107: TINFO: available physical memory: 7303 MB > swapping01.c:110: TINFO: try to allocate: 9494 MB > swapping01.c:113: TINFO: memory allocated: 9494 MB > swapping01.c:148: TPASS: no heavy swapping detected, 6204 MB swapped. > swapping01.c:107: TINFO: available physical memory: 7299 MB > swapping01.c:110: TINFO: try to allocate: 9488 MB > swapping01.c:113: TINFO: memory allocated: 9488 MB > swapping01.c:144: TFAIL: heavy swapping detected: 9490 MB swapped. > > Summary: > passed 27 > failed 1 > skipped 0 > warnings 0 > > > W/O your patch: > --------------- > swapping01.c:147: TPASS: no heavy swapping detected, 2229 MB swapped. > swapping01.c:106: TINFO: available physical memory: 7311 MB > swapping01.c:109: TINFO: try to allocate: 9504 MB > swapping01.c:112: TINFO: memory allocated: 9504 MB > swapping01.c:147: TPASS: no heavy swapping detected, 2249 MB swapped. > swapping01.c:106: TINFO: available physical memory: 7306 MB > swapping01.c:109: TINFO: try to allocate: 9498 MB > swapping01.c:112: TINFO: memory allocated: 9498 MB > swapping01.c:143: TFAIL: heavy swapping detected: 9496 MB swapped. > > Summary: > passed 23 > failed 1 > skipped 0 > warnings 0 > > > > I can easily reproduce the problem inside an LPAR with 8GB RAM and 20GB > swap. But could *NOT* reproduce it inside a zVM w/o any patches, might be > fluke or some other unknown factors probably. > My patch seems to work very well, couldn't reproduce the problem with it > applied :/ > I'm now a little bit doubt the lawful of swapping01 since I can easily get heavy-swapping[1] when increasing the value of COE_SLIGHT_OVER (which means just to request more available memory). The original purpose of swapping01 seems to verify NO heavy-swapping with slightly over-use available memory, but it doesn't have a standard to evaluate/say how much is slightly over-use(the default set 1.3*phy_mem). But I guess that might influence by different config of system, e.g the system you tried with 8GB RAM and 20GB swap, that swap more likely to be overused. Anyway, these are my personal perspective, maybe wrong, we still need more investigation. [1] the conception of heavy-swapping in this test define by: swapped > mem_available_init -------------------------------- The test FAIL(With and W/O your patch) on an aarch64 platform: (allocate 2*mem_available but not 1.3*mem_available) # free -h total used free shared buff/cache available Mem: 5.4Gi 329Mi 5.0Gi 0.0Ki 152Mi 4.6Gi Swap: 6.0Gi 194Mi 5.8Gi # ./swapping01 tst_test.c:1263: TINFO: Timeout per run is 0h 05m 00s swapping01.c:107: TINFO: available physical memory: 4745 MB swapping01.c:109: TINFO: try to allocate: 9491 MB swapping01.c:112: TINFO: memory allocated: 9491 MB swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6107200, swap_free_now = 1339520, swap_consum(swap_free_init - swap_free_now) = 4767680 swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6107200, swap_free_now = 1196160, swap_consum(swap_free_init - swap_free_now) = 4911040 swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6107200, swap_free_now = 1200256, swap_consum(swap_free_init - swap_free_now) = 4906944 swapping01.c:150: TFAIL: heavy swapping detected: 4791 MB swapped. (Debug-Info shows that 'swap_free_now' dynamically changes not only decrease but also increase during the testing, it looks like the system tries to reclaim some memory and do balance for a while. But the worth saying, no OOM occurred during the above memory overused.)
Hi Petr, Thanks for your debugging. On Wed, Jan 20, 2021 at 8:47 PM Petr Vorel <pvorel@suse.cz> wrote: > Hi Alexander, Li, > > First I found quite hard to have correct setup to avoid TCONF "Not enough > swap > space to test". It often requires to add additional swap file via mkswap > && swapon > and use some RAM with: > > dd if=/dev/zero of=/dev/shm/fill bs=1k count=14024k > > It'd be great if test could do required setup itself. > And it'd also help at least to print swap_free_init and mem_over_max in > that TCONF: > > if (swap_free_init < mem_over_max) > - tst_brk(TCONF, "Not enough swap space to test."); > + tst_brk(TCONF, "Not enough swap space to test (%ld < > %ld)", swap_free_init, mem_over_max); > > > I tried to debug swapping01 on s390x (without this patch) with using > additional > swap and eat RAM with dd. and got bus error (core dumped). Is that due too > aggressive memory eating? I'm not able to crash it on intel arch. > I think it is the correct behavior on your system, noticed that you tried with almost 14G(1k*14024k) on /dev/shm, but the MemFree is only 3218768 kB on your system. Since the size of /dev/shm is limited by excess RAM, so that couldn't provide such a huge ramdisk for your test and get SIGBUS eventually. (you can have a look the size of /dev/shm/fill is far less than 14G, I believe) > > # strace ./swapping01 > execve("./swapping01", ["./swapping01"], 0x3ffdd9fdd30 /* 163 vars */) = 0 > brk(NULL) = 0x18ce000 > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x3ffb41fc000 > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or > directory) > openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 > fstat(3, {st_mode=S_IFREG|0644, st_size=51982, ...}) = 0 > mmap(NULL, 51982, PROT_READ, MAP_PRIVATE, 3, 0) = 0x3ffb4100000 > close(3) = 0 > openat(AT_FDCWD, "/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3 > read(3, > "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\3\0\26\0\0\0\1\0\0\0\0\0\0\205\370"..., > 832) = 832 > fstat(3, {st_mode=S_IFREG|0755, st_size=160168, ...}) = 0 > mmap(NULL, 143720, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = > 0x3ffb4080000 > mprotect(0x3ffb409d000, 4096, PROT_NONE) = 0 > mmap(0x3ffb409e000, 8192, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0x3ffb409e000 > mmap(0x3ffb40a0000, 12648, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3ffb40a0000 > close(3) = 0 > openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 > read(3, > "\177ELF\2\2\1\3\0\0\0\0\0\0\0\0\0\3\0\26\0\0\0\1\0\0\0\0\0\2\263\250"..., > 832) = 832 > fstat(3, {st_mode=S_IFREG|0755, st_size=2012560, ...}) = 0 > mmap(NULL, 1765032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) > = 0x3ffb3e80000 > mprotect(0x3ffb4025000, 4096, PROT_NONE) = 0 > mmap(0x3ffb4026000, 24576, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a5000) = 0x3ffb4026000 > mmap(0x3ffb402c000, 11944, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3ffb402c000 > close(3) = 0 > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x3ffb41fa000 > mprotect(0x3ffb4026000, 12288, PROT_READ) = 0 > mprotect(0x3ffb409e000, 4096, PROT_READ) = 0 > mprotect(0x1025000, 4096, PROT_READ) = 0 > mprotect(0x3ffb41a7000, 4096, PROT_READ) = 0 > munmap(0x3ffb4100000, 51982) = 0 > set_tid_address(0x3ffb41fb060) = 124658 > set_robust_list(0x3ffb41fb070, 24) = 0 > rt_sigaction(SIGRTMIN, {sa_handler=0x3ffb4088050, sa_mask=[], > sa_flags=SA_SIGINFO}, NULL, 8) = 0 > rt_sigaction(SIGRT_1, {sa_handler=0x3ffb4088108, sa_mask=[], > sa_flags=SA_RESTART|SA_SIGINFO}, NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 > prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, > rlim_max=RLIM64_INFINITY}) = 0 > getpid() = 124658 > geteuid() = 0 > openat(AT_FDCWD, "/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3 > read(3, "0-1\n", 8192) = 4 > close(3) = 0 > access("/dev/shm", F_OK) = 0 > getpid() = 124658 > openat(AT_FDCWD, "/dev/shm/ltp_swapping01_124658", O_RDWR|O_CREAT|O_EXCL, > 0600) = 3 > chmod("/dev/shm/ltp_swapping01_124658", 0666) = 0 > ftruncate(3, 4096) = 0 > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x3ffb4100000 > unlink("/dev/shm/ltp_swapping01_124658") = 0 > close(3) = 0 > rt_sigaction(SIGALRM, {sa_handler=0x1005a88, sa_mask=[ALRM], > sa_flags=SA_RESTART}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 > rt_sigaction(SIGUSR1, {sa_handler=0x1005910, sa_mask=[USR1], > sa_flags=SA_RESTART}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 > --- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x3ffb4100000} --- > +++ killed by SIGBUS (core dumped) +++ > Bus error (core dumped) > > $ cat /proc/meminfo > MemTotal: 12300520 kB > MemFree: 3218768 kB > MemAvailable: 5748908 kB > Buffers: 6956 kB > Cached: 8756848 kB > SwapCached: 0 kB > Active: 6651856 kB > Inactive: 2152212 kB > Active(anon): 6198384 kB > Inactive(anon): 24904 kB > Active(file): 453472 kB > Inactive(file): 2127308 kB > Unevictable: 18292 kB > Mlocked: 18292 kB > SwapTotal: 2538488 kB > SwapFree: 2538488 kB > Dirty: 16 kB > Writeback: 0 kB > AnonPages: 56544 kB > Mapped: 32840 kB > Shmem: 6175368 kB > KReclaimable: 105912 kB > Slab: 156604 kB > SReclaimable: 105912 kB > SUnreclaim: 50692 kB > KernelStack: 6672 kB > PageTables: 1976 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 8688748 kB > Committed_AS: 6360300 kB > VmallocTotal: 132120576 kB > VmallocUsed: 49520 kB > VmallocChunk: 0 kB > Percpu: 67680 kB > AnonHugePages: 22528 kB > ShmemHugePages: 0 kB > ShmemPmdMapped: 0 kB > CmaTotal: 0 kB > CmaFree: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 1024 kB > Hugetlb: 0 kB > DirectMap4k: 6144 kB > DirectMap1M: 4188160 kB > DirectMap2G: 8388608 kB > > Kind regards, > Petr > >
Hi Alex, Petr, I'm going to apply Alex's patch after a round of testing[1]. Because giving more time waiting for swap memory to settle is no harm to test, at least help to get rid of failures observed on LPAR so far. But I still think the swapping01 not very precise to some degree. We probably need to rewrite it someday but currently, I have no better idea. [1] Running a stress-ng to eat some available memory in the background, then swapping01 easily hit the heavy-swap issue. This could be proved that it influenced by deamon allocate/free memory during testing. # stress-ng to eat memory after starting swapping01 # ./swapping01 -i 5 tst_test.c:1263: TINFO: Timeout per run is 0h 05m 00s swapping01.c:107: TINFO: available physical memory: 4660 MB swapping01.c:109: TINFO: try to allocate: 6058 MB swapping01.c:112: TINFO: memory allocated: 6058 MB swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6021120, swap_free_now = 4738304, swap_consum(swap_free_init - swap_free_now) = 1282816 swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6021120, swap_free_now = 4746496, swap_consum(swap_free_init - swap_free_now) = 1274624 swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6021120, swap_free_now = 4616640, swap_consum(swap_free_init - swap_free_now) = 1404480 swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6021120, swap_free_now = 4191936, swap_consum(swap_free_init - swap_free_now) = 1829184 swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6021120, swap_free_now = 3339776, swap_consum(swap_free_init - swap_free_now) = 2681344 swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6021120, swap_free_now = 3070336, swap_consum(swap_free_init - swap_free_now) = 2950784 swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6021120, swap_free_now = 2763136, swap_consum(swap_free_init - swap_free_now) = 3257984 swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6021120, swap_free_now = 2377280, swap_consum(swap_free_init - swap_free_now) = 3643840 swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6021120, swap_free_now = 1876544, swap_consum(swap_free_init - swap_free_now) = 4144576 swapping01.c:135: TINFO: DEBUG-INFO: swap_free_init = 6021120, swap_free_now = 1274432, swap_consum(swap_free_init - swap_free_now) = 4746688 swapping01.c:150: TFAIL: heavy swapping detected: 5075 MB swapped.
Hi Petr, Date: Fri, 22 Jan 2021 08:16:41 +0100 Message-ID: <874kj9zbja.fsf@oc8242746057.ibm.com> X-Trend-IP-HD: ip=[9.171.79.240]sender=(egorenar@linux.ibm.com)recipient=<ltp@lists.linux.it;liwang@redhat.com;pvorel@suse.cz> Petr Vorel <pvorel@suse.cz> writes: > Hi Alexander, Li, > > First I found quite hard to have correct setup to avoid TCONF "Not enough swap > space to test". It often requires to add additional swap file via mkswap && swapon > and use some RAM with: > > dd if=/dev/zero of=/dev/shm/fill bs=1k count=14024k > > It'd be great if test could do required setup itself. > And it'd also help at least to print swap_free_init and mem_over_max in that TCONF: > > if (swap_free_init < mem_over_max) > - tst_brk(TCONF, "Not enough swap space to test."); > + tst_brk(TCONF, "Not enough swap space to test (%ld < %ld)", swap_free_init, mem_over_max); > yeah, that was a problem for me too when i was new to this test :/ > > I tried to debug swapping01 on s390x (without this patch) with using additional > swap and eat RAM with dd. and got bus error (core dumped). Is that due too > aggressive memory eating? I'm not able to crash it on intel arch. > that's something new for me. If possible please run it in gdb with debug symbols to find out where exactly this signal is raised. Regards Alex
Hi Alex, > Hi Petr, > Date: Fri, 22 Jan 2021 08:16:41 +0100 > Message-ID: <874kj9zbja.fsf@oc8242746057.ibm.com> > X-Trend-IP-HD: ip=[9.171.79.240]sender=(egorenar@linux.ibm.com)recipient=<ltp@lists.linux.it;liwang@redhat.com;pvorel@suse.cz> > Petr Vorel <pvorel@suse.cz> writes: > > Hi Alexander, Li, > > First I found quite hard to have correct setup to avoid TCONF "Not enough swap > > space to test". It often requires to add additional swap file via mkswap && swapon > > and use some RAM with: > > dd if=/dev/zero of=/dev/shm/fill bs=1k count=14024k > > It'd be great if test could do required setup itself. > > And it'd also help at least to print swap_free_init and mem_over_max in that TCONF: > > if (swap_free_init < mem_over_max) > > - tst_brk(TCONF, "Not enough swap space to test."); > > + tst_brk(TCONF, "Not enough swap space to test (%ld < %ld)", swap_free_init, mem_over_max); > yeah, that was a problem for me too when i was new to this test :/ :) > > I tried to debug swapping01 on s390x (without this patch) with using additional > > swap and eat RAM with dd. and got bus error (core dumped). Is that due too > > aggressive memory eating? I'm not able to crash it on intel arch. > that's something new for me. If possible please run it in gdb with debug symbols > to find out where exactly this signal is raised. I'll do. Kind regards, Petr > Regards > Alex
Hi Li, Alex, > Hi Alex, Petr, > I'm going to apply Alex's patch after a round of testing[1]. > Because giving more time waiting for swap memory to > settle is no harm to test, at least help to get rid of failures > observed on LPAR so far. +1, thanks! Reviewed-by: Petr Vorel <pvorel@suse.cz> > But I still think the swapping01 not very precise to some degree. > We probably need to rewrite it someday but currently, I have no better idea. +1 > [1] Running a stress-ng to eat some available memory in the OK, maybe better way for eating memory. Thanks :). Kind regards, Petr > background, then swapping01 easily hit the heavy-swap issue. > This could be proved that it influenced by deamon allocate/free > memory during testing. ...
Hi Li, Date: Fri, 22 Jan 2021 08:56:45 +0100 Message-ID: <87zh11xv42.fsf@oc8242746057.ibm.com> X-Trend-IP-HD: ip=[9.171.79.240]sender=(egorenar@linux.ibm.com)recipient=<ltp@lists.linux.it;pvorel@suse.cz;liwang@redhat.com> Li Wang <liwang@redhat.com> writes: > Hi Alex, Petr, > > I'm going to apply Alex's patch after a round of testing[1]. > Because giving more time waiting for swap memory to > settle is no harm to test, at least help to get rid of failures > observed on LPAR so far. Great, thank you very much :) I also think it shouldn't make the test behave worse. > > But I still think the swapping01 not very precise to some degree. > We probably need to rewrite it someday but currently, I have no better > idea. 100% agree. It is very fragile. I was thinking maybe we could find out how much swap space is assigned to a *particular* process, maybe something in /proc/ can be used ? > > [1] Running a stress-ng to eat some available memory in the > background, then swapping01 easily hit the heavy-swap issue. > This could be proved that it influenced by deamon allocate/free > memory during testing. > > # stress-ng to eat memory after starting swapping01 > I also observed that with stress-ng. Regards Alex
Hi Alex, Petr, Patch applied, thanks! Alexander Egorenkov <egorenar@linux.ibm.com> wrote: ... > > But I still think the swapping01 not very precise to some degree. > > We probably need to rewrite it someday but currently, I have no better > > idea. > > 100% agree. It is very fragile. I was thinking maybe we could find out > how much swap space is assigned to a *particular* process, maybe something > in /proc/ can be used ? Sounds practicable, we can have a try.
diff --git a/testcases/kernel/mem/swapping/swapping01.c b/testcases/kernel/mem/swapping/swapping01.c index 24b8313f3..befb52e91 100644 --- a/testcases/kernel/mem/swapping/swapping01.c +++ b/testcases/kernel/mem/swapping/swapping01.c @@ -127,10 +127,10 @@ static void check_swapping(void) /* Still occupying memory, loop for a while */ i = 0; - while (i < 10) { + while (i < 30) { swap_free_now = SAFE_READ_MEMINFO("SwapFree:"); sleep(1); - if (labs(swap_free_now - SAFE_READ_MEMINFO("SwapFree:")) < 512) + if (labs(swap_free_now - SAFE_READ_MEMINFO("SwapFree:")) < 10) break; i++;
Increase time waiting for swap memory to settle before performing checks with occupied swap memory. This decreases the chance of the test to fail in the case of the swap memory taking longer than expected to settle. This fixes sporadic failures observed on IBM s390x systems such as this: swapping01.c:149: TPASS: no heavy swapping detected, 4534 MB swapped. swapping01.c:106: TINFO: available physical memory: 7348 MB swapping01.c:109: TINFO: try to allocate: 9552 MB swapping01.c:112: TINFO: memory allocated: 9552 MB swapping01.c:140: TINFO: swap free init: 25019 MB swapping01.c:141: TINFO: swap free now: 20484 MB swapping01.c:149: TPASS: no heavy swapping detected, 4534 MB swapped. swapping01.c:106: TINFO: available physical memory: 7348 MB swapping01.c:109: TINFO: try to allocate: 9553 MB swapping01.c:112: TINFO: memory allocated: 9553 MB swapping01.c:140: TINFO: swap free init: 25018 MB swapping01.c:141: TINFO: swap free now: 17694 MB swapping01.c:149: TPASS: no heavy swapping detected, 7323 MB swapped. swapping01.c:106: TINFO: available physical memory: 7318 MB swapping01.c:109: TINFO: try to allocate: 9514 MB swapping01.c:112: TINFO: memory allocated: 9514 MB swapping01.c:140: TINFO: swap free init: 25019 MB swapping01.c:141: TINFO: swap free now: 17697 MB swapping01.c:145: TFAIL: heavy swapping detected: 7322 MB swapped. Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> --- testcases/kernel/mem/swapping/swapping01.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)