Message ID | 20190415163210.17463-1-mfo@canonical.com |
---|---|
Headers | show |
Series | Fix write()/fsync() deadlock in write_cache_pages() | expand |
On 2019-04-15 13:32:09 , Mauricio Faria de Oliveira wrote: > BugLink: https://bugs.launchpad.net/bugs/1824827 > > [Impact] > > * Tasks of a multi-threaded workload doing write() and fsync() > might deadlock in write_cache_pages(), preventing progress. > > * The fix addresses a corner case in write_cache_pages() on > the range_cyclic implementation which allows the deadlock. > > * Patch: > - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 > ("mm/page-writeback.c: fix range_cyclic writeback vs > writepages deadlock"), present in v4.20-rc1~92^2~19. > > [Test Case] > > * This issue originally has been hit by the 'perforce' (p4d) > tool in a XFS filesystem, but it's difficult/rare to occur. > > * We've written an userspace + kernel module (kprobes-based) > to reproduce this problem and verify the test kernel/patch. > > * The kprobes are strictly tied to particular kernel versions > because of the assembly instruction offsets. We'll provide > updated versions for -updates and -proposed for verification. > > * Steps > (see output examples in comments): > > - Userspace part: > $ gcc -o test test.c -pthread > > - Kernel part: > $ touch Makefile > $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean > $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o modules > > - Shorter hung task timeout and higher console logging level > to notice the deadlocked tasks sooner, and watch progress: > $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs > $ echo 9 | sudo tee /proc/sys/kernel/printk > > - Load module / Run userspace part (logging to kernel log) in XFS: > $ sudo insmod kprobe-test.ko > $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test >/dev/kmsg' > $ sudo rmmod kprobe-test > > You may need to ctrl-z with the original kernel as 'test' doesn't finish. > > - Check kernel log or watch the system console: > $ dmesg > > Check threads in D state. > $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker > > > [Regression Potential] > > * The patch is small but changes core writeback infrastructure, > so there's a chance this may _affect_ some or other behavior > that has not been validated with our regression testing; not > exactly _break_ it. Please note our regression testing. > > * This has been verified with 'xfstests' (not only for XFS fs, > despite its original name), used by major Linux filesystems > for regression testing during development. It's been tested > on systems with 24 and 4 CPUs (to exercise differences in > scalability, parallelism, and workload) and XFS and ext4 > (reporter's environment + Ubuntu's default). > No regressions were observed (the set of failed tests is > the same in each system and tests failed in the same way). > > * This has also been verified with 'iozone' for write intensive > tests, to exercise the writeback mechanism and no errors were > observed. > > * The reporter has been running the test kernel with the patch > for weeks and has not observed any other issues/regressions. > > [Other Info] > > * This is only required in Cosmic (for the Bionic HWE kernel), > and is already applied in Disco. > > Dave Chinner (1): > mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock > > mm/page-writeback.c | 33 +++++++++++++++------------------ > 1 file changed, 15 insertions(+), 18 deletions(-) > > -- > 2.17.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team