Message ID | 20190321234836.11774-1-mfo@canonical.com |
---|---|
Headers | show |
Series | Fix for LP#1821259 (pending patches for) Fix for deadlock in cpu_stopper | expand |
On 2019-03-21 20:48:34 , Mauricio Faria de Oliveira wrote: > BugLink: https://bugs.launchpad.net/bugs/1821259 > > Bionic only needs 2 of the 4 patches submitted for Xenial. > All patches are applied / not needed on Cosmic and later. > > [Impact] > > * This problem hard locks up 2 CPUs in a deadlock, and this > soft locks up other CPUs as an effect; the system becomes > unusable. > > * This is relatively rare / difficult to hit because it's a > corner case in scheduling/load balancing that needs timing > with CPU stopper code. And it needs SMP plus _NUMA_ system. > (but it can be hit with synthetic test case attached in LP.) > > * Since SMP plus NUMA usually equals _servers_ it looks like > a good idea to prevent this bug / hard lockups / rebooting. > > * The fix resolves the potential deadlock by removing one of > the calls required to deadlock from under the locked code. > > [Test Case] > > * There's a synthetic test case to reproduce this problem > (although without the stack traces - just a system hang) > attached to this LP bug. > > * It uses kprobes/mdelay/cpu stopper calls to force the code > to execute and force the timing/locking condition to occur. > > * $ sudo insmod kmod-stopper.ko > > Some dmesg logging occurs, and systems either hangs or not. > See examples in comments. > > [Regression Potential] > > * These are patches to the cpu stop_machine.c code, and they > change a bit how it works; however, there are no upstream > fixes for these patches anymore and they are still the top > of the 'git log --oneline -- kernel/stop_machine.c' output. > > * These patches have been verified with the synthetic test case > and 'stress-ng --class scheduler --sequential 0' (no regressions) > on guest with 2 CPUs and one physical system with 24 CPUs. > > [Other Info] > > * The patches are required on Xenial and later. > * There are 4 patches for Xenial, and 2 patches pending for Bionic. > * All patches are applied from Cosmic onwards. > > Isaac J. Manjarres (1): > stop_machine: Disable preemption after queueing stopper threads > > Prasad Sodagudi (1): > stop_machine: Atomically queue and wake stopper threads > > kernel/stop_machine.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
On 2019-03-21 20:48:34 , Mauricio Faria de Oliveira wrote: > BugLink: https://bugs.launchpad.net/bugs/1821259 > > Bionic only needs 2 of the 4 patches submitted for Xenial. > All patches are applied / not needed on Cosmic and later. > > [Impact] > > * This problem hard locks up 2 CPUs in a deadlock, and this > soft locks up other CPUs as an effect; the system becomes > unusable. > > * This is relatively rare / difficult to hit because it's a > corner case in scheduling/load balancing that needs timing > with CPU stopper code. And it needs SMP plus _NUMA_ system. > (but it can be hit with synthetic test case attached in LP.) > > * Since SMP plus NUMA usually equals _servers_ it looks like > a good idea to prevent this bug / hard lockups / rebooting. > > * The fix resolves the potential deadlock by removing one of > the calls required to deadlock from under the locked code. > > [Test Case] > > * There's a synthetic test case to reproduce this problem > (although without the stack traces - just a system hang) > attached to this LP bug. > > * It uses kprobes/mdelay/cpu stopper calls to force the code > to execute and force the timing/locking condition to occur. > > * $ sudo insmod kmod-stopper.ko > > Some dmesg logging occurs, and systems either hangs or not. > See examples in comments. > > [Regression Potential] > > * These are patches to the cpu stop_machine.c code, and they > change a bit how it works; however, there are no upstream > fixes for these patches anymore and they are still the top > of the 'git log --oneline -- kernel/stop_machine.c' output. > > * These patches have been verified with the synthetic test case > and 'stress-ng --class scheduler --sequential 0' (no regressions) > on guest with 2 CPUs and one physical system with 24 CPUs. > > [Other Info] > > * The patches are required on Xenial and later. > * There are 4 patches for Xenial, and 2 patches pending for Bionic. > * All patches are applied from Cosmic onwards. > > Isaac J. Manjarres (1): > stop_machine: Disable preemption after queueing stopper threads > > Prasad Sodagudi (1): > stop_machine: Atomically queue and wake stopper threads > > kernel/stop_machine.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > -- > 2.17.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team