Message ID | 20210519151513.309935-1-andrea.righi@canonical.com |
---|---|
Headers | show |
Series | aws: proper fix for c5.18xlarge hibernation issues | expand |
On Wed, May 19, 2021 at 12:15 PM Andrea Righi <andrea.righi@canonical.com> wrote: > > BugLink: https://bugs.launchpad.net/bugs/1920944 > > [Impact] > > In LP: #1918694 we applied a fix and a workaround to solve the > hibernation issues on c5.18xlarge. The workaround was in the form of a > SAUCE patch: > > "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > > It looks like we can replace this workaround with a proper fix, by > applying this patch: > > http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ > > [Test plan] > > Create a c5.18xlarge instance, run the memory stress test script (the > same test script that we are using to stress test hibernation), trigger > the hibernate event, trigger the resume event. Repeat a couple of times > and the problem is very likely to happen. > > [Fix] > > Replace "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > with: > > http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ > > The fix has been tested extensively in the AWS infrastructure with > positive results. > > [Where problems could occur] > > This new code introduced by the fix can be executed also when a CPU is > put offline, so we may see potential regressions in the KVM CPU > hotplugging. > > ---------------------------------------------------------------- > Changelog (v2 -> v3): > - updated backported / signed-off lines with the right upstream info > (thanks Guilherme!) > > NOTE: backport activity was minimal, it only required some context > adjustments to properly apply the changes. > > Andrea Righi (1): > Revert "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > > Vitaly Kuznetsov (5): > x86/kvm: Fix pr_info() for async PF setup/teardown > x86/kvm: Teardown PV features on boot CPU as well > x86/kvm: Disable kvmclock on all CPUs on shutdown > x86/kvm: Disable all PV features on crash > x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline() > > arch/x86/include/asm/kvm_para.h | 9 ++---- > arch/x86/kernel/kvm.c | 113 ++++++++++++++++++++++++++++++++++++++++++++---------------------- > arch/x86/kernel/kvmclock.c | 28 ++--------------- > 3 files changed, 79 insertions(+), 71 deletions(-) > Thanks a bunch Andrea, looks great to me: Acked-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com> pr_info() exists in focal/linux-aws. I'm curious why you didn't preserve it in patch 2/6 ? On 5/19/21 9:15 AM, Andrea Righi wrote: > BugLink: https://bugs.launchpad.net/bugs/1920944 > > [Impact] > > In LP: #1918694 we applied a fix and a workaround to solve the > hibernation issues on c5.18xlarge. The workaround was in the form of a > SAUCE patch: > > "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > > It looks like we can replace this workaround with a proper fix, by > applying this patch: > > http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ > > [Test plan] > > Create a c5.18xlarge instance, run the memory stress test script (the > same test script that we are using to stress test hibernation), trigger > the hibernate event, trigger the resume event. Repeat a couple of times > and the problem is very likely to happen. > > [Fix] > > Replace "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > with: > > http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ > > The fix has been tested extensively in the AWS infrastructure with > positive results. > > [Where problems could occur] > > This new code introduced by the fix can be executed also when a CPU is > put offline, so we may see potential regressions in the KVM CPU > hotplugging. > > ---------------------------------------------------------------- > Changelog (v2 -> v3): > - updated backported / signed-off lines with the right upstream info > (thanks Guilherme!) > > NOTE: backport activity was minimal, it only required some context > adjustments to properly apply the changes. > > Andrea Righi (1): > Revert "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > > Vitaly Kuznetsov (5): > x86/kvm: Fix pr_info() for async PF setup/teardown > x86/kvm: Teardown PV features on boot CPU as well > x86/kvm: Disable kvmclock on all CPUs on shutdown > x86/kvm: Disable all PV features on crash > x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline() > > arch/x86/include/asm/kvm_para.h | 9 ++---- > arch/x86/kernel/kvm.c | 113 ++++++++++++++++++++++++++++++++++++++++++++---------------------- > arch/x86/kernel/kvmclock.c | 28 ++--------------- > 3 files changed, 79 insertions(+), 71 deletions(-) > >
On Wed, May 19, 2021 at 12:23:22PM -0600, Tim Gardner wrote: > Acked-by: Tim Gardner <tim.gardner@canonical.com> > > pr_info() exists in focal/linux-aws. I'm curious why you didn't preserve it > in patch 2/6 ? Good point, I could have used pr_info(), but the original patch was changing a pr_info() to another pr_info() and the original code has a printk(), so I thought it was more consistent to keep the printk() and change only the text like the original patch does... -Andrea > > On 5/19/21 9:15 AM, Andrea Righi wrote: > > BugLink: https://bugs.launchpad.net/bugs/1920944 > > > > [Impact] > > > > In LP: #1918694 we applied a fix and a workaround to solve the > > hibernation issues on c5.18xlarge. The workaround was in the form of a > > SAUCE patch: > > > > "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > > > > It looks like we can replace this workaround with a proper fix, by > > applying this patch: > > > > http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ > > > > [Test plan] > > > > Create a c5.18xlarge instance, run the memory stress test script (the > > same test script that we are using to stress test hibernation), trigger > > the hibernate event, trigger the resume event. Repeat a couple of times > > and the problem is very likely to happen. > > > > [Fix] > > > > Replace "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > > with: > > > > http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ > > > > The fix has been tested extensively in the AWS infrastructure with > > positive results. > > > > [Where problems could occur] > > > > This new code introduced by the fix can be executed also when a CPU is > > put offline, so we may see potential regressions in the KVM CPU > > hotplugging. > > > > ---------------------------------------------------------------- > > Changelog (v2 -> v3): > > - updated backported / signed-off lines with the right upstream info > > (thanks Guilherme!) > > > > NOTE: backport activity was minimal, it only required some context > > adjustments to properly apply the changes. > > > > Andrea Righi (1): > > Revert "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > > > > Vitaly Kuznetsov (5): > > x86/kvm: Fix pr_info() for async PF setup/teardown > > x86/kvm: Teardown PV features on boot CPU as well > > x86/kvm: Disable kvmclock on all CPUs on shutdown > > x86/kvm: Disable all PV features on crash > > x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline() > > > > arch/x86/include/asm/kvm_para.h | 9 ++---- > > arch/x86/kernel/kvm.c | 113 ++++++++++++++++++++++++++++++++++++++++++++---------------------- > > arch/x86/kernel/kvmclock.c | 28 ++--------------- > > 3 files changed, 79 insertions(+), 71 deletions(-) > > > > > > -- > ----------- > Tim Gardner > Canonical, Inc
applied to F/aws. thank you! -Kelsey On 2021-05-19 17:15:07 , Andrea Righi wrote: > BugLink: https://bugs.launchpad.net/bugs/1920944 > > [Impact] > > In LP: #1918694 we applied a fix and a workaround to solve the > hibernation issues on c5.18xlarge. The workaround was in the form of a > SAUCE patch: > > "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > > It looks like we can replace this workaround with a proper fix, by > applying this patch: > > http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ > > [Test plan] > > Create a c5.18xlarge instance, run the memory stress test script (the > same test script that we are using to stress test hibernation), trigger > the hibernate event, trigger the resume event. Repeat a couple of times > and the problem is very likely to happen. > > [Fix] > > Replace "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > with: > > http://next.patchew.org/Linux/20210414123544.1060604-1-vkuznets@redhat.com/ > > The fix has been tested extensively in the AWS infrastructure with > positive results. > > [Where problems could occur] > > This new code introduced by the fix can be executed also when a CPU is > put offline, so we may see potential regressions in the KVM CPU > hotplugging. > > ---------------------------------------------------------------- > Changelog (v2 -> v3): > - updated backported / signed-off lines with the right upstream info > (thanks Guilherme!) > > NOTE: backport activity was minimal, it only required some context > adjustments to properly apply the changes. > > Andrea Righi (1): > Revert "UBUNTU: SAUCE: aws: kvm: double the size of hv_clock_boot" > > Vitaly Kuznetsov (5): > x86/kvm: Fix pr_info() for async PF setup/teardown > x86/kvm: Teardown PV features on boot CPU as well > x86/kvm: Disable kvmclock on all CPUs on shutdown > x86/kvm: Disable all PV features on crash > x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline() > > arch/x86/include/asm/kvm_para.h | 9 ++---- > arch/x86/kernel/kvm.c | 113 ++++++++++++++++++++++++++++++++++++++++++++---------------------- > arch/x86/kernel/kvmclock.c | 28 ++--------------- > 3 files changed, 79 insertions(+), 71 deletions(-) > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team