Message ID | 20201007211640.60573-1-kelsey.skunberg@canonical.com |
---|---|
Headers | show |
Series | Fix kdump Over Network | expand |
On 07.10.20 23:16, Kelsey Skunberg wrote: > BugLink: https://bugs.launchpad.net/bugs/1883261 > > [Impact] > > Microsoft would like to request two kdump related fixes in all releases > supported on Azure. The two commits are: > > c81992e7f4aa1 ("PCI: hv: Retry PCI bus D0 entry on invalid device > state") > 83cc3508ffaa6 ("PCI: hv: Fix the PCI HyperV probe failure path > to release resource properly") > > These are in the virtual PCI driver for Hyper-V. The customer visible > symptom is that the network is not functional in the kdump kernel, so > the dump file must be stored on the local disk and cannot be written > over the network. > > The problem only occurs when Accelerated Networking is enabled. It’s a > relatively obscure scenario, which is why the problem has not surfaced > before now. But we have an important customer who wants the > “dump-file-over-the-network” functionality to work. > > For bionic/linux-azure-4.15, the following additional patch needs to be > backported first to allow the requested patches to apply cleanly: > > a8e37506e79a ("PCI: hv: Reorganize the code in preparation of > hibernation") > > [Test Case] > > - Apply requested patches and boot into updated kernel > - Verify Accelerated Networking is enabled > - Set up kdump > - configure kdump to use SSH > - Test the crash dump mechanism and verify the kernel crash dump appears > on the selected remote server > > Further details for setting up kdump through testing can be found here: > https://ubuntu.com/server/docs/kernel-crash-dump > > [Regression Potential] > > Patches are only targeted to azure kernels. > > Patches are desgiend to release allocated resources remaining after > error cases in hv_pci_probe() or PCI devices not being shut down > properly. if those resources are still not correctly released, then > entering D0 state in kdump kernel could continue to fail. > > Potential for finding regression with freeing resources or still failing to > enter D0 state in the kdump kernel even after all resources have been > released. > > Build & boot tested. Verified kdump works as intended over SSH after > patches are applied. > > Both 5.4 and 4.15 test kernels were sent to Microsoft. Both kernels > signed off on and verified to resolve problem. > > > Changes for Bionic/linux-azure-4.15: > > > Dexuan Cui (1): > PCI: hv: Reorganize the code in preparation of hibernation > > Wei Hu (2): > PCI: hv: Fix the PCI HyperV probe failure path to release resource > properly > PCI: hv: Retry PCI bus D0 entry on invalid device state > > drivers/pci/host/pci-hyperv.c | 101 +++++++++++++++++++++++++++------- > 1 file changed, 81 insertions(+), 20 deletions(-) > > > Changes for Focal/linux-azure: > > Wei Hu (2): > PCI: hv: Fix the PCI HyperV probe failure path to release resource > properly > PCI: hv: Retry PCI bus D0 entry on invalid device state > > drivers/pci/controller/pci-hyperv.c | 60 ++++++++++++++++++++++++++--- > 1 file changed, 54 insertions(+), 6 deletions(-) > > -- > 2.25.1 > Acked-by: Stefan Bader <stefan.bader@canonical.com>
On 07/10/2020 22:16, Kelsey Skunberg wrote: > BugLink: https://bugs.launchpad.net/bugs/1883261 > > [Impact] > > Microsoft would like to request two kdump related fixes in all releases > supported on Azure. The two commits are: > > c81992e7f4aa1 ("PCI: hv: Retry PCI bus D0 entry on invalid device > state") > 83cc3508ffaa6 ("PCI: hv: Fix the PCI HyperV probe failure path > to release resource properly") > > These are in the virtual PCI driver for Hyper-V. The customer visible > symptom is that the network is not functional in the kdump kernel, so > the dump file must be stored on the local disk and cannot be written > over the network. > > The problem only occurs when Accelerated Networking is enabled. It’s a > relatively obscure scenario, which is why the problem has not surfaced > before now. But we have an important customer who wants the > “dump-file-over-the-network” functionality to work. > > For bionic/linux-azure-4.15, the following additional patch needs to be > backported first to allow the requested patches to apply cleanly: > > a8e37506e79a ("PCI: hv: Reorganize the code in preparation of > hibernation") > > [Test Case] > > - Apply requested patches and boot into updated kernel > - Verify Accelerated Networking is enabled > - Set up kdump > - configure kdump to use SSH > - Test the crash dump mechanism and verify the kernel crash dump appears > on the selected remote server > > Further details for setting up kdump through testing can be found here: > https://ubuntu.com/server/docs/kernel-crash-dump > > [Regression Potential] > > Patches are only targeted to azure kernels. > > Patches are desgiend to release allocated resources remaining after > error cases in hv_pci_probe() or PCI devices not being shut down > properly. if those resources are still not correctly released, then > entering D0 state in kdump kernel could continue to fail. > > Potential for finding regression with freeing resources or still failing to > enter D0 state in the kdump kernel even after all resources have been > released. > > Build & boot tested. Verified kdump works as intended over SSH after > patches are applied. > > Both 5.4 and 4.15 test kernels were sent to Microsoft. Both kernels > signed off on and verified to resolve problem. > > > Changes for Bionic/linux-azure-4.15: > > > Dexuan Cui (1): > PCI: hv: Reorganize the code in preparation of hibernation > > Wei Hu (2): > PCI: hv: Fix the PCI HyperV probe failure path to release resource > properly > PCI: hv: Retry PCI bus D0 entry on invalid device state > > drivers/pci/host/pci-hyperv.c | 101 +++++++++++++++++++++++++++------- > 1 file changed, 81 insertions(+), 20 deletions(-) > > > Changes for Focal/linux-azure: > > Wei Hu (2): > PCI: hv: Fix the PCI HyperV probe failure path to release resource > properly > PCI: hv: Retry PCI bus D0 entry on invalid device state > > drivers/pci/controller/pci-hyperv.c | 60 ++++++++++++++++++++++++++--- > 1 file changed, 54 insertions(+), 6 deletions(-) > > -- > 2.25.1 > Thanks Kelsey; backports look good to me, good test case and results, I think the regression potential vs benefit looks sane, so.. Acked-by: Colin Ian King <colin.king@canonical.com>
Applied to Bionic/azure-4.15-next Thanks! Ian On 2020-10-07 15:16:35 , Kelsey Skunberg wrote: > BugLink: https://bugs.launchpad.net/bugs/1883261 > > [Impact] > > Microsoft would like to request two kdump related fixes in all releases > supported on Azure. The two commits are: > > c81992e7f4aa1 ("PCI: hv: Retry PCI bus D0 entry on invalid device > state") > 83cc3508ffaa6 ("PCI: hv: Fix the PCI HyperV probe failure path > to release resource properly") > > These are in the virtual PCI driver for Hyper-V. The customer visible > symptom is that the network is not functional in the kdump kernel, so > the dump file must be stored on the local disk and cannot be written > over the network. > > The problem only occurs when Accelerated Networking is enabled. It’s a > relatively obscure scenario, which is why the problem has not surfaced > before now. But we have an important customer who wants the > “dump-file-over-the-network” functionality to work. > > For bionic/linux-azure-4.15, the following additional patch needs to be > backported first to allow the requested patches to apply cleanly: > > a8e37506e79a ("PCI: hv: Reorganize the code in preparation of > hibernation") > > [Test Case] > > - Apply requested patches and boot into updated kernel > - Verify Accelerated Networking is enabled > - Set up kdump > - configure kdump to use SSH > - Test the crash dump mechanism and verify the kernel crash dump appears > on the selected remote server > > Further details for setting up kdump through testing can be found here: > https://ubuntu.com/server/docs/kernel-crash-dump > > [Regression Potential] > > Patches are only targeted to azure kernels. > > Patches are desgiend to release allocated resources remaining after > error cases in hv_pci_probe() or PCI devices not being shut down > properly. if those resources are still not correctly released, then > entering D0 state in kdump kernel could continue to fail. > > Potential for finding regression with freeing resources or still failing to > enter D0 state in the kdump kernel even after all resources have been > released. > > Build & boot tested. Verified kdump works as intended over SSH after > patches are applied. > > Both 5.4 and 4.15 test kernels were sent to Microsoft. Both kernels > signed off on and verified to resolve problem. > > > Changes for Bionic/linux-azure-4.15: > > > Dexuan Cui (1): > PCI: hv: Reorganize the code in preparation of hibernation > > Wei Hu (2): > PCI: hv: Fix the PCI HyperV probe failure path to release resource > properly > PCI: hv: Retry PCI bus D0 entry on invalid device state > > drivers/pci/host/pci-hyperv.c | 101 +++++++++++++++++++++++++++------- > 1 file changed, 81 insertions(+), 20 deletions(-) > > > Changes for Focal/linux-azure: > > Wei Hu (2): > PCI: hv: Fix the PCI HyperV probe failure path to release resource > properly > PCI: hv: Retry PCI bus D0 entry on invalid device state > > drivers/pci/controller/pci-hyperv.c | 60 ++++++++++++++++++++++++++--- > 1 file changed, 54 insertions(+), 6 deletions(-) > > -- > 2.25.1 > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
Applied to Focal/azure Thanks, Ian On 2020-10-07 15:16:35 , Kelsey Skunberg wrote: > BugLink: https://bugs.launchpad.net/bugs/1883261 > > [Impact] > > Microsoft would like to request two kdump related fixes in all releases > supported on Azure. The two commits are: > > c81992e7f4aa1 ("PCI: hv: Retry PCI bus D0 entry on invalid device > state") > 83cc3508ffaa6 ("PCI: hv: Fix the PCI HyperV probe failure path > to release resource properly") > > These are in the virtual PCI driver for Hyper-V. The customer visible > symptom is that the network is not functional in the kdump kernel, so > the dump file must be stored on the local disk and cannot be written > over the network. > > The problem only occurs when Accelerated Networking is enabled. It’s a > relatively obscure scenario, which is why the problem has not surfaced > before now. But we have an important customer who wants the > “dump-file-over-the-network” functionality to work. > > For bionic/linux-azure-4.15, the following additional patch needs to be > backported first to allow the requested patches to apply cleanly: > > a8e37506e79a ("PCI: hv: Reorganize the code in preparation of > hibernation") > > [Test Case] > > - Apply requested patches and boot into updated kernel > - Verify Accelerated Networking is enabled > - Set up kdump > - configure kdump to use SSH > - Test the crash dump mechanism and verify the kernel crash dump appears > on the selected remote server > > Further details for setting up kdump through testing can be found here: > https://ubuntu.com/server/docs/kernel-crash-dump > > [Regression Potential] > > Patches are only targeted to azure kernels. > > Patches are desgiend to release allocated resources remaining after > error cases in hv_pci_probe() or PCI devices not being shut down > properly. if those resources are still not correctly released, then > entering D0 state in kdump kernel could continue to fail. > > Potential for finding regression with freeing resources or still failing to > enter D0 state in the kdump kernel even after all resources have been > released. > > Build & boot tested. Verified kdump works as intended over SSH after > patches are applied. > > Both 5.4 and 4.15 test kernels were sent to Microsoft. Both kernels > signed off on and verified to resolve problem. > > > Changes for Bionic/linux-azure-4.15: > > > Dexuan Cui (1): > PCI: hv: Reorganize the code in preparation of hibernation > > Wei Hu (2): > PCI: hv: Fix the PCI HyperV probe failure path to release resource > properly > PCI: hv: Retry PCI bus D0 entry on invalid device state > > drivers/pci/host/pci-hyperv.c | 101 +++++++++++++++++++++++++++------- > 1 file changed, 81 insertions(+), 20 deletions(-) > > > Changes for Focal/linux-azure: > > Wei Hu (2): > PCI: hv: Fix the PCI HyperV probe failure path to release resource > properly > PCI: hv: Retry PCI bus D0 entry on invalid device state > > drivers/pci/controller/pci-hyperv.c | 60 ++++++++++++++++++++++++++--- > 1 file changed, 54 insertions(+), 6 deletions(-) > > -- > 2.25.1 > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team