mbox

[PULL,SRU,Zesty] Fix KVM hang on ThunderX systems

Message ID CALdTtnuAzSjaiyd=DXVr+8_O8pZjfGD-8RppBPPK_h87yi-x5g@mail.gmail.com
State New
Headers show

Pull-request

git://git.launchpad.net/~dannf/ubuntu/+source/linux/+git/linux lp1673564-zesty

Message

dann frazier Aug. 3, 2017, 7:30 p.m. UTC
On Thu, Aug 3, 2017 at 1:28 PM, dann frazier <dann.frazier@canonical.com> wrote:
> Here's a backport of the KVM hang fix for ThunderX systems for zesty.
>
> BugLink: https://bugs.launchpad.net/bugs/1673564
>
> Here's the description, cut & pasted from my earlier artful PR:
> ---------
> There's a pretty nasty errata for ThunderX SoCs in which a guest can
> cause interrupts to be disabled on the host kernel. The symptoms vary,
> but it's easy to reproduce running a bunch of parallel VM start/stop
> loops.
>
> There was quite a bit of backporting required for the patches in this
> series. For such patches, I've described the changes required in a
> comment above my S-o-B. They are mostly mechanical transformations to
> revert macro cleanups that occurred post-4.11.
> ---------
>
> I've previously submitted versions for unstable[*] and artful[**]. I
> mentioned in the artful PR that, while my artful backport applied
> cleanly to zesty, I was seeing crashes there. Using bisection, I found
> this to be resolved by a v4.11 patch that disables stack-protector
> when compiling EL2 code on arm64. That patch is included in this
> series. Note: there's also an arm32 patch for this that I did *not*
> include, because I don't have a way to test it. However, it is tagged
> for stable, so you might want to consider pulling it as well:
>
>   501ad27c arm: KVM: Do not use stack-protector to compile HYP code
>
> I've had this running my reproducer for several hours on 2 systems - a
> ThunderX system, which requires this workaround, and a Qualcomm
> QDF2400 system that does not, with no issues.
>
> [*] https://lists.ubuntu.com/archives/kernel-team/2017-July/085587.html
> [**] https://lists.ubuntu.com/archives/kernel-team/2017-July/085810.html

Sorry, hit send too fast - here's the PR:

The following changes since commit df5efd131d1638b29092b674fee531f69efce29d:

  arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling (2017-08-03
15:06:44 -0300)

are available in the git repository at:

  git://git.launchpad.net/~dannf/ubuntu/+source/linux/+git/linux lp1673564-zesty

for you to fetch changes up to aaea3c0a269cc36b71a53369babd4caef0edb761:

  KVM: arm64: Log an error if trapping a write-to-read-only GICv3
access (2017-08-03 13:04:06 -0600)

----------------------------------------------------------------
Christoffer Dall (1):
      KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction

David Daney (2):
      arm64: Add MIDR values for Cavium cn83XX SoCs
      arm64: Add workaround for Cavium Thunder erratum 30115

Marc Zyngier (28):
      arm64: KVM: Do not use stack-protector to compile EL2 code
      KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of
ICH_APxRn_EL2 registers
      arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
      KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
      KVM: arm64: Make kvm_condition_valid32() accessible from EL2
      KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
      KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
      KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
      KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
      KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
      KVM: arm64: vgic-v3: Add misc Group-0 handlers
      KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
      KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
      KVM: arm64: vgic-v3: Add ICV_DIR_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_RPR_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_CTLR_EL1 handler
      KVM: arm64: vgic-v3: Add ICV_PMR_EL1 handler
      KVM: arm64: Enable GICv3 common sysreg trapping via command-line
      KVM: arm64: vgic-v3: Log which GICv3 system registers are trapped
      arm64: KVM: Make unexpected reads from WO registers inject an undef
      KVM: arm64: Log an error if trapping a read-from-write-only GICv3 access
      KVM: arm64: Log an error if trapping a write-to-read-only GICv3 access

Vijaya Kumar K (1):
      irqchip/gic-v3: Add missing system register definitions

dann frazier (1):
      UBUNTU: [Config] CONFIG_CAVIUM_ERRATUM_30115=y

 Documentation/admin-guide/kernel-parameters.txt |  12 +
 Documentation/arm64/silicon-errata.txt          |   1 +
 arch/arm64/Kconfig                              |  11 +
 arch/arm64/include/asm/arch_gicv3.h             |  10 +-
 arch/arm64/include/asm/cpucaps.h                |   3 +-
 arch/arm64/include/asm/cputype.h                |   2 +
 arch/arm64/include/asm/esr.h                    |  24 +
 arch/arm64/include/asm/kvm_hyp.h                |   1 +
 arch/arm64/kernel/cpu_errata.c                  |  21 +
 arch/arm64/kvm/hyp/Makefile                     |   2 +
 arch/arm64/kvm/hyp/switch.c                     |  14 +
 arch/arm64/kvm/sys_regs.c                       |  48 +-
 arch/arm64/kvm/sys_regs.h                       |  18 -
 debian.master/config/config.common.ubuntu       |   1 +
 include/kvm/arm_vgic.h                          |   1 +
 include/linux/irqchip/arm-gic-v3.h              |  49 +-
 virt/kvm/arm/aarch32.c                          |   2 +-
 virt/kvm/arm/hyp/vgic-v3-sr.c                   | 851 +++++++++++++++++++++++-
 virt/kvm/arm/vgic/vgic-v3.c                     |  45 ++
 19 files changed, 1066 insertions(+), 50 deletions(-)

Comments

Stefan Bader Aug. 4, 2017, 7:55 a.m. UTC | #1
On 03.08.2017 21:30, dann frazier wrote:
> On Thu, Aug 3, 2017 at 1:28 PM, dann frazier <dann.frazier@canonical.com> wrote:
>> Here's a backport of the KVM hang fix for ThunderX systems for zesty.
>>
>> BugLink: https://bugs.launchpad.net/bugs/1673564
>>
>> Here's the description, cut & pasted from my earlier artful PR:
>> ---------
>> There's a pretty nasty errata for ThunderX SoCs in which a guest can
>> cause interrupts to be disabled on the host kernel. The symptoms vary,
>> but it's easy to reproduce running a bunch of parallel VM start/stop
>> loops.
>>
>> There was quite a bit of backporting required for the patches in this
>> series. For such patches, I've described the changes required in a
>> comment above my S-o-B. They are mostly mechanical transformations to
>> revert macro cleanups that occurred post-4.11.
>> ---------
>>
>> I've previously submitted versions for unstable[*] and artful[**]. I
>> mentioned in the artful PR that, while my artful backport applied
>> cleanly to zesty, I was seeing crashes there. Using bisection, I found
>> this to be resolved by a v4.11 patch that disables stack-protector
>> when compiling EL2 code on arm64. That patch is included in this
>> series. Note: there's also an arm32 patch for this that I did *not*
>> include, because I don't have a way to test it. However, it is tagged
>> for stable, so you might want to consider pulling it as well:
>>
>>   501ad27c arm: KVM: Do not use stack-protector to compile HYP code
>>
>> I've had this running my reproducer for several hours on 2 systems - a
>> ThunderX system, which requires this workaround, and a Qualcomm
>> QDF2400 system that does not, with no issues.
>>
>> [*] https://lists.ubuntu.com/archives/kernel-team/2017-July/085587.html
>> [**] https://lists.ubuntu.com/archives/kernel-team/2017-July/085810.html
> 
> Sorry, hit send too fast - here's the PR:
> 
> The following changes since commit df5efd131d1638b29092b674fee531f69efce29d:
> 
>   arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling (2017-08-03
> 15:06:44 -0300)
> 
> are available in the git repository at:
> 
>   git://git.launchpad.net/~dannf/ubuntu/+source/linux/+git/linux lp1673564-zesty
> 
> for you to fetch changes up to aaea3c0a269cc36b71a53369babd4caef0edb761:
> 
>   KVM: arm64: Log an error if trapping a write-to-read-only GICv3
> access (2017-08-03 13:04:06 -0600)
> 
> ----------------------------------------------------------------
> Christoffer Dall (1):
>       KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
> 
> David Daney (2):
>       arm64: Add MIDR values for Cavium cn83XX SoCs
>       arm64: Add workaround for Cavium Thunder erratum 30115
> 
> Marc Zyngier (28):
>       arm64: KVM: Do not use stack-protector to compile EL2 code
>       KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of
> ICH_APxRn_EL2 registers
>       arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
>       KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
>       KVM: arm64: Make kvm_condition_valid32() accessible from EL2
>       KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
>       KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
>       KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
>       KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
>       KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
>       KVM: arm64: vgic-v3: Add misc Group-0 handlers
>       KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
>       KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
>       KVM: arm64: vgic-v3: Add ICV_DIR_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_RPR_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_CTLR_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_PMR_EL1 handler
>       KVM: arm64: Enable GICv3 common sysreg trapping via command-line
>       KVM: arm64: vgic-v3: Log which GICv3 system registers are trapped
>       arm64: KVM: Make unexpected reads from WO registers inject an undef
>       KVM: arm64: Log an error if trapping a read-from-write-only GICv3 access
>       KVM: arm64: Log an error if trapping a write-to-read-only GICv3 access
> 
> Vijaya Kumar K (1):
>       irqchip/gic-v3: Add missing system register definitions
> 
> dann frazier (1):
>       UBUNTU: [Config] CONFIG_CAVIUM_ERRATUM_30115=y
> 
>  Documentation/admin-guide/kernel-parameters.txt |  12 +
>  Documentation/arm64/silicon-errata.txt          |   1 +
>  arch/arm64/Kconfig                              |  11 +
>  arch/arm64/include/asm/arch_gicv3.h             |  10 +-
>  arch/arm64/include/asm/cpucaps.h                |   3 +-
>  arch/arm64/include/asm/cputype.h                |   2 +
>  arch/arm64/include/asm/esr.h                    |  24 +
>  arch/arm64/include/asm/kvm_hyp.h                |   1 +
>  arch/arm64/kernel/cpu_errata.c                  |  21 +
>  arch/arm64/kvm/hyp/Makefile                     |   2 +
>  arch/arm64/kvm/hyp/switch.c                     |  14 +
>  arch/arm64/kvm/sys_regs.c                       |  48 +-
>  arch/arm64/kvm/sys_regs.h                       |  18 -
>  debian.master/config/config.common.ubuntu       |   1 +
>  include/kvm/arm_vgic.h                          |   1 +
>  include/linux/irqchip/arm-gic-v3.h              |  49 +-
>  virt/kvm/arm/aarch32.c                          |   2 +-
>  virt/kvm/arm/hyp/vgic-v3-sr.c                   | 851 +++++++++++++++++++++++-
>  virt/kvm/arm/vgic/vgic-v3.c                     |  45 ++
>  19 files changed, 1066 insertions(+), 50 deletions(-)
> 
Acked-by: Stefan Bader <stefan.bader@canonical.com>

Normally my feeling for such huge dumps is rather bad. But at least limited to
arm and we already made that into an ugly abomination...

-Stefan
Kleber Sacilotto de Souza Aug. 8, 2017, 10:44 a.m. UTC | #2
On 08/03/17 21:30, dann frazier wrote:
> On Thu, Aug 3, 2017 at 1:28 PM, dann frazier <dann.frazier@canonical.com> wrote:
>> Here's a backport of the KVM hang fix for ThunderX systems for zesty.
>>
>> BugLink: https://bugs.launchpad.net/bugs/1673564
>>
>> Here's the description, cut & pasted from my earlier artful PR:
>> ---------
>> There's a pretty nasty errata for ThunderX SoCs in which a guest can
>> cause interrupts to be disabled on the host kernel. The symptoms vary,
>> but it's easy to reproduce running a bunch of parallel VM start/stop
>> loops.
>>
>> There was quite a bit of backporting required for the patches in this
>> series. For such patches, I've described the changes required in a
>> comment above my S-o-B. They are mostly mechanical transformations to
>> revert macro cleanups that occurred post-4.11.
>> ---------
>>
>> I've previously submitted versions for unstable[*] and artful[**]. I
>> mentioned in the artful PR that, while my artful backport applied
>> cleanly to zesty, I was seeing crashes there. Using bisection, I found
>> this to be resolved by a v4.11 patch that disables stack-protector
>> when compiling EL2 code on arm64. That patch is included in this
>> series. Note: there's also an arm32 patch for this that I did *not*
>> include, because I don't have a way to test it. However, it is tagged
>> for stable, so you might want to consider pulling it as well:
>>
>>   501ad27c arm: KVM: Do not use stack-protector to compile HYP code
>>
>> I've had this running my reproducer for several hours on 2 systems - a
>> ThunderX system, which requires this workaround, and a Qualcomm
>> QDF2400 system that does not, with no issues.
>>
>> [*] https://lists.ubuntu.com/archives/kernel-team/2017-July/085587.html
>> [**] https://lists.ubuntu.com/archives/kernel-team/2017-July/085810.html
> 
> Sorry, hit send too fast - here's the PR:
> 
> The following changes since commit df5efd131d1638b29092b674fee531f69efce29d:
> 
>   arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling (2017-08-03
> 15:06:44 -0300)
> 
> are available in the git repository at:
> 
>   git://git.launchpad.net/~dannf/ubuntu/+source/linux/+git/linux lp1673564-zesty
> 
> for you to fetch changes up to aaea3c0a269cc36b71a53369babd4caef0edb761:
> 
>   KVM: arm64: Log an error if trapping a write-to-read-only GICv3
> access (2017-08-03 13:04:06 -0600)
> 
> ----------------------------------------------------------------
> Christoffer Dall (1):
>       KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
> 
> David Daney (2):
>       arm64: Add MIDR values for Cavium cn83XX SoCs
>       arm64: Add workaround for Cavium Thunder erratum 30115
> 
> Marc Zyngier (28):
>       arm64: KVM: Do not use stack-protector to compile EL2 code
>       KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of
> ICH_APxRn_EL2 registers
>       arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
>       KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
>       KVM: arm64: Make kvm_condition_valid32() accessible from EL2
>       KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
>       KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
>       KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
>       KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
>       KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
>       KVM: arm64: vgic-v3: Add misc Group-0 handlers
>       KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
>       KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
>       KVM: arm64: vgic-v3: Add ICV_DIR_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_RPR_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_CTLR_EL1 handler
>       KVM: arm64: vgic-v3: Add ICV_PMR_EL1 handler
>       KVM: arm64: Enable GICv3 common sysreg trapping via command-line
>       KVM: arm64: vgic-v3: Log which GICv3 system registers are trapped
>       arm64: KVM: Make unexpected reads from WO registers inject an undef
>       KVM: arm64: Log an error if trapping a read-from-write-only GICv3 access
>       KVM: arm64: Log an error if trapping a write-to-read-only GICv3 access
> 
> Vijaya Kumar K (1):
>       irqchip/gic-v3: Add missing system register definitions
> 
> dann frazier (1):
>       UBUNTU: [Config] CONFIG_CAVIUM_ERRATUM_30115=y
> 
>  Documentation/admin-guide/kernel-parameters.txt |  12 +
>  Documentation/arm64/silicon-errata.txt          |   1 +
>  arch/arm64/Kconfig                              |  11 +
>  arch/arm64/include/asm/arch_gicv3.h             |  10 +-
>  arch/arm64/include/asm/cpucaps.h                |   3 +-
>  arch/arm64/include/asm/cputype.h                |   2 +
>  arch/arm64/include/asm/esr.h                    |  24 +
>  arch/arm64/include/asm/kvm_hyp.h                |   1 +
>  arch/arm64/kernel/cpu_errata.c                  |  21 +
>  arch/arm64/kvm/hyp/Makefile                     |   2 +
>  arch/arm64/kvm/hyp/switch.c                     |  14 +
>  arch/arm64/kvm/sys_regs.c                       |  48 +-
>  arch/arm64/kvm/sys_regs.h                       |  18 -
>  debian.master/config/config.common.ubuntu       |   1 +
>  include/kvm/arm_vgic.h                          |   1 +
>  include/linux/irqchip/arm-gic-v3.h              |  49 +-
>  virt/kvm/arm/aarch32.c                          |   2 +-
>  virt/kvm/arm/hyp/vgic-v3-sr.c                   | 851 +++++++++++++++++++++++-
>  virt/kvm/arm/vgic/vgic-v3.c                     |  45 ++
>  19 files changed, 1066 insertions(+), 50 deletions(-)
> 

Huge pull request, but limited to arm and good testing.

Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

And applied to zesty/master-next branch. Thanks.