mbox

[Pull,SRU,Zesty,v2] Add support for RAS features on ARM64

Message ID 1500390709.4626.20@smtp.canonical.com
State New
Headers show

Pull-request

git://git.launchpad.net/~centriq-team/+git/linux-arm64ras

Message

Manoj Iyer July 18, 2017, 3:11 p.m. UTC
The following pull request adds RAS support to ARM64. The patches were 
tested on QDF2400 and UEFI based AMD64 systems using mce-test testsuite.

Patches track bugs:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696570
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1698448
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696852

I had previously submitted similar patches based on linux-next, but 
these patches here are cherry-picked and backported from the linus's 
tree. Please review and consider for SRU.

The following changes since commit 
f4f26263ff6a66c2012e9417a56e1b01a95c45d0:

  UBUNTU: Ubuntu-4.10.0-28.32 (2017-06-29 11:24:09 +0200)

are available in the git repository at:

  git://git.launchpad.net/~centriq-team/+git/linux-arm64ras

for you to fetch changes up to 619d132d16d9dfd3f3afa6c205828c4d495e2b53:

  arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling (2017-07-11 
09:20:53 -0500)

----------------------------------------------------------------
Arnd Bergmann (1):
      ras: mark stub functions as 'inline'

Jonathan (Zhixiong) Zhang (3):
      acpi: apei: panic OS with fatal error status block
      arm64: kconfig: allow support for memory failure handling
      arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling

Manoj Iyer (1):
      UBUNTU: [Config] CONFIG_ACPI_APEI_SEA=y

Punit Agrawal (2):
      arm64: mm: Update perf accounting to handle poison faults
      arm64: hugetlb: Fix huge_pte_offset to return poisoned page table 
entries

Tyler Baicar (11):
      acpi: apei: read ack upon ghes record consumption
      ras: acpi/apei: cper: add support for generic data v3 structure
      cper: add timestamp print to CPER status printing
      efi: parse ARM processor error
      arm64: exception: handle Synchronous External Abort
      acpi: apei: handle SEA notification type for ARMv8
      efi: print unrecognized CPER section
      ras: acpi / apei: generate trace event for unrecognized CPER 
section
      trace, ras: add ARM processor error trace event
      arm/arm64: KVM: add guest SEA support
      acpi: apei: check for pending errors when probing GHES entries

 arch/arm/include/asm/kvm_arm.h | 10 ++
 arch/arm/include/asm/system_misc.h | 5 +
 arch/arm/kvm/mmu.c | 36 ++++-
 arch/arm64/Kconfig | 3 +
 arch/arm64/include/asm/esr.h | 1 +
 arch/arm64/include/asm/kvm_arm.h | 10 ++
 arch/arm64/include/asm/pgtable.h | 2 +-
 arch/arm64/include/asm/system_misc.h | 2 +
 arch/arm64/mm/fault.c | 170 +++++++++++++++++-------
 arch/arm64/mm/hugetlbpage.c | 29 ++---
 debian.master/config/config.common.ubuntu | 1 +
 drivers/acpi/apei/Kconfig | 15 +++
 drivers/acpi/apei/ghes.c | 209 +++++++++++++++++++++++++-----
 drivers/acpi/apei/hest.c | 7 +-
 drivers/firmware/efi/cper.c | 204 ++++++++++++++++++++++++++---
 drivers/ras/ras.c | 25 ++++
 include/acpi/ghes.h | 48 ++++++-
 include/linux/cper.h | 54 ++++++++
 include/linux/ras.h | 19 +++
 include/linux/uuid.h | 9 ++
 include/ras/ras_event.h | 90 +++++++++++++
 21 files changed, 831 insertions(+), 118 deletions(-)

Comments

Stefan Bader July 21, 2017, 8:09 a.m. UTC | #1
On 18.07.2017 17:11, Manoj Iyer wrote:
> The following pull request adds RAS support to ARM64. The patches were tested on
> QDF2400 and UEFI based AMD64 systems using mce-test testsuite. 
> 
> Patches track bugs:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696570
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1698448
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696852
> 
> I had previously submitted similar patches based on linux-next, but these
> patches here are cherry-picked and backported from the linus's tree. Please
> review and consider for SRU.

This is (still) a big change across subsystems to add feature support. I saw
Seth picked things up for Artful. So my preference would be to give that at
least some time to settle before moving everything into Zesty, too.

-Stefan

> 
> The following changes since commit f4f26263ff6a66c2012e9417a56e1b01a95c45d0:
> 
>   UBUNTU: Ubuntu-4.10.0-28.32 (2017-06-29 11:24:09 +0200)
> 
> are available in the git repository at:
> 
>   git://git.launchpad.net/~centriq-team/+git/linux-arm64ras 
> 
> for you to fetch changes up to 619d132d16d9dfd3f3afa6c205828c4d495e2b53:
> 
>   arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling (2017-07-11 09:20:53
> -0500)
> 
> ----------------------------------------------------------------
> Arnd Bergmann (1):
>       ras: mark stub functions as 'inline'
> 
> Jonathan (Zhixiong) Zhang (3):
>       acpi: apei: panic OS with fatal error status block
>       arm64: kconfig: allow support for memory failure handling
>       arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
> 
> Manoj Iyer (1):
>       UBUNTU: [Config] CONFIG_ACPI_APEI_SEA=y
> 
> Punit Agrawal (2):
>       arm64: mm: Update perf accounting to handle poison faults
>       arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
> 
> Tyler Baicar (11):
>       acpi: apei: read ack upon ghes record consumption
>       ras: acpi/apei: cper: add support for generic data v3 structure
>       cper: add timestamp print to CPER status printing
>       efi: parse ARM processor error
>       arm64: exception: handle Synchronous External Abort
>       acpi: apei: handle SEA notification type for ARMv8
>       efi: print unrecognized CPER section
>       ras: acpi / apei: generate trace event for unrecognized CPER section
>       trace, ras: add ARM processor error trace event
>       arm/arm64: KVM: add guest SEA support
>       acpi: apei: check for pending errors when probing GHES entries
> 
>  arch/arm/include/asm/kvm_arm.h | 10 ++
>  arch/arm/include/asm/system_misc.h | 5 +
>  arch/arm/kvm/mmu.c | 36 ++++-
>  arch/arm64/Kconfig | 3 +
>  arch/arm64/include/asm/esr.h | 1 +
>  arch/arm64/include/asm/kvm_arm.h | 10 ++
>  arch/arm64/include/asm/pgtable.h | 2 +-
>  arch/arm64/include/asm/system_misc.h | 2 +
>  arch/arm64/mm/fault.c | 170 +++++++++++++++++-------
>  arch/arm64/mm/hugetlbpage.c | 29 ++---
>  debian.master/config/config.common.ubuntu | 1 +
>  drivers/acpi/apei/Kconfig | 15 +++
>  drivers/acpi/apei/ghes.c | 209 +++++++++++++++++++++++++-----
>  drivers/acpi/apei/hest.c | 7 +-
>  drivers/firmware/efi/cper.c | 204 ++++++++++++++++++++++++++---
>  drivers/ras/ras.c | 25 ++++
>  include/acpi/ghes.h | 48 ++++++-
>  include/linux/cper.h | 54 ++++++++
>  include/linux/ras.h | 19 +++
>  include/linux/uuid.h | 9 ++
>  include/ras/ras_event.h | 90 +++++++++++++
>  21 files changed, 831 insertions(+), 118 deletions(-)
> 
> 
>
Manoj Iyer July 26, 2017, 9:13 p.m. UTC | #2
Stefan,

The RAS patch series was submitted previously as SRU to Zesty & to 
Artful, this patch series was fix-committed to Artful and is currently 
pending SRU to zesty. Tracked in bugs:

https://launchpad.net/bugs/1696570https://launchpad.net/bugs/1698448
https://launchpadnet/bugs/1696852

This patch was regression tested on AMD64, Power8 and 4 different ARM64 
systems. Details of tests and test kernel can be found in the bug:  
https://launchpad.net/bugs/1696570 (and in 
https://launchpad.net/bugs/1706141). The test kernel based on Ubuntu 
Zesty tag Ubuntu-4.10.0-29.33 with RAS patches and EDAC_GHES enablement 
config patch was tested by me as well as Qualcomm for functionality, 
and test results were posted to the bug report as comments by me and 
Tyler Baicar.

Also, we have been testing the early incarnation of the RAS patches 
along with the EDAC_GHES support for quite sometime in the periodic 
Xenial UOSE builds we did for QDF2400 systems. We therefore have high 
degree of confidence that the RAS patches and the EDAC_GHES support 
works as expected on QDF2400 systems, and poses a low degree of 
regression risk on other architectures.

The RAS patches have had some time to bake in Artful, and we have 
provided additional testing on QDF2400 and other ARM64 systems, and 
regression tested on AMD64 and Power8. Please reconsider this patch 
series for SRU in Zesty.

Thanks
Manoj Iyer

On Fri, Jul 21, 2017 at 3:09 AM, Stefan Bader 
<stefan.bader@canonical.com> wrote:
> On 18.07.2017 17:11, Manoj Iyer wrote:
>>  The following pull request adds RAS support to ARM64. The patches 
>> were tested on
>>  QDF2400 and UEFI based AMD64 systems using mce-test testsuite.
>> 
>>  Patches track bugs:
>>  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696570
>>  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1698448
>>  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696852
>> 
>>  I had previously submitted similar patches based on linux-next, but 
>> these
>>  patches here are cherry-picked and backported from the linus's 
>> tree. Please
>>  review and consider for SRU.
> 
> This is (still) a big change across subsystems to add feature 
> support. I saw
> Seth picked things up for Artful. So my preference would be to give 
> that at
> least some time to settle before moving everything into Zesty, too.
> 
> -Stefan
> 
>> 
>>  The following changes since commit 
>> f4f26263ff6a66c2012e9417a56e1b01a95c45d0:
>> 
>>    UBUNTU: Ubuntu-4.10.0-28.32 (2017-06-29 11:24:09 +0200)
>> 
>>  are available in the git repository at:
>> 
>>    git://git.launchpad.net/~centriq-team/+git/linux-arm64ras
>> 
>>  for you to fetch changes up to 
>> 619d132d16d9dfd3f3afa6c205828c4d495e2b53:
>> 
>>    arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling 
>> (2017-07-11 09:20:53
>>  -0500)
>> 
>>  ----------------------------------------------------------------
>>  Arnd Bergmann (1):
>>        ras: mark stub functions as 'inline'
>> 
>>  Jonathan (Zhixiong) Zhang (3):
>>        acpi: apei: panic OS with fatal error status block
>>        arm64: kconfig: allow support for memory failure handling
>>        arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
>> 
>>  Manoj Iyer (1):
>>        UBUNTU: [Config] CONFIG_ACPI_APEI_SEA=y
>> 
>>  Punit Agrawal (2):
>>        arm64: mm: Update perf accounting to handle poison faults
>>        arm64: hugetlb: Fix huge_pte_offset to return poisoned page 
>> table entries
>> 
>>  Tyler Baicar (11):
>>        acpi: apei: read ack upon ghes record consumption
>>        ras: acpi/apei: cper: add support for generic data v3 
>> structure
>>        cper: add timestamp print to CPER status printing
>>        efi: parse ARM processor error
>>        arm64: exception: handle Synchronous External Abort
>>        acpi: apei: handle SEA notification type for ARMv8
>>        efi: print unrecognized CPER section
>>        ras: acpi / apei: generate trace event for unrecognized CPER 
>> section
>>        trace, ras: add ARM processor error trace event
>>        arm/arm64: KVM: add guest SEA support
>>        acpi: apei: check for pending errors when probing GHES entries
>> 
>>   arch/arm/include/asm/kvm_arm.h | 10 ++
>>   arch/arm/include/asm/system_misc.h | 5 +
>>   arch/arm/kvm/mmu.c | 36 ++++-
>>   arch/arm64/Kconfig | 3 +
>>   arch/arm64/include/asm/esr.h | 1 +
>>   arch/arm64/include/asm/kvm_arm.h | 10 ++
>>   arch/arm64/include/asm/pgtable.h | 2 +-
>>   arch/arm64/include/asm/system_misc.h | 2 +
>>   arch/arm64/mm/fault.c | 170 +++++++++++++++++-------
>>   arch/arm64/mm/hugetlbpage.c | 29 ++---
>>   debian.master/config/config.common.ubuntu | 1 +
>>   drivers/acpi/apei/Kconfig | 15 +++
>>   drivers/acpi/apei/ghes.c | 209 +++++++++++++++++++++++++-----
>>   drivers/acpi/apei/hest.c | 7 +-
>>   drivers/firmware/efi/cper.c | 204 ++++++++++++++++++++++++++---
>>   drivers/ras/ras.c | 25 ++++
>>   include/acpi/ghes.h | 48 ++++++-
>>   include/linux/cper.h | 54 ++++++++
>>   include/linux/ras.h | 19 +++
>>   include/linux/uuid.h | 9 ++
>>   include/ras/ras_event.h | 90 +++++++++++++
>>   21 files changed, 831 insertions(+), 118 deletions(-)
>> 
>> 
>> 
> 
>
Stefan Bader July 27, 2017, 2:26 p.m. UTC | #3
On 18.07.2017 17:11, Manoj Iyer wrote:
> The following pull request adds RAS support to ARM64. The patches were tested on
> QDF2400 and UEFI based AMD64 systems using mce-test testsuite. 
> 
> Patches track bugs:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696570
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1698448
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696852
> 
> I had previously submitted similar patches based on linux-next, but these
> patches here are cherry-picked and backported from the linus's tree. Please
> review and consider for SRU.
> 
> The following changes since commit f4f26263ff6a66c2012e9417a56e1b01a95c45d0:
> 
>   UBUNTU: Ubuntu-4.10.0-28.32 (2017-06-29 11:24:09 +0200)
> 
> are available in the git repository at:
> 
>   git://git.launchpad.net/~centriq-team/+git/linux-arm64ras 
> 
> for you to fetch changes up to 619d132d16d9dfd3f3afa6c205828c4d495e2b53:
> 
>   arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling (2017-07-11 09:20:53
> -0500)
> 
> ----------------------------------------------------------------
> Arnd Bergmann (1):
>       ras: mark stub functions as 'inline'
> 
> Jonathan (Zhixiong) Zhang (3):
>       acpi: apei: panic OS with fatal error status block
>       arm64: kconfig: allow support for memory failure handling
>       arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
> 
> Manoj Iyer (1):
>       UBUNTU: [Config] CONFIG_ACPI_APEI_SEA=y
> 
> Punit Agrawal (2):
>       arm64: mm: Update perf accounting to handle poison faults
>       arm64: hugetlb: Fix huge_pte_offset to return poisoned page table entries
> 
> Tyler Baicar (11):
>       acpi: apei: read ack upon ghes record consumption
>       ras: acpi/apei: cper: add support for generic data v3 structure
>       cper: add timestamp print to CPER status printing
>       efi: parse ARM processor error
>       arm64: exception: handle Synchronous External Abort
>       acpi: apei: handle SEA notification type for ARMv8
>       efi: print unrecognized CPER section
>       ras: acpi / apei: generate trace event for unrecognized CPER section
>       trace, ras: add ARM processor error trace event
>       arm/arm64: KVM: add guest SEA support
>       acpi: apei: check for pending errors when probing GHES entries
> 
>  arch/arm/include/asm/kvm_arm.h | 10 ++
>  arch/arm/include/asm/system_misc.h | 5 +
>  arch/arm/kvm/mmu.c | 36 ++++-
>  arch/arm64/Kconfig | 3 +
>  arch/arm64/include/asm/esr.h | 1 +
>  arch/arm64/include/asm/kvm_arm.h | 10 ++
>  arch/arm64/include/asm/pgtable.h | 2 +-
>  arch/arm64/include/asm/system_misc.h | 2 +
>  arch/arm64/mm/fault.c | 170 +++++++++++++++++-------
>  arch/arm64/mm/hugetlbpage.c | 29 ++---
>  debian.master/config/config.common.ubuntu | 1 +
>  drivers/acpi/apei/Kconfig | 15 +++
>  drivers/acpi/apei/ghes.c | 209 +++++++++++++++++++++++++-----
>  drivers/acpi/apei/hest.c | 7 +-
>  drivers/firmware/efi/cper.c | 204 ++++++++++++++++++++++++++---
>  drivers/ras/ras.c | 25 ++++
>  include/acpi/ghes.h | 48 ++++++-
>  include/linux/cper.h | 54 ++++++++
>  include/linux/ras.h | 19 +++
>  include/linux/uuid.h | 9 ++
>  include/ras/ras_event.h | 90 +++++++++++++
>  21 files changed, 831 insertions(+), 118 deletions(-)
> 
> 
> 
Acked-by: Stefan Bader <stefan.bader@canonical.com>

Allright, if all has been tested as stated, lets try and hope for the best.

-Stefan
Seth Forshee July 27, 2017, 7:26 p.m. UTC | #4
On Tue, Jul 18, 2017 at 10:11:49AM -0500, Manoj Iyer wrote:
> The following pull request adds RAS support to ARM64. The patches were
> tested on QDF2400 and UEFI based AMD64 systems using mce-test testsuite.
> 
> Patches track bugs:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696570
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1698448
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1696852
> 
> I had previously submitted similar patches based on linux-next, but these
> patches here are cherry-picked and backported from the linus's tree. Please
> review and consider for SRU.
> 
> The following changes since commit f4f26263ff6a66c2012e9417a56e1b01a95c45d0:
> 
>  UBUNTU: Ubuntu-4.10.0-28.32 (2017-06-29 11:24:09 +0200)
> 
> are available in the git repository at:
> 
>  git://git.launchpad.net/~centriq-team/+git/linux-arm64ras
> 
> for you to fetch changes up to 619d132d16d9dfd3f3afa6c205828c4d495e2b53:
> 
>  arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling (2017-07-11
> 09:20:53 -0500)

Based on the testing:

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Thadeu Lima de Souza Cascardo Aug. 3, 2017, 6:25 p.m. UTC | #5
Applied to zesty master-next branch.

Thanks.
Cascardo.