diff mbox series

[v18,2/6] docs: APEI GHES generation and CPER record description

Message ID 20190906083152.25716-3-zhengxiang9@huawei.com
State New
Headers show
Series Add ARMv8 RAS virtualization support in QEMU | expand

Commit Message

Xiang Zheng Sept. 6, 2019, 8:31 a.m. UTC
From: Dongjiu Geng <gengdongjiu@huawei.com>

Add APEI/GHES detailed design document

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
---
 docs/specs/acpi_hest_ghes.txt | 88 +++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 docs/specs/acpi_hest_ghes.txt

Comments

Peter Maydell Sept. 19, 2019, 1:25 p.m. UTC | #1
On Fri, 6 Sep 2019 at 09:33, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> From: Dongjiu Geng <gengdongjiu@huawei.com>
>
> Add APEI/GHES detailed design document
>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> ---
>  docs/specs/acpi_hest_ghes.txt | 88 +++++++++++++++++++++++++++++++++++
>  1 file changed, 88 insertions(+)
>  create mode 100644 docs/specs/acpi_hest_ghes.txt

Hi; new documentation in docs/specs should be in rst format and
listed in the contents page for the manual at docs/specs/index.rst,
please. Conversion from plain text should hopefully be fairly
straightforward.


I've also provided some minor typo/grammar fixes below.


> diff --git a/docs/specs/acpi_hest_ghes.txt b/docs/specs/acpi_hest_ghes.txt
> new file mode 100644
> index 0000000000..690d4b2bd0
> --- /dev/null
> +++ b/docs/specs/acpi_hest_ghes.txt
> @@ -0,0 +1,88 @@
> +APEI tables generating and CPER record
> +=============================
> +
> +Copyright (C) 2019 Huawei Corporation.
> +
> +Design Details:
> +-------------------
> +
> +       etc/acpi/tables                                 etc/hardware_errors
> +    ====================                      ==========================================
> ++ +--------------------------+            +-----------------------+
> +| | HEST                     |            |    address            |            +--------------+
> +| +--------------------------+            |    registers          |            | Error Status |
> +| | GHES1                    |            | +---------------------+            | Data Block 1 |
> +| +--------------------------+ +--------->| |error_block_address1 |----------->| +------------+
> +| | .................        | |          | +---------------------+            | |  CPER      |
> +| | error_status_address-----+-+ +------->| |error_block_address2 |--------+   | |  CPER      |
> +| | .................        |   |        | +---------------------+        |   | |  ....      |
> +| | read_ack_register--------+-+ |        | |    ..............   |        |   | |  CPER      |
> +| | read_ack_preserve        | | |        +-----------------------+        |   | +------------+
> +| | read_ack_write           | | | +----->| |error_block_addressN |------+ |   | Error Status |
> ++ +--------------------------+ | | |      | +---------------------+      | |   | Data Block 2 |
> +| | GHES2                    | +-+-+----->| |read_ack_register1   |      | +-->| +------------+
> ++ +--------------------------+   | |      | +---------------------+      |     | |  CPER      |
> +| | .................        |   | | +--->| |read_ack_register2   |      |     | |  CPER      |
> +| | error_status_address-----+---+ | |    | +---------------------+      |     | |  ....      |
> +| | .................        |     | |    | |  .............      |      |     | |  CPER      |
> +| | read_ack_register--------+-----+-+    | +---------------------+      |     +-+------------+
> +| | read_ack_preserve        |     |   +->| |read_ack_registerN   |      |     | |..........  |
> +| | read_ack_write           |     |   |  | +---------------------+      |     | +------------+
> ++ +--------------------------|     |   |                                 |     | Error Status |
> +| | ...............          |     |   |                                 |     | Data Block N |
> ++ +--------------------------+     |   |                                 +---->| +------------+
> +| | GHESN                    |     |   |                                       | |  CPER      |
> ++ +--------------------------+     |   |                                       | |  CPER      |
> +| | .................        |     |   |                                       | |  ....      |
> +| | error_status_address-----+-----+   |                                       | |  CPER      |
> +| | .................        |         |                                       +-+------------+
> +| | read_ack_register--------+---------+
> +| | read_ack_preserve        |
> +| | read_ack_write           |
> ++ +--------------------------+
> +
> +(1) QEMU generates the ACPI HEST table. This table goes in the current
> +    "etc/acpi/tables" fw_cfg blob. Each error source has different
> +    notification types.
> +
> +(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
> +    also need to populate this blob. The "etc/hardwre_errors" fw_cfg blob

"needs". "hardware_errors".

> +    contains an address registers table and an Error Status Data Block table.
> +
> +(3) The address registers table contains N Error Block Address entries
> +    and N Read Ack Register entries, the size for each entry is 8-byte.

". The size".

> +    The Error Status Data Block table contains N Error Status Data Block
> +    entries, the size for each entry is 4096(0x1000) bytes. The total size


". The size"

> +    for "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.

"for the"

> +    N is the kinds of hardware error sources.

Not sure what you had in mind here. Possibly either "N is the number of kinds of
hardware error sources" or "N is the number of hardware error sources" ?

> +
> +(4) QEMU generates the ACPI linker/loader script for the firmware, the

". The"

> +    firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors"
> +    and copies blobs content there.

"blob contents"

> +
> +(5) QEMU generates N ADD_POINTER commands, which patch address in the

"addresses"

> +    "error_status_address" fields of the HEST table with a pointer to the
> +    corresponding "address registers" in "etc/hardware_errors" blob.

"in the"

> +
> +(6) QEMU generates N ADD_POINTER commands, which patch address in the

"addresses"

> +    "read_ack_register" fields of the HEST table with a pointer to the
> +    corresponding "address registers" in "etc/hardware_errors" blob.

"in the"

> +
> +(7) QEMU generates N ADD_POINTER commands for the firmware, which patch
> +    address in the " error_block_address" fields with a pointer to the

"addresses". Stray extra space after open-quote.

> +    respective "Error Status Data Block" in "etc/hardware_errors" blob.

"in the"

> +
> +(8) QEMU defines a third and write-only fw_cfg blob which is called
> +    "etc/hardware_errors_addr". Through that blob, the firmware can send back
> +    the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
> +    blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER commands

"command"

> +    for the firmware, the firmware will write back the start address of

". The"

> +    "etc/hardware_errors" blob to fw_cfg file "etc/hardware_errors_addr".

"to the fw_cfg file"

> +
> +(9) When QEMU gets SIGBUS from the kernel, QEMU formats the CPER right into

"a SIGBUS"

> +    guest memory, and then injects whatever interrupt (or assert whatever GPIO

"or asserts"

> +    line) as a notification which is necessary for notifying the guest.
> +
> +(10) This notification (in virtual hardware) will be handled by guest kernel,

"the guest kernel"

> +    guest APEI driver will read the CPER which is recorded by QEMU and do the

"and the guest APEI driver"

> +    recovery.

thanks
-- PMM
Xiang Zheng Sept. 20, 2019, 1:45 a.m. UTC | #2
Hi Peter, thanks for your review!

On 2019/9/19 21:25, Peter Maydell wrote:
> On Fri, 6 Sep 2019 at 09:33, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>
>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>
>> Add APEI/GHES detailed design document
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>> ---
>>  docs/specs/acpi_hest_ghes.txt | 88 +++++++++++++++++++++++++++++++++++
>>  1 file changed, 88 insertions(+)
>>  create mode 100644 docs/specs/acpi_hest_ghes.txt
> 
> Hi; new documentation in docs/specs should be in rst format and
> listed in the contents page for the manual at docs/specs/index.rst,
> please. Conversion from plain text should hopefully be fairly
> straightforward.
> 

Got it!

> 
> I've also provided some minor typo/grammar fixes below.
> 
Thanks, I will fix them soon.

> 
>> diff --git a/docs/specs/acpi_hest_ghes.txt b/docs/specs/acpi_hest_ghes.txt
>> new file mode 100644
>> index 0000000000..690d4b2bd0
>> --- /dev/null
>> +++ b/docs/specs/acpi_hest_ghes.txt
>> @@ -0,0 +1,88 @@
>> +APEI tables generating and CPER record
>> +=============================
>> +
>> +Copyright (C) 2019 Huawei Corporation.
>> +
>> +Design Details:
>> +-------------------
>> +
>> +       etc/acpi/tables                                 etc/hardware_errors
>> +    ====================                      ==========================================
>> ++ +--------------------------+            +-----------------------+
>> +| | HEST                     |            |    address            |            +--------------+
>> +| +--------------------------+            |    registers          |            | Error Status |
>> +| | GHES1                    |            | +---------------------+            | Data Block 1 |
>> +| +--------------------------+ +--------->| |error_block_address1 |----------->| +------------+
>> +| | .................        | |          | +---------------------+            | |  CPER      |
>> +| | error_status_address-----+-+ +------->| |error_block_address2 |--------+   | |  CPER      |
>> +| | .................        |   |        | +---------------------+        |   | |  ....      |
>> +| | read_ack_register--------+-+ |        | |    ..............   |        |   | |  CPER      |
>> +| | read_ack_preserve        | | |        +-----------------------+        |   | +------------+
>> +| | read_ack_write           | | | +----->| |error_block_addressN |------+ |   | Error Status |
>> ++ +--------------------------+ | | |      | +---------------------+      | |   | Data Block 2 |
>> +| | GHES2                    | +-+-+----->| |read_ack_register1   |      | +-->| +------------+
>> ++ +--------------------------+   | |      | +---------------------+      |     | |  CPER      |
>> +| | .................        |   | | +--->| |read_ack_register2   |      |     | |  CPER      |
>> +| | error_status_address-----+---+ | |    | +---------------------+      |     | |  ....      |
>> +| | .................        |     | |    | |  .............      |      |     | |  CPER      |
>> +| | read_ack_register--------+-----+-+    | +---------------------+      |     +-+------------+
>> +| | read_ack_preserve        |     |   +->| |read_ack_registerN   |      |     | |..........  |
>> +| | read_ack_write           |     |   |  | +---------------------+      |     | +------------+
>> ++ +--------------------------|     |   |                                 |     | Error Status |
>> +| | ...............          |     |   |                                 |     | Data Block N |
>> ++ +--------------------------+     |   |                                 +---->| +------------+
>> +| | GHESN                    |     |   |                                       | |  CPER      |
>> ++ +--------------------------+     |   |                                       | |  CPER      |
>> +| | .................        |     |   |                                       | |  ....      |
>> +| | error_status_address-----+-----+   |                                       | |  CPER      |
>> +| | .................        |         |                                       +-+------------+
>> +| | read_ack_register--------+---------+
>> +| | read_ack_preserve        |
>> +| | read_ack_write           |
>> ++ +--------------------------+
>> +
>> +(1) QEMU generates the ACPI HEST table. This table goes in the current
>> +    "etc/acpi/tables" fw_cfg blob. Each error source has different
>> +    notification types.
>> +
>> +(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
>> +    also need to populate this blob. The "etc/hardwre_errors" fw_cfg blob
> 
> "needs". "hardware_errors".
> 
>> +    contains an address registers table and an Error Status Data Block table.
>> +
>> +(3) The address registers table contains N Error Block Address entries
>> +    and N Read Ack Register entries, the size for each entry is 8-byte.
> 
> ". The size".
> 
>> +    The Error Status Data Block table contains N Error Status Data Block
>> +    entries, the size for each entry is 4096(0x1000) bytes. The total size
> 
> 
> ". The size"
> 
>> +    for "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
> 
> "for the"
> 
>> +    N is the kinds of hardware error sources.
> 
> Not sure what you had in mind here. Possibly either "N is the number of kinds of
> hardware error sources" or "N is the number of hardware error sources" ?

Yes, I mean "N is the number of kinds of hardware error sources".

> 
>> +
>> +(4) QEMU generates the ACPI linker/loader script for the firmware, the
> 
> ". The"
> 
>> +    firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors"
>> +    and copies blobs content there.
> 
> "blob contents"
> 
>> +
>> +(5) QEMU generates N ADD_POINTER commands, which patch address in the
> 
> "addresses"
> 
>> +    "error_status_address" fields of the HEST table with a pointer to the
>> +    corresponding "address registers" in "etc/hardware_errors" blob.
> 
> "in the"
> 
>> +
>> +(6) QEMU generates N ADD_POINTER commands, which patch address in the
> 
> "addresses"
> 
>> +    "read_ack_register" fields of the HEST table with a pointer to the
>> +    corresponding "address registers" in "etc/hardware_errors" blob.
> 
> "in the"
> 
>> +
>> +(7) QEMU generates N ADD_POINTER commands for the firmware, which patch
>> +    address in the " error_block_address" fields with a pointer to the
> 
> "addresses". Stray extra space after open-quote.
> 
>> +    respective "Error Status Data Block" in "etc/hardware_errors" blob.
> 
> "in the"
> 
>> +
>> +(8) QEMU defines a third and write-only fw_cfg blob which is called
>> +    "etc/hardware_errors_addr". Through that blob, the firmware can send back
>> +    the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
>> +    blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER commands
> 
> "command"
> 
>> +    for the firmware, the firmware will write back the start address of
> 
> ". The"
> 
>> +    "etc/hardware_errors" blob to fw_cfg file "etc/hardware_errors_addr".
> 
> "to the fw_cfg file"
> 
>> +
>> +(9) When QEMU gets SIGBUS from the kernel, QEMU formats the CPER right into
> 
> "a SIGBUS"
> 
>> +    guest memory, and then injects whatever interrupt (or assert whatever GPIO
> 
> "or asserts"
> 
>> +    line) as a notification which is necessary for notifying the guest.
>> +
>> +(10) This notification (in virtual hardware) will be handled by guest kernel,
> 
> "the guest kernel"
> 
>> +    guest APEI driver will read the CPER which is recorded by QEMU and do the
> 
> "and the guest APEI driver"
> 
>> +    recovery.
> 
> thanks
> -- PMM
> 
> .
>
Igor Mammedov Oct. 4, 2019, 8:20 a.m. UTC | #3
On Fri, 6 Sep 2019 16:31:48 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> From: Dongjiu Geng <gengdongjiu@huawei.com>
> 
[...]
> +
> +(9) When QEMU gets SIGBUS from the kernel, QEMU formats the CPER right into
> +    guest memory, and then injects whatever interrupt (or assert whatever GPIO
s/whatever .../platform specific/

and add concrete impl info like:
  "in case of arm/virt machine it's ..."

> +    line) as a notification which is necessary for notifying the guest.
[...]
Xiang Zheng Oct. 8, 2019, 1:25 p.m. UTC | #4
Hi Igor,

Thanks for your review!

On 2019/10/4 16:20, Igor Mammedov wrote:
> On Fri, 6 Sep 2019 16:31:48 +0800
> Xiang Zheng <zhengxiang9@huawei.com> wrote:
> 
>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>
> [...]
>> +
>> +(9) When QEMU gets SIGBUS from the kernel, QEMU formats the CPER right into
>> +    guest memory, and then injects whatever interrupt (or assert whatever GPIO
> s/whatever .../platform specific/
> 
> and add concrete impl info like:
>   "in case of arm/virt machine it's ..."

OK, I will add the concrete impl info.

> 
>> +    line) as a notification which is necessary for notifying the guest.
> [...]
> 
> .
>
diff mbox series

Patch

diff --git a/docs/specs/acpi_hest_ghes.txt b/docs/specs/acpi_hest_ghes.txt
new file mode 100644
index 0000000000..690d4b2bd0
--- /dev/null
+++ b/docs/specs/acpi_hest_ghes.txt
@@ -0,0 +1,88 @@ 
+APEI tables generating and CPER record
+=============================
+
+Copyright (C) 2019 Huawei Corporation.
+
+Design Details:
+-------------------
+
+       etc/acpi/tables                                 etc/hardware_errors
+    ====================                      ==========================================
++ +--------------------------+            +-----------------------+
+| | HEST                     |            |    address            |            +--------------+
+| +--------------------------+            |    registers          |            | Error Status |
+| | GHES1                    |            | +---------------------+            | Data Block 1 |
+| +--------------------------+ +--------->| |error_block_address1 |----------->| +------------+
+| | .................        | |          | +---------------------+            | |  CPER      |
+| | error_status_address-----+-+ +------->| |error_block_address2 |--------+   | |  CPER      |
+| | .................        |   |        | +---------------------+        |   | |  ....      |
+| | read_ack_register--------+-+ |        | |    ..............   |        |   | |  CPER      |
+| | read_ack_preserve        | | |        +-----------------------+        |   | +------------+
+| | read_ack_write           | | | +----->| |error_block_addressN |------+ |   | Error Status |
++ +--------------------------+ | | |      | +---------------------+      | |   | Data Block 2 |
+| | GHES2                    | +-+-+----->| |read_ack_register1   |      | +-->| +------------+
++ +--------------------------+   | |      | +---------------------+      |     | |  CPER      |
+| | .................        |   | | +--->| |read_ack_register2   |      |     | |  CPER      |
+| | error_status_address-----+---+ | |    | +---------------------+      |     | |  ....      |
+| | .................        |     | |    | |  .............      |      |     | |  CPER      |
+| | read_ack_register--------+-----+-+    | +---------------------+      |     +-+------------+
+| | read_ack_preserve        |     |   +->| |read_ack_registerN   |      |     | |..........  |
+| | read_ack_write           |     |   |  | +---------------------+      |     | +------------+
++ +--------------------------|     |   |                                 |     | Error Status |
+| | ...............          |     |   |                                 |     | Data Block N |
++ +--------------------------+     |   |                                 +---->| +------------+
+| | GHESN                    |     |   |                                       | |  CPER      |
++ +--------------------------+     |   |                                       | |  CPER      |
+| | .................        |     |   |                                       | |  ....      |
+| | error_status_address-----+-----+   |                                       | |  CPER      |
+| | .................        |         |                                       +-+------------+
+| | read_ack_register--------+---------+
+| | read_ack_preserve        |
+| | read_ack_write           |
++ +--------------------------+
+
+(1) QEMU generates the ACPI HEST table. This table goes in the current
+    "etc/acpi/tables" fw_cfg blob. Each error source has different
+    notification types.
+
+(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
+    also need to populate this blob. The "etc/hardwre_errors" fw_cfg blob
+    contains an address registers table and an Error Status Data Block table.
+
+(3) The address registers table contains N Error Block Address entries
+    and N Read Ack Register entries, the size for each entry is 8-byte.
+    The Error Status Data Block table contains N Error Status Data Block
+    entries, the size for each entry is 4096(0x1000) bytes. The total size
+    for "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
+    N is the kinds of hardware error sources.
+
+(4) QEMU generates the ACPI linker/loader script for the firmware, the
+    firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors"
+    and copies blobs content there.
+
+(5) QEMU generates N ADD_POINTER commands, which patch address in the
+    "error_status_address" fields of the HEST table with a pointer to the
+    corresponding "address registers" in "etc/hardware_errors" blob.
+
+(6) QEMU generates N ADD_POINTER commands, which patch address in the
+    "read_ack_register" fields of the HEST table with a pointer to the
+    corresponding "address registers" in "etc/hardware_errors" blob.
+
+(7) QEMU generates N ADD_POINTER commands for the firmware, which patch
+    address in the " error_block_address" fields with a pointer to the
+    respective "Error Status Data Block" in "etc/hardware_errors" blob.
+
+(8) QEMU defines a third and write-only fw_cfg blob which is called
+    "etc/hardware_errors_addr". Through that blob, the firmware can send back
+    the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
+    blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER commands
+    for the firmware, the firmware will write back the start address of
+    "etc/hardware_errors" blob to fw_cfg file "etc/hardware_errors_addr".
+
+(9) When QEMU gets SIGBUS from the kernel, QEMU formats the CPER right into
+    guest memory, and then injects whatever interrupt (or assert whatever GPIO
+    line) as a notification which is necessary for notifying the guest.
+
+(10) This notification (in virtual hardware) will be handled by guest kernel,
+    guest APEI driver will read the CPER which is recorded by QEMU and do the
+    recovery.