mbox series

[v7,00/22] MPIPL support

Message ID 20190413091548.14329-1-hegdevasant@linux.vnet.ibm.com
Headers show
Series MPIPL support | expand

Message

Vasant Hegde April 13, 2019, 9:15 a.m. UTC
Memory Preserving Initial Program Load (MPIPL) is a Power feature where
the contents of memory are preserved while the system reboots after a
failure. This is accomplished by the firmware/OS publishing ranges of
memory to be preserved across boots.

In the OPAL context, OPAL and host Linux communicate the memory ranges
to be preserved via source descriptor tables in the HDAT. This data is
updated at runtime via the SBE. When OPAL or Linux crashes, SBE gets to
know of the event via a special interrupt which causes it to trigger
the MPIPL.

Hostboot then re-IPLs the machine taking care to copy over contents of
the source descriptor tables to a alternate memory locations and
publishes this information in the destination descriptor tables. The
success/failure of the copy is indicated by a results table.

Hostboot also does the requisite procedures to gather hardware register
states for all active threads at the time of the crash.

On an MPIPL boot, OPAL exports the results information to the host Linux
kernel, which then uses its existing logic (kdump/fadump) to write out a
core dump of OPAL and Linux kernel in a format that GDB and crash can
understand.

design:
  - OPAL hardcodes few entries in MDST, MDDT table so that we can
    capture early opal crash
  - OPAL passes relocated base address to SBE via 'MPIPL stash chip-op'
    so that hostboot can use relocated address to get MDST, MDDT table.
  - OPAL passes dump details to host via device tree (/ibm,dump node)
  - OPAL provides new API to MPIPL registration/unregistration
  - Enhance reboot2 API to support MPIPL
  - Add SBE s0/s1 interrupt support to initiate MPIPL on BMC based
    system.

Dependency:
  - This patchset has dependency on SBE and Hostboot changes. There are
    few Hostboot/SBE patches still in review state.
  - We need Linux kernel changes to generate opal core.
    Hari will post Linux side patches.

Testing:
    We can use below op-build tree to build PNOR with MPIPL support
    until we get all HB and SBE patches in upstream.
    https://github.com/hegdevasant/op-build/tree/mpipl

TODO:
  - Export raw opal core under sysfs

Testing:
  - I've tested this patchset with custom hostboot and SBE changes. Its
    workign fine. We are able to get vmcore and opalcore.

Changes in v7:
  - Rebased on top of current master
  - Fixed hiomap test cases
  - Few minor fixes/logging improvements

Changes in v6:
  - Added support to get architected register data
  - Added support for HIOMAP reset
  - Added new patch to export OPAL boot entry address in device tree
   (Needed for OPAL core)
  - Added support to save/export crashing CPU details
  - Few other minor fixes

Changes in v5:
  - As Stewart suggested moved "dump" device tree node under /ibm,opal
  - Updated OPAL API number to 170
  - Added check before triggering MPIPL.
    If MPIPL supported then it will trigger MPIPL, else it will call
    normal reboot.

Changes in v4:
   - Make sure crashing CPU will not go to stop state
   - Send stash MPIPL chip op to all SBEs
   - Minor prlog improvements

Changes in v3:
  - Added documenation for new OPAL API and device tree binding
  - Fixed MPIPL trigger path
    I have hooked MPIPL trigger path to assert path. Now it will be
    trigged on witherspoon only. We haven't tested on other BMC
    platform. Once we test we will enable on other BMC system.
  - Added MBOX reset support before triggering MPIPL
  - Added support to detect `MPIPL support` system params

Changes in v2:
  - Added support to get architected registers
  - SBE guys changed MPIPL trigger interrupt bit. Now its S0 on both
    master and slave chip SBE
  - Fixed few other minor issues


Vasant Hegde (22):
  OPAL: Add OPAL boot entry address to device tree
  FSP/MDST: Rename fsp-mdst-table.c -> fsp-sysdump.c
  hdata: Fix MDST structure
  hdata: Define various DUMP related tables
  hdata: Setup memory for MDDT table
  hdata: Setup memory for MDRT table
  fadump: Define destination memory for OPAL boot crash
  hdata: Add support for early OPAL crash
  hdata: Adjust various structure offset after relocation
  dump: Define fadump structure
  hdata: Create /ibm,opal/dump device tree node
  hdata: Add result table property to "dump" node
  fadump: Add fw-source-table to ibm,opal/dump node
  fadump: Add OPAL API to register for fadump
  fadump: Introduce new reboot type
  fadump: Send OPAL relocated base address to SBE
  fadump: Add support to trigger memory preserving IPL on BMC system
  fadump: Add documentation
  mbox: Reset bmc mbox in MPIPL path
  fadump: Hardcode architected register destination memory
  hdata: Add architected register details to device tree
  fadump: Save crashing cpu data

 core/Makefile.inc                          |   2 +-
 core/flash.c                               |  13 +-
 core/init.c                                |   6 +-
 core/opal-dump.c                           | 385 +++++++++++++++++++++++++++++
 core/opal.c                                |   4 +-
 core/platform.c                            |   7 +-
 doc/device-tree/ibm,opal/dump.rst          |  33 +++
 doc/fadump.rst                             |  51 ++++
 doc/index.rst                              |   1 +
 doc/opal-api/opal-cec-reboot-6-116.rst     |   9 +
 doc/opal-api/opal-fadump-manage-173.rst    |  73 ++++++
 hdata/spira.c                              | 274 +++++++++++++++++++-
 hdata/spira.h                              |  10 +-
 hdata/test/hdata_to_dt.c                   |   1 +
 hw/fsp/Makefile.inc                        |   2 +-
 hw/fsp/{fsp-mdst-table.c => fsp-sysdump.c} |  18 +-
 hw/ipmi/ipmi-attn.c                        |   9 +-
 hw/sbe-p9.c                                | 121 +++++++++
 include/fsp-mdst-table.h                   |  48 ----
 include/mem-map.h                          |  67 ++++-
 include/opal-api.h                         |  45 +++-
 include/opal-dump.h                        | 138 +++++++++++
 include/sbe-p9.h                           |  18 +-
 include/skiboot.h                          |   3 +-
 libflash/ipmi-hiomap.c                     |  31 ++-
 libflash/ipmi-hiomap.h                     |   2 +-
 libflash/mbox-flash.c                      |  28 ++-
 libflash/mbox-flash.h                      |   2 +-
 libflash/test/mbox-server.c                |   1 +
 libflash/test/test-ipmi-hiomap.c           | 169 +++++++++++++
 skiboot.lds.S                              |  12 +-
 31 files changed, 1497 insertions(+), 86 deletions(-)
 create mode 100644 core/opal-dump.c
 create mode 100644 doc/device-tree/ibm,opal/dump.rst
 create mode 100644 doc/fadump.rst
 create mode 100644 doc/opal-api/opal-fadump-manage-173.rst
 rename hw/fsp/{fsp-mdst-table.c => fsp-sysdump.c} (95%)
 delete mode 100644 include/fsp-mdst-table.h
 create mode 100644 include/opal-dump.h

Comments

Michael Neuling May 9, 2019, 12:38 a.m. UTC | #1
On Sat, 2019-04-13 at 14:45 +0530, Vasant Hegde wrote:
> Memory Preserving Initial Program Load (MPIPL) is a Power feature where
> the contents of memory are preserved while the system reboots after a
> failure. This is accomplished by the firmware/OS publishing ranges of
> memory to be preserved across boots.
> 
> In the OPAL context, OPAL and host Linux communicate the memory ranges
> to be preserved via source descriptor tables in the HDAT. This data is
> updated at runtime via the SBE. When OPAL or Linux crashes, SBE gets to
> know of the event via a special interrupt which causes it to trigger
> the MPIPL.
> 
> Hostboot then re-IPLs the machine taking care to copy over contents of
> the source descriptor tables to a alternate memory locations and
> publishes this information in the destination descriptor tables. The
> success/failure of the copy is indicated by a results table.

How much coping are we talking about here?  Most of RAM? Where to?

> Hostboot also does the requisite procedures to gather hardware register
> states for all active threads at the time of the crash.
> 
> On an MPIPL boot, OPAL exports the results information to the host Linux
> kernel, which then uses its existing logic (kdump/fadump) to write out a
> core dump of OPAL and Linux kernel in a format that GDB and crash can
> understand.
> 
> design:
>   - OPAL hardcodes few entries in MDST, MDDT table so that we can
>     capture early opal crash
>   - OPAL passes relocated base address to SBE via 'MPIPL stash chip-op'
>     so that hostboot can use relocated address to get MDST, MDDT table.
>   - OPAL passes dump details to host via device tree (/ibm,dump node)
>   - OPAL provides new API to MPIPL registration/unregistration
>   - Enhance reboot2 API to support MPIPL
>   - Add SBE s0/s1 interrupt support to initiate MPIPL on BMC based
>     system.
> 
> Dependency:
>   - This patchset has dependency on SBE and Hostboot changes. There are
>     few Hostboot/SBE patches still in review state.

How do we determine if the hostboot we are running with supports this?

>   - We need Linux kernel changes to generate opal core.
>     Hari will post Linux side patches.

Similarly here. How does Linux know skiboot/hostboot supports this?  If
skiboot/hostboot doesn't support this, it'll need to go for a kdump alternative?

> Testing:
>     We can use below op-build tree to build PNOR with MPIPL support
>     until we get all HB and SBE patches in upstream.
>     https://github.com/hegdevasant/op-build/tree/mpipl
>
> TODO:
>   - Export raw opal core under sysfs
> 
> Testing:
>   - I've tested this patchset with custom hostboot and SBE changes. Its
>     workign fine. We are able to get vmcore and opalcore.
> 
> Changes in v7:
>   - Rebased on top of current master
>   - Fixed hiomap test cases
>   - Few minor fixes/logging improvements
> 
> Changes in v6:
>   - Added support to get architected register data
>   - Added support for HIOMAP reset
>   - Added new patch to export OPAL boot entry address in device tree
>    (Needed for OPAL core)
>   - Added support to save/export crashing CPU details
>   - Few other minor fixes
> 
> Changes in v5:
>   - As Stewart suggested moved "dump" device tree node under /ibm,opal
>   - Updated OPAL API number to 170
>   - Added check before triggering MPIPL.
>     If MPIPL supported then it will trigger MPIPL, else it will call
>     normal reboot.
> 
> Changes in v4:
>    - Make sure crashing CPU will not go to stop state
>    - Send stash MPIPL chip op to all SBEs
>    - Minor prlog improvements
> 
> Changes in v3:
>   - Added documenation for new OPAL API and device tree binding
>   - Fixed MPIPL trigger path
>     I have hooked MPIPL trigger path to assert path. Now it will be
>     trigged on witherspoon only. We haven't tested on other BMC
>     platform. Once we test we will enable on other BMC system.
>   - Added MBOX reset support before triggering MPIPL
>   - Added support to detect `MPIPL support` system params
> 
> Changes in v2:
>   - Added support to get architected registers
>   - SBE guys changed MPIPL trigger interrupt bit. Now its S0 on both
>     master and slave chip SBE
>   - Fixed few other minor issues
> 
> 
> Vasant Hegde (22):
>   OPAL: Add OPAL boot entry address to device tree
>   FSP/MDST: Rename fsp-mdst-table.c -> fsp-sysdump.c
>   hdata: Fix MDST structure
>   hdata: Define various DUMP related tables
>   hdata: Setup memory for MDDT table
>   hdata: Setup memory for MDRT table
>   fadump: Define destination memory for OPAL boot crash
>   hdata: Add support for early OPAL crash
>   hdata: Adjust various structure offset after relocation
>   dump: Define fadump structure
>   hdata: Create /ibm,opal/dump device tree node
>   hdata: Add result table property to "dump" node
>   fadump: Add fw-source-table to ibm,opal/dump node
>   fadump: Add OPAL API to register for fadump
>   fadump: Introduce new reboot type
>   fadump: Send OPAL relocated base address to SBE
>   fadump: Add support to trigger memory preserving IPL on BMC system
>   fadump: Add documentation
>   mbox: Reset bmc mbox in MPIPL path
>   fadump: Hardcode architected register destination memory
>   hdata: Add architected register details to device tree
>   fadump: Save crashing cpu data
> 
>  core/Makefile.inc                          |   2 +-
>  core/flash.c                               |  13 +-
>  core/init.c                                |   6 +-
>  core/opal-dump.c                           | 385 +++++++++++++++++++++++++++++
>  core/opal.c                                |   4 +-
>  core/platform.c                            |   7 +-
>  doc/device-tree/ibm,opal/dump.rst          |  33 +++
>  doc/fadump.rst                             |  51 ++++
>  doc/index.rst                              |   1 +
>  doc/opal-api/opal-cec-reboot-6-116.rst     |   9 +
>  doc/opal-api/opal-fadump-manage-173.rst    |  73 ++++++
>  hdata/spira.c                              | 274 +++++++++++++++++++-
>  hdata/spira.h                              |  10 +-
>  hdata/test/hdata_to_dt.c                   |   1 +
>  hw/fsp/Makefile.inc                        |   2 +-
>  hw/fsp/{fsp-mdst-table.c => fsp-sysdump.c} |  18 +-
>  hw/ipmi/ipmi-attn.c                        |   9 +-
>  hw/sbe-p9.c                                | 121 +++++++++
>  include/fsp-mdst-table.h                   |  48 ----
>  include/mem-map.h                          |  67 ++++-
>  include/opal-api.h                         |  45 +++-
>  include/opal-dump.h                        | 138 +++++++++++
>  include/sbe-p9.h                           |  18 +-
>  include/skiboot.h                          |   3 +-
>  libflash/ipmi-hiomap.c                     |  31 ++-
>  libflash/ipmi-hiomap.h                     |   2 +-
>  libflash/mbox-flash.c                      |  28 ++-
>  libflash/mbox-flash.h                      |   2 +-
>  libflash/test/mbox-server.c                |   1 +
>  libflash/test/test-ipmi-hiomap.c           | 169 +++++++++++++
>  skiboot.lds.S                              |  12 +-
>  31 files changed, 1497 insertions(+), 86 deletions(-)
>  create mode 100644 core/opal-dump.c
>  create mode 100644 doc/device-tree/ibm,opal/dump.rst
>  create mode 100644 doc/fadump.rst
>  create mode 100644 doc/opal-api/opal-fadump-manage-173.rst
>  rename hw/fsp/{fsp-mdst-table.c => fsp-sysdump.c} (95%)
>  delete mode 100644 include/fsp-mdst-table.h
>  create mode 100644 include/opal-dump.h
>
Vasant Hegde May 13, 2019, 9:36 a.m. UTC | #2
On 05/09/2019 06:08 AM, Michael Neuling wrote:
> On Sat, 2019-04-13 at 14:45 +0530, Vasant Hegde wrote:
>> Memory Preserving Initial Program Load (MPIPL) is a Power feature where
>> the contents of memory are preserved while the system reboots after a
>> failure. This is accomplished by the firmware/OS publishing ranges of
>> memory to be preserved across boots.
>>
>> In the OPAL context, OPAL and host Linux communicate the memory ranges
>> to be preserved via source descriptor tables in the HDAT. This data is
>> updated at runtime via the SBE. When OPAL or Linux crashes, SBE gets to
>> know of the event via a special interrupt which causes it to trigger
>> the MPIPL.
>>
>> Hostboot then re-IPLs the machine taking care to copy over contents of
>> the source descriptor tables to a alternate memory locations and
>> publishes this information in the destination descriptor tables. The
>> success/failure of the copy is indicated by a results table.
> 
> How much coping are we talking about here? 

OPAL memory and part of kernel memory.

It will be copied to destination memory as specified by OPAL/kernel during 
registration.

 > Most of RAM? Where to?

Destination memory is RAM. During MPIPL hostboot will not clear (reinit) 
memories. So
content is safe.


> 
>> Hostboot also does the requisite procedures to gather hardware register
>> states for all active threads at the time of the crash.
>>
>> On an MPIPL boot, OPAL exports the results information to the host Linux
>> kernel, which then uses its existing logic (kdump/fadump) to write out a
>> core dump of OPAL and Linux kernel in a format that GDB and crash can
>> understand.
>>
>> design:
>>    - OPAL hardcodes few entries in MDST, MDDT table so that we can
>>      capture early opal crash
>>    - OPAL passes relocated base address to SBE via 'MPIPL stash chip-op'
>>      so that hostboot can use relocated address to get MDST, MDDT table.
>>    - OPAL passes dump details to host via device tree (/ibm,dump node)
>>    - OPAL provides new API to MPIPL registration/unregistration
>>    - Enhance reboot2 API to support MPIPL
>>    - Add SBE s0/s1 interrupt support to initiate MPIPL on BMC based
>>      system.
>>
>> Dependency:
>>    - This patchset has dependency on SBE and Hostboot changes. There are
>>      few Hostboot/SBE patches still in review state.
> 
> How do we determine if the hostboot we are running with supports this?

We have machine xml flag to say MPIPL is supported or not. It will come to OPAL 
via HDAT.

> 
>>    - We need Linux kernel changes to generate opal core.
>>      Hari will post Linux side patches.
> 
> Similarly here. How does Linux know skiboot/hostboot supports this?  If
> skiboot/hostboot doesn't support this, it'll need to go for a kdump alternative?

Correct. If MPIPL is supported we export device tree node (/ibm,opal/dump).
Kernel will check for this node, and enables appropriate dump mechanism.

-Vasant