mbox series

[v9,00/25] MPIPL support

Message ID 20190712111802.23560-1-hegdevasant@linux.vnet.ibm.com
Headers show
Series MPIPL support | expand

Message

Vasant Hegde July 12, 2019, 11:17 a.m. UTC
Memory Preserving Initial Program Load (MPIPL) is a Power feature where
the contents of memory are preserved while the system reboots after a
failure. This is accomplished by the firmware/OS publishing ranges of
memory to be preserved across boots.

In the OPAL context, OPAL and host Linux communicate the memory ranges
to be preserved via source descriptor tables in the HDAT. OPAL and Linux
can update these tables during runtime. OPAL sends relocated OPAL base
address to SBE. When OPAL or Linux crashes, SBE gets to know of the
event via a special interrupt which causes it ot trigger the MPIPL.

SBE then collects archicted register data and loads Hostboot. Hostboot
then re-IPLs the machine taking care to copy over contents of the source
descriptor tables to a alternate memory locations and publishes this
information in the destination descriptor tables. The success/failure
of the copy is indicated by a results table. Hostboot also copies
architected register states to OPAL passed memory.

On an MPIPL boot, OPAL creates new device tree propety to indicate its
MPIPL boot (/ibm,opal/dump/mpipl-boot). Linux makes MPIPL API call to
get metadata pointers. Kernel uses metadata information to create
vmcore and opalcore.

Flow:
  - Hostboot relies on MDST, MDDT, MDRT ntuple in HDAT for MPIPL.
  - During boot/runtime, OPAL will update MDST and MDDT table.
  - Kernel will create metadata area which contain source, destination
    address, size etc.
  - Kernel will use MPIPL API for registration
    - It will pass src, dest, size to OPAL
  - Kernel will use MPIPL register tag API to send tags that needs to be
    preserved across MPIPL. Post MPIPL, kernel will call query tag API
    to get back preserved tags.
  - Kernel -OR- OPAL will request for MPIPL.
     - On FSP system OPAL will trigger attn intruction
     - On BMC system OPAL will trigger SBE S0 interrupt
  - SBE quiesce the system and collect CPU register state of running
    threads.
  - SBE -> hostboot -> memory preserved + CPU data copied to OPAL reserved
    memory -> load OPAL
  - OPAL validates DUMP result table and adds `mpipl-boot` device tree property
  - Kernel detects its MPIPL boot.
  - Kernel will use MPIPL query tag API to retrieve metadata tags.
  - Kenel will create `vmcore` and `opalcore`
  - Use existing crash tool to debug `vmcore` and gdb to debug `opalcore`

Dependency:
  - We need Linux kernel changes to generate opal core.
    Hari will post Linux side patches.

Impact on kernel:
  Upstream kernel has `fadump` (Firmware Assisted Dump) feature on PowerVM
  LPAR. FADump works on top of kdump and uses same vmcore format. From kernel
  point of view, this is extending FADump feature for OPAL based system.

User space:
  We are reusing existing kernel/user space infrastructure. Hence this
  feature is transparent to end user. User can use existing crash tool
  to debug `vmcore` and gdb to debug `opalcore`.

CPU register data collection:
  Before initiating crash, kernel will save running thread register
  content and initiates crash. Then control goes to SBE. SBE will quiesce
  the system and collect CPU register content for all applicable threads.
  Kernel will use these data to create vmcore.

  We had offline discussion with Nick. On of his suggestion was to use
  kernel SRESET IPI to collect secondary CPU register data. Technically
  it is possible to use SRESET, but that is still not completely
  water-right. We can switch to that down the line when SRESET works
  reliably and we find a way to collect secondary CPU data for OPAL
  dump.

TODO:
  - Capture OPAL crashing CPU information
    Current patchset relies on SBE to capture OPAL crashing CPU
    information. We may miss some of the important register
    information. In future we will enhance OPAL to collect crashing
    CPU details.
  - Capture VSX registers (As suggested by Mikey)


Testing:
  - We have tested this patchset with op-build and Hari's kernel
    patches. Its working fine. We are able to get vmcore and opalcore.

Changes in v9:
  - Adressed review comments (commit message improvement, removed few
			      redundant checks, etc).
  - Introduced new API for tag registration
  - Added new tag to send post MPIPL boot memory size

Changes in v8:
  - v7 got good review from various folks including Nick, Oliver,
    Mikey, etc. Thanks for all the reviews.
  - As suggested by Nick I have deferred early OPAL crash.
  - Completely reworked OPAL - Kernel interface
  - Removed OPAL_FADUMP_MANAGE API
  - Added new API for MPIPL Update (OPAL_MPIPL_UPDATE)
  - Added new API to retrieve query tags (OPAL_MPIPL_QUERY_TAG)
  - Added support to create OPAL metadata area and send metadata
    pointer to kernel.
  - Added explicit assert in few places - suggested by Mikey
  - Removed few redundant checks - suggested by Oliver

Changes in v7:
  - Rebased on top of current master
  - Fixed hiomap test cases
  - Few minor fixes/logging improvements

Changes in v6:
  - Added support to get architected register data
  - Added support for HIOMAP reset
  - Added new patch to export OPAL boot entry address in device tree
   (Needed for OPAL core)
  - Added support to save/export crashing CPU details
  - Few other minor fixes

Changes in v5:
  - As Stewart suggested moved "dump" device tree node under /ibm,opal
  - Updated OPAL API number to 170
  - Added check before triggering MPIPL.
    If MPIPL supported then it will trigger MPIPL, else it will call
    normal reboot.

Changes in v4:
   - Make sure crashing CPU will not go to stop state
   - Send stash MPIPL chip op to all SBEs
   - Minor prlog improvements

Changes in v3:
  - Added documenation for new OPAL API and device tree binding
  - Fixed MPIPL trigger path
    I have hooked MPIPL trigger path to assert path. Now it will be
    trigged on witherspoon only. We haven't tested on other BMC
    platform. Once we test we will enable on other BMC system.
  - Added MBOX reset support before triggering MPIPL
  - Added support to detect `MPIPL support` system params

Changes in v2:
  - Added support to get architected registers
  - SBE guys changed MPIPL trigger interrupt bit. Now its S0 on both
    master and slave chip SBE
  - Fixed few other minor issues


Vasant Hegde (25):
  OPAL: Add OPAL boot entry address to device tree
  FSP/MDST: Rename fsp-mdst-table.c -> fsp-sysdump.c
  hdata: Split MDST 'type' field to accommodate MPIPL
  hdata: Define various DUMP related structures
  mem-map: Setup memory for MDDT table
  mem-map: Setup memory for MDRT table
  hdata: Update spirah structure
  hdata: Adjust various structure offset after relocation
  hdata: Create /ibm,opal/dump device tree node
  MPIPL: Register for OPAL dump
  MPIPL: Define OPAL metadata area
  MPIPL: Add OPAL API to register for dump region
  MPIPL: Add OPAL API to register tags
  SBE: Send OPAL relocated base address to SBE
  MPIPL: Add support to trigger MPIPL on BMC system
  MPIPL: Save crashing PIR
  HIOMAP: Reset bmc mbox in MPIPL path
  platform: Introduce new reboot type
  hdata: Add "mpipl-boot" property to "dump" node
  MPIPL: Prepare OPAL data tag
  MPIPL: Add OPAL API to query saved tags
  MPIPL: Clear tags and metadata
  MPIPL: Reserve memory to capture architected registers data
  MPIPL: Prepare architected registers data tag
  MPIPL: Add documentation

 core/Makefile.inc                          |   2 +-
 core/flash.c                               |  14 +-
 core/init.c                                |   6 +-
 core/opal-dump.c                           | 550 +++++++++++++++++++++++++++++
 core/opal.c                                |   4 +-
 core/platform.c                            |   8 +-
 doc/device-tree/ibm,opal/dump.rst          |  37 ++
 doc/index.rst                              |   1 +
 doc/mpipl.rst                              |  48 +++
 doc/opal-api/index.rst                     |   6 +
 doc/opal-api/opal-cec-reboot-6-116.rst     |   9 +
 doc/opal-api/opal-mpipl-173-174.rst        | 156 ++++++++
 hdata/spira.c                              | 134 ++++++-
 hdata/spira.h                              |  10 +-
 hw/fsp/Makefile.inc                        |   2 +-
 hw/fsp/{fsp-mdst-table.c => fsp-sysdump.c} |  16 +-
 hw/ipmi/ipmi-attn.c                        |  16 +-
 hw/sbe-p9.c                                | 121 +++++++
 include/fsp-mdst-table.h                   |  48 ---
 include/mem-map.h                          |  38 +-
 include/opal-api.h                         |  44 ++-
 include/opal-dump.h                        | 141 ++++++++
 include/sbe-p9.h                           |  18 +-
 include/skiboot.h                          |   3 +-
 libflash/blocklevel.h                      |   1 +
 libflash/ipmi-hiomap.c                     |  32 +-
 libflash/ipmi-hiomap.h                     |   2 +-
 libflash/mbox-flash.c                      |  29 +-
 libflash/mbox-flash.h                      |   2 +-
 libflash/test/mbox-server.c                |   1 +
 libflash/test/test-ipmi-hiomap.c           | 169 +++++++++
 skiboot.lds.S                              |  12 +-
 32 files changed, 1591 insertions(+), 89 deletions(-)
 create mode 100644 core/opal-dump.c
 create mode 100644 doc/device-tree/ibm,opal/dump.rst
 create mode 100644 doc/mpipl.rst
 create mode 100644 doc/opal-api/opal-mpipl-173-174.rst
 rename hw/fsp/{fsp-mdst-table.c => fsp-sysdump.c} (96%)
 delete mode 100644 include/fsp-mdst-table.h
 create mode 100644 include/opal-dump.h