mbox series

[linux-next,v3,00/14] Split crash out from kexec and clean up related config items

Message ID 20240124051254.67105-1-bhe@redhat.com (mailing list archive)
Headers show
Series Split crash out from kexec and clean up related config items | expand

Message

Baoquan He Jan. 24, 2024, 5:12 a.m. UTC
Motivation:
=============
Previously, LKP reported a building error. When investigating, it can't
be resolved reasonablly with the present messy kdump config items.

 https://lore.kernel.org/oe-kbuild-all/202312182200.Ka7MzifQ-lkp@intel.com/

The kdump (crash dumping) related config items could causes confusions:

Firstly,
---
CRASH_CORE enables codes including
 - crashkernel reservation;
 - elfcorehdr updating;
 - vmcoreinfo exporting;
 - crash hotplug handling;

Now fadump of powerpc, kcore dynamic debugging and kdump all selects
CRASH_CORE, while fadump
 - fadump needs crashkernel parsing, vmcoreinfo exporting, and accessing
   global variable 'elfcorehdr_addr';
 - kcore only needs vmcoreinfo exporting;
 - kdump needs all of the current kernel/crash_core.c.

So only enabling PROC_CORE or FA_DUMP will enable CRASH_CORE, this
mislead people that we enable crash dumping, actual it's not.

Secondly,
---
It's not reasonable to allow KEXEC_CORE select CRASH_CORE.

Because KEXEC_CORE enables codes which allocate control pages, copy
kexec/kdump segments, and prepare for switching. These codes are
shared by both kexec reboot and kdump. We could want kexec reboot,
but disable kdump. In that case, CRASH_CORE should not be selected.

 --------------------
 CONFIG_CRASH_CORE=y
 CONFIG_KEXEC_CORE=y
 CONFIG_KEXEC=y
 CONFIG_KEXEC_FILE=y
 ---------------------

Thirdly,
---
It's not reasonable to allow CRASH_DUMP select KEXEC_CORE.

That could make KEXEC_CORE, CRASH_DUMP are enabled independently from
KEXEC or KEXEC_FILE. However, w/o KEXEC or KEXEC_FILE, the KEXEC_CORE
code built in doesn't make any sense because no kernel loading or
switching will happen to utilize the KEXEC_CORE code.
 ---------------------
 CONFIG_CRASH_CORE=y
 CONFIG_KEXEC_CORE=y
 CONFIG_CRASH_DUMP=y
 ---------------------

In this case, what is worse, on arch sh and arm, KEXEC relies on MMU,
while CRASH_DUMP can still be enabled when !MMU, then compiling error is
seen as the lkp test robot reported in above link.

 ------arch/sh/Kconfig------
 config ARCH_SUPPORTS_KEXEC
         def_bool MMU

 config ARCH_SUPPORTS_CRASH_DUMP
         def_bool BROKEN_ON_SMP
 ---------------------------

Changes:
===========
1, split out crash_reserve.c from crash_core.c;
2, split out vmcore_infoc. from crash_core.c;
3, move crash related codes in kexec_core.c into crash_core.c;
4, remove dependency of FA_DUMP on CRASH_DUMP;
5, clean up kdump related config items;
6, wrap up crash codes in crash related ifdefs on all 8 arch-es
   which support crash dumping, except of ppc;

Achievement:
===========
With above changes, I can rearrange the config item logic as below (the right
item depends on or is selected by the left item):

    PROC_KCORE -----------> VMCORE_INFO

               |----------> VMCORE_INFO
    FA_DUMP----|
               |----------> CRASH_RESERVE

                                                    ---->VMCORE_INFO
                                                   /
                                                   |---->CRASH_RESERVE
    KEXEC      --|                                /|
                 |--> KEXEC_CORE--> CRASH_DUMP-->/-|---->PROC_VMCORE
    KEXEC_FILE --|                               \ |
                                                   \---->CRASH_HOTPLUG


    KEXEC      --|
                 |--> KEXEC_CORE (for kexec reboot only)
    KEXEC_FILE --|

Test
========
On all 8 architectures, including x86_64, arm64, s390x, sh, arm, mips,
riscv, loongarch, I did below three cases of config item setting and
building all passed. Take configs on x86_64 as exampmle here:

(1) Both CONFIG_KEXEC and KEXEC_FILE is unset, then all kexec/kdump
items are unset automatically:
# Kexec and crash features
# CONFIG_KEXEC is not set
# CONFIG_KEXEC_FILE is not set
# end of Kexec and crash features

(2) set CONFIG_KEXEC_FILE and 'make olddefconfig':
---------------
# Kexec and crash features
CONFIG_CRASH_RESERVE=y
CONFIG_VMCORE_INFO=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
CONFIG_CRASH_DUMP=y
CONFIG_CRASH_HOTPLUG=y
CONFIG_CRASH_MAX_MEMORY_RANGES=8192
# end of Kexec and crash features
---------------

(3) unset CONFIG_CRASH_DUMP in case 2 and execute 'make olddefconfig':
------------------------
# Kexec and crash features
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
# end of Kexec and crash features
------------------------

Note:
For ppc, it needs investigation to make clear how to split out crash
code in arch folder. Hope Hari and Pingfan can help have a look, see if
it's doable. Now, I make it either have both kexec and crash enabled, or
disable both of them altogether.

Changelog
==========
v2->v3:
- In patch 2, there's conflict when rebasing to linux-next in
  kernel/crash_core.c because of below commits from Uladzislau:
  - commit 699d9351822e ("mm: vmalloc: Fix a warning in the crash_save_vmcoreinfo_init()")
  - commit 5f4c0c1e2a51 (:mm/vmalloc: remove vmap_area_list")
- In patch 13, fix the lkp reported issue by using CONFIG_CRASH_RESERVE
  ifdef, giving up the earlier IS_ENABLED(CONFIG_CRASH_RESERVE) checking in v2. 
- In patch 14, update code change after below commit merged into
  mainline:
  - commit 78de91b45860 ("LoongArch: Use generic interface to support crashkernel=X,[high,low]")

Baoquan He (14):
  kexec: split crashkernel reservation code out from crash_core.c
  crash: split vmcoreinfo exporting code out from crash_core.c
  crash: remove dependency of FA_DUMP on CRASH_DUMP
  crash: split crash dumping code out from kexec_core.c
  crash: clean up kdump related config items
  x86, crash: wrap crash dumping code into crash related ifdefs
  arm64, crash: wrap crash dumping code into crash related ifdefs
  ppc, crash: enforce KEXEC and KEXEC_FILE to select CRASH_DUMP
  s390, crash: wrap crash dumping code into crash related ifdefs
  sh, crash: wrap crash dumping code into crash related ifdefs
  mips, crash: wrap crash dumping code into crash related ifdefs
  riscv, crash: wrap crash dumping code into crash related ifdefs
  arm, crash: wrap crash dumping code into crash related ifdefs
  loongarch, crash: wrap crash dumping code into crash related ifdefs

 arch/arm/kernel/setup.c                       |   4 +-
 arch/arm64/Kconfig                            |   2 +-
 .../asm/{crash_core.h => crash_reserve.h}     |   4 +-
 arch/arm64/include/asm/kexec.h                |   2 +-
 arch/arm64/kernel/Makefile                    |   2 +-
 arch/arm64/kernel/machine_kexec.c             |   2 +-
 arch/arm64/kernel/machine_kexec_file.c        |  10 +-
 .../kernel/{crash_core.c => vmcore_info.c}    |   2 +-
 arch/arm64/mm/init.c                          |   2 +-
 arch/loongarch/kernel/setup.c                 |   2 +-
 arch/mips/kernel/setup.c                      |  17 +-
 arch/powerpc/Kconfig                          |   9 +-
 arch/powerpc/kernel/setup-common.c            |   2 +-
 arch/powerpc/mm/nohash/kaslr_booke.c          |   4 +-
 arch/powerpc/platforms/powernv/opal-core.c    |   2 +-
 arch/riscv/Kconfig                            |   2 +-
 .../asm/{crash_core.h => crash_reserve.h}     |   4 +-
 arch/riscv/kernel/Makefile                    |   2 +-
 arch/riscv/kernel/elf_kexec.c                 |   9 +-
 .../kernel/{crash_core.c => vmcore_info.c}    |   2 +-
 arch/riscv/mm/init.c                          |   2 +-
 arch/s390/kernel/kexec_elf.c                  |   2 +
 arch/s390/kernel/kexec_image.c                |   2 +
 arch/s390/kernel/machine_kexec_file.c         |  10 +
 arch/sh/kernel/machine_kexec.c                |   3 +
 arch/sh/kernel/setup.c                        |   2 +-
 arch/x86/Kconfig                              |   2 +-
 .../asm/{crash_core.h => crash_reserve.h}     |   6 +-
 arch/x86/kernel/Makefile                      |   6 +-
 arch/x86/kernel/cpu/mshyperv.c                |   4 +
 arch/x86/kernel/kexec-bzimage64.c             |   4 +
 arch/x86/kernel/kvm.c                         |   4 +-
 arch/x86/kernel/machine_kexec_64.c            |   3 +
 arch/x86/kernel/reboot.c                      |   2 +-
 arch/x86/kernel/setup.c                       |   2 +-
 arch/x86/kernel/smp.c                         |   2 +-
 .../{crash_core_32.c => vmcore_info_32.c}     |   2 +-
 .../{crash_core_64.c => vmcore_info_64.c}     |   2 +-
 arch/x86/xen/enlighten_hvm.c                  |   4 +
 drivers/base/cpu.c                            |   6 +-
 drivers/firmware/qemu_fw_cfg.c                |  14 +-
 fs/proc/Kconfig                               |   2 +-
 fs/proc/kcore.c                               |   2 +-
 include/linux/buildid.h                       |   2 +-
 include/linux/crash_core.h                    | 152 ++--
 include/linux/crash_reserve.h                 |  48 ++
 include/linux/kexec.h                         |  47 +-
 include/linux/vmcore_info.h                   |  81 ++
 init/initramfs.c                              |   2 +-
 kernel/Kconfig.kexec                          |  12 +-
 kernel/Makefile                               |   5 +-
 kernel/crash_core.c                           | 762 +++++-------------
 kernel/crash_reserve.c                        | 464 +++++++++++
 kernel/{crash_dump.c => elfcorehdr.c}         |   0
 kernel/kexec.c                                |  11 +-
 kernel/kexec_core.c                           | 250 +-----
 kernel/kexec_file.c                           |  13 +-
 kernel/kexec_internal.h                       |   2 +
 kernel/ksysfs.c                               |  10 +-
 kernel/printk/printk.c                        |   4 +-
 kernel/vmcore_info.c                          | 231 ++++++
 lib/buildid.c                                 |   2 +-
 62 files changed, 1228 insertions(+), 1043 deletions(-)
 rename arch/arm64/include/asm/{crash_core.h => crash_reserve.h} (81%)
 rename arch/arm64/kernel/{crash_core.c => vmcore_info.c} (97%)
 rename arch/riscv/include/asm/{crash_core.h => crash_reserve.h} (78%)
 rename arch/riscv/kernel/{crash_core.c => vmcore_info.c} (96%)
 rename arch/x86/include/asm/{crash_core.h => crash_reserve.h} (92%)
 rename arch/x86/kernel/{crash_core_32.c => vmcore_info_32.c} (90%)
 rename arch/x86/kernel/{crash_core_64.c => vmcore_info_64.c} (94%)
 create mode 100644 include/linux/crash_reserve.h
 create mode 100644 include/linux/vmcore_info.h
 create mode 100644 kernel/crash_reserve.c
 rename kernel/{crash_dump.c => elfcorehdr.c} (100%)
 create mode 100644 kernel/vmcore_info.c

Comments

Nathan Chancellor Jan. 26, 2024, 4:55 a.m. UTC | #1
Hi Baoquan,

On Wed, Jan 24, 2024 at 01:12:40PM +0800, Baoquan He wrote:
> Motivation:
> =============
> Previously, LKP reported a building error. When investigating, it can't
> be resolved reasonablly with the present messy kdump config items.
> 
>  https://lore.kernel.org/oe-kbuild-all/202312182200.Ka7MzifQ-lkp@intel.com/
> 
> The kdump (crash dumping) related config items could causes confusions:
> 
> Firstly,
> ---
> CRASH_CORE enables codes including
>  - crashkernel reservation;
>  - elfcorehdr updating;
>  - vmcoreinfo exporting;
>  - crash hotplug handling;
> 
> Now fadump of powerpc, kcore dynamic debugging and kdump all selects
> CRASH_CORE, while fadump
>  - fadump needs crashkernel parsing, vmcoreinfo exporting, and accessing
>    global variable 'elfcorehdr_addr';
>  - kcore only needs vmcoreinfo exporting;
>  - kdump needs all of the current kernel/crash_core.c.
> 
> So only enabling PROC_CORE or FA_DUMP will enable CRASH_CORE, this
> mislead people that we enable crash dumping, actual it's not.
> 
> Secondly,
> ---
> It's not reasonable to allow KEXEC_CORE select CRASH_CORE.
> 
> Because KEXEC_CORE enables codes which allocate control pages, copy
> kexec/kdump segments, and prepare for switching. These codes are
> shared by both kexec reboot and kdump. We could want kexec reboot,
> but disable kdump. In that case, CRASH_CORE should not be selected.
> 
>  --------------------
>  CONFIG_CRASH_CORE=y
>  CONFIG_KEXEC_CORE=y
>  CONFIG_KEXEC=y
>  CONFIG_KEXEC_FILE=y
>  ---------------------
> 
> Thirdly,
> ---
> It's not reasonable to allow CRASH_DUMP select KEXEC_CORE.
> 
> That could make KEXEC_CORE, CRASH_DUMP are enabled independently from
> KEXEC or KEXEC_FILE. However, w/o KEXEC or KEXEC_FILE, the KEXEC_CORE
> code built in doesn't make any sense because no kernel loading or
> switching will happen to utilize the KEXEC_CORE code.
>  ---------------------
>  CONFIG_CRASH_CORE=y
>  CONFIG_KEXEC_CORE=y
>  CONFIG_CRASH_DUMP=y
>  ---------------------
> 
> In this case, what is worse, on arch sh and arm, KEXEC relies on MMU,
> while CRASH_DUMP can still be enabled when !MMU, then compiling error is
> seen as the lkp test robot reported in above link.
> 
>  ------arch/sh/Kconfig------
>  config ARCH_SUPPORTS_KEXEC
>          def_bool MMU
> 
>  config ARCH_SUPPORTS_CRASH_DUMP
>          def_bool BROKEN_ON_SMP
>  ---------------------------
> 
> Changes:
> ===========
> 1, split out crash_reserve.c from crash_core.c;
> 2, split out vmcore_infoc. from crash_core.c;
> 3, move crash related codes in kexec_core.c into crash_core.c;
> 4, remove dependency of FA_DUMP on CRASH_DUMP;
> 5, clean up kdump related config items;
> 6, wrap up crash codes in crash related ifdefs on all 8 arch-es
>    which support crash dumping, except of ppc;
> 
> Achievement:
> ===========
> With above changes, I can rearrange the config item logic as below (the right
> item depends on or is selected by the left item):
> 
>     PROC_KCORE -----------> VMCORE_INFO
> 
>                |----------> VMCORE_INFO
>     FA_DUMP----|
>                |----------> CRASH_RESERVE
> 
>                                                     ---->VMCORE_INFO
>                                                    /
>                                                    |---->CRASH_RESERVE
>     KEXEC      --|                                /|
>                  |--> KEXEC_CORE--> CRASH_DUMP-->/-|---->PROC_VMCORE
>     KEXEC_FILE --|                               \ |
>                                                    \---->CRASH_HOTPLUG
> 
> 
>     KEXEC      --|
>                  |--> KEXEC_CORE (for kexec reboot only)
>     KEXEC_FILE --|
> 
> Test
> ========
> On all 8 architectures, including x86_64, arm64, s390x, sh, arm, mips,
> riscv, loongarch, I did below three cases of config item setting and
> building all passed. Take configs on x86_64 as exampmle here:
> 
> (1) Both CONFIG_KEXEC and KEXEC_FILE is unset, then all kexec/kdump
> items are unset automatically:
> # Kexec and crash features
> # CONFIG_KEXEC is not set
> # CONFIG_KEXEC_FILE is not set
> # end of Kexec and crash features
> 
> (2) set CONFIG_KEXEC_FILE and 'make olddefconfig':
> ---------------
> # Kexec and crash features
> CONFIG_CRASH_RESERVE=y
> CONFIG_VMCORE_INFO=y
> CONFIG_KEXEC_CORE=y
> CONFIG_KEXEC_FILE=y
> CONFIG_CRASH_DUMP=y
> CONFIG_CRASH_HOTPLUG=y
> CONFIG_CRASH_MAX_MEMORY_RANGES=8192
> # end of Kexec and crash features
> ---------------
> 
> (3) unset CONFIG_CRASH_DUMP in case 2 and execute 'make olddefconfig':
> ------------------------
> # Kexec and crash features
> CONFIG_KEXEC_CORE=y
> CONFIG_KEXEC_FILE=y
> # end of Kexec and crash features
> ------------------------
> 
> Note:
> For ppc, it needs investigation to make clear how to split out crash
> code in arch folder. Hope Hari and Pingfan can help have a look, see if
> it's doable. Now, I make it either have both kexec and crash enabled, or
> disable both of them altogether.

I am seeing a few build failures in my test matrix on next-20240125 that
appear to be caused by this series although I have not bisected. Some
reproduction steps:

$ curl -LSso .config https://git.alpinelinux.org/aports/plain/community/linux-edge/config-edge.armv7
$ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- olddefconfig all
...
arm-linux-gnueabi-ld: arch/arm/kernel/machine_kexec.o: in function `arch_crash_save_vmcoreinfo':
machine_kexec.c:(.text+0x488): undefined reference to `vmcoreinfo_append_str'
...

$ curl -LSso .config https://github.com/archlinuxarm/PKGBUILDs/raw/master/core/linux-aarch64/config
$ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux- olddefconfig all
...
aarch64-linux-ld: kernel/kexec_file.o: in function `kexec_walk_memblock.constprop.0':
kexec_file.c:(.text+0x314): undefined reference to `crashk_res'
aarch64-linux-ld: kernel/kexec_file.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `crashk_res' which may bind externally can not be used when making a shared object; recompile with -fPIC
kexec_file.c:(.text+0x314): dangerous relocation: unsupported relocation
aarch64-linux-ld: kexec_file.c:(.text+0x318): undefined reference to `crashk_res'
aarch64-linux-ld: drivers/of/kexec.o: in function `of_kexec_alloc_and_setup_fdt':
kexec.c:(.text+0x580): undefined reference to `crashk_res'
aarch64-linux-ld: drivers/of/kexec.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `crashk_res' which may bind externally can not be used when making a shared object; recompile with -fPIC
kexec.c:(.text+0x580): dangerous relocation: unsupported relocation
aarch64-linux-ld: kexec.c:(.text+0x584): undefined reference to `crashk_res'
aarch64-linux-ld: kexec.c:(.text+0x590): undefined reference to `crashk_res'
aarch64-linux-ld: kexec.c:(.text+0x5b0): undefined reference to `crashk_low_res'
aarch64-linux-ld: drivers/of/kexec.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `crashk_low_res' which may bind externally can not be used when making a shared object; recompile with -fPIC
kexec.c:(.text+0x5b0): dangerous relocation: unsupported relocation
aarch64-linux-ld: kexec.c:(.text+0x5b4): undefined reference to `crashk_low_res'
aarch64-linux-ld: kexec.c:(.text+0x5c0): undefined reference to `crashk_low_res'
...

$ curl -LSso .config https://git.alpinelinux.org/aports/plain/community/linux-edge/config-edge.x86_64
$ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig all
...
x86_64-linux-ld: arch/x86/xen/mmu_pv.o: in function `paddr_vmcoreinfo_note':
mmu_pv.c:(.text+0x3af3): undefined reference to `vmcoreinfo_note'
...

Cheers,
Nathan
Baoquan He Jan. 26, 2024, 6:07 a.m. UTC | #2
On 01/25/24 at 09:55pm, Nathan Chancellor wrote:
...... 
> I am seeing a few build failures in my test matrix on next-20240125 that
> appear to be caused by this series although I have not bisected. Some
> reproduction steps:

Thanks for trying this, I have reproduced the linking failure on arm,
will work out a way to fix it.

It's weird, I remember I have built these and passed.

> 
> $ curl -LSso .config https://git.alpinelinux.org/aports/plain/community/linux-edge/config-edge.armv7
> $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- olddefconfig all
> ...
> arm-linux-gnueabi-ld: arch/arm/kernel/machine_kexec.o: in function `arch_crash_save_vmcoreinfo':
> machine_kexec.c:(.text+0x488): undefined reference to `vmcoreinfo_append_str'
> ...
> 
> $ curl -LSso .config https://github.com/archlinuxarm/PKGBUILDs/raw/master/core/linux-aarch64/config
> $ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux- olddefconfig all
> ...
> aarch64-linux-ld: kernel/kexec_file.o: in function `kexec_walk_memblock.constprop.0':
> kexec_file.c:(.text+0x314): undefined reference to `crashk_res'
> aarch64-linux-ld: kernel/kexec_file.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `crashk_res' which may bind externally can not be used when making a shared object; recompile with -fPIC
> kexec_file.c:(.text+0x314): dangerous relocation: unsupported relocation
> aarch64-linux-ld: kexec_file.c:(.text+0x318): undefined reference to `crashk_res'
> aarch64-linux-ld: drivers/of/kexec.o: in function `of_kexec_alloc_and_setup_fdt':
> kexec.c:(.text+0x580): undefined reference to `crashk_res'
> aarch64-linux-ld: drivers/of/kexec.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `crashk_res' which may bind externally can not be used when making a shared object; recompile with -fPIC
> kexec.c:(.text+0x580): dangerous relocation: unsupported relocation
> aarch64-linux-ld: kexec.c:(.text+0x584): undefined reference to `crashk_res'
> aarch64-linux-ld: kexec.c:(.text+0x590): undefined reference to `crashk_res'
> aarch64-linux-ld: kexec.c:(.text+0x5b0): undefined reference to `crashk_low_res'
> aarch64-linux-ld: drivers/of/kexec.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `crashk_low_res' which may bind externally can not be used when making a shared object; recompile with -fPIC
> kexec.c:(.text+0x5b0): dangerous relocation: unsupported relocation
> aarch64-linux-ld: kexec.c:(.text+0x5b4): undefined reference to `crashk_low_res'
> aarch64-linux-ld: kexec.c:(.text+0x5c0): undefined reference to `crashk_low_res'
> ...
> 
> $ curl -LSso .config https://git.alpinelinux.org/aports/plain/community/linux-edge/config-edge.x86_64
> $ make -skj"$(nproc)" ARCH=x86_64 CROSS_COMPILE=x86_64-linux- olddefconfig all
> ...
> x86_64-linux-ld: arch/x86/xen/mmu_pv.o: in function `paddr_vmcoreinfo_note':
> mmu_pv.c:(.text+0x3af3): undefined reference to `vmcoreinfo_note'
> ...
> 
> Cheers,
> Nathan
>