diff mbox series

[v15,5/5] powerpc: add crash memory hotplug support

Message ID 20240111105138.251366-6-sourabhjain@linux.ibm.com (mailing list archive)
State Changes Requested
Headers show
Series powerpc/crash: Kernel handling of CPU and memory hotplug | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.

Commit Message

Sourabh Jain Jan. 11, 2024, 10:51 a.m. UTC
Extend the arch crash hotplug handler, as introduced by the patch title
("powerpc: add crash CPU hotplug support"), to also support memory
add/remove events.

Elfcorehdr describes the memory of the crash kernel to capture the
kernel; hence, it needs to be updated if memory resources change due to
memory add/remove events. Therefore, arch_crash_handle_hotplug_event()
is updated to recreate the elfcorehdr and replace it with the previous
one on memory add/remove events.

The memblock list is used to prepare the elfcorehdr. In the case of
memory hot removal, the memblock list is updated after the arch crash
hotplug handler is triggered, as depicted in Figure 1. Thus, the
hot-removed memory is explicitly removed from the crash memory ranges
to ensure that the memory ranges added to elfcorehdr do not include the
hot-removed memory.

    Memory remove
          |
          v
    Offline pages
          |
          v
 Initiate memory notify call <----> crash hotplug handler
 chain for MEM_OFFLINE event
          |
          v
 Update memblock list

 	Figure 1

There are two system calls, `kexec_file_load` and `kexec_load`, used to
load the kdump image. A few changes have been made to ensure that the
kernel can safely update the elfcorehdr component of the kdump image for
both system calls.

For the kexec_file_load syscall, kdump image is prepared in the kernel.
To support an increasing number of memory regions, the elfcorehdr is
built with extra buffer space to ensure that it can accommodate
additional memory ranges in future.

For the kexec_load syscall, the elfcorehdr is updated only if the
KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the
kexec tool. Passing this flag to the kernel indicates that the
elfcorehdr is built to accommodate additional memory ranges and the
elfcorehdr segment is not considered for SHA calculation, making it safe
to update.

The changes related to this feature are kept under the CRASH_HOTPLUG
config, and it is enabled by default.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Akhil Raj <lf32.dev@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Laurent Dufour <laurent.dufour@fr.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Cc: x86@kernel.org
---
 arch/powerpc/include/asm/kexec.h        |   5 +-
 arch/powerpc/include/asm/kexec_ranges.h |   1 +
 arch/powerpc/kexec/core_64.c            | 107 +++++++++++++++++++++++-
 arch/powerpc/kexec/file_load_64.c       |  34 +++++++-
 arch/powerpc/kexec/ranges.c             |  85 +++++++++++++++++++
 5 files changed, 225 insertions(+), 7 deletions(-)

Comments

Hari Bathini Jan. 23, 2024, 10:22 a.m. UTC | #1
On 11/01/24 4:21 pm, Sourabh Jain wrote:
> Extend the arch crash hotplug handler, as introduced by the patch title
> ("powerpc: add crash CPU hotplug support"), to also support memory
> add/remove events.
> 
> Elfcorehdr describes the memory of the crash kernel to capture the
> kernel; hence, it needs to be updated if memory resources change due to
> memory add/remove events. Therefore, arch_crash_handle_hotplug_event()
> is updated to recreate the elfcorehdr and replace it with the previous
> one on memory add/remove events.
> 
> The memblock list is used to prepare the elfcorehdr. In the case of
> memory hot removal, the memblock list is updated after the arch crash
> hotplug handler is triggered, as depicted in Figure 1. Thus, the
> hot-removed memory is explicitly removed from the crash memory ranges
> to ensure that the memory ranges added to elfcorehdr do not include the
> hot-removed memory.
> 
>      Memory remove
>            |
>            v
>      Offline pages
>            |
>            v
>   Initiate memory notify call <----> crash hotplug handler
>   chain for MEM_OFFLINE event
>            |
>            v
>   Update memblock list
> 
>   	Figure 1
> 
> There are two system calls, `kexec_file_load` and `kexec_load`, used to
> load the kdump image. A few changes have been made to ensure that the
> kernel can safely update the elfcorehdr component of the kdump image for
> both system calls.
> 
> For the kexec_file_load syscall, kdump image is prepared in the kernel.
> To support an increasing number of memory regions, the elfcorehdr is
> built with extra buffer space to ensure that it can accommodate
> additional memory ranges in future.
> 
> For the kexec_load syscall, the elfcorehdr is updated only if the
> KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the
> kexec tool. Passing this flag to the kernel indicates that the
> elfcorehdr is built to accommodate additional memory ranges and the
> elfcorehdr segment is not considered for SHA calculation, making it safe
> to update.
> 
> The changes related to this feature are kept under the CRASH_HOTPLUG
> config, and it is enabled by default.
> 
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> Cc: Akhil Raj <lf32.dev@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Borislav Petkov (AMD) <bp@alien8.de>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Hari Bathini <hbathini@linux.ibm.com>
> Cc: Laurent Dufour <laurent.dufour@fr.ibm.com>
> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Mimi Zohar <zohar@linux.ibm.com>
> Cc: Naveen N Rao <naveen@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: kexec@lists.infradead.org
> Cc: x86@kernel.org
> ---
>   arch/powerpc/include/asm/kexec.h        |   5 +-
>   arch/powerpc/include/asm/kexec_ranges.h |   1 +
>   arch/powerpc/kexec/core_64.c            | 107 +++++++++++++++++++++++-
>   arch/powerpc/kexec/file_load_64.c       |  34 +++++++-
>   arch/powerpc/kexec/ranges.c             |  85 +++++++++++++++++++
>   5 files changed, 225 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
> index 943e58eb9bff..25ff5b7f1a28 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -116,8 +116,11 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges);
>   #ifdef CONFIG_CRASH_HOTPLUG
>   void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
> -#endif /*CONFIG_CRASH_HOTPLUG */
>   
> +unsigned int arch_crash_get_elfcorehdr_size(void);
> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
> +
> +#endif /*CONFIG_CRASH_HOTPLUG */
>   #endif /* CONFIG_PPC64 */
>   
>   #ifdef CONFIG_KEXEC_FILE
> diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h
> index f83866a19e87..802abf580cf0 100644
> --- a/arch/powerpc/include/asm/kexec_ranges.h
> +++ b/arch/powerpc/include/asm/kexec_ranges.h
> @@ -7,6 +7,7 @@
>   void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
>   struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
>   int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
> +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
>   int add_tce_mem_ranges(struct crash_mem **mem_ranges);
>   int add_initrd_mem_range(struct crash_mem **mem_ranges);
>   #ifdef CONFIG_PPC_64S_HASH_MMU
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 43fcd78c2102..4673f150f973 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -19,8 +19,11 @@
>   #include <linux/of.h>
>   #include <linux/libfdt.h>
>   #include <linux/memblock.h>
> +#include <linux/memory.h>
>   
>   #include <asm/page.h>
> +#include <asm/drmem.h>
> +#include <asm/mmzone.h>
>   #include <asm/current.h>
>   #include <asm/machdep.h>
>   #include <asm/cacheflush.h>
> @@ -546,6 +549,101 @@ int update_cpus_node(void *fdt)
>   #undef pr_fmt
>   #define pr_fmt(fmt) "crash hp: " fmt
>   
> +/*
> + * Advertise preferred elfcorehdr size to userspace via
> + * /sys/kernel/crash_elfcorehdr_size sysfs interface.
> + */
> +unsigned int arch_crash_get_elfcorehdr_size(void)
> +{
> +	unsigned int sz;
> +	unsigned long elf_phdr_cnt;
> +
> +	/* Program header for CPU notes and vmcoreinfo */
> +	elf_phdr_cnt = 2;
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> +		/* In the worst case, a Phdr is needed for every other LMB to be
> +		 * represented as an individual crash range.
> +		 */
> +		elf_phdr_cnt += memory_hotplug_max() / (2 * drmem_lmb_size());
> +
> +	/* Do not cross the max limit */
> +	if (elf_phdr_cnt > PN_XNUM)
> +		elf_phdr_cnt = PN_XNUM;
> +
> +	sz = sizeof(struct elfhdr) + (elf_phdr_cnt * sizeof(Elf64_Phdr));
> +	return sz;
> +}
> +
> +/**
> + * update_crash_elfcorehdr() - Recreate the elfcorehdr and replace it with old
> + *			       elfcorehdr in the kexec segment array.
> + * @image: the active struct kimage
> + * @mn: struct memory_notify data handler
> + */
> +static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify *mn)
> +{
> +	int ret;
> +	struct crash_mem *cmem = NULL;
> +	struct kexec_segment *ksegment;
> +	void *ptr, *mem, *elfbuf = NULL;
> +	unsigned long elfsz, memsz, base_addr, size;
> +
> +	ksegment = &image->segment[image->elfcorehdr_index];
> +	mem = (void *) ksegment->mem;
> +	memsz = ksegment->memsz;
> +
> +	ret = get_crash_memory_ranges(&cmem);
> +	if (ret) {
> +		pr_err("Failed to get crash mem range\n");
> +		return;
> +	}
> +
> +	/*
> +	 * The hot unplugged memory is part of crash memory ranges,
> +	 * remove it here.
> +	 */
> +	if (image->hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY) {
> +		base_addr = PFN_PHYS(mn->start_pfn);
> +		size = mn->nr_pages * PAGE_SIZE;
> +		ret = remove_mem_range(&cmem, base_addr, size);
> +		if (ret) {
> +			pr_err("Failed to remove hot-unplugged from crash memory ranges.\n");
> +			return;
> +		}
> +	}
> +
> +	ret = crash_prepare_elf64_headers(cmem, false, &elfbuf, &elfsz);
> +	if (ret) {
> +		pr_err("Failed to prepare elf header\n");
> +		return;
> +	}
> +



> +	/*
> +	 * It is unlikely that kernel hit this because elfcorehdr kexec
> +	 * segment (memsz) is built with addition space to accommodate growing
> +	 * number of crash memory ranges while loading the kdump kernel. It is
> +	 * Just to avoid any unforeseen case.

nitpick..

s/Just/just/

> +	 */
> +	if (elfsz > memsz) {
> +		pr_err("Updated crash elfcorehdr elfsz %lu > memsz %lu", elfsz, memsz);
> +		goto out;
> +	}
> +
> +	ptr = __va(mem);
> +	if (ptr) {
> +		/* Temporarily invalidate the crash image while it is replaced */
> +		xchg(&kexec_crash_image, NULL);
> +
> +		/* Replace the old elfcorehdr with newly prepared elfcorehdr */
> +		memcpy((void *)ptr, elfbuf, elfsz);
> +
> +		/* The crash image is now valid once again */
> +		xchg(&kexec_crash_image, image);
> +	}
> +out:
> +	vfree(elfbuf);
> +}
> +
>   /**
>    * arch_crash_handle_hotplug_event - Handle crash CPU/Memory hotplug events to update the
>    *				     necessary kexec segments based on the hotplug event.
> @@ -556,7 +654,7 @@ int update_cpus_node(void *fdt)
>    * CPU addition: Update the FDT segment to include the newly added CPU.
>    * CPU removal: No action is needed, with the assumption that it's okay to have offline CPUs
>    *		as part of the FDT.
> - * Memory addition/removal: No action is taken as this is not yet supported.
> + * Memory addition/removal: Recreate the elfcorehdr segment
>    */
>   void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
>   {
> @@ -570,7 +668,6 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
>   		return;
>   

>   	} else if (hp_action == KEXEC_CRASH_HP_ADD_CPU) {
> -

>   		void *fdt, *ptr;
>   		unsigned long mem;
>   		int i, fdt_index = -1;
> @@ -605,8 +702,10 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
>   

>   	} else if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY ||
>   		   hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {

> -		pr_info_once("Crash update is not supported for memory hotplug\n");
> -		return;
> +		struct memory_notify *mn;
> +
> +		mn = (struct memory_notify *)arg;
> +		update_crash_elfcorehdr(image, mn);
>   	}

Use switch case statement for hotplug actions for better readability..

Thanks
Hari
Sourabh Jain Jan. 29, 2024, 6:46 a.m. UTC | #2
On 23/01/24 15:52, Hari Bathini wrote:
>
>
> On 11/01/24 4:21 pm, Sourabh Jain wrote:
>> Extend the arch crash hotplug handler, as introduced by the patch title
>> ("powerpc: add crash CPU hotplug support"), to also support memory
>> add/remove events.
>>
>> Elfcorehdr describes the memory of the crash kernel to capture the
>> kernel; hence, it needs to be updated if memory resources change due to
>> memory add/remove events. Therefore, arch_crash_handle_hotplug_event()
>> is updated to recreate the elfcorehdr and replace it with the previous
>> one on memory add/remove events.
>>
>> The memblock list is used to prepare the elfcorehdr. In the case of
>> memory hot removal, the memblock list is updated after the arch crash
>> hotplug handler is triggered, as depicted in Figure 1. Thus, the
>> hot-removed memory is explicitly removed from the crash memory ranges
>> to ensure that the memory ranges added to elfcorehdr do not include the
>> hot-removed memory.
>>
>>      Memory remove
>>            |
>>            v
>>      Offline pages
>>            |
>>            v
>>   Initiate memory notify call <----> crash hotplug handler
>>   chain for MEM_OFFLINE event
>>            |
>>            v
>>   Update memblock list
>>
>>       Figure 1
>>
>> There are two system calls, `kexec_file_load` and `kexec_load`, used to
>> load the kdump image. A few changes have been made to ensure that the
>> kernel can safely update the elfcorehdr component of the kdump image for
>> both system calls.
>>
>> For the kexec_file_load syscall, kdump image is prepared in the kernel.
>> To support an increasing number of memory regions, the elfcorehdr is
>> built with extra buffer space to ensure that it can accommodate
>> additional memory ranges in future.
>>
>> For the kexec_load syscall, the elfcorehdr is updated only if the
>> KEXEC_CRASH_HOTPLUG_SUPPORT kexec flag is passed to the kernel by the
>> kexec tool. Passing this flag to the kernel indicates that the
>> elfcorehdr is built to accommodate additional memory ranges and the
>> elfcorehdr segment is not considered for SHA calculation, making it safe
>> to update.
>>
>> The changes related to this feature are kept under the CRASH_HOTPLUG
>> config, and it is enabled by default.
>>
>> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>> Cc: Akhil Raj <lf32.dev@gmail.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
>> Cc: Baoquan He <bhe@redhat.com>
>> Cc: Borislav Petkov (AMD) <bp@alien8.de>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Dave Young <dyoung@redhat.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: Hari Bathini <hbathini@linux.ibm.com>
>> Cc: Laurent Dufour <laurent.dufour@fr.ibm.com>
>> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Mimi Zohar <zohar@linux.ibm.com>
>> Cc: Naveen N Rao <naveen@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Valentin Schneider <vschneid@redhat.com>
>> Cc: Vivek Goyal <vgoyal@redhat.com>
>> Cc: kexec@lists.infradead.org
>> Cc: x86@kernel.org
>> ---
>>   arch/powerpc/include/asm/kexec.h        |   5 +-
>>   arch/powerpc/include/asm/kexec_ranges.h |   1 +
>>   arch/powerpc/kexec/core_64.c            | 107 +++++++++++++++++++++++-
>>   arch/powerpc/kexec/file_load_64.c       |  34 +++++++-
>>   arch/powerpc/kexec/ranges.c             |  85 +++++++++++++++++++
>>   5 files changed, 225 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kexec.h 
>> b/arch/powerpc/include/asm/kexec.h
>> index 943e58eb9bff..25ff5b7f1a28 100644
>> --- a/arch/powerpc/include/asm/kexec.h
>> +++ b/arch/powerpc/include/asm/kexec.h
>> @@ -116,8 +116,11 @@ int get_crash_memory_ranges(struct crash_mem 
>> **mem_ranges);
>>   #ifdef CONFIG_CRASH_HOTPLUG
>>   void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
>>   #define arch_crash_handle_hotplug_event 
>> arch_crash_handle_hotplug_event
>> -#endif /*CONFIG_CRASH_HOTPLUG */
>>   +unsigned int arch_crash_get_elfcorehdr_size(void);
>> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>> +
>> +#endif /*CONFIG_CRASH_HOTPLUG */
>>   #endif /* CONFIG_PPC64 */
>>     #ifdef CONFIG_KEXEC_FILE
>> diff --git a/arch/powerpc/include/asm/kexec_ranges.h 
>> b/arch/powerpc/include/asm/kexec_ranges.h
>> index f83866a19e87..802abf580cf0 100644
>> --- a/arch/powerpc/include/asm/kexec_ranges.h
>> +++ b/arch/powerpc/include/asm/kexec_ranges.h
>> @@ -7,6 +7,7 @@
>>   void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
>>   struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
>>   int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
>> +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 
>> size);
>>   int add_tce_mem_ranges(struct crash_mem **mem_ranges);
>>   int add_initrd_mem_range(struct crash_mem **mem_ranges);
>>   #ifdef CONFIG_PPC_64S_HASH_MMU
>> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
>> index 43fcd78c2102..4673f150f973 100644
>> --- a/arch/powerpc/kexec/core_64.c
>> +++ b/arch/powerpc/kexec/core_64.c
>> @@ -19,8 +19,11 @@
>>   #include <linux/of.h>
>>   #include <linux/libfdt.h>
>>   #include <linux/memblock.h>
>> +#include <linux/memory.h>
>>     #include <asm/page.h>
>> +#include <asm/drmem.h>
>> +#include <asm/mmzone.h>
>>   #include <asm/current.h>
>>   #include <asm/machdep.h>
>>   #include <asm/cacheflush.h>
>> @@ -546,6 +549,101 @@ int update_cpus_node(void *fdt)
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "crash hp: " fmt
>>   +/*
>> + * Advertise preferred elfcorehdr size to userspace via
>> + * /sys/kernel/crash_elfcorehdr_size sysfs interface.
>> + */
>> +unsigned int arch_crash_get_elfcorehdr_size(void)
>> +{
>> +    unsigned int sz;
>> +    unsigned long elf_phdr_cnt;
>> +
>> +    /* Program header for CPU notes and vmcoreinfo */
>> +    elf_phdr_cnt = 2;
>> +    if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
>> +        /* In the worst case, a Phdr is needed for every other LMB 
>> to be
>> +         * represented as an individual crash range.
>> +         */
>> +        elf_phdr_cnt += memory_hotplug_max() / (2 * drmem_lmb_size());
>> +
>> +    /* Do not cross the max limit */
>> +    if (elf_phdr_cnt > PN_XNUM)
>> +        elf_phdr_cnt = PN_XNUM;
>> +
>> +    sz = sizeof(struct elfhdr) + (elf_phdr_cnt * sizeof(Elf64_Phdr));
>> +    return sz;
>> +}
>> +
>> +/**
>> + * update_crash_elfcorehdr() - Recreate the elfcorehdr and replace 
>> it with old
>> + *                   elfcorehdr in the kexec segment array.
>> + * @image: the active struct kimage
>> + * @mn: struct memory_notify data handler
>> + */
>> +static void update_crash_elfcorehdr(struct kimage *image, struct 
>> memory_notify *mn)
>> +{
>> +    int ret;
>> +    struct crash_mem *cmem = NULL;
>> +    struct kexec_segment *ksegment;
>> +    void *ptr, *mem, *elfbuf = NULL;
>> +    unsigned long elfsz, memsz, base_addr, size;
>> +
>> +    ksegment = &image->segment[image->elfcorehdr_index];
>> +    mem = (void *) ksegment->mem;
>> +    memsz = ksegment->memsz;
>> +
>> +    ret = get_crash_memory_ranges(&cmem);
>> +    if (ret) {
>> +        pr_err("Failed to get crash mem range\n");
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * The hot unplugged memory is part of crash memory ranges,
>> +     * remove it here.
>> +     */
>> +    if (image->hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY) {
>> +        base_addr = PFN_PHYS(mn->start_pfn);
>> +        size = mn->nr_pages * PAGE_SIZE;
>> +        ret = remove_mem_range(&cmem, base_addr, size);
>> +        if (ret) {
>> +            pr_err("Failed to remove hot-unplugged from crash memory 
>> ranges.\n");
>> +            return;
>> +        }
>> +    }
>> +
>> +    ret = crash_prepare_elf64_headers(cmem, false, &elfbuf, &elfsz);
>> +    if (ret) {
>> +        pr_err("Failed to prepare elf header\n");
>> +        return;
>> +    }
>> +
>
>
>
>> +    /*
>> +     * It is unlikely that kernel hit this because elfcorehdr kexec
>> +     * segment (memsz) is built with addition space to accommodate 
>> growing
>> +     * number of crash memory ranges while loading the kdump kernel. 
>> It is
>> +     * Just to avoid any unforeseen case.
>
> nitpick..
>
> s/Just/just/

Thanks I will fix it.

>
>> +     */
>> +    if (elfsz > memsz) {
>> +        pr_err("Updated crash elfcorehdr elfsz %lu > memsz %lu", 
>> elfsz, memsz);
>> +        goto out;
>> +    }
>> +
>> +    ptr = __va(mem);
>> +    if (ptr) {
>> +        /* Temporarily invalidate the crash image while it is 
>> replaced */
>> +        xchg(&kexec_crash_image, NULL);
>> +
>> +        /* Replace the old elfcorehdr with newly prepared elfcorehdr */
>> +        memcpy((void *)ptr, elfbuf, elfsz);
>> +
>> +        /* The crash image is now valid once again */
>> +        xchg(&kexec_crash_image, image);
>> +    }
>> +out:
>> +    vfree(elfbuf);
>> +}
>> +
>>   /**
>>    * arch_crash_handle_hotplug_event - Handle crash CPU/Memory 
>> hotplug events to update the
>>    *                     necessary kexec segments based on the 
>> hotplug event.
>> @@ -556,7 +654,7 @@ int update_cpus_node(void *fdt)
>>    * CPU addition: Update the FDT segment to include the newly added 
>> CPU.
>>    * CPU removal: No action is needed, with the assumption that it's 
>> okay to have offline CPUs
>>    *        as part of the FDT.
>> - * Memory addition/removal: No action is taken as this is not yet 
>> supported.
>> + * Memory addition/removal: Recreate the elfcorehdr segment
>>    */
>>   void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
>>   {
>> @@ -570,7 +668,6 @@ void arch_crash_handle_hotplug_event(struct 
>> kimage *image, void *arg)
>>           return;
>
>>       } else if (hp_action == KEXEC_CRASH_HP_ADD_CPU) {
>> -
>
>>           void *fdt, *ptr;
>>           unsigned long mem;
>>           int i, fdt_index = -1;
>> @@ -605,8 +702,10 @@ void arch_crash_handle_hotplug_event(struct 
>> kimage *image, void *arg)
>
>>       } else if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY ||
>>              hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
>
>> -        pr_info_once("Crash update is not supported for memory 
>> hotplug\n");
>> -        return;
>> +        struct memory_notify *mn;
>> +
>> +        mn = (struct memory_notify *)arg;
>> +        update_crash_elfcorehdr(image, mn);
>>       }
>
> Use switch case statement for hotplug actions for better readability..

Sure.

Thanks,
Sourabh
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 943e58eb9bff..25ff5b7f1a28 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -116,8 +116,11 @@  int get_crash_memory_ranges(struct crash_mem **mem_ranges);
 #ifdef CONFIG_CRASH_HOTPLUG
 void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
-#endif /*CONFIG_CRASH_HOTPLUG */
 
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
+
+#endif /*CONFIG_CRASH_HOTPLUG */
 #endif /* CONFIG_PPC64 */
 
 #ifdef CONFIG_KEXEC_FILE
diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h
index f83866a19e87..802abf580cf0 100644
--- a/arch/powerpc/include/asm/kexec_ranges.h
+++ b/arch/powerpc/include/asm/kexec_ranges.h
@@ -7,6 +7,7 @@ 
 void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
 struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
 int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
+int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
 int add_tce_mem_ranges(struct crash_mem **mem_ranges);
 int add_initrd_mem_range(struct crash_mem **mem_ranges);
 #ifdef CONFIG_PPC_64S_HASH_MMU
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 43fcd78c2102..4673f150f973 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -19,8 +19,11 @@ 
 #include <linux/of.h>
 #include <linux/libfdt.h>
 #include <linux/memblock.h>
+#include <linux/memory.h>
 
 #include <asm/page.h>
+#include <asm/drmem.h>
+#include <asm/mmzone.h>
 #include <asm/current.h>
 #include <asm/machdep.h>
 #include <asm/cacheflush.h>
@@ -546,6 +549,101 @@  int update_cpus_node(void *fdt)
 #undef pr_fmt
 #define pr_fmt(fmt) "crash hp: " fmt
 
+/*
+ * Advertise preferred elfcorehdr size to userspace via
+ * /sys/kernel/crash_elfcorehdr_size sysfs interface.
+ */
+unsigned int arch_crash_get_elfcorehdr_size(void)
+{
+	unsigned int sz;
+	unsigned long elf_phdr_cnt;
+
+	/* Program header for CPU notes and vmcoreinfo */
+	elf_phdr_cnt = 2;
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+		/* In the worst case, a Phdr is needed for every other LMB to be
+		 * represented as an individual crash range.
+		 */
+		elf_phdr_cnt += memory_hotplug_max() / (2 * drmem_lmb_size());
+
+	/* Do not cross the max limit */
+	if (elf_phdr_cnt > PN_XNUM)
+		elf_phdr_cnt = PN_XNUM;
+
+	sz = sizeof(struct elfhdr) + (elf_phdr_cnt * sizeof(Elf64_Phdr));
+	return sz;
+}
+
+/**
+ * update_crash_elfcorehdr() - Recreate the elfcorehdr and replace it with old
+ *			       elfcorehdr in the kexec segment array.
+ * @image: the active struct kimage
+ * @mn: struct memory_notify data handler
+ */
+static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify *mn)
+{
+	int ret;
+	struct crash_mem *cmem = NULL;
+	struct kexec_segment *ksegment;
+	void *ptr, *mem, *elfbuf = NULL;
+	unsigned long elfsz, memsz, base_addr, size;
+
+	ksegment = &image->segment[image->elfcorehdr_index];
+	mem = (void *) ksegment->mem;
+	memsz = ksegment->memsz;
+
+	ret = get_crash_memory_ranges(&cmem);
+	if (ret) {
+		pr_err("Failed to get crash mem range\n");
+		return;
+	}
+
+	/*
+	 * The hot unplugged memory is part of crash memory ranges,
+	 * remove it here.
+	 */
+	if (image->hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY) {
+		base_addr = PFN_PHYS(mn->start_pfn);
+		size = mn->nr_pages * PAGE_SIZE;
+		ret = remove_mem_range(&cmem, base_addr, size);
+		if (ret) {
+			pr_err("Failed to remove hot-unplugged from crash memory ranges.\n");
+			return;
+		}
+	}
+
+	ret = crash_prepare_elf64_headers(cmem, false, &elfbuf, &elfsz);
+	if (ret) {
+		pr_err("Failed to prepare elf header\n");
+		return;
+	}
+
+	/*
+	 * It is unlikely that kernel hit this because elfcorehdr kexec
+	 * segment (memsz) is built with addition space to accommodate growing
+	 * number of crash memory ranges while loading the kdump kernel. It is
+	 * Just to avoid any unforeseen case.
+	 */
+	if (elfsz > memsz) {
+		pr_err("Updated crash elfcorehdr elfsz %lu > memsz %lu", elfsz, memsz);
+		goto out;
+	}
+
+	ptr = __va(mem);
+	if (ptr) {
+		/* Temporarily invalidate the crash image while it is replaced */
+		xchg(&kexec_crash_image, NULL);
+
+		/* Replace the old elfcorehdr with newly prepared elfcorehdr */
+		memcpy((void *)ptr, elfbuf, elfsz);
+
+		/* The crash image is now valid once again */
+		xchg(&kexec_crash_image, image);
+	}
+out:
+	vfree(elfbuf);
+}
+
 /**
  * arch_crash_handle_hotplug_event - Handle crash CPU/Memory hotplug events to update the
  *				     necessary kexec segments based on the hotplug event.
@@ -556,7 +654,7 @@  int update_cpus_node(void *fdt)
  * CPU addition: Update the FDT segment to include the newly added CPU.
  * CPU removal: No action is needed, with the assumption that it's okay to have offline CPUs
  *		as part of the FDT.
- * Memory addition/removal: No action is taken as this is not yet supported.
+ * Memory addition/removal: Recreate the elfcorehdr segment
  */
 void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 {
@@ -570,7 +668,6 @@  void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 		return;
 
 	} else if (hp_action == KEXEC_CRASH_HP_ADD_CPU) {
-
 		void *fdt, *ptr;
 		unsigned long mem;
 		int i, fdt_index = -1;
@@ -605,8 +702,10 @@  void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 
 	} else if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY ||
 		   hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
-		pr_info_once("Crash update is not supported for memory hotplug\n");
-		return;
+		struct memory_notify *mn;
+
+		mn = (struct memory_notify *)arg;
+		update_crash_elfcorehdr(image, mn);
 	}
 }
 #endif
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 3fb0500fee16..672a901edaa1 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -21,6 +21,8 @@ 
 #include <linux/memblock.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <linux/elf.h>
+
 #include <asm/setup.h>
 #include <asm/cputhreads.h>
 #include <asm/drmem.h>
@@ -740,7 +742,35 @@  static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf)
 
 	kbuf->buffer = headers;
 	kbuf->mem = KEXEC_BUF_MEM_UNKNOWN;
-	kbuf->bufsz = kbuf->memsz = headers_sz;
+	kbuf->bufsz = headers_sz;
+#if defined(CONFIG_CRASH_HOTPLUG) && defined(CONFIG_MEMORY_HOTPLUG)
+	/* Adjust the elfcorehdr segment size to accommodate
+	 * future crash memory ranges.
+	 */
+	int max_lmb;
+	unsigned long pnum;
+
+	/* In the worst case, a Phdr is needed for every other LMB to be
+	 * represented as an individual crash range.
+	 */
+	max_lmb = memory_hotplug_max() / (2 * drmem_lmb_size());
+
+	/* Do not cross the Phdr max limit of the elf header.
+	 * Avoid counting Phdr for crash ranges (cmem->nr_ranges)
+	 * which are already part of elfcorehdr.
+	 */
+	if (max_lmb > PN_XNUM)
+		pnum = PN_XNUM - cmem->nr_ranges;
+	else
+		pnum = max_lmb - cmem->nr_ranges;
+
+	/* Additional buffer space for elfcorehdr to accommodate
+	 * future memory ranges.
+	 */
+	kbuf->memsz = headers_sz + pnum * sizeof(Elf64_Phdr);
+#else
+	kbuf->memsz = headers_sz;
+#endif
 	kbuf->top_down = false;
 
 	ret = kexec_add_buffer(kbuf);
@@ -750,7 +780,7 @@  static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf)
 	}
 
 	image->elf_load_addr = kbuf->mem;
-	image->elf_headers_sz = headers_sz;
+	image->elf_headers_sz = kbuf->memsz;
 	image->elf_headers = headers;
 out:
 	kfree(cmem);
diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
index fb3e12f15214..4fd0c5d5607b 100644
--- a/arch/powerpc/kexec/ranges.c
+++ b/arch/powerpc/kexec/ranges.c
@@ -234,6 +234,91 @@  int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size)
 	return __add_mem_range(mem_ranges, base, size);
 }
 
+/**
+ * remove_mem_range - Removes the given memory range from the range list.
+ * @mem_ranges:    Range list to remove the memory range to.
+ * @base:          Base address of the range to remove.
+ * @size:          Size of the memory range to remove.
+ *
+ * (Re)allocates memory, if needed.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size)
+{
+	u64 end;
+	int ret = 0;
+	unsigned int i;
+	u64 mstart, mend;
+	struct crash_mem *mem_rngs = *mem_ranges;
+
+	if (!size)
+		return 0;
+
+	/*
+	 * Memory range are stored as start and end address, use
+	 * the same format to do remove operation.
+	 */
+	end = base + size - 1;
+
+	for (i = 0; i < mem_rngs->nr_ranges; i++) {
+		mstart = mem_rngs->ranges[i].start;
+		mend = mem_rngs->ranges[i].end;
+
+		/*
+		 * Memory range to remove is not part of this range entry
+		 * in the memory range list
+		 */
+		if (!(base >= mstart && end <= mend))
+			continue;
+
+		/*
+		 * Memory range to remove is equivalent to this entry in the
+		 * memory range list. Remove the range entry from the list.
+		 */
+		if (base == mstart && end == mend) {
+			for (; i < mem_rngs->nr_ranges - 1; i++) {
+				mem_rngs->ranges[i].start = mem_rngs->ranges[i+1].start;
+				mem_rngs->ranges[i].end = mem_rngs->ranges[i+1].end;
+			}
+			mem_rngs->nr_ranges--;
+			goto out;
+		}
+		/*
+		 * Start address of the memory range to remove and the
+		 * current memory range entry in the list is same. Just
+		 * move the start address of the current memory range
+		 * entry in the list to end + 1.
+		 */
+		else if (base == mstart) {
+			mem_rngs->ranges[i].start = end + 1;
+			goto out;
+		}
+		/*
+		 * End address of the memory range to remove and the
+		 * current memory range entry in the list is same.
+		 * Just move the end address of the current memory
+		 * range entry in the list to base - 1.
+		 */
+		else if (end == mend)  {
+			mem_rngs->ranges[i].end = base - 1;
+			goto out;
+		}
+		/*
+		 * Memory range to remove is not at the edge of current
+		 * memory range entry. Split the current memory entry into
+		 * two half.
+		 */
+		else {
+			mem_rngs->ranges[i].end = base - 1;
+			size = mem_rngs->ranges[i].end - end;
+			ret = add_mem_range(mem_ranges, end + 1, size);
+		}
+	}
+out:
+	return ret;
+}
+
 /**
  * add_tce_mem_ranges - Adds tce-table range to the given memory ranges list.
  * @mem_ranges:         Range list to add the memory range(s) to.