diff mbox series

powerpc/crashkernel: take mem option into account

Message ID 1568256617-14030-1-git-send-email-kernelfans@gmail.com (mailing list archive)
State Superseded
Headers show
Series powerpc/crashkernel: take mem option into account | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (c317052c95bef1f977b023158e5aa929215f443d)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch warning total: 0 errors, 0 warnings, 1 checks, 22 lines checked

Commit Message

Pingfan Liu Sept. 12, 2019, 2:50 a.m. UTC
'mem=" option is an easy way to put high pressure on memory during some
test. Hence in stead of total mem, the effective usable memory size should
be considered when reserving mem for crashkernel. Otherwise the boot up may
experience oom issue.

E.g passing
crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and
mem=5G on a 256G machine.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
To: linuxppc-dev@lists.ozlabs.org
---
v1 -> v2: fix the printk info about the total mem
 arch/powerpc/kernel/machine_kexec.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Pingfan Liu Sept. 17, 2019, 5:29 a.m. UTC | #1
Cc Kexec list. And keep the original content.

On Thu, Sep 12, 2019 at 10:50 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> 'mem=" option is an easy way to put high pressure on memory during some
> test. Hence in stead of total mem, the effective usable memory size should
> be considered when reserving mem for crashkernel. Otherwise the boot up may
> experience oom issue.
>
> E.g passing
> crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and
> mem=5G on a 256G machine.
>
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Hari Bathini <hbathini@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> To: linuxppc-dev@lists.ozlabs.org
> ---
> v1 -> v2: fix the printk info about the total mem
>  arch/powerpc/kernel/machine_kexec.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c
> index c4ed328..eec96dc 100644
> --- a/arch/powerpc/kernel/machine_kexec.c
> +++ b/arch/powerpc/kernel/machine_kexec.c
> @@ -114,11 +114,12 @@ void machine_kexec(struct kimage *image)
>
>  void __init reserve_crashkernel(void)
>  {
> -       unsigned long long crash_size, crash_base;
> +       unsigned long long crash_size, crash_base, total_mem_sz;
>         int ret;
>
> +       total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size();
>         /* use common parsing */
> -       ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> +       ret = parse_crashkernel(boot_command_line, total_mem_sz,
>                         &crash_size, &crash_base);
>         if (ret == 0 && crash_size > 0) {
>                 crashk_res.start = crash_base;
> @@ -185,7 +186,7 @@ void __init reserve_crashkernel(void)
>                         "for crashkernel (System RAM: %ldMB)\n",
>                         (unsigned long)(crash_size >> 20),
>                         (unsigned long)(crashk_res.start >> 20),
> -                       (unsigned long)(memblock_phys_mem_size() >> 20));
> +                       (unsigned long)(total_mem_sz >> 20));
>
>         if (!memblock_is_region_memory(crashk_res.start, crash_size) ||
>             memblock_reserve(crashk_res.start, crash_size)) {
> --
> 2.7.5
>
Michael Ellerman Sept. 18, 2019, 11:22 a.m. UTC | #2
Pingfan Liu <kernelfans@gmail.com> writes:
> Cc Kexec list. And keep the original content.
>
> On Thu, Sep 12, 2019 at 10:50 AM Pingfan Liu <kernelfans@gmail.com> wrote:
>>
>> 'mem=" option is an easy way to put high pressure on memory during some
>> test. Hence in stead of total mem, the effective usable memory size
               ^                          ^
               instead                    "actual" would be clearer

I think adding: "after applying the memory limit" 

would help here.

>> should be considered when reserving mem for crashkernel. Otherwise
>> the boot up may experience oom issue.
                              ^
                              OOM
>>
>> E.g passing
>> crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and
>> mem=5G on a 256G machine.

Spelling out the behaviour before and after would help here, eg:

.. "would reserve 4G prior to the change and 512M afterward."


>> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
>> Cc: Hari Bathini <hbathini@linux.ibm.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> To: linuxppc-dev@lists.ozlabs.org
>> ---
>> v1 -> v2: fix the printk info about the total mem
>>  arch/powerpc/kernel/machine_kexec.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c
>> index c4ed328..eec96dc 100644
>> --- a/arch/powerpc/kernel/machine_kexec.c
>> +++ b/arch/powerpc/kernel/machine_kexec.c
>> @@ -114,11 +114,12 @@ void machine_kexec(struct kimage *image)
>>
>>  void __init reserve_crashkernel(void)
>>  {
>> -       unsigned long long crash_size, crash_base;
>> +       unsigned long long crash_size, crash_base, total_mem_sz;
>>         int ret;
>>
>> +       total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size();
>>         /* use common parsing */
>> -       ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>> +       ret = parse_crashkernel(boot_command_line, total_mem_sz,
>>                         &crash_size, &crash_base);

I think this change makes sense. But we have multiple arches that
implement similar logic, and I wonder if we should keep them all the
same.

eg:

  arch/arm/kernel/setup.c:                ret = parse_crashkernel(boot_command_line, total_mem,
  arch/arm64/mm/init.c:                   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
  arch/ia64/kernel/setup.c:               ret = parse_crashkernel(boot_command_line, total,
  arch/mips/kernel/setup.c:               ret = parse_crashkernel(boot_command_line, total_mem,
  arch/powerpc/kernel/fadump.c:           ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
  arch/powerpc/kernel/machine_kexec.c:    ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
  arch/s390/kernel/setup.c:               rc = parse_crashkernel(boot_command_line, memory_end, &crash_size,
  arch/sh/kernel/machine_kexec.c:         ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
  arch/x86/kernel/setup.c:                ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);


From a quick glance most of them don't seem to take the memory limit
into account.

So I guess the question is do we want all arches to implement the same
behaviour or do we think it doesn't matter if they differ in details
like this?

cheers
Pingfan Liu Sept. 23, 2019, 4:14 a.m. UTC | #3
On Wed, Sep 18, 2019 at 7:23 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Pingfan Liu <kernelfans@gmail.com> writes:
> > Cc Kexec list. And keep the original content.
> >
> > On Thu, Sep 12, 2019 at 10:50 AM Pingfan Liu <kernelfans@gmail.com> wrote:
> >>
> >> 'mem=" option is an easy way to put high pressure on memory during some
> >> test. Hence in stead of total mem, the effective usable memory size
>                ^                          ^
>                instead                    "actual" would be clearer
>
> I think adding: "after applying the memory limit"
>
> would help here.
>
> >> should be considered when reserving mem for crashkernel. Otherwise
> >> the boot up may experience oom issue.
>                               ^
>                               OOM
> >>
> >> E.g passing
> >> crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and
> >> mem=5G on a 256G machine.
>
> Spelling out the behaviour before and after would help here, eg:
>
> .. "would reserve 4G prior to the change and 512M afterward."
>
Thanks for kindly review. I will update the commit based on your suggestion.
>
> >> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> >> Cc: Hari Bathini <hbathini@linux.ibm.com>
> >> Cc: Michael Ellerman <mpe@ellerman.id.au>
> >> To: linuxppc-dev@lists.ozlabs.org
> >> ---
> >> v1 -> v2: fix the printk info about the total mem
> >>  arch/powerpc/kernel/machine_kexec.c | 7 ++++---
> >>  1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c
> >> index c4ed328..eec96dc 100644
> >> --- a/arch/powerpc/kernel/machine_kexec.c
> >> +++ b/arch/powerpc/kernel/machine_kexec.c
> >> @@ -114,11 +114,12 @@ void machine_kexec(struct kimage *image)
> >>
> >>  void __init reserve_crashkernel(void)
> >>  {
> >> -       unsigned long long crash_size, crash_base;
> >> +       unsigned long long crash_size, crash_base, total_mem_sz;
> >>         int ret;
> >>
> >> +       total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size();
> >>         /* use common parsing */
> >> -       ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> >> +       ret = parse_crashkernel(boot_command_line, total_mem_sz,
> >>                         &crash_size, &crash_base);
>
> I think this change makes sense. But we have multiple arches that
> implement similar logic, and I wonder if we should keep them all the
> same.
>
> eg:
>
>   arch/arm/kernel/setup.c:                ret = parse_crashkernel(boot_command_line, total_mem,
>   arch/arm64/mm/init.c:                   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>   arch/ia64/kernel/setup.c:               ret = parse_crashkernel(boot_command_line, total,
>   arch/mips/kernel/setup.c:               ret = parse_crashkernel(boot_command_line, total_mem,
>   arch/powerpc/kernel/fadump.c:           ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>   arch/powerpc/kernel/machine_kexec.c:    ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>   arch/s390/kernel/setup.c:               rc = parse_crashkernel(boot_command_line, memory_end, &crash_size,
>   arch/sh/kernel/machine_kexec.c:         ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>   arch/x86/kernel/setup.c:                ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
>
>
> From a quick glance most of them don't seem to take the memory limit
> into account.
>
> So I guess the question is do we want all arches to implement the same
> behaviour or do we think it doesn't matter if they differ in details
> like this?

On powerpc, the current code make fadump/kdump a higher priority than
"mem=" option, as the notes in fadump_reserve_mem() says
"
        /*
         * Calculate the memory boundary.
         * If memory_limit is less than actual memory boundary then reserve
         * the memory for fadump beyond the memory_limit and adjust the
         * memory_limit accordingly, so that the running kernel can run with
         * specified memory_limit.
         */
"

While on other archs, they pack "mem=" info into memblock before
calling memblock_phys_mem_size(). So when parse_crashkernel() calls
memblock_phys_mem_size(), the "mem=" takes effect.

E.g for x86 in arch/x86/kernel/e820.c
static int __init parse_memopt(char *p)
{
...
e820__range_remove(mem_size, ULLONG_MAX - mem_size, E820_TYPE_RAM, 1);
// this pack the "mem=" info into e820, and is finally feed to
memblock
}

Thanks,
Pingfan
diff mbox series

Patch

diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c
index c4ed328..eec96dc 100644
--- a/arch/powerpc/kernel/machine_kexec.c
+++ b/arch/powerpc/kernel/machine_kexec.c
@@ -114,11 +114,12 @@  void machine_kexec(struct kimage *image)
 
 void __init reserve_crashkernel(void)
 {
-	unsigned long long crash_size, crash_base;
+	unsigned long long crash_size, crash_base, total_mem_sz;
 	int ret;
 
+	total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size();
 	/* use common parsing */
-	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+	ret = parse_crashkernel(boot_command_line, total_mem_sz,
 			&crash_size, &crash_base);
 	if (ret == 0 && crash_size > 0) {
 		crashk_res.start = crash_base;
@@ -185,7 +186,7 @@  void __init reserve_crashkernel(void)
 			"for crashkernel (System RAM: %ldMB)\n",
 			(unsigned long)(crash_size >> 20),
 			(unsigned long)(crashk_res.start >> 20),
-			(unsigned long)(memblock_phys_mem_size() >> 20));
+			(unsigned long)(total_mem_sz >> 20));
 
 	if (!memblock_is_region_memory(crashk_res.start, crash_size) ||
 	    memblock_reserve(crashk_res.start, crash_size)) {