Message ID | 1568256617-14030-1-git-send-email-kernelfans@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | powerpc/crashkernel: take mem option into account | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (c317052c95bef1f977b023158e5aa929215f443d) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | warning | total: 0 errors, 0 warnings, 1 checks, 22 lines checked |
Cc Kexec list. And keep the original content. On Thu, Sep 12, 2019 at 10:50 AM Pingfan Liu <kernelfans@gmail.com> wrote: > > 'mem=" option is an easy way to put high pressure on memory during some > test. Hence in stead of total mem, the effective usable memory size should > be considered when reserving mem for crashkernel. Otherwise the boot up may > experience oom issue. > > E.g passing > crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and > mem=5G on a 256G machine. > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com> > Cc: Hari Bathini <hbathini@linux.ibm.com> > Cc: Michael Ellerman <mpe@ellerman.id.au> > To: linuxppc-dev@lists.ozlabs.org > --- > v1 -> v2: fix the printk info about the total mem > arch/powerpc/kernel/machine_kexec.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c > index c4ed328..eec96dc 100644 > --- a/arch/powerpc/kernel/machine_kexec.c > +++ b/arch/powerpc/kernel/machine_kexec.c > @@ -114,11 +114,12 @@ void machine_kexec(struct kimage *image) > > void __init reserve_crashkernel(void) > { > - unsigned long long crash_size, crash_base; > + unsigned long long crash_size, crash_base, total_mem_sz; > int ret; > > + total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size(); > /* use common parsing */ > - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), > + ret = parse_crashkernel(boot_command_line, total_mem_sz, > &crash_size, &crash_base); > if (ret == 0 && crash_size > 0) { > crashk_res.start = crash_base; > @@ -185,7 +186,7 @@ void __init reserve_crashkernel(void) > "for crashkernel (System RAM: %ldMB)\n", > (unsigned long)(crash_size >> 20), > (unsigned long)(crashk_res.start >> 20), > - (unsigned long)(memblock_phys_mem_size() >> 20)); > + (unsigned long)(total_mem_sz >> 20)); > > if (!memblock_is_region_memory(crashk_res.start, crash_size) || > memblock_reserve(crashk_res.start, crash_size)) { > -- > 2.7.5 >
Pingfan Liu <kernelfans@gmail.com> writes: > Cc Kexec list. And keep the original content. > > On Thu, Sep 12, 2019 at 10:50 AM Pingfan Liu <kernelfans@gmail.com> wrote: >> >> 'mem=" option is an easy way to put high pressure on memory during some >> test. Hence in stead of total mem, the effective usable memory size ^ ^ instead "actual" would be clearer I think adding: "after applying the memory limit" would help here. >> should be considered when reserving mem for crashkernel. Otherwise >> the boot up may experience oom issue. ^ OOM >> >> E.g passing >> crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and >> mem=5G on a 256G machine. Spelling out the behaviour before and after would help here, eg: .. "would reserve 4G prior to the change and 512M afterward." >> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> >> Cc: Hari Bathini <hbathini@linux.ibm.com> >> Cc: Michael Ellerman <mpe@ellerman.id.au> >> To: linuxppc-dev@lists.ozlabs.org >> --- >> v1 -> v2: fix the printk info about the total mem >> arch/powerpc/kernel/machine_kexec.c | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c >> index c4ed328..eec96dc 100644 >> --- a/arch/powerpc/kernel/machine_kexec.c >> +++ b/arch/powerpc/kernel/machine_kexec.c >> @@ -114,11 +114,12 @@ void machine_kexec(struct kimage *image) >> >> void __init reserve_crashkernel(void) >> { >> - unsigned long long crash_size, crash_base; >> + unsigned long long crash_size, crash_base, total_mem_sz; >> int ret; >> >> + total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size(); >> /* use common parsing */ >> - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), >> + ret = parse_crashkernel(boot_command_line, total_mem_sz, >> &crash_size, &crash_base); I think this change makes sense. But we have multiple arches that implement similar logic, and I wonder if we should keep them all the same. eg: arch/arm/kernel/setup.c: ret = parse_crashkernel(boot_command_line, total_mem, arch/arm64/mm/init.c: ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), arch/ia64/kernel/setup.c: ret = parse_crashkernel(boot_command_line, total, arch/mips/kernel/setup.c: ret = parse_crashkernel(boot_command_line, total_mem, arch/powerpc/kernel/fadump.c: ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), arch/powerpc/kernel/machine_kexec.c: ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), arch/s390/kernel/setup.c: rc = parse_crashkernel(boot_command_line, memory_end, &crash_size, arch/sh/kernel/machine_kexec.c: ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), arch/x86/kernel/setup.c: ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base); From a quick glance most of them don't seem to take the memory limit into account. So I guess the question is do we want all arches to implement the same behaviour or do we think it doesn't matter if they differ in details like this? cheers
On Wed, Sep 18, 2019 at 7:23 PM Michael Ellerman <mpe@ellerman.id.au> wrote: > > Pingfan Liu <kernelfans@gmail.com> writes: > > Cc Kexec list. And keep the original content. > > > > On Thu, Sep 12, 2019 at 10:50 AM Pingfan Liu <kernelfans@gmail.com> wrote: > >> > >> 'mem=" option is an easy way to put high pressure on memory during some > >> test. Hence in stead of total mem, the effective usable memory size > ^ ^ > instead "actual" would be clearer > > I think adding: "after applying the memory limit" > > would help here. > > >> should be considered when reserving mem for crashkernel. Otherwise > >> the boot up may experience oom issue. > ^ > OOM > >> > >> E.g passing > >> crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and > >> mem=5G on a 256G machine. > > Spelling out the behaviour before and after would help here, eg: > > .. "would reserve 4G prior to the change and 512M afterward." > Thanks for kindly review. I will update the commit based on your suggestion. > > >> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> > >> Cc: Hari Bathini <hbathini@linux.ibm.com> > >> Cc: Michael Ellerman <mpe@ellerman.id.au> > >> To: linuxppc-dev@lists.ozlabs.org > >> --- > >> v1 -> v2: fix the printk info about the total mem > >> arch/powerpc/kernel/machine_kexec.c | 7 ++++--- > >> 1 file changed, 4 insertions(+), 3 deletions(-) > >> > >> diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c > >> index c4ed328..eec96dc 100644 > >> --- a/arch/powerpc/kernel/machine_kexec.c > >> +++ b/arch/powerpc/kernel/machine_kexec.c > >> @@ -114,11 +114,12 @@ void machine_kexec(struct kimage *image) > >> > >> void __init reserve_crashkernel(void) > >> { > >> - unsigned long long crash_size, crash_base; > >> + unsigned long long crash_size, crash_base, total_mem_sz; > >> int ret; > >> > >> + total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size(); > >> /* use common parsing */ > >> - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), > >> + ret = parse_crashkernel(boot_command_line, total_mem_sz, > >> &crash_size, &crash_base); > > I think this change makes sense. But we have multiple arches that > implement similar logic, and I wonder if we should keep them all the > same. > > eg: > > arch/arm/kernel/setup.c: ret = parse_crashkernel(boot_command_line, total_mem, > arch/arm64/mm/init.c: ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), > arch/ia64/kernel/setup.c: ret = parse_crashkernel(boot_command_line, total, > arch/mips/kernel/setup.c: ret = parse_crashkernel(boot_command_line, total_mem, > arch/powerpc/kernel/fadump.c: ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), > arch/powerpc/kernel/machine_kexec.c: ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), > arch/s390/kernel/setup.c: rc = parse_crashkernel(boot_command_line, memory_end, &crash_size, > arch/sh/kernel/machine_kexec.c: ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), > arch/x86/kernel/setup.c: ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base); > > > From a quick glance most of them don't seem to take the memory limit > into account. > > So I guess the question is do we want all arches to implement the same > behaviour or do we think it doesn't matter if they differ in details > like this? On powerpc, the current code make fadump/kdump a higher priority than "mem=" option, as the notes in fadump_reserve_mem() says " /* * Calculate the memory boundary. * If memory_limit is less than actual memory boundary then reserve * the memory for fadump beyond the memory_limit and adjust the * memory_limit accordingly, so that the running kernel can run with * specified memory_limit. */ " While on other archs, they pack "mem=" info into memblock before calling memblock_phys_mem_size(). So when parse_crashkernel() calls memblock_phys_mem_size(), the "mem=" takes effect. E.g for x86 in arch/x86/kernel/e820.c static int __init parse_memopt(char *p) { ... e820__range_remove(mem_size, ULLONG_MAX - mem_size, E820_TYPE_RAM, 1); // this pack the "mem=" info into e820, and is finally feed to memblock } Thanks, Pingfan
diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c index c4ed328..eec96dc 100644 --- a/arch/powerpc/kernel/machine_kexec.c +++ b/arch/powerpc/kernel/machine_kexec.c @@ -114,11 +114,12 @@ void machine_kexec(struct kimage *image) void __init reserve_crashkernel(void) { - unsigned long long crash_size, crash_base; + unsigned long long crash_size, crash_base, total_mem_sz; int ret; + total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size(); /* use common parsing */ - ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, total_mem_sz, &crash_size, &crash_base); if (ret == 0 && crash_size > 0) { crashk_res.start = crash_base; @@ -185,7 +186,7 @@ void __init reserve_crashkernel(void) "for crashkernel (System RAM: %ldMB)\n", (unsigned long)(crash_size >> 20), (unsigned long)(crashk_res.start >> 20), - (unsigned long)(memblock_phys_mem_size() >> 20)); + (unsigned long)(total_mem_sz >> 20)); if (!memblock_is_region_memory(crashk_res.start, crash_size) || memblock_reserve(crashk_res.start, crash_size)) {
'mem=" option is an easy way to put high pressure on memory during some test. Hence in stead of total mem, the effective usable memory size should be considered when reserving mem for crashkernel. Otherwise the boot up may experience oom issue. E.g passing crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and mem=5G on a 256G machine. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> To: linuxppc-dev@lists.ozlabs.org --- v1 -> v2: fix the printk info about the total mem arch/powerpc/kernel/machine_kexec.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)