diff mbox

[v2,1/2] fadump: reduce memory consumption for capture kernel

Message ID e1377bf8-e574-6ffa-6b63-3748a01dcdfa@linux.vnet.ibm.com (mailing list archive)
State Superseded
Headers show

Commit Message

Hari Bathini April 12, 2017, 8:29 p.m. UTC
On Friday 07 April 2017 07:16 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini@linux.vnet.ibm.com> writes:
>> On Friday 07 April 2017 07:24 AM, Michael Ellerman wrote:
>>> My preference would be that the fadump kernel "just works". If it's
>>> using too much memory then the fadump kernel should do whatever it needs
>>> to use less memory, eg. shrinking nr_cpu_ids etc.
>>> Do we actually know *why* the fadump kernel is running out of memory?
>>> Obviously large numbers of CPUs is one of the main drivers (lots of
>>> stacks required). But other than that what is causing the memory
>>> pressure? I would like some data on that before we proceed.
>> Almost the same amount of memory in comparison with the memory
>> required to boot the production kernel but that is unwarranted for fadump
>> (dump capture) kernel.
> That's not data! :)
>
> The dump kernel is booted with *much* less memory than the production
> kernel (that's the whole issue!) and so it doesn't need to create struct
> pages for all that memory, which means it should need less memory.
>
> The vfs caches are also sized based on the available memory, so they
> should also shrink in the dump kernel.
>
> I want some actual numbers on what's driving the memory usage.
>
> I tried some of these parameters to see how much memory they would save:

Hi Michael,

Tried to get data to show parameters like numa=off & cgroup_disable=memory
matter too but parameter nr_cpus=1 is making parameters like numa=off,
cgroup_disable=memory insignificant. Also, these parameters not using much
of early memory reservations is making quantification of memory saved for
each of them that much more difficult. But I would still like to argue that
passing additional parameters to fadump is better than enforcing nr_cpus=1
in the kernel for:

   a) With makedumpfile tool supporting multi-threading it would make sense
      to leave the choice of how many CPUs to have, to the user.

   b) Parameters like udev.children-max=2 can help to reduce the number of
      parallel executed events bringing down the memory pressure on fadump
      kernel (when it is booted with more than one CPU).

   c) Ease of maintainability is better (considering any new kernel features
      with some memory to save or stability to gain on disabling, possible
      platform supports) with append approach over enforcing these 
parameters
      in the kernel.

   d) It would give user the flexibility to disable unwanted kernel features
      in fadump kernel (numa=off, cgroup_disable=memory). For every feature
      enabled in the production kernel, fadump kernel will have the 
choice to
      opt out of it, provided there is such cmdline option.

>> So, if parameters like
>> cgroup_disable=memory,
> 0 bytes saved.
>
>> transparent_hugepages=never,
> 0 bytes saved.
>
>> numa=off,
> 64KB saved.
>
>> nr_cpus=1,
> 3MB saved (vs 16 CPUs)
>

Hmmm... On a system with single core and 8GB memory, fadump kernel captures
dump successfully with 272MB passing nr_cpus=1 while it needed 320MB (+48MB)
to do the same without nr_cpus=1. So, while the early reservations saved 
is only a
couple of megabytes, it rubs off further in the boot process to reduce 
memory
consumption by nearly 50MB :)

> Now maybe on your system those do save memory for some reason, but
> please prove it to me. Otherwise I'm inclined to merge:
>
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 8ff0dd4e77a7..03f1f253c372 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -79,8 +79,10 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
>   	 * dump data waiting for us.
>   	 */
>   	fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL);
> -	if (fdm_active)
> +	if (fdm_active) {
>   		fw_dump.dump_active = 1;
> +		nr_cpu_ids = 1;
> +	}
>
>   	/* Get the sizes required to store dump data for the firmware provided
>   	 * dump sections.
>
>

Based on your suggestion, I am thinking of something like the below:

--
powerpc/fadump: reduce memory consumption for capture kernel

With fadump (dump capture) kernel booting like a regular kernel, it almost
needs the same amount of memory to boot as the production kernel, which is
unwarranted for a dump capture kernel. But with no option to disable some
of the unnecessary subsystems in fadump kernel, that much memory is wasted
on fadump, depriving the production kernel of that memory.

Introduce kernel parameter 'fadump_append=' that would take regular kernel
parameters as a comma separated list, to be enforced when fadump is active.
This 'fadump_append=' parameter can be leveraged to pass parameters like
nr_cpus=1, cgroup_disable=memory and numa=off, to disable unwarranted
resources/subsystems.

Also, ensure the log "Firmware-assisted dump is active" is printed early
in the boot process to put the subsequent fadump messages in context.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
---
  arch/powerpc/kernel/fadump.c |   41 
++++++++++++++++++++++++++++++++++++++---
  1 file changed, 38 insertions(+), 3 deletions(-)



Thanks
Hari

Comments

Michal Suchánek April 13, 2017, 7:58 p.m. UTC | #1
On Thu, 13 Apr 2017 01:59:13 +0530
Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:

> On Friday 07 April 2017 07:16 PM, Michael Ellerman wrote:
> > Hari Bathini <hbathini@linux.vnet.ibm.com> writes:  
> >> On Friday 07 April 2017 07:24 AM, Michael Ellerman wrote:  
> >>> My preference would be that the fadump kernel "just works". If
> >>> it's using too much memory then the fadump kernel should do
> >>> whatever it needs to use less memory, eg. shrinking nr_cpu_ids
> >>> etc. Do we actually know *why* the fadump kernel is running out
> >>> of memory? Obviously large numbers of CPUs is one of the main
> >>> drivers (lots of stacks required). But other than that what is
> >>> causing the memory pressure? I would like some data on that
> >>> before we proceed.  
> >> Almost the same amount of memory in comparison with the memory
> >> required to boot the production kernel but that is unwarranted for
> >> fadump (dump capture) kernel.  
> > That's not data! :)
> >
> > The dump kernel is booted with *much* less memory than the
> > production kernel (that's the whole issue!) and so it doesn't need
> > to create struct pages for all that memory, which means it should
> > need less memory.
> >
> > The vfs caches are also sized based on the available memory, so they
> > should also shrink in the dump kernel.
> >
> > I want some actual numbers on what's driving the memory usage.
> >
> > I tried some of these parameters to see how much memory they would
> > save:  
> 
> Hi Michael,
> 
> Tried to get data to show parameters like numa=off &
> cgroup_disable=memory matter too but parameter nr_cpus=1 is making
> parameters like numa=off, cgroup_disable=memory insignificant. Also,
> these parameters not using much of early memory reservations is
> making quantification of memory saved for each of them that much more
> difficult. But I would still like to argue that passing additional
> parameters to fadump is better than enforcing nr_cpus=1 in the kernel
> for:
> 
>    a) With makedumpfile tool supporting multi-threading it would make
> sense to leave the choice of how many CPUs to have, to the user.
> 
>    b) Parameters like udev.children-max=2 can help to reduce the
> number of parallel executed events bringing down the memory pressure
> on fadump kernel (when it is booted with more than one CPU).
> 
>    c) Ease of maintainability is better (considering any new kernel
> features with some memory to save or stability to gain on disabling,
> possible platform supports) with append approach over enforcing these 
> parameters
>       in the kernel.
> 
>    d) It would give user the flexibility to disable unwanted kernel
> features in fadump kernel (numa=off, cgroup_disable=memory). For
> every feature enabled in the production kernel, fadump kernel will
> have the choice to
>       opt out of it, provided there is such cmdline option.

Hello,

can't the extra parameters be passed in the devicetree?

The docs say that the kernel can tell it's a fadump crash kernel by
checking the devicetree ibm,dump-kernel property. Is there any reason
more (optional) properties cannot be added?

Thanks

Michal
Hari Bathini April 17, 2017, 3:13 p.m. UTC | #2
On Friday 14 April 2017 01:28 AM, Michal Suchánek wrote:
> On Thu, 13 Apr 2017 01:59:13 +0530
> Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:
>
>> On Friday 07 April 2017 07:16 PM, Michael Ellerman wrote:
>>> Hari Bathini <hbathini@linux.vnet.ibm.com> writes:
>>>> On Friday 07 April 2017 07:24 AM, Michael Ellerman wrote:
>>>>> My preference would be that the fadump kernel "just works". If
>>>>> it's using too much memory then the fadump kernel should do
>>>>> whatever it needs to use less memory, eg. shrinking nr_cpu_ids
>>>>> etc. Do we actually know *why* the fadump kernel is running out
>>>>> of memory? Obviously large numbers of CPUs is one of the main
>>>>> drivers (lots of stacks required). But other than that what is
>>>>> causing the memory pressure? I would like some data on that
>>>>> before we proceed.
>>>> Almost the same amount of memory in comparison with the memory
>>>> required to boot the production kernel but that is unwarranted for
>>>> fadump (dump capture) kernel.
>>> That's not data! :)
>>>
>>> The dump kernel is booted with *much* less memory than the
>>> production kernel (that's the whole issue!) and so it doesn't need
>>> to create struct pages for all that memory, which means it should
>>> need less memory.
>>>
>>> The vfs caches are also sized based on the available memory, so they
>>> should also shrink in the dump kernel.
>>>
>>> I want some actual numbers on what's driving the memory usage.
>>>
>>> I tried some of these parameters to see how much memory they would
>>> save:
>> Hi Michael,
>>
>> Tried to get data to show parameters like numa=off &
>> cgroup_disable=memory matter too but parameter nr_cpus=1 is making
>> parameters like numa=off, cgroup_disable=memory insignificant. Also,
>> these parameters not using much of early memory reservations is
>> making quantification of memory saved for each of them that much more
>> difficult. But I would still like to argue that passing additional
>> parameters to fadump is better than enforcing nr_cpus=1 in the kernel
>> for:
>>
>>     a) With makedumpfile tool supporting multi-threading it would make
>> sense to leave the choice of how many CPUs to have, to the user.
>>
>>     b) Parameters like udev.children-max=2 can help to reduce the
>> number of parallel executed events bringing down the memory pressure
>> on fadump kernel (when it is booted with more than one CPU).
>>
>>     c) Ease of maintainability is better (considering any new kernel
>> features with some memory to save or stability to gain on disabling,
>> possible platform supports) with append approach over enforcing these
>> parameters
>>        in the kernel.
>>
>>     d) It would give user the flexibility to disable unwanted kernel
>> features in fadump kernel (numa=off, cgroup_disable=memory). For
>> every feature enabled in the production kernel, fadump kernel will
>> have the choice to
>>        opt out of it, provided there is such cmdline option.
> Hello,

Hi Michal,

> can't the extra parameters be passed in the devicetree?

Hmmm.. possible. Without change in f/w, this may not be guaranteed though.

> The docs say that the kernel can tell it's a fadump crash kernel by
> checking the devicetree ibm,dump-kernel property. Is there any reason

This node is exported by firmware

> more (optional) properties cannot be added?

Kernel change seems simple over f/w enhancement..

Thanks
Hari
Michal Suchánek April 18, 2017, 2:18 p.m. UTC | #3
On Mon, 17 Apr 2017 20:43:02 +0530
Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:

> On Friday 14 April 2017 01:28 AM, Michal Suchánek wrote:
> > On Thu, 13 Apr 2017 01:59:13 +0530
> > Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:
> >  
> >> On Friday 07 April 2017 07:16 PM, Michael Ellerman wrote:  
> >>> Hari Bathini <hbathini@linux.vnet.ibm.com> writes:  
> >>>> On Friday 07 April 2017 07:24 AM, Michael Ellerman wrote:  
> >>>>> My preference would be that the fadump kernel "just works". If
> >>>>> it's using too much memory then the fadump kernel should do
> >>>>> whatever it needs to use less memory, eg. shrinking nr_cpu_ids
> >>>>> etc. Do we actually know *why* the fadump kernel is running out
> >>>>> of memory? Obviously large numbers of CPUs is one of the main
> >>>>> drivers (lots of stacks required). But other than that what is
> >>>>> causing the memory pressure? I would like some data on that
> >>>>> before we proceed.  
> >>>> Almost the same amount of memory in comparison with the memory
> >>>> required to boot the production kernel but that is unwarranted
> >>>> for fadump (dump capture) kernel.  
> >>> That's not data! :)
> >>>
> >>> The dump kernel is booted with *much* less memory than the
> >>> production kernel (that's the whole issue!) and so it doesn't need
> >>> to create struct pages for all that memory, which means it should
> >>> need less memory.
> >>>
> >>> The vfs caches are also sized based on the available memory, so
> >>> they should also shrink in the dump kernel.
> >>>
> >>> I want some actual numbers on what's driving the memory usage.
> >>>
> >>> I tried some of these parameters to see how much memory they would
> >>> save:  
> >> Hi Michael,
> >>
> >> Tried to get data to show parameters like numa=off &
> >> cgroup_disable=memory matter too but parameter nr_cpus=1 is making
> >> parameters like numa=off, cgroup_disable=memory insignificant.
> >> Also, these parameters not using much of early memory reservations
> >> is making quantification of memory saved for each of them that
> >> much more difficult. But I would still like to argue that passing
> >> additional parameters to fadump is better than enforcing nr_cpus=1
> >> in the kernel for:
> >>
> >>     a) With makedumpfile tool supporting multi-threading it would
> >> make sense to leave the choice of how many CPUs to have, to the
> >> user.
> >>
> >>     b) Parameters like udev.children-max=2 can help to reduce the
> >> number of parallel executed events bringing down the memory
> >> pressure on fadump kernel (when it is booted with more than one
> >> CPU).
> >>
> >>     c) Ease of maintainability is better (considering any new
> >> kernel features with some memory to save or stability to gain on
> >> disabling, possible platform supports) with append approach over
> >> enforcing these parameters
> >>        in the kernel.
> >>
> >>     d) It would give user the flexibility to disable unwanted
> >> kernel features in fadump kernel (numa=off,
> >> cgroup_disable=memory). For every feature enabled in the
> >> production kernel, fadump kernel will have the choice to
> >>        opt out of it, provided there is such cmdline option.  
> > Hello,  
> 
> Hi Michal,
> 
> > can't the extra parameters be passed in the devicetree?  
> 
> Hmmm.. possible. Without change in f/w, this may not be guaranteed
> though.
> 
> > The docs say that the kernel can tell it's a fadump crash kernel by
> > checking the devicetree ibm,dump-kernel property. Is there any
> > reason  
> 
> This node is exported by firmware
> 
> > more (optional) properties cannot be added?  
> 
> Kernel change seems simple over f/w enhancement..

That certainly looks so when you are a kernel developer and can
implement the change yourself compared to convincing some firmware
developer that this feature makes sense.

On the other hand, the proposed kernel-only solution introduces
requirement that the maintainer does not like.

For the platform as a whole does it make more sense to add a hack to
the kernel or does it make sense to enhance the firmware to provide
more options for firmware-assisted dump?

Thanks

Michal
Michael Ellerman April 19, 2017, 4:19 a.m. UTC | #4
Michal Suchánek <msuchanek@suse.de> writes:
> On Mon, 17 Apr 2017 20:43:02 +0530
> Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:
>> On Friday 14 April 2017 01:28 AM, Michal Suchánek wrote:
>> > more (optional) properties cannot be added?  
>> 
>> Kernel change seems simple over f/w enhancement..
>
> That certainly looks so when you are a kernel developer and can
> implement the change yourself compared to convincing some firmware
> developer that this feature makes sense.
>
> On the other hand, the proposed kernel-only solution introduces
> requirement that the maintainer does not like.
>
> For the platform as a whole does it make more sense to add a hack to
> the kernel or does it make sense to enhance the firmware to provide
> more options for firmware-assisted dump?

Unfortunately it doesn't really matter, because there is firmware out
there that implements the current behaviour and will never be updated.
So we have to work with what's there.

I don't think fadump_append is a hack, though it's not pretty I'll
admit.

cheers
Michal Suchánek April 19, 2017, 2:08 p.m. UTC | #5
On Wed, 19 Apr 2017 14:19:47 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Michal Suchánek <msuchanek@suse.de> writes:
> > On Mon, 17 Apr 2017 20:43:02 +0530
> > Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:  
> >> On Friday 14 April 2017 01:28 AM, Michal Suchánek wrote:  
> >> > more (optional) properties cannot be added?    
> >> 
> >> Kernel change seems simple over f/w enhancement..  
> >
> > That certainly looks so when you are a kernel developer and can
> > implement the change yourself compared to convincing some firmware
> > developer that this feature makes sense.
> >
> > On the other hand, the proposed kernel-only solution introduces
> > requirement that the maintainer does not like.
> >
> > For the platform as a whole does it make more sense to add a hack to
> > the kernel or does it make sense to enhance the firmware to provide
> > more options for firmware-assisted dump?  
> 
> Unfortunately it doesn't really matter, because there is firmware out
> there that implements the current behaviour and will never be updated.
> So we have to work with what's there.
> 

It's not that with the existing firmware fadump does not work. It just
uses more memory than needed. So if new firmware revision allows more
flexibility it will work for people with updated firmware and people
with outdated firmware will get the current behavior.

The memory saving is only significant for big systems so only people
running those will get significant improvement from the updated
firmware.

Thanks

Michal
Hari Bathini April 20, 2017, 6:49 p.m. UTC | #6
On Wednesday 19 April 2017 07:38 PM, Michal Suchánek wrote:
> On Wed, 19 Apr 2017 14:19:47 +1000
> Michael Ellerman <mpe@ellerman.id.au> wrote:
>
>> Michal Suchánek <msuchanek@suse.de> writes:
>>> On Mon, 17 Apr 2017 20:43:02 +0530
>>> Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:
>>>> On Friday 14 April 2017 01:28 AM, Michal Suchánek wrote:
>>>>> more (optional) properties cannot be added?
>>>> Kernel change seems simple over f/w enhancement..
>>> That certainly looks so when you are a kernel developer and can
>>> implement the change yourself compared to convincing some firmware
>>> developer that this feature makes sense.
>>>
>>> On the other hand, the proposed kernel-only solution introduces
>>> requirement that the maintainer does not like.
>>>
>>> For the platform as a whole does it make more sense to add a hack to
>>> the kernel or does it make sense to enhance the firmware to provide
>>> more options for firmware-assisted dump?
>> Unfortunately it doesn't really matter, because there is firmware out
>> there that implements the current behaviour and will never be updated.
>> So we have to work with what's there.
>>
> It's not that with the existing firmware fadump does not work. It just
> uses more memory than needed. So if new firmware revision allows more
> flexibility it will work for people with updated firmware and people
> with outdated firmware will get the current behavior.
>
> The memory saving is only significant for big systems so only people
> running those will get significant improvement from the updated
> firmware.
>


Hi Michal,

With the fadump_append= approach I posted in this response thread,
additional parameters are enforced when fadump is active. If f/w supports
appending additional parameters, it has to update chosen/bootargs
whenever fadump is active. Almost the same thing except the dirty job
is now done by f/w? Hence I thought fadump_append= kernel parameter
approach is simple and lesser of the two evils? Am i missing something 
here..

Thanks
Hari
Michal Suchánek April 24, 2017, 10:24 a.m. UTC | #7
On Fri, 21 Apr 2017 00:19:55 +0530
Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:

> On Wednesday 19 April 2017 07:38 PM, Michal Suchánek wrote:
> > On Wed, 19 Apr 2017 14:19:47 +1000
> > Michael Ellerman <mpe@ellerman.id.au> wrote:
> >  
> >> Michal Suchánek <msuchanek@suse.de> writes:  
> >>> On Mon, 17 Apr 2017 20:43:02 +0530
> >>> Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:  
> >>>> On Friday 14 April 2017 01:28 AM, Michal Suchánek wrote:  
> >>>>> more (optional) properties cannot be added?  
> >>>> Kernel change seems simple over f/w enhancement..  
> >>> That certainly looks so when you are a kernel developer and can
> >>> implement the change yourself compared to convincing some firmware
> >>> developer that this feature makes sense.
> >>>
> >>> On the other hand, the proposed kernel-only solution introduces
> >>> requirement that the maintainer does not like.
> >>>
> >>> For the platform as a whole does it make more sense to add a hack
> >>> to the kernel or does it make sense to enhance the firmware to
> >>> provide more options for firmware-assisted dump?  
> >> Unfortunately it doesn't really matter, because there is firmware
> >> out there that implements the current behaviour and will never be
> >> updated. So we have to work with what's there.
> >>  
> > It's not that with the existing firmware fadump does not work. It
> > just uses more memory than needed. So if new firmware revision
> > allows more flexibility it will work for people with updated
> > firmware and people with outdated firmware will get the current
> > behavior.
> >
> > The memory saving is only significant for big systems so only people
> > running those will get significant improvement from the updated
> > firmware.
> >  
> 
> 
> Hi Michal,
> 
> With the fadump_append= approach I posted in this response thread,
> additional parameters are enforced when fadump is active. If f/w
> supports appending additional parameters, it has to update
> chosen/bootargs whenever fadump is active. Almost the same thing
> except the dirty job is now done by f/w? Hence I thought
> fadump_append= kernel parameter approach is simple and lesser of the
> two evils? Am i missing something here..
> 

The firmware already has to set the parameter by which the kernel can
tell it is a fadump kernel. Hence it already modifies the devicetree for
fadump and you have to rely on it to do so.

The handover area which stores the additional parameters is reserved by
the running kernel so now you also have to rely on the running kernel,
the running kernel and fadum kernel having the same idea about the
location of handover area, the crashing kernel not randomly overwriting
the handover area, etc.

In short it is additional potential for trouble which can be avoided.

I don't know if the firmware does protect the fadump reserved area and
devicetree from being randomly overwritten by the crashing kernel but
it is in the position to do so if desired. The handover area is
controlled by the crashing kernel and cannot have this guarantee.

Thanks

Michal
Hari Bathini April 24, 2017, 12:56 p.m. UTC | #8
Hi Michal.

On Monday 24 April 2017 03:54 PM, Michal Suchánek wrote:
> On Fri, 21 Apr 2017 00:19:55 +0530
> Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:
>
>> On Wednesday 19 April 2017 07:38 PM, Michal Suchánek wrote:
>>> On Wed, 19 Apr 2017 14:19:47 +1000
>>> Michael Ellerman <mpe@ellerman.id.au> wrote:
>>>   
>>>> Michal Suchánek <msuchanek@suse.de> writes:
>>>>> On Mon, 17 Apr 2017 20:43:02 +0530
>>>>> Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:
>>>>>> On Friday 14 April 2017 01:28 AM, Michal Suchánek wrote:
>>>>>>> more (optional) properties cannot be added?
>>>>>> Kernel change seems simple over f/w enhancement..
>>>>> That certainly looks so when you are a kernel developer and can
>>>>> implement the change yourself compared to convincing some firmware
>>>>> developer that this feature makes sense.
>>>>>
>>>>> On the other hand, the proposed kernel-only solution introduces
>>>>> requirement that the maintainer does not like.
>>>>>
>>>>> For the platform as a whole does it make more sense to add a hack
>>>>> to the kernel or does it make sense to enhance the firmware to
>>>>> provide more options for firmware-assisted dump?
>>>> Unfortunately it doesn't really matter, because there is firmware
>>>> out there that implements the current behaviour and will never be
>>>> updated. So we have to work with what's there.
>>>>   
>>> It's not that with the existing firmware fadump does not work. It
>>> just uses more memory than needed. So if new firmware revision
>>> allows more flexibility it will work for people with updated
>>> firmware and people with outdated firmware will get the current
>>> behavior.
>>>
>>> The memory saving is only significant for big systems so only people
>>> running those will get significant improvement from the updated
>>> firmware.
>>>   
>>
>> Hi Michal,
>>
>> With the fadump_append= approach I posted in this response thread,
>> additional parameters are enforced when fadump is active. If f/w
>> supports appending additional parameters, it has to update
>> chosen/bootargs whenever fadump is active. Almost the same thing
>> except the dirty job is now done by f/w? Hence I thought
>> fadump_append= kernel parameter approach is simple and lesser of the
>> two evils? Am i missing something here..
>>
> The firmware already has to set the parameter by which the kernel can
> tell it is a fadump kernel. Hence it already modifies the devicetree for
> fadump and you have to rely on it to do so.

Right, devicetree is modified to include new 'ibm,kernel-dump' rtas node
informing that fadump is active.

> The handover area which stores the additional parameters is reserved by
> the running kernel so now you also have to rely on the running kernel,
> the running kernel and fadum kernel having the same idea about the
> location of handover area, the crashing kernel not randomly overwriting
> the handover area, etc.
>
> In short it is additional potential for trouble which can be avoided.
>
> I don't know if the firmware does protect the fadump reserved area and
> devicetree from being randomly overwritten by the crashing kernel but
> it is in the position to do so if desired. The handover area is
> controlled by the crashing kernel and cannot have this guarantee.
>
>

I thinks there is a mixup here..
I am no longer batting for handover area approach but
"fadump_append=" approach (suggested by Michael Ellerman in this thread)
where there is no need for a handover area but an additional kernel
parameter fadump_append=numa=off,nr_cpus=1.. which is a comma separated
list of parameters the user wants to enforce when fadump is active.

I was trying to say that passing additional parameters with 'fadump_append='
kernel parameter is better over f/w changing the chosen/bootargs node.


Thanks
Hari
Michal Suchánek April 24, 2017, 2 p.m. UTC | #9
On Mon, 24 Apr 2017 18:26:37 +0530
Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:

> Hi Michal.
> 
> On Monday 24 April 2017 03:54 PM, Michal Suchánek wrote:
> > On Fri, 21 Apr 2017 00:19:55 +0530
> > Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:
> >  
> >> On Wednesday 19 April 2017 07:38 PM, Michal Suchánek wrote:  
> >>> On Wed, 19 Apr 2017 14:19:47 +1000
> >>> Michael Ellerman <mpe@ellerman.id.au> wrote:
> >>>     

> >> With the fadump_append= approach I posted in this response thread,
> >> additional parameters are enforced when fadump is active. If f/w
> >> supports appending additional parameters, it has to update
> >> chosen/bootargs whenever fadump is active. Almost the same thing
> >> except the dirty job is now done by f/w? Hence I thought
> >> fadump_append= kernel parameter approach is simple and lesser of
> >> the two evils? Am i missing something here..
> >>  
> > The firmware already has to set the parameter by which the kernel
> > can tell it is a fadump kernel. Hence it already modifies the
> > devicetree for fadump and you have to rely on it to do so.  
> 
> Right, devicetree is modified to include new 'ibm,kernel-dump' rtas
> node informing that fadump is active.
> 
> > The handover area which stores the additional parameters is
> > reserved by the running kernel so now you also have to rely on the
> > running kernel, the running kernel and fadum kernel having the same
> > idea about the location of handover area, the crashing kernel not
> > randomly overwriting the handover area, etc.
> >
> > In short it is additional potential for trouble which can be
> > avoided.
> >
> > I don't know if the firmware does protect the fadump reserved area
> > and devicetree from being randomly overwritten by the crashing
> > kernel but it is in the position to do so if desired. The handover
> > area is controlled by the crashing kernel and cannot have this
> > guarantee.
> >
> >  
> 
> I thinks there is a mixup here..
> I am no longer batting for handover area approach but
> "fadump_append=" approach (suggested by Michael Ellerman in this
> thread) where there is no need for a handover area but an additional
> kernel parameter fadump_append=numa=off,nr_cpus=1.. which is a comma
> separated list of parameters the user wants to enforce when fadump is
> active.
> 
> I was trying to say that passing additional parameters with
> 'fadump_append=' kernel parameter is better over f/w changing the
> chosen/bootargs node.


Ok, so a fadump specific parameter was just removed in
25031865553e5a2bd9b6e0a5d044952cfa2925d8 and with this we get one back.
If this is going to be added as an extra kernel parameter can't this be
passed to kdump kernel as well? Does kdump not have the same
restrictions?

Thanks

Michal
Hari Bathini April 25, 2017, 1:13 p.m. UTC | #10
On Monday 24 April 2017 07:30 PM, Michal Suchánek wrote:
> On Mon, 24 Apr 2017 18:26:37 +0530
> Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:
>
>> Hi Michal.
>>
>> On Monday 24 April 2017 03:54 PM, Michal Suchánek wrote:
>>> On Fri, 21 Apr 2017 00:19:55 +0530
>>> Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:
>>>   
>>>> On Wednesday 19 April 2017 07:38 PM, Michal Suchánek wrote:
>>>>> On Wed, 19 Apr 2017 14:19:47 +1000
>>>>> Michael Ellerman <mpe@ellerman.id.au> wrote:
>>>>>      
>>>> With the fadump_append= approach I posted in this response thread,
>>>> additional parameters are enforced when fadump is active. If f/w
>>>> supports appending additional parameters, it has to update
>>>> chosen/bootargs whenever fadump is active. Almost the same thing
>>>> except the dirty job is now done by f/w? Hence I thought
>>>> fadump_append= kernel parameter approach is simple and lesser of
>>>> the two evils? Am i missing something here..
>>>>   
>>> The firmware already has to set the parameter by which the kernel
>>> can tell it is a fadump kernel. Hence it already modifies the
>>> devicetree for fadump and you have to rely on it to do so.
>> Right, devicetree is modified to include new 'ibm,kernel-dump' rtas
>> node informing that fadump is active.
>>
>>> The handover area which stores the additional parameters is
>>> reserved by the running kernel so now you also have to rely on the
>>> running kernel, the running kernel and fadum kernel having the same
>>> idea about the location of handover area, the crashing kernel not
>>> randomly overwriting the handover area, etc.
>>>
>>> In short it is additional potential for trouble which can be
>>> avoided.
>>>
>>> I don't know if the firmware does protect the fadump reserved area
>>> and devicetree from being randomly overwritten by the crashing
>>> kernel but it is in the position to do so if desired. The handover
>>> area is controlled by the crashing kernel and cannot have this
>>> guarantee.
>>>
>>>   
>> I thinks there is a mixup here..
>> I am no longer batting for handover area approach but
>> "fadump_append=" approach (suggested by Michael Ellerman in this
>> thread) where there is no need for a handover area but an additional
>> kernel parameter fadump_append=numa=off,nr_cpus=1.. which is a comma
>> separated list of parameters the user wants to enforce when fadump is
>> active.
>>
>> I was trying to say that passing additional parameters with
>> 'fadump_append=' kernel parameter is better over f/w changing the
>> chosen/bootargs node.
>
> Ok, so a fadump specific parameter was just removed in
> 25031865553e5a2bd9b6e0a5d044952cfa2925d8 and with this we get one back.

I think you mean commit c2afb7ce9d088696427018ba2135b898058507b8
from linux-next. If so, yes.

> If this is going to be added as an extra kernel parameter can't this be
> passed to kdump kernel as well? Does kdump not have the same
> restrictions?
>

kdump kernel is loaded with kexec and additional parameters can be 
passed to it at the
time of loading. But fadump kernel boots like a regular kernel (through 
f/w & bootloader,
resetting all the h/w) except that f/w guarantees to keep the memory 
intact. So, there is
a need for a different approach to pass additional parameters in case of 
fadump..

Thanks
Hari
Michal Suchánek April 25, 2017, 1:29 p.m. UTC | #11
On Tue, 25 Apr 2017 18:43:11 +0530
Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:

> On Monday 24 April 2017 07:30 PM, Michal Suchánek wrote:
> > On Mon, 24 Apr 2017 18:26:37 +0530
> > Hari Bathini <hbathini@linux.vnet.ibm.com> wrote:
> >  
> >> Hi Michal.
> >>

> >>>     
> >> I thinks there is a mixup here..
> >> I am no longer batting for handover area approach but
> >> "fadump_append=" approach (suggested by Michael Ellerman in this
> >> thread) where there is no need for a handover area but an
> >> additional kernel parameter fadump_append=numa=off,nr_cpus=1..
> >> which is a comma separated list of parameters the user wants to
> >> enforce when fadump is active.
> >>
> >> I was trying to say that passing additional parameters with
> >> 'fadump_append=' kernel parameter is better over f/w changing the
> >> chosen/bootargs node.  
> >
> > Ok, so a fadump specific parameter was just removed in
> > 25031865553e5a2bd9b6e0a5d044952cfa2925d8 and with this we get one
> > back.  
> 
> I think you mean commit c2afb7ce9d088696427018ba2135b898058507b8
> from linux-next. If so, yes.
> 
> > If this is going to be added as an extra kernel parameter can't
> > this be passed to kdump kernel as well? Does kdump not have the same
> > restrictions?
> >  
> 
> kdump kernel is loaded with kexec and additional parameters can be 
> passed to it at the
> time of loading. But fadump kernel boots like a regular kernel
> (through f/w & bootloader,
> resetting all the h/w) except that f/w guarantees to keep the memory 
> intact. So, there is
> a need for a different approach to pass additional parameters in case
> of fadump..
> 

I see, you can pass different commandline to the kdump kernel while
loading it with kexec but fadump kernel is booted same as normal kernel
until it looks at the devicetree so the extra arguments have to be
passed always and only appended to the commandline in the fadump case.

This sounds reasonable.

Thanks

Michal
diff mbox

Patch

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index e013f8f..c4d4663 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -79,8 +79,10 @@  int __init early_init_dt_scan_fw_dump(unsigned long node,
       * dump data waiting for us.
       */
      fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL);
-    if (fdm_active)
+    if (fdm_active) {
+        pr_info("Firmware-assisted dump is active.\n");
          fw_dump.dump_active = 1;
+    }

      /* Get the sizes required to store dump data for the firmware provided
       * dump sections.
@@ -263,8 +265,12 @@  int __init fadump_reserve_mem(void)
  {
      unsigned long base, size, memory_boundary;

-    if (!fw_dump.fadump_enabled)
+    if (!fw_dump.fadump_enabled) {
+        if (fw_dump.dump_active)
+            pr_warn("Firmware-assisted dump was active but kernel"
+                " booted with fadump disabled!\n");
          return 0;
+    }

      if (!fw_dump.fadump_supported) {
          printk(KERN_INFO "Firmware-assisted dump is not supported on"
@@ -304,7 +310,6 @@  int __init fadump_reserve_mem(void)
          memory_boundary = memblock_end_of_DRAM();

      if (fw_dump.dump_active) {
-        printk(KERN_INFO "Firmware-assisted dump is active.\n");
          /*
           * If last boot has crashed then reserve all the memory
           * above boot_memory_size so that we don't touch it until
@@ -359,6 +364,36 @@  static int __init early_fadump_param(char *p)
  }
  early_param("fadump", early_fadump_param);

+static void __init parse_fadump_append_params(const char *p)
+{
+    static char fadump_cmdline[COMMAND_LINE_SIZE] __initdata;
+    char *token;
+
+    strlcpy(fadump_cmdline, p, COMMAND_LINE_SIZE);
+    token = strchr(fadump_cmdline, ',');
+    while (token) {
+        *token = ' ';
+        token = strchr(token, ',');
+    }
+
+    pr_info("enforcing additional parameters (%s) passed through "
+        "'fadump_append=' parameter\n", fadump_cmdline);
+    parse_early_options(fadump_cmdline);
+}
+
+/* Look for fadump_append= cmdline option. */
+static int __init early_fadump_append_param(char *p)
+{
+    if (!p)
+        return 1;
+
+    if (fw_dump.dump_active)
+        parse_fadump_append_params(p);
+
+    return 0;
+}
+early_param("fadump_append", early_fadump_append_param);
+
  static void register_fw_dump(struct fadump_mem_struct *fdm)
  {
      int rc;