diff mbox

Only reset e820 once, even with multiple memmap=exactmap params

Message ID 201301081747.17853.trenn@suse.de
State Not Applicable
Headers show

Commit Message

Thomas Renninger Jan. 8, 2013, 4:47 p.m. UTC
On Tuesday, January 08, 2013 04:04:56 AM Yinghai Lu wrote:
> On Mon, Jan 7, 2013 at 4:42 PM, Thomas Renninger <trenn@suse.de> wrote:
> > memmap=256M$3584M
> 
> may need to change to:
> 
> memmap=256M\$\$3584M
The problem is (beside the special char $) that
memmap=exactmap boot param resets all e820 maps every time the
parameter is processed.
And:
/sbin/kexec -p xy --append="..." --initrd yx
seem to magically add (append):
memmap=exactmap memmap=640K@0K memmap=392556K@115328K elfcorehdr=507884K memmap=252K#3099760K

therefore all memmap= I try to pass are voided out by:
memmap=exactmap
which is always added by kexec after my params.

I could come around with attached patch and passing:
/sbin/kexec -p xy --append='... memmap=exactmap memmap=256M$3584M' --initrd yx

Now mmconfig is working in kdump kernel.
This would mean mmconfig is broken by design in kexec?

Only way to fix this I can think of is to export
mmconfig area through /sys  (../kernel/debug/mmconfig?, possibly
already in X$Y format?) in the productive kernel and make kexec add it
like the other memmap= params automatically.

I'll attach the output in a separate mail.

   Thomas

---

x86 e820: Do not reset e820 map twice, even if memmap=exactmap is passed as boot param several times

This is needed to be able to explicitly pass (debug) e820
modifications through kexec via memmap=.
Otherwise those get voided by kexec appending memmap=exactmap always
after the user defined boot parameters.

Signed-off-by: Thomas Renninger <trenn@suse.de>

 linux-2.6_t/arch/x86/kernel/e820.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Yinghai Lu Jan. 8, 2013, 5:19 p.m. UTC | #1
On Tue, Jan 8, 2013 at 8:47 AM, Thomas Renninger <trenn@suse.de> wrote:
> On Tuesday, January 08, 2013 04:04:56 AM Yinghai Lu wrote:
>> On Mon, Jan 7, 2013 at 4:42 PM, Thomas Renninger <trenn@suse.de> wrote:
>> > memmap=256M$3584M
>>
>> may need to change to:
>>
>> memmap=256M\$\$3584M
> The problem is (beside the special char $) that
> memmap=exactmap boot param resets all e820 maps every time the
> parameter is processed.
> And:
> /sbin/kexec -p xy --append="..." --initrd yx
> seem to magically add (append):
> memmap=exactmap memmap=640K@0K memmap=392556K@115328K elfcorehdr=507884K memmap=252K#3099760K
>
> therefore all memmap= I try to pass are voided out by:
> memmap=exactmap
> which is always added by kexec after my params.
>
> I could come around with attached patch and passing:
> /sbin/kexec -p xy --append='... memmap=exactmap memmap=256M$3584M' --initrd yx
>
> Now mmconfig is working in kdump kernel.
> This would mean mmconfig is broken by design in kexec?
>
> Only way to fix this I can think of is to export
> mmconfig area through /sys  (../kernel/debug/mmconfig?, possibly
> already in X$Y format?) in the productive kernel and make kexec add it
> like the other memmap= params automatically.
>
> I'll attach the output in a separate mail.
>
>    Thomas
>
> ---
>
> x86 e820: Do not reset e820 map twice, even if memmap=exactmap is passed as boot param several times
>
> This is needed to be able to explicitly pass (debug) e820
> modifications through kexec via memmap=.
> Otherwise those get voided by kexec appending memmap=exactmap always
> after the user defined boot parameters.
>
> Signed-off-by: Thomas Renninger <trenn@suse.de>
>
>  linux-2.6_t/arch/x86/kernel/e820.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> Index: git/linux-2.6_t/arch/x86/kernel/e820.c
> ===================================================================
> --- git.orig/linux-2.6_t/arch/x86/kernel/e820.c
> +++ git/linux-2.6_t/arch/x86/kernel/e820.c
> @@ -845,7 +845,9 @@ static int __init parse_memmap_opt(char
>
>         if (!strncmp(p, "exactmap", 8)) {
>  #ifdef CONFIG_CRASH_DUMP
> -               /*
> +               /* memmap=exactmap passed twice, do not reset tables again */
> +               if (saved_max_pfn)
> +                       return 0;               /*
>                  * If we are doing a crash dump, we still need to know
>                  * the real mem size before original memory map is
>                  * reset.

that exactmap logic still have problem:
We need to check exactmap at first, aka need to scan the whole comand line to
see if exactmap is there at first and reset e820 tables then handle
other memmap opt.

Also please update your patch after

tip/x86/mm2

I have one patch that process memmap= with "," there.

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=9710f581bb4c35589ac046b0cfc0deb7f369fc85

We could put exactmap scanning in new parse_memmap_opt.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Renninger Jan. 10, 2013, 3:21 a.m. UTC | #2
On Tuesday, January 08, 2013 09:19:18 AM Yinghai Lu wrote:
...
> 
> that exactmap logic still have problem:
> We need to check exactmap at first, aka need to scan the whole comand line
> to see if exactmap is there at first and reset e820 tables then handle
> other memmap opt.
> 
> Also please update your patch after
> 
> tip/x86/mm2
> 
> I have one patch that process memmap= with "," there.
> 
> http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=9710f58
> 1bb4c35589ac046b0cfc0deb7f369fc85
> 
> We could put exactmap scanning in new parse_memmap_opt.

I still do not understand why:

Kexec (kexec/firmware_memmap.c) is setting up the e820 map from:
/sys/firmware/memmap/*
and pass it via bootloader structures.
And this e820 table gets immediately voided by memmap=exactmap
and a new one passed via boot parameters is set up.
If I read this correctly, this is what happens?

Can kexec simply pass the memory to use via memmap=X@Y
Then take the original e820 table, but not the usable entries (those
are coming from above memmap=X@Y).
That would mean that the kexec kernel takes all the
original ACPI, ACPI NVS, reserved, unusable (everthing but usable) entries
from the original e820 table and identifies the usable memory from
memmap boot param?

This would be much smarter than trying to pass the mmconf reserved
area and I could imagine other issues will show up if the reserved areas
do not match the original ones in the kexec kernel.

If this really can be done and memmap=exactmap was only used by kexec,
it's logic could be redefined from "drop all e820 entries" to
"drop all usable e820 entries" and no further adjustings in kexec/kernel are
needed to get mmconf working (and other issues may be avoided before
they happen). Beside that ACPI reserved aread is not needed anymore to get
passed via memmap=X#Y by kexec.

   Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal Jan. 10, 2013, 2:26 p.m. UTC | #3
On Thu, Jan 10, 2013 at 04:21:49AM +0100, Thomas Renninger wrote:
> On Tuesday, January 08, 2013 09:19:18 AM Yinghai Lu wrote:
> ...
> > 
> > that exactmap logic still have problem:
> > We need to check exactmap at first, aka need to scan the whole comand line
> > to see if exactmap is there at first and reset e820 tables then handle
> > other memmap opt.
> > 
> > Also please update your patch after
> > 
> > tip/x86/mm2
> > 
> > I have one patch that process memmap= with "," there.
> > 
> > http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=9710f58
> > 1bb4c35589ac046b0cfc0deb7f369fc85
> > 
> > We could put exactmap scanning in new parse_memmap_opt.
> 
> I still do not understand why:
> 
> Kexec (kexec/firmware_memmap.c) is setting up the e820 map from:
> /sys/firmware/memmap/*

Kexe reads it from /sys/firmware/memmap/* because that's supposed to
be e820 map as passed by BIOS. And if user has specified some
command line options like mem=X, /sys/firmware/memmap/* is not truncated
but one exported using /proc/iomem is truncated accordingly.

So if we pass mem=1G in first kernel but not in second kernel, we want
second kernel to see whole of the system memory and not just 1G.

> and pass it via bootloader structures.

That's how bootloader is supposed to pass e820 memory map to kernel and
kexec is a bootloader.

> And this e820 table gets immediately voided by memmap=exactmap
> and a new one passed via boot parameters is set up.
> If I read this correctly, this is what happens?

This happens only in case of kdump and not kexec. In case of kdump
we want second kernel to use only selected memory areas.

In fact this is one improvement area. Instead of using memmap= entries
in kdump case, we should probably modify the e820 map passed in 
zero page and get rid of memmap= entries.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Jan. 10, 2013, 4:53 p.m. UTC | #4
On Thu, Jan 10, 2013 at 6:26 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>
> This happens only in case of kdump and not kexec. In case of kdump
> we want second kernel to use only selected memory areas.
>
> In fact this is one improvement area. Instead of using memmap= entries
> in kdump case, we should probably modify the e820 map passed in
> zero page and get rid of memmap= entries.

then how the kdump kernel get saved_max_pfn?

may have problem for read_oldmem with crashed kernel.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vivek Goyal Jan. 10, 2013, 5:01 p.m. UTC | #5
On Thu, Jan 10, 2013 at 08:53:18AM -0800, Yinghai Lu wrote:
> On Thu, Jan 10, 2013 at 6:26 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > This happens only in case of kdump and not kexec. In case of kdump
> > we want second kernel to use only selected memory areas.
> >
> > In fact this is one improvement area. Instead of using memmap= entries
> > in kdump case, we should probably modify the e820 map passed in
> > zero page and get rid of memmap= entries.
> 
> then how the kdump kernel get saved_max_pfn?
> 

Oh, I forgot about that. May be we can pass saved_max_pfn on command line
instead of passing all memmap= entries.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Jan. 10, 2013, 5:11 p.m. UTC | #6
On Thu, Jan 10, 2013 at 9:01 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Jan 10, 2013 at 08:53:18AM -0800, Yinghai Lu wrote:
>> On Thu, Jan 10, 2013 at 6:26 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >
>> > This happens only in case of kdump and not kexec. In case of kdump
>> > we want second kernel to use only selected memory areas.
>> >
>> > In fact this is one improvement area. Instead of using memmap= entries
>> > in kdump case, we should probably modify the e820 map passed in
>> > zero page and get rid of memmap= entries.
>>
>> then how the kdump kernel get saved_max_pfn?
>>
>
> Oh, I forgot about that. May be we can pass saved_max_pfn on command line
> instead of passing all memmap= entries.

then we create another kernel version/kexec tools dependency.

kexec tools need to check kernel version, to see if saved_max_pfn= is supported.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Jan. 10, 2013, 11:34 p.m. UTC | #7
On Wed, Jan 9, 2013 at 7:21 PM, Thomas Renninger <trenn@suse.de> wrote:
> I still do not understand why:
>
> Kexec (kexec/firmware_memmap.c) is setting up the e820 map from:
> /sys/firmware/memmap/*
> and pass it via bootloader structures.
> And this e820 table gets immediately voided by memmap=exactmap
> and a new one passed via boot parameters is set up.
> If I read this correctly, this is what happens?

yes, kdump scripts append those memmap.

>
> Can kexec simply pass the memory to use via memmap=X@Y
> Then take the original e820 table, but not the usable entries (those
> are coming from above memmap=X@Y).
> That would mean that the kexec kernel takes all the
> original ACPI, ACPI NVS, reserved, unusable (everthing but usable) entries
> from the original e820 table and identifies the usable memory from
> memmap boot param?

kdump scripts already do that for acpi regions, need to update it
to append that for mmconf.

>
> This would be much smarter than trying to pass the mmconf reserved
> area and I could imagine other issues will show up if the reserved areas
> do not match the original ones in the kexec kernel.
>
> If this really can be done and memmap=exactmap was only used by kexec,
> it's logic could be redefined from "drop all e820 entries" to
> "drop all usable e820 entries" and no further adjustings in kexec/kernel are
> needed to get mmconf working (and other issues may be avoided before
> they happen). Beside that ACPI reserved aread is not needed anymore to get
> passed via memmap=X#Y by kexec.

yes, we have other user for debug  like simulating user memmap for some bugs.

current problem for exactmap is that we don't scan that at first.

attached patch could help that.

Thanks

Yinghai
diff mbox

Patch

Index: git/linux-2.6_t/arch/x86/kernel/e820.c
===================================================================
--- git.orig/linux-2.6_t/arch/x86/kernel/e820.c
+++ git/linux-2.6_t/arch/x86/kernel/e820.c
@@ -845,7 +845,9 @@  static int __init parse_memmap_opt(char
 
 	if (!strncmp(p, "exactmap", 8)) {
 #ifdef CONFIG_CRASH_DUMP
-		/*
+		/* memmap=exactmap passed twice, do not reset tables again */
+		if (saved_max_pfn)
+			return 0;		/*
 		 * If we are doing a crash dump, we still need to know
 		 * the real mem size before original memory map is
 		 * reset.