Patchwork x86 e820: only void usable memory areas in memmap=exactmap case

login
register
mail settings
Submitter Thomas Renninger
Date Jan. 12, 2013, 11:31 a.m.
Message ID <4068140.Uym9gOXymC@hammer82.arch.suse.de>
Download mbox | patch
Permalink /patch/211488/
State Not Applicable
Headers show

Comments

Thomas Renninger - Jan. 12, 2013, 11:31 a.m.
On Friday, January 11, 2013 02:16:35 PM H. Peter Anvin wrote:
> On 01/11/2013 01:09 PM, Yinghai Lu wrote:
> > On Fri, Jan 11, 2013 at 12:06 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> >> On 01/11/2013 11:59 AM, Yinghai Lu wrote:
> >>> On Fri, Jan 11, 2013 at 10:24 AM, Thomas Renninger <trenn@suse.de> wrote:
> >>>>> We may need to keep exactmap intact.
> >>>> 
> >>>> Why?
> >>>> Kexec/kdump should have been the only user?
> >>>> If older/current kexec calls still add ACPI maps via memmap=X#Y,
> >>>> they should already exist in the original e820 map and fall off or
> >>>> get glued to one region if (wrongly) overlapping via sanitize_map.
> >>> 
> >>> No, kexec/kdump is not the only user for memmap=exactmap.
> >> 
> >> Who is using it then, since you seem to know?
> > 
> > http://forums.gentoo.org/viewtopic-t-487476-highlight-proliant.html
> > 
> > http://forums.fedoraforum.org/archive/index.php/t-225347.html
> 
> Hm... both of those seem to be someone trying memmap=exactmap to hack
> around a problem which really was elsewhere, with a different solution.

Ok, boot params can be counted as "public interface" to the kernel that should/must
not change.
Generally there are several flavors of these:
  1) Purely (low level) debug options
  2) Workarounds for broken HW
  3) Real interface that may make sense on productive systems and which
      applications may pick up

In this case we have a mixture of all.
Kexec/kdump would be the only one for 3. I expect.

IMO nobody should run memmap=exactmap on a productive machine.
These have a sever BIOS (or whereever) bugs and will jump from one
trap to the other trying to update the kernel and similar.
This applies for above 2 links.

What is confusing for developers is if they used memmap=exactmap
already to try a self made up e820 table and might get angry if he realizes
after some time that its usage changed silently

Still introducing another memmap=param would be very unfortunate because
of the kexec version (and work) dependency to get this fixed properly.

Hm, whatabout this:
Tell user that memmap=exactmap usage has changed in !kdump case.
This would cover the very rare cases you mentioned.

In kdump case passing an extra exactmap option was broken anyway.
No need to bore the user with an additional message there.

is_kdump_kernel() cannot be used because it's too early.
But the check is exactly the same (will elfcorehdr_addr be set via
the corresponding boot param).

Compile tested only:


-----------------------------
x86 e820: only void usable memory areas in memmap=exactmap case

All unusable (reserved, ACPI, ACPI NVS,...) areas have to be
honored in kdump case.
Othwerise ACPI parts will quickly run into trouble when trying
to for example early_ioremap reserved areas which are not
declared reserved in kdump kernel.
mmconf area must also be a reserved mem region.

Passing unusable memory via memmap= is a design flaw as
this information is already (exactly for this purpose) passed
via bootloader structure.
In kdump case (when memmap=exactmap is passed), only void
(do not use) usable memory regions from the passed e820 table
and use memory areas defined via memmap=X@Y boot parameter instead.
But do still use the "unusable" memory regions from the original e820
table.

Rename exactmap_parsed to memmap checked as voidmap needs the same checking.

Signed-off-by: Thomas Renninger <trenn@suse.de>

---
 Documentation/kernel-parameters.txt |   18 ++++++++++---
 arch/x86/kernel/e820.c              |   46 ++++++++++++++++++++++++++++++++--
 2 files changed, 57 insertions(+), 7 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - Jan. 12, 2013, 5:07 p.m.
On Sat, Jan 12, 2013 at 3:31 AM, Thomas Renninger <trenn@suse.de> wrote:
>         memmap=exactmap [KNL,X86] Enable setting of an exact
> -                       E820 memory map, as specified by the user.
> -                       Such memmap=exactmap lines can be constructed based on
> -                       BIOS output or other requirements. See the memmap=nn@ss
> -                       option description.
> +                       E820 usable memory map, as specified by the user.
> +                       All unusable (reserved, ACPI, NVS,...) ranges from the
> +                       original e820 table are preserved.
> +                       But the usable memory regions from the original e820
> +                       table are removed.
> +                       This parameter is explicitly for kdump usage:
> +                       The memory the kdump kernel is allowed to use must
> +                       be passed via below memmap=nn[KMG]@ss[KMG] param.
> +                       All reserved regions the kernel may use for ioremapping
> +                       and similar are still considered.
> +
> +       memmap=voidmap  [KNL,X86] Do not use any e820 ranges from BIOS or
> +                       bootloader. Instead you have to pass regions via
> +                       below memmap= options.

I would suggest to keep memmap=exactmap meaning not changed, and add
memmap=exactusablemap
instead.

kexec-tools could be updated to support exactusablemap with
kernelversion checking for kdump.

also we need to double check to make sure:
1. exactmap should override exactusablemap, even the out of order sequence.
2. when exactusablemap is used, not just remove old usable type range,
also need to remove overlapped range
with new usable range.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Renninger - Jan. 14, 2013, 2:08 a.m.
On Saturday, January 12, 2013 09:07:12 AM Yinghai Lu wrote:
> On Sat, Jan 12, 2013 at 3:31 AM, Thomas Renninger <trenn@suse.de> wrote:
> >         memmap=exactmap [KNL,X86] Enable setting of an exact
> > 
> > -                       E820 memory map, as specified by the user.
> > -                       Such memmap=exactmap lines can be constructed
> > based on -                       BIOS output or other requirements. See
> > the memmap=nn@ss -                       option description.
> > +                       E820 usable memory map, as specified by the user.
> > +                       All unusable (reserved, ACPI, NVS,...) ranges from
> > the +                       original e820 table are preserved.
> > +                       But the usable memory regions from the original
> > e820 +                       table are removed.
> > +                       This parameter is explicitly for kdump usage:
> > +                       The memory the kdump kernel is allowed to use must
> > +                       be passed via below memmap=nn[KMG]@ss[KMG] param.
> > +                       All reserved regions the kernel may use for
> > ioremapping +                       and similar are still considered.
> > +
> > +       memmap=voidmap  [KNL,X86] Do not use any e820 ranges from BIOS or
> > +                       bootloader. Instead you have to pass regions via
> > +                       below memmap= options.
> 
> I would suggest to keep memmap=exactmap meaning not changed, and add
> memmap=exactusablemap
> instead.
I disagree.
I would like to change memmap=exactmap behavior.

Why:
1)
This is a sever bug (for kdump/kexec). I could imagine quite a lot
kdump related bugs get silently solved by this fix.
I expect we agree that the change, however it looks like in the end, should
still get in in this kernel round?
With my approach, I would also suggest to spread this to stable kernels.
-> No need to update/patch kexec-tools, things will just work as they should.

2)
I would introduce 2 new memmap= options. However they look like, for example:
=void_usable_map
=void_orig_map
and deprecated exactmap= via
printk(KERN_INFO "exactmap usage changed and is deprecated\n");
but still fix it.
Latest kexec-tools will just use memmap=void_usable_map, old ones
are still fixed via stable kernel updates.


> kexec-tools could be updated to support exactusablemap with
> kernelversion checking for kdump.
No, please not. It will be a maintenance/compatibility issue that will remain
for years or ever in kdump and/or the kernel and it's not necessary.

> also we need to double check to make sure:
> 1. exactmap should override exactusablemap, even the out of order sequence.
So that one can override the whole map in kdump case (which was broken until
today)?
I agree. But in my case it would be:
=void_orig_map overrides =void_usable_map

> 2. when exactusablemap is used, not just remove old usable type range,
> also need to remove overlapped range
> with new usable range.
Why should this be necessary?
The e820 map passed by BIOS/bootloader should already be sanity checked
-> no overlaps.

Then usable ranges are removed and the memmap= defined are added and
sanity check is called again.
-> no overlaps.

Sanity checking prefers reserved/unusable memory ranges (in this case always 
using the original ones from BIOS) over usable ones and this is a good idea to 
do...

I guess it's up to further votes/comments/ideas and a final maintainer 
(kexec/x86?) decision?


    Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - Jan. 14, 2013, 2:43 a.m.
On Sun, Jan 13, 2013 at 6:08 PM, Thomas Renninger <trenn@suse.de> wrote:
> On Saturday, January 12, 2013 09:07:12 AM Yinghai Lu wrote:
>> On Sat, Jan 12, 2013 at 3:31 AM, Thomas Renninger <trenn@suse.de> wrote:
>> >         memmap=exactmap [KNL,X86] Enable setting of an exact
>> >
>> > -                       E820 memory map, as specified by the user.
>> > -                       Such memmap=exactmap lines can be constructed
>> > based on -                       BIOS output or other requirements. See
>> > the memmap=nn@ss -                       option description.
>> > +                       E820 usable memory map, as specified by the user.
>> > +                       All unusable (reserved, ACPI, NVS,...) ranges from
>> > the +                       original e820 table are preserved.
>> > +                       But the usable memory regions from the original
>> > e820 +                       table are removed.
>> > +                       This parameter is explicitly for kdump usage:
>> > +                       The memory the kdump kernel is allowed to use must
>> > +                       be passed via below memmap=nn[KMG]@ss[KMG] param.
>> > +                       All reserved regions the kernel may use for
>> > ioremapping +                       and similar are still considered.
>> > +
>> > +       memmap=voidmap  [KNL,X86] Do not use any e820 ranges from BIOS or
>> > +                       bootloader. Instead you have to pass regions via
>> > +                       below memmap= options.
>>
>> I would suggest to keep memmap=exactmap meaning not changed, and add
>> memmap=exactusablemap
>> instead.
> I disagree.
> I would like to change memmap=exactmap behavior.
>
> Why:
> 1)
> This is a sever bug (for kdump/kexec). I could imagine quite a lot
> kdump related bugs get silently solved by this fix.
> I expect we agree that the change, however it looks like in the end, should
> still get in in this kernel round?
> With my approach, I would also suggest to spread this to stable kernels.
> -> No need to update/patch kexec-tools, things will just work as they should.
>
> 2)
> I would introduce 2 new memmap= options. However they look like, for example:
> =void_usable_map
> =void_orig_map
> and deprecated exactmap= via
> printk(KERN_INFO "exactmap usage changed and is deprecated\n");
> but still fix it.
> Latest kexec-tools will just use memmap=void_usable_map, old ones
> are still fixed via stable kernel updates.

everyone could understand it straightforward:
exactmap:  memmap will be specified, and it should be honored without
considering old any memmap.
exactusablemap: will make sure only old ram range get removed, and
specified usable ranges will become final usable
ranges in the final memmap. so we need to remove overlapping to old
reserved ranges.

aka: exact means EXACT ...

>
>
>> kexec-tools could be updated to support exactusablemap with
>> kernelversion checking for kdump.
> No, please not. It will be a maintenance/compatibility issue that will remain
> for years or ever in kdump and/or the kernel and it's not necessary.

I don't see there is any problem with it.

those just some kind of improvement without considering kdump.
because kdump/scripts already does good job to provide right exactmap
with usable and acpi reserved areas.
for mmconf, some system that range reserved in e820, and some have that in ACPI.
and those systems will have that mmconf enabled in kdumped kernel.

attached is what I like to have with exactusablemap, but maybe is not
needed, and we can just stay with exactmap...

ps: we don't need to add e820_remove_type...

Yinghai
Thomas Renninger - Jan. 14, 2013, 3:05 p.m.
On Monday, January 14, 2013 03:43:46 AM Yinghai Lu wrote:
> On Sun, Jan 13, 2013 at 6:08 PM, Thomas Renninger <trenn@suse.de> wrote:
> > On Saturday, January 12, 2013 09:07:12 AM Yinghai Lu wrote:
...
> everyone could understand it straightforward:
> exactmap:  memmap will be specified, and it should be honored without
> considering old any memmap.
> exactusablemap: will make sure only old ram range get removed, and
> specified usable ranges will become final usable
> ranges in the final memmap. so we need to remove overlapping to old
> reserved ranges.
> 
> aka: exact means EXACT ...
The naming is not my point...
Anyway after a quick talk with Alexander Graf, I guess I won't have
much of a chance: rule is rule and a boot param is a public interface
which must not change just like that (deprecation phase, etc.).

...
 
> those just some kind of improvement without considering kdump.
> because kdump/scripts already does good job to provide right exactmap
> with usable and acpi reserved areas.
No it's conceptional wrong to provide the reserved areas via memmap
while the original ones are declared already via boot loader structures.

There are store ACPI NVS memory in suspend/resume workarounds (suspend does not
fit that much for kdump, but there may be more than this and mmconf).
The reserved areas in kdump kernel should be the same than with the
original e820 table to avoid any unforseen issues.
And kdump got them already passed and should use this info.

> for mmconf, some system that range reserved in e820, and some have that in ACPI.
> and those systems will have that mmconf enabled in kdumped kernel.

 
> attached is what I like to have with exactusablemap, but maybe is not
> needed, and we can just stay with exactmap...
> 
> ps: we don't need to add e820_remove_type...
I thought this tiny loop:
+       for (i = 0; i < e820.nr_map; i++) {
+               struct e820entry *ei = &e820.map[i];
+               if (ei->type == type) {
+                       memset(ei, 0, sizeof(struct e820entry));
+                       continue;
+               }
+       }
would be easier to evaluate what it's doing (cmp to.
e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);)
But I do not have a strong opinion on that.

What is this for?:
@@ -871,6 +879,11 @@ static int __init parse_memmap_one(char
 	userdef = 1;
 	if (*p == '@') {
 		start_at = memparse(p+1, &p);
+		if (exactusablemap_parsed) {
+			/* remove all range with other types */
+			e820_remove_range(start_at, mem_size,
+						 E820_RAM, 0);
+		}
 		e820_add_region(start_at, mem_size, E820_RAM);
 	} else if (*p == '#') {
 		start_at = memparse(p+1, &p);


    Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - Jan. 14, 2013, 7:04 p.m.
On Mon, Jan 14, 2013 at 7:05 AM, Thomas Renninger <trenn@suse.de> wrote:
> What is this for?:
> @@ -871,6 +879,11 @@ static int __init parse_memmap_one(char
>         userdef = 1;
>         if (*p == '@') {
>                 start_at = memparse(p+1, &p);
> +               if (exactusablemap_parsed) {
> +                       /* remove all range with other types */
> +                       e820_remove_range(start_at, mem_size,
> +                                                E820_RAM, 0);
> +               }
>                 e820_add_region(start_at, mem_size, E820_RAM);
>         } else if (*p == '#') {
>                 start_at = memparse(p+1, &p);

remove all old renges before add E820_RAM, otherwise new add E820
ranges could be ignored.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Renninger - Jan. 15, 2013, 12:54 a.m.
On Monday, January 14, 2013 11:04:36 AM Yinghai Lu wrote:
> On Mon, Jan 14, 2013 at 7:05 AM, Thomas Renninger <trenn@suse.de> wrote:
> > What is this for?:
> > @@ -871,6 +879,11 @@ static int __init parse_memmap_one(char
> > 
> >         userdef = 1;
> >         if (*p == '@') {
> >         
> >                 start_at = memparse(p+1, &p);
> > 
> > +               if (exactusablemap_parsed) {
> > +                       /* remove all range with other types */
> > +                       e820_remove_range(start_at, mem_size,
> > +                                                E820_RAM, 0);
> > +               }
> > 
> >                 e820_add_region(start_at, mem_size, E820_RAM);
> >         
> >         } else if (*p == '#') {
> >         
> >                 start_at = memparse(p+1, &p);
> 
> remove all old renges before add E820_RAM, otherwise new add E820
> ranges could be ignored.
But this is intended?
kexec must never request reserved memory to be used as ordinary E820_RAM
by the kdump kernel.
This also reverts what exactusablemap is all about:
Keep all reserved memory ranges of the original BIOS map.

Above would again wrongly remove the mmconf and other reserved regions
if kexec passes memmap=exactuseablemap,x@y

From what I can see the patch looks fine, but above part should
simply be left out.

   Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu - Jan. 15, 2013, 4:45 a.m.
On Mon, Jan 14, 2013 at 4:54 PM, Thomas Renninger <trenn@suse.de> wrote:
> On Monday, January 14, 2013 11:04:36 AM Yinghai Lu wrote:
>> On Mon, Jan 14, 2013 at 7:05 AM, Thomas Renninger <trenn@suse.de> wrote:
>> > What is this for?:
>> > @@ -871,6 +879,11 @@ static int __init parse_memmap_one(char
>> >
>> >         userdef = 1;
>> >         if (*p == '@') {
>> >
>> >                 start_at = memparse(p+1, &p);
>> >
>> > +               if (exactusablemap_parsed) {
>> > +                       /* remove all range with other types */
>> > +                       e820_remove_range(start_at, mem_size,
>> > +                                                E820_RAM, 0);
>> > +               }
>> >
>> >                 e820_add_region(start_at, mem_size, E820_RAM);
>> >
>> >         } else if (*p == '#') {
>> >
>> >                 start_at = memparse(p+1, &p);
>>
>> remove all old renges before add E820_RAM, otherwise new add E820
>> ranges could be ignored.
> But this is intended?
> kexec must never request reserved memory to be used as ordinary E820_RAM
> by the kdump kernel.
> This also reverts what exactusablemap is all about:
> Keep all reserved memory ranges of the original BIOS map.
>
> Above would again wrongly remove the mmconf and other reserved regions
> if kexec passes memmap=exactuseablemap,x@y
>
> From what I can see the patch looks fine, but above part should
> simply be left out.

then, I would like to rename it to resetusablemap instead.

like attached.

Thanks

Yinghai
Thomas Renninger - Jan. 22, 2013, 3:21 p.m.
On Tuesday, January 15, 2013 05:45:43 AM Yinghai Lu wrote:
> On Mon, Jan 14, 2013 at 4:54 PM, Thomas Renninger <trenn@suse.de> wrote:
...
> > From what I can see the patch looks fine, but above part should
> > simply be left out.
> 
> then, I would like to rename it to resetusablemap instead.
> 
> like attached.

I tried this one out on linux-x86-tip tree on the mm2 branch
This one already had your patch:
x86, mm: Let "memmap=" take more entries one time
and I additionally added:
x86 e820: Check for exactmap appearance when parsing first memmap option
and the one you posted one mail earlier:
x86 e820: Introduce memmap=resetusablemap for kdump usage

I just re-posted your 2 patches also adding some kernel-parameter
documention, subject:
[PATCH 0/2] Only parse exactmap once, introduce memmap=resetusablemap

I adjusted kexec tools to identify the kernel version from the kernel
it loads.
I posted these patches separately, subject (google should find it
pretty soon):
[PATCH 0/3] Make use of new memmap= kernel parameter syntax

for those who are interested in these to:
  - make use of comma separated memmap= option
    Let "memmap=" take more entries one time
  - Do not pass unusable (ACPI or whatever reserved mem) via memmap=x#y
  - Do use memmap=resetusablemap instead of memmap=exactmap
in case the kernel to load is of version 3.9 or newer.
Otherwise there is no change.

Unfortunately I cannot see a proper way to backport this.
The only way may be to:
  - In the backported kernel inverse the condition to prefer resetusablemap
    over exactmap
  - In kexec pass both to older kernels: memmap=exactmap memmap=resetusablemap
    and all the ACPI reservation memmap= params


I tried this on a somewhat quicker booting machine with quite some more
memmap= params passed by kexec. I reset the machine hard when it started
kdump:
Copying data                       : [ 20 %]
as I expect that if it comes that far, things work. In both cases the
kernel was dumped:

kexec using new resetusablemap syntax (with kexec debug enabled):
===================================
./build/sbin/kexec -d -p /boot/vmlinuz-3.7.0-rc6-default+ --append="root=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part6 console=tty0 
console=ttyS0,57600 sysrq_always_enabled panic=100 ignore_loglevel resume=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part2 apic=verbose debug 
vga=normal elevator=deadline sysrq=yes reset_devices irqpoll maxcpus=1  " --initrd=/boot/initrd-3.7.0-rc6-default+-kdump...

Kernel release: 3.7.0-rc6-d in long format: 0x30700
...
root=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part6 console=tty0 console=ttyS0,57600 sysrq_always_enabled panic=100 ignore_loglevel 
resume=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part2 apic=verbose debug vga=normal elevator=deadline sysrq=yes reset_devices irqpoll maxcpus=1   
memmap=resetusablemap,559K@64K,261560K@638976K elfcorehdr=900536K
...
===================================

important parts of crash kernel log with new syntax:
===================================
[48023.711213] RIP  [<ffffffff812f368d>] sysrq_handle_crash+0xd/0x20
[48023.724765]  RSP <ffff88042411fe90>
[48023.732539] CR2: 0000000000000000
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.7.0-rc6-default+ (trenn@ett) (gcc version 4.5.1 20101208 [gcc-4_5-branch revision 167585] (SUSE Linux) ) #2 SMP Tue Jan 22 01:43:26 
CET 2013
[    0.000000] Command line: root=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part6 console=tty0 console=ttyS0,57600 sysrq_always_enabled panic=100 
ignore_loglevel resume=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part2 apic=verbose debug vga=normal elevator=deadline sysrq=yes reset_devices 
irqpoll maxcpus=1   memmap=resetusablemap,559K@64K,261560K@638976K elfcorehdr=900536K
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000100-0x000000000009bbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009bc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000b93dafff] usable
[    0.000000] BIOS-e820: [mem 0x00000000b93db000-0x00000000b9454fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000b9455000-0x00000000bb155fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bb156000-0x00000000bb166fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bb167000-0x00000000bb3d7fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bb3d8000-0x00000000bb6d8fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bb6d9000-0x00000000bd9fcfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bd9fd000-0x00000000bdbfcfff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bdbfd000-0x00000000bdcdcfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bdcdd000-0x00000000bdde6fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bdde7000-0x00000000bde8ffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bde90000-0x00000000bde90fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bde91000-0x00000000bdf07fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bdf08000-0x00000000bdf08fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bdf09000-0x00000000bdf0afff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bdf0b000-0x00000000bdf0bfff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bdf0c000-0x00000000bdf0cfff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bdf0d000-0x00000000bdf23fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bdf24000-0x00000000bdfb0fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bdfb1000-0x00000000bdffffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000be000000-0x00000000cfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed19000-0x00000000fed19fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ffa20000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000083fffffff] usable
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] e820: last_pfn = 0x840000 max_arch_pfn = 0x400000000
[    0.000000] e820: remove [mem 0x00000000-0xfffffffffffffffe] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] e820: user-defined physical RAM map:
[    0.000000] user: [mem 0x0000000000010000-0x000000000009bbff] usable
[    0.000000] user: [mem 0x000000000009bc00-0x000000000009ffff] reserved
[    0.000000] user: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] user: [mem 0x0000000027000000-0x0000000036f6dfff] usable
[    0.000000] user: [mem 0x00000000b93db000-0x00000000b9454fff] ACPI data
[    0.000000] user: [mem 0x00000000bb156000-0x00000000bb166fff] reserved
[    0.000000] user: [mem 0x00000000bb3d8000-0x00000000bb6d8fff] ACPI NVS
[    0.000000] user: [mem 0x00000000bd9fd000-0x00000000bdbfcfff] ACPI NVS
[    0.000000] user: [mem 0x00000000bdcdd000-0x00000000bdde6fff] reserved
[    0.000000] user: [mem 0x00000000bdde7000-0x00000000bde8ffff] ACPI NVS
[    0.000000] user: [mem 0x00000000bde90000-0x00000000bde90fff] ACPI data
[    0.000000] user: [mem 0x00000000bde91000-0x00000000bdf07fff] ACPI NVS
[    0.000000] user: [mem 0x00000000bdf08000-0x00000000bdf08fff] ACPI data
[    0.000000] user: [mem 0x00000000bdf09000-0x00000000bdf0afff] ACPI NVS
[    0.000000] user: [mem 0x00000000bdf0b000-0x00000000bdf0bfff] ACPI data
[    0.000000] user: [mem 0x00000000bdf0c000-0x00000000bdf0cfff] ACPI NVS
[    0.000000] user: [mem 0x00000000bdf0d000-0x00000000bdf23fff] ACPI data
[    0.000000] user: [mem 0x00000000bdf24000-0x00000000bdfb0fff] ACPI NVS
[    0.000000] user: [mem 0x00000000be000000-0x00000000cfffffff] reserved
[    0.000000] user: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] user: [mem 0x00000000fed19000-0x00000000fed19fff] reserved
[    0.000000] user: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    0.000000] user: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] user: [mem 0x00000000ffa20000-0x00000000ffffffff] reserved
[    0.000000] DMI 2.6 present.
...
[    2.370015] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xc0000000-0xcfffffff] (base 0xc0000000)
[    2.390823] PCI: MMCONFIG at [mem 0xc0000000-0xcfffffff] reserved in E820

===================================
===================================

kexec using old exactmap syntax (with kexec debug enabled):
===================================
./build/sbin/kexec -d -p /boot/vmlinuz-3.7.0-rc6-default+ --append="root=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part6 console=tty0 
console=ttyS0,57600 sysrq_always_enabled panic=100 ignore_loglevel resume=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part2 apic=verbose debug 
vga=normal elevator=deadline sysrq=yes reset_devices irqpoll maxcpus=1  " --initrd=/boot/initrd-3.7.0-rc6-default+-kdump

...
root=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part6 console=tty0 console=ttyS0,57600 sysrq_always_enabled panic=100 igno
re_loglevel resume=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part2 apic=verbose debug vga=normal elevator=deadline sysrq=
yes reset_devices irqpoll maxcpus=1   memmap=exactmap memmap=559K@64K memmap=261560K@638976K elfcorehdr=900536K memmap=488K#3034988K memma
p=3076K#3067744K memmap=2048K#3106804K memmap=676K#3110812K memmap=4K#3111488K memmap=476K#3111492K memmap=4K#3111968K memmap=8K#3111972K 
memmap=4K#3111980K memmap=4K#3111984K memmap=92K#3111988K memmap=564K#3112080K
===================================

important parts of crash kernel log with old syntax
(compare unnecessary memmap=x#y additions and the
resulting broken e820 user defined map):
===================================
[  856.564790] RIP  [<ffffffff812f368d>] sysrq_handle_crash+0xd/0x20
[  856.578345]  RSP <ffff880424197e90>
[  856.586124] CR2: 0000000000000000
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.7.0-rc6-default+ (trenn@ett) (gcc version 4.5.1 20101208 [gcc-4_5-branch revision 167585] (SUSE Linux) ) #2 SMP Tue Jan 22 01:43:26 
CET 2013
[    0.000000] Command line: root=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part6 console=tty0 console=ttyS0,57600 sysrq_always_enabled panic=100 
ignore_loglevel resume=/dev/disk/by-id/ata-Hitachi_HDS721016CLA382_JPAB40HM2KUK6B-part2 apic=verbose debug vga=normal elevator=deadline sysrq=yes reset_devices 
irqpoll maxcpus=1   memmap=exactmap memmap=559K@64K memmap=261560K@638976K elfcorehdr=900536K memmap=488K#3034988K memmap=3076K#3067744K memmap=2048K#3106804K 
memmap=676K#3110812K memmap=4K#3111488K memmap=476K#3111492K memmap=4K#3111968K memmap=8K#3111972K memmap=4K#3111980K memmap=4K#3111984K memmap=92K#3111988K 
memmap=564K#3112080K
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000100-0x000000000009bbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009bc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000b93dafff] usable
[    0.000000] BIOS-e820: [mem 0x00000000b93db000-0x00000000b9454fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000b9455000-0x00000000bb155fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bb156000-0x00000000bb166fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bb167000-0x00000000bb3d7fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bb3d8000-0x00000000bb6d8fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bb6d9000-0x00000000bd9fcfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bd9fd000-0x00000000bdbfcfff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bdbfd000-0x00000000bdcdcfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bdcdd000-0x00000000bdde6fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bdde7000-0x00000000bde8ffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bde90000-0x00000000bde90fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bde91000-0x00000000bdf07fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bdf08000-0x00000000bdf08fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bdf09000-0x00000000bdf0afff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bdf0b000-0x00000000bdf0bfff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bdf0c000-0x00000000bdf0cfff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bdf0d000-0x00000000bdf23fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bdf24000-0x00000000bdfb0fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bdfb1000-0x00000000bdffffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000be000000-0x00000000cfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed19000-0x00000000fed19fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ffa20000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000083fffffff] usable
[    0.000000] debug: ignoring loglevel setting.
[    0.000000] e820: last_pfn = 0x840000 max_arch_pfn = 0x400000000
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] e820: user-defined physical RAM map:
[    0.000000] user: [mem 0x0000000000010000-0x000000000009bbff] usable
[    0.000000] user: [mem 0x0000000027000000-0x0000000036f6dfff] usable
[    0.000000] user: [mem 0x00000000b93db000-0x00000000b9454fff] ACPI data
[    0.000000] user: [mem 0x00000000bb3d8000-0x00000000bb6d8fff] ACPI data
[    0.000000] user: [mem 0x00000000bd9fd000-0x00000000bdbfcfff] ACPI data
[    0.000000] user: [mem 0x00000000bdde7000-0x00000000bdfb0fff] ACPI data
[    0.000000] DMI 2.6 present.
...
[    2.264005] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xc0000000-0xcfffffff] (base 0xc0000000)
[    2.284811] PCI: not using MMCONFIG
===================================

     Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 363e348..739b665 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1510,10 +1510,20 @@  bytes respectively. Such letter suffixes can also be entirely omitted.
 			per-device physically contiguous DMA buffers.
 
 	memmap=exactmap	[KNL,X86] Enable setting of an exact
-			E820 memory map, as specified by the user.
-			Such memmap=exactmap lines can be constructed based on
-			BIOS output or other requirements. See the memmap=nn@ss
-			option description.
+			E820 usable memory map, as specified by the user.
+			All unusable (reserved, ACPI, NVS,...) ranges from the
+			original e820 table are preserved.
+			But the usable memory regions from the original e820
+			table are removed.
+			This parameter is explicitly for kdump usage:
+			The memory the kdump kernel is allowed to use must
+			be passed via below memmap=nn[KMG]@ss[KMG] param.
+			All reserved regions the kernel may use for ioremapping
+			and similar are still considered.
+
+	memmap=voidmap  [KNL,X86] Do not use any e820 ranges from BIOS or
+			bootloader. Instead you have to pass regions via
+			below memmap= options.
 
 	memmap=nn[KMG]@ss[KMG]
 			[KNL] Force usage of a specific region of memory
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index dc0b9f0..4a3803a 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -559,6 +559,19 @@  u64 __init e820_remove_range(u64 start, u64 size, unsigned old_type,
 	return real_removed_size;
 }
 
+static void __init e820_remove_range_type(u32 type)
+{
+	int i;
+
+	for (i = 0; i < e820.nr_map; i++) {
+		struct e820entry *ei = &e820.map[i];
+		if (ei->type == type) {
+			memset(ei, 0, sizeof(struct e820entry));
+			continue;
+		}
+	}
+}
+
 void __init update_e820(void)
 {
 	u32 nr_map;
@@ -835,21 +848,32 @@  static int __init parse_memopt(char *p)
 }
 early_param("mem", parse_memopt);
 
-static bool __initdata exactmap_parsed;
+static bool __initdata memmap_checked;
 
 static int __init parse_memmap_one(char *p)
 {
 	char *oldp;
 	u64 start_at, mem_size;
+	char *cmdline = boot_command_line;
 
 	if (!p)
 		return -EINVAL;
 
 	if (!strncmp(p, "exactmap", 8)) {
-		if (exactmap_parsed)
+		if (memmap_checked)
 			return 0;
 
-		exactmap_parsed = true;
+		memmap_checked = true;
+		cmdline = strstr(cmdline, "elfcorehdr");
+		if (!cmdline)
+			/*
+			 * No kdump kernel, but exactmap used.
+			 * Tell user about exactmap changes.
+			 * Remove this after some kernel revisions.
+			 */
+			pr_info(
+		"memmap=exactmap changed, use voidmap for old behavior\n");
+
 #ifdef CONFIG_CRASH_DUMP
 		/*
 		 * If we are doing a crash dump, we still need to know
@@ -858,6 +882,22 @@  static int __init parse_memmap_one(char *p)
 		 */
 		saved_max_pfn = e820_end_of_ram_pfn();
 #endif
+		/*
+		 * Remove all usable memory (this is for kdump), usable
+		 * memory will be passed via memmap=X@Y parameter
+		 */
+		e820_remove_range_type(E820_RAM);
+		userdef = 1;
+		return 0;
+	} else if (!strncmp(p, "voidmap", 7)) {
+		if (memmap_checked)
+			return 0;
+
+		memmap_checked = true;
+
+#ifdef CONFIG_CRASH_DUMP
+		saved_max_pfn = e820_end_of_ram_pfn();
+#endif
 		e820.nr_map = 0;
 		userdef = 1;
 		return 0;