Patchwork QEMU e820 reservation patch

login
register
mail settings
Submitter Jes Sorensen
Date Feb. 15, 2010, 5:33 p.m.
Message ID <4B79857A.1030808@redhat.com>
Download mbox | patch
Permalink /patch/45412/
State New
Headers show

Comments

Jes Sorensen - Feb. 15, 2010, 5:33 p.m.
Hi,

Kevin and I have agreed on the approach for this one now. So here is
the latest version of the patch for QEMU, submitting e820 reservation
entries via fw_cfg.

Cheers,
Jes
Use qemu-cfg to provide the BIOS with an optional table of e820 entries.

Notify the BIOS of the location of the TSS+EPT range to by reserving
it via the e820 table.

This matches a corresponding patch for Seabios, however older versions
of Seabios will default to the hardcoded address range and stay
compatible with current QEMU.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>

---
 hw/pc.c           |   35 +++++++++++++++++++++++++++++++++++
 hw/pc.h           |   10 ++++++++++
 target-i386/kvm.c |    8 ++++++++
 3 files changed, 53 insertions(+)
Stefano Stabellini - Feb. 15, 2010, 6:54 p.m.
On Mon, 15 Feb 2010, Jes Sorensen wrote:
> Hi,
> 
> Kevin and I have agreed on the approach for this one now. So here is
> the latest version of the patch for QEMU, submitting e820 reservation
> entries via fw_cfg.
> 

I think the interface is fine and it is perfectly usable by Xen as well.

Cheers,

Stefano
Jes Sorensen - Feb. 16, 2010, 8:49 a.m.
On 02/15/10 19:54, Stefano Stabellini wrote:
> On Mon, 15 Feb 2010, Jes Sorensen wrote:
>> Kevin and I have agreed on the approach for this one now. So here is
>> the latest version of the patch for QEMU, submitting e820 reservation
>> entries via fw_cfg.
>
> I think the interface is fine and it is perfectly usable by Xen as well.

Hi Stefano,

Thanks for the heads up! Glad it is flexible enough for Xen as well.

Cheers,
Jes
Anthony Liguori - Feb. 19, 2010, 9:02 p.m.
On 02/15/2010 11:33 AM, Jes Sorensen wrote:
> Hi,
>
> Kevin and I have agreed on the approach for this one now. So here is
> the latest version of the patch for QEMU, submitting e820 reservation
> entries via fw_cfg.
>
> Cheers,
> Jes

Acked-by: Anthony Liguori <aliguori@us.ibm.com>

Would be nice to use git-send-email in the future as your attachments 
are often rendered poorly by mail clients.

I noticed that you use this for the TSS page with EPT but you don't use 
this interface for the rest of memory.  I'm curious what you think the 
right long term split is?  If QEMU is not managing the full e820 table, 
can we reasonable add entries on our own?

Regards,

Anthony Liguori
Anthony Liguori - Feb. 19, 2010, 10 p.m.
On 02/15/2010 11:33 AM, Jes Sorensen wrote:
> Hi,
>
> Kevin and I have agreed on the approach for this one now. So here is
> the latest version of the patch for QEMU, submitting e820 reservation
> entries via fw_cfg.
>
> Cheers,
> Jes
>

Applied.  Thanks.

Regards,

Anthony Liguori
Jes Sorensen - Feb. 21, 2010, 5:44 p.m.
On 02/19/10 22:02, Anthony Liguori wrote:
> Acked-by: Anthony Liguori <aliguori@us.ibm.com>
>
> Would be nice to use git-send-email in the future as your attachments
> are often rendered poorly by mail clients.

Just switching over from quilt to git, and still having issues with
git-send-email. It does some things nicely, other things are a pain in
the b**t with it, especially for single patch submissions.

> I noticed that you use this for the TSS page with EPT but you don't use
> this interface for the rest of memory. I'm curious what you think the
> right long term split is? If QEMU is not managing the full e820 table,
> can we reasonable add entries on our own?

I'd like to have QEMU handle more, I picked the TSS page because we
changed the location of that in the past and it was the one that
triggered my patch in the first place. Now we have the infrastructure,
it will be easier to add more.

Cheers,
Jes
Kevin O'Connor - Feb. 21, 2010, 7:13 p.m.
On Sun, Feb 21, 2010 at 06:44:55PM +0100, Jes Sorensen wrote:
> On 02/19/10 22:02, Anthony Liguori wrote:
> >I noticed that you use this for the TSS page with EPT but you don't use
> >this interface for the rest of memory. I'm curious what you think the
> >right long term split is? If QEMU is not managing the full e820 table,
> >can we reasonable add entries on our own?
> 
> I'd like to have QEMU handle more, I picked the TSS page because we
> changed the location of that in the past and it was the one that
> triggered my patch in the first place. Now we have the infrastructure,
> it will be easier to add more.

What parts of the memory map do you envision qemu managing?

On qemu, SeaBIOS manages the map for ram under 1M, creates the map for
high-memory based on the max memory sizes located in cmos, and it
manages reserved entries for the acpi/smbios tables.  (It also adds an
entry for the rom (the last 256K of the 4gig space), which, BTW, would
be nice to include in your patch.)

Interestingly, on coreboot, SeaBIOS reads the memory map from coreboot
and calculates the max memory sizes (the opposite of what it does on
qemu).  Also, it's coreboot that generates the bios tables - SeaBIOS
uses the passed in memory map to locate the tables and copy the
required parts into the f-segment.

Are you thinking of moving qemu more torwards what coreboot does, or
did you have a different idea in mind?

-Kevin
Gleb Natapov - Feb. 22, 2010, 8:33 a.m.
On Sun, Feb 21, 2010 at 02:13:51PM -0500, Kevin O'Connor wrote:
> On Sun, Feb 21, 2010 at 06:44:55PM +0100, Jes Sorensen wrote:
> > On 02/19/10 22:02, Anthony Liguori wrote:
> > >I noticed that you use this for the TSS page with EPT but you don't use
> > >this interface for the rest of memory. I'm curious what you think the
> > >right long term split is? If QEMU is not managing the full e820 table,
> > >can we reasonable add entries on our own?
> > 
> > I'd like to have QEMU handle more, I picked the TSS page because we
> > changed the location of that in the past and it was the one that
> > triggered my patch in the first place. Now we have the infrastructure,
> > it will be easier to add more.
> 
> What parts of the memory map do you envision qemu managing?
> 
> On qemu, SeaBIOS manages the map for ram under 1M, creates the map for
> high-memory based on the max memory sizes located in cmos, and it
> manages reserved entries for the acpi/smbios tables.  (It also adds an
> entry for the rom (the last 256K of the 4gig space), which, BTW, would
> be nice to include in your patch.)
> 
> Interestingly, on coreboot, SeaBIOS reads the memory map from coreboot
> and calculates the max memory sizes (the opposite of what it does on
> qemu).  Also, it's coreboot that generates the bios tables - SeaBIOS
> uses the passed in memory map to locate the tables and copy the
> required parts into the f-segment.
> 
> Are you thinking of moving qemu more torwards what coreboot does, or
> did you have a different idea in mind?
> 
We shouldn't compare coreboot with qemu. Qemu is a hardware. Coreboot
is part of a firmware.

--
			Gleb.
Avi Kivity - Feb. 22, 2010, 9:23 a.m.
On 02/21/2010 09:13 PM, Kevin O'Connor wrote:
>
>> I'd like to have QEMU handle more, I picked the TSS page because we
>> changed the location of that in the past and it was the one that
>> triggered my patch in the first place. Now we have the infrastructure,
>> it will be easier to add more.
>>      
> What parts of the memory map do you envision qemu managing?
>    

Not a lot.  SeaBIOS should manage hardware devices memory requirements, 
as it does with PCI and HPET.  That leaves non-hardware memory 
requirements.  The kvm reserved areas fall into this category, but I 
don't think there are others.
Kevin O'Connor - Feb. 23, 2010, 1:31 a.m.
On Mon, Feb 22, 2010 at 10:33:12AM +0200, Gleb Natapov wrote:
> On Sun, Feb 21, 2010 at 02:13:51PM -0500, Kevin O'Connor wrote:
> > Are you thinking of moving qemu more torwards what coreboot does, or
> > did you have a different idea in mind?
> > 
> We shouldn't compare coreboot with qemu. Qemu is a hardware. Coreboot
> is part of a firmware.

Coreboot and qemu often face the same problems when trying to pass
information into the BIOS.  I think it helps to look at how others
have solved similar problems.

-Kevin
Gleb Natapov - Feb. 23, 2010, 8:22 a.m.
On Mon, Feb 22, 2010 at 08:31:00PM -0500, Kevin O'Connor wrote:
> On Mon, Feb 22, 2010 at 10:33:12AM +0200, Gleb Natapov wrote:
> > On Sun, Feb 21, 2010 at 02:13:51PM -0500, Kevin O'Connor wrote:
> > > Are you thinking of moving qemu more torwards what coreboot does, or
> > > did you have a different idea in mind?
> > > 
> > We shouldn't compare coreboot with qemu. Qemu is a hardware. Coreboot
> > is part of a firmware.
> 
> Coreboot and qemu often face the same problems when trying to pass
> information into the BIOS.  I think it helps to look at how others
> have solved similar problems.
>
Since qemu is a HW and coreboot is one part of firmware stack the
information they are passing to Seabios is often fundamentally different.
It is OK for coreboot to create ACPI/SMBIOS/E820 tables and pass them to
Seabios, but it is not OK if qemu does that. Information that QEMU pass
to Seabios can be divided into two types. First one can be classified
as board description. It is needed so Seabios would be able to support
more then one qemu configuration without recompile. Second is "bios
configuration" (boot priority, show bunner, etc). I don't know who
manages this information on coreboot + Seabios combo, but I think it
should be Seabios, so this kind of info should not be passed between
coreboot an Seabios at all.

--
			Gleb.
Anthony Liguori - Feb. 23, 2010, 1:50 p.m.
On 02/23/2010 02:22 AM, Gleb Natapov wrote:
> On Mon, Feb 22, 2010 at 08:31:00PM -0500, Kevin O'Connor wrote:
>    
>> On Mon, Feb 22, 2010 at 10:33:12AM +0200, Gleb Natapov wrote:
>>      
>>> On Sun, Feb 21, 2010 at 02:13:51PM -0500, Kevin O'Connor wrote:
>>>        
>>>> Are you thinking of moving qemu more torwards what coreboot does, or
>>>> did you have a different idea in mind?
>>>>
>>>>          
>>> We shouldn't compare coreboot with qemu. Qemu is a hardware. Coreboot
>>> is part of a firmware.
>>>        
>> Coreboot and qemu often face the same problems when trying to pass
>> information into the BIOS.  I think it helps to look at how others
>> have solved similar problems.
>>
>>      
> Since qemu is a HW and coreboot is one part of firmware stack the
> information they are passing to Seabios is often fundamentally different.
> It is OK for coreboot to create ACPI/SMBIOS/E820 tables and pass them to
> Seabios, but it is not OK if qemu does that.

Actually, we do passthrough ACPI tables (you wrote that ;-)) and we 
build SMBIOS tables and pass them through to Seabios.

Regards,

Anthony Liguori

>   Information that QEMU pass
> to Seabios can be divided into two types. First one can be classified
> as board description. It is needed so Seabios would be able to support
> more then one qemu configuration without recompile. Second is "bios
> configuration" (boot priority, show bunner, etc). I don't know who
> manages this information on coreboot + Seabios combo, but I think it
> should be Seabios, so this kind of info should not be passed between
> coreboot an Seabios at all.
>
> --
> 			Gleb.
>
>
>
Gleb Natapov - Feb. 23, 2010, 2:01 p.m.
On Tue, Feb 23, 2010 at 07:50:16AM -0600, Anthony Liguori wrote:
> On 02/23/2010 02:22 AM, Gleb Natapov wrote:
> >On Mon, Feb 22, 2010 at 08:31:00PM -0500, Kevin O'Connor wrote:
> >>On Mon, Feb 22, 2010 at 10:33:12AM +0200, Gleb Natapov wrote:
> >>>On Sun, Feb 21, 2010 at 02:13:51PM -0500, Kevin O'Connor wrote:
> >>>>Are you thinking of moving qemu more torwards what coreboot does, or
> >>>>did you have a different idea in mind?
> >>>>
> >>>We shouldn't compare coreboot with qemu. Qemu is a hardware. Coreboot
> >>>is part of a firmware.
> >>Coreboot and qemu often face the same problems when trying to pass
> >>information into the BIOS.  I think it helps to look at how others
> >>have solved similar problems.
> >>
> >Since qemu is a HW and coreboot is one part of firmware stack the
> >information they are passing to Seabios is often fundamentally different.
> >It is OK for coreboot to create ACPI/SMBIOS/E820 tables and pass them to
> >Seabios, but it is not OK if qemu does that.
> 
> Actually, we do passthrough ACPI tables (you wrote that ;-)) and we
I remember :) It allows to pass additional tables, not overwrite
tables that Seabios creates.

> build SMBIOS tables and pass them through to Seabios.
> 
I freely admit those are hacks :( And smbios one allows to much
control IMHO.


> Regards,
> 
> Anthony Liguori
> 
> >  Information that QEMU pass
> >to Seabios can be divided into two types. First one can be classified
> >as board description. It is needed so Seabios would be able to support
> >more then one qemu configuration without recompile. Second is "bios
> >configuration" (boot priority, show bunner, etc). I don't know who
> >manages this information on coreboot + Seabios combo, but I think it
> >should be Seabios, so this kind of info should not be passed between
> >coreboot an Seabios at all.
> >
> >--
> >			Gleb.
> >
> >

--
			Gleb.

Patch

Index: qemu/hw/pc.c
===================================================================
--- qemu.orig/hw/pc.c
+++ qemu/hw/pc.c
@@ -59,6 +59,7 @@ 
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1)
 #define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2)
+#define FW_CFG_E820_TABLE (FW_CFG_ARCH_LOCAL + 3)
 
 #define MAX_IDE_BUS 2
 
@@ -67,6 +68,21 @@  static RTCState *rtc_state;
 static PITState *pit;
 static PCII440FXState *i440fx_state;
 
+#define E820_NR_ENTRIES		16
+
+struct e820_entry {
+    uint64_t address;
+    uint64_t length;
+    uint32_t type;
+};
+
+struct e820_table {
+    uint32_t count;
+    struct e820_entry entry[E820_NR_ENTRIES];
+};
+
+static struct e820_table e820_table;
+
 typedef struct isa_irq_state {
     qemu_irq *i8259;
     qemu_irq *ioapic;
@@ -435,6 +451,23 @@  static void bochs_bios_write(void *opaqu
     }
 }
 
+int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+    int index = e820_table.count;
+    struct e820_entry *entry;
+
+    if (index >= E820_NR_ENTRIES)
+        return -EBUSY;
+    entry = &e820_table.entry[index];
+
+    entry->address = address;
+    entry->length = length;
+    entry->type = type;
+
+    e820_table.count++;
+    return e820_table.count;
+}
+
 static void *bochs_bios_init(void)
 {
     void *fw_cfg;
@@ -466,6 +499,8 @@  static void *bochs_bios_init(void)
     if (smbios_table)
         fw_cfg_add_bytes(fw_cfg, FW_CFG_SMBIOS_ENTRIES,
                          smbios_table, smbios_len);
+    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE, (uint8_t *)&e820_table,
+                     sizeof(struct e820_table));
 
     /* allocate memory for the NUMA channel: one (64bit) word for the number
      * of nodes, one word for each VCPU->node and one word for each node to
Index: qemu/hw/pc.h
===================================================================
--- qemu.orig/hw/pc.h
+++ qemu/hw/pc.h
@@ -150,4 +150,14 @@  void isa_cirrus_vga_init(void);
 void isa_ne2000_init(int base, int irq, NICInfo *nd);
 
 int cpu_is_bsp(CPUState *env);
+
+/* e820 types */
+#define E820_RAM        1
+#define E820_RESERVED   2
+#define E820_ACPI       3
+#define E820_NVS        4
+#define E820_UNUSABLE   5
+
+int e820_add_entry(uint64_t, uint64_t, uint32_t);
+
 #endif
Index: qemu/target-i386/kvm.c
===================================================================
--- qemu.orig/target-i386/kvm.c
+++ qemu/target-i386/kvm.c
@@ -24,6 +24,7 @@ 
 #include "cpu.h"
 #include "gdbstub.h"
 #include "host-utils.h"
+#include "hw/pc.h"
 
 #ifdef CONFIG_KVM_PARA
 #include <linux/kvm_para.h>
@@ -362,6 +363,13 @@  int kvm_arch_init(KVMState *s, int smp_c
      * as unavaible memory.  FIXME, need to ensure the e820 map deals with
      * this?
      */
+    /*
+     * Tell fw_cfg to notify the BIOS to reserve the range.
+     */
+    if (e820_add_entry(0xfffbc000, 0x4000, E820_RESERVED) < 0) {
+        perror("e820_add_entry() table is full");
+        exit(1);
+    }
     return kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000);
 }