Patchwork QEMU-KVM - provide e820 reserve through qemu_cfg

login
register
mail settings
Submitter Jes Sorensen
Date Jan. 25, 2010, 4:49 p.m.
Message ID <4B5DCB93.7050007@redhat.com>
Download mbox | patch
Permalink /patch/43678/
State New
Headers show

Comments

Jes Sorensen - Jan. 25, 2010, 4:49 p.m.
Hi,

This is the QEMU-KVM bits for providing the e820-reserve space through
qemu-cfg.

Cheers,
Jes
Use qemu-cfg to notify the BIOS of the location of the TSS range to
reserve in the e820 table, to avoid relying on hard coded values.

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>

---
 hw/fw_cfg.h       |    5 +++++
 hw/pc.c           |    4 ++++
 kvm.h             |    2 ++
 qemu-kvm-x86.c    |    6 ++++++
 target-i386/kvm.c |    7 +++++++
 5 files changed, 24 insertions(+)
Jes Sorensen - Jan. 25, 2010, 5:13 p.m.
On 01/25/10 17:58, Alexander Graf wrote:
> Howdy. Congratulations to the new mail address - looks neat ;-).

:-)

> Two comments:
>
> 1) I don't see how passing a single region is any help. I'd rather like to see a device tree like table structure
> You'd get one variable for len of the table, one with the contents. So for a universal reserved region specifier you'd get:
>
> <u64 base><u64 len>
>
> Then have len=2 and put data in the table:
>
> <u64 base1><u64 len1><u64 base2><u64 len2>
>
> That way we'd get 2 entries and the chance to enhance them later on. In fact, it might even make sense to pass the whole table in such a form. That way qemu generates all of the e820 tables and we can declare whatever we want. Just add a type field in the table.

I am fine with having QEMU build the e820 tables completely if there is
a consensus to take that path.

> 2) Please inline patches. They showed up as attachments here, making them really hard to comment on.

Sorry Thunderbug doesn't do that well, but they should be attached as
txt?

Cheers,
Jes
Jes Sorensen - Jan. 25, 2010, 5:46 p.m.
On 01/25/10 18:28, Alexander Graf wrote:
>>> That way we'd get 2 entries and the chance to enhance them later on.
>>> In fact, it might even make sense to pass the whole table in such a
>>> form. That way qemu generates all of the e820 tables and we can
>>> declare whatever we want. Just add a type field in the table.
>>
>> I am fine with having QEMU build the e820 tables completely if there is
>> a consensus to take that path.
>
> I agree. We better get this right :-). I don't want to maintain 5
> versions of an 380 fw_cfg interface.

Looking at the internals, some of the e820 entries are based on compile
time constants for the BIOS, so it will be hard to pass those from
QEMU, but we could do it in a way so we pass a number of additional
e820 entries. Ie. address, length, and type.

What do you think?

Cheers,
Jes
Jes Sorensen - Jan. 25, 2010, 9:05 p.m.
On 01/25/10 21:14, Anthony Liguori wrote:
> On 01/25/2010 02:04 PM, Alexander Graf wrote:
>> Yes, sounds good. Should be fairly extensible then. What about memory
>> holes? Do we need to take care of them?
>
> It would be nice for QEMU to be able to add additional e820 regions that
> don't necessarily fit the standard layout model.
>
> For instance, I've thought a number of times about using a large
> reserved region as a shared memory mechanism.
>
> But we certainly need to allow the BIOS to define the regions it needs
> to know about.

I think it should be easy to accommodate using the scheme I am
suggesting. It would require some basic testing for conflicts in the
BIOS, but otherwise it should pretty much allow you to specify any
region you want as a reserved block.

Only problem is that we don't really have a way to pass back info
saying 'you messed up trying to pinch an area that the BIOS wants
for itself'.

I'll take a look at it.

Cheers,
Jes
Jes Sorensen - Jan. 25, 2010, 9:24 p.m.
On 01/25/10 22:08, Alexander Graf wrote:
>
> On 25.01.2010, at 22:05, Jes Sorensen wrote:
>> Only problem is that we don't really have a way to pass back info
>> saying 'you messed up trying to pinch an area that the BIOS wants
>> for itself'.
>
> Eh - the BIOS shouldn't even try to use regions that are declared as reserved using this interface.
> I guess we're mostly talking about DMI and ACPI tables. They can be anywhere in RAM.

What I had in mind with the above was the situation where a user tries
to reserve a region that is hardcoded into the BIOS, such as the address
of the BIOS text/data etc.

I don't think it would be a real problem anyway, if some user wants to
play with it, they have to take the risk of shooting themself in the
foot :)

Cheers,
Jes
Gleb Natapov - Jan. 26, 2010, 6:46 a.m.
On Mon, Jan 25, 2010 at 06:13:35PM +0100, Jes Sorensen wrote:
> On 01/25/10 17:58, Alexander Graf wrote:
> >Howdy. Congratulations to the new mail address - looks neat ;-).
> 
> :-)
> 
> >Two comments:
> >
> >1) I don't see how passing a single region is any help. I'd rather like to see a device tree like table structure
> >You'd get one variable for len of the table, one with the contents. So for a universal reserved region specifier you'd get:
> >
> ><u64 base><u64 len>
> >
> >Then have len=2 and put data in the table:
> >
> ><u64 base1><u64 len1><u64 base2><u64 len2>
> >
> >That way we'd get 2 entries and the chance to enhance them later on. In fact, it might even make sense to pass the whole table in such a form. That way qemu generates all of the e820 tables and we can declare whatever we want. Just add a type field in the table.
> 
> I am fine with having QEMU build the e820 tables completely if there is
> a consensus to take that path.
> 
QEMU can't build the e820 map completely. There are things it doesn't
know. Like how much memory ACPI tables take and where they are located.

> >2) Please inline patches. They showed up as attachments here, making them really hard to comment on.
> 
> Sorry Thunderbug doesn't do that well, but they should be attached as
> txt?
> 
> Cheers,
> Jes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.
Jes Sorensen - Jan. 26, 2010, 8:36 a.m.
On 01/26/10 07:46, Gleb Natapov wrote:
> On Mon, Jan 25, 2010 at 06:13:35PM +0100, Jes Sorensen wrote:
>> I am fine with having QEMU build the e820 tables completely if there is
>> a consensus to take that path.
>>
> QEMU can't build the e820 map completely. There are things it doesn't
> know. Like how much memory ACPI tables take and where they are located.

Good point!

I think the conclusion is to do a load-extra-tables kinda interface 
allowing QEMU to pass in a bunch of them, but leaving things like the
ACPI space for the BIOS to reserve.

Cheers,
Jes

Patch

Index: qemu-kvm/hw/fw_cfg.h
===================================================================
--- qemu-kvm.orig/hw/fw_cfg.h
+++ qemu-kvm/hw/fw_cfg.h
@@ -67,4 +67,9 @@  FWCfgState *fw_cfg_init(uint32_t ctl_por
 
 #endif /* NO_QEMU_PROTOS */
 
+struct fw_cfg_e820_reserve {
+    uint32_t addr;
+    uint32_t length;
+};
+
 #endif
Index: qemu-kvm/hw/pc.c
===================================================================
--- qemu-kvm.orig/hw/pc.c
+++ qemu-kvm/hw/pc.c
@@ -66,6 +66,7 @@ 
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1)
 #define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2)
+#define FW_CFG_E820_RESERVE (FW_CFG_ARCH_LOCAL + 3)
 
 #define MAX_IDE_BUS 2
 
@@ -73,6 +74,7 @@  static fdctrl_t *floppy_controller;
 static RTCState *rtc_state;
 static PITState *pit;
 static PCII440FXState *i440fx_state;
+struct fw_cfg_e820_reserve e820_reserve;
 
 qemu_irq *ioapic_irq_hack;
 
@@ -475,6 +477,8 @@  static void *bochs_bios_init(void)
     if (smbios_table)
         fw_cfg_add_bytes(fw_cfg, FW_CFG_SMBIOS_ENTRIES,
                          smbios_table, smbios_len);
+    fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_RESERVE, (uint8_t *)&e820_reserve,
+                     sizeof(struct fw_cfg_e820_reserve));
 
     /* allocate memory for the NUMA channel: one (64bit) word for the number
      * of nodes, one word for each VCPU->node and one word for each node to
Index: qemu-kvm/kvm.h
===================================================================
--- qemu-kvm.orig/kvm.h
+++ qemu-kvm/kvm.h
@@ -101,6 +101,8 @@  void kvm_arch_reset_vcpu(CPUState *env);
 struct kvm_guest_debug;
 struct kvm_debug_exit_arch;
 
+extern struct fw_cfg_e820_reserve e820_reserve;
+
 struct kvm_sw_breakpoint {
     target_ulong pc;
     target_ulong saved_insn;
Index: qemu-kvm/qemu-kvm-x86.c
===================================================================
--- qemu-kvm.orig/qemu-kvm-x86.c
+++ qemu-kvm/qemu-kvm-x86.c
@@ -23,6 +23,7 @@ 
 
 #include "kvm.h"
 #include "hw/pc.h"
+#include "hw/fw_cfg.h"
 
 #define MSR_IA32_TSC		0x10
 
@@ -37,6 +38,11 @@  int kvm_set_tss_addr(kvm_context_t kvm, 
 {
 #ifdef KVM_CAP_SET_TSS_ADDR
 	int r;
+        /*
+         * Tell fw_cfg to notify the BIOS to reserve the range.
+         */
+        e820_reserve.addr = addr;
+        e820_reserve.length = 0x4000;
 
 	r = kvm_ioctl(kvm_state, KVM_CHECK_EXTENSION, KVM_CAP_SET_TSS_ADDR);
 	if (r > 0) {
Index: qemu-kvm/target-i386/kvm.c
===================================================================
--- qemu-kvm.orig/target-i386/kvm.c
+++ qemu-kvm/target-i386/kvm.c
@@ -25,6 +25,8 @@ 
 #include "gdbstub.h"
 #include "host-utils.h"
 
+extern struct fw_cfg_e820_reserve e820_reserve;
+
 #ifdef KVM_UPSTREAM
 //#define DEBUG_KVM
 
@@ -298,6 +300,11 @@  int kvm_arch_init(KVMState *s, int smp_c
      * as unavaible memory.  FIXME, need to ensure the e820 map deals with
      * this?
      */
+    /*
+     * Tell fw_cfg to notify the BIOS to reserve the range.
+     */
+    e820_reserve.addr = 0xfffbc000;
+    e820_reserve.length = 0x4000;
     return kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000);
 }