Message ID | 5485AF57.6050902@redhat.com |
---|---|
State | New |
Headers | show |
On 8 December 2014 at 14:01, Laszlo Ersek <lersek@redhat.com> wrote: > Peter, can we introduce a second, 64-bit wide, data register, for > fw_cfg? (Or even two -- Drew suggested the LDP instruction for the guest.) I don't have an in-principle objection. I do require that whatever mechanism we provide is usable by 32 bit guests as well as 64 bit guests. You'll also want the access instruction to be one where the hardware provides instruction syndrome information to KVM about the faulting load/store, which means you can only use AArch64 loads and stores of a single general-purpose register, and AArch32 instructions LDR, LDA, LDRT, LDRSH, LDRSHT, LDRH, LDAH, LDRHT, LDRSB, LDRSBT, LDRB, LDAB, LDRBT, STR, STL, STRT, STRH, STLH, STRHT, STRB, STLB, or STRBT. Note that this rules out LDP. We also need to make sure it works with TCG QEMU. (64-bit access to devices is something we've needed previously in ARM QEMU, so though in theory it should work it would need testing.) The other thing you could do is put image and initrd in to guest memory as we do currently and have the fw_cfg just pass their addresses. That would require that UEFI not have already stomped on that part of RAM by the time you get round to looking at it, though, and I can imagine that might not be feasible. -- PMM
On 8 December 2014 at 14:41, Peter Maydell <peter.maydell@linaro.org> wrote: > We also need to make sure it works with TCG QEMU. (64-bit access > to devices is something we've needed previously in ARM QEMU, so > though in theory it should work it would need testing.) "is not something we've needed previously", obviously. -- PMM
On 12/08/14 15:41, Peter Maydell wrote: > On 8 December 2014 at 14:01, Laszlo Ersek <lersek@redhat.com> wrote: >> Peter, can we introduce a second, 64-bit wide, data register, for >> fw_cfg? (Or even two -- Drew suggested the LDP instruction for the guest.) > > I don't have an in-principle objection. I do require that > whatever mechanism we provide is usable by 32 bit guests > as well as 64 bit guests. The idea is that in the following: fw_cfg_data_mem_read() fw_cfg_read() fw_cfg_data_mem_read() doesn't care about either "addr" or "size", which happens to be the best for the current problem. According to the documentation on "MemoryRegionOps" in "include/exec/memory.h" (and also to my testing in practice), if you set .valid.max_access_size to something large (definitely up to 8, but maybe even up to 16?), but keep .impl.max_access_size at something smaller, then "memory.c" will split up the access for you automatically. (Ultimately turning the bigger access to several sequential calls to fw_cfg_read(), which happens to be correct.) So, if a 32-bit guest never "submits" an access wider than 32-bit, then that's the largest that "memory.c" will slice and dice. > You'll also want the access instruction > to be one where the hardware provides instruction syndrome information > to KVM about the faulting load/store, Yes, I definitely remembered this; I just didn't know how to name it precisely. > which means you can only use > AArch64 loads and stores of a single general-purpose register, and > AArch32 instructions LDR, LDA, LDRT, LDRSH, LDRSHT, LDRH, LDAH, LDRHT, > LDRSB, LDRSBT, LDRB, LDAB, LDRBT, STR, STL, STRT, STRH, STLH, STRHT, > STRB, STLB, or STRBT. > > Note that this rules out LDP. Yes, I was concerned about this (Drew can confirm :)) Thank you very much for making this clear; this will definitely help with the guest code. Also we don't have to figure out if .valid.max_access_size=16 would work at all. > We also need to make sure it works with TCG QEMU. (64-bit access > to devices is something we've needed previously in ARM QEMU, so > though in theory it should work it would need testing.) (Factoring in the typo correction from your next mail:) Good point. I just tested TCG. While the speedup is similar, the data that is transferred looks corrupt. > The other thing you could do is put image and initrd in to guest > memory as we do currently and have the fw_cfg just pass their > addresses. That would require that UEFI not have already stomped > on that part of RAM by the time you get round to looking at it, > though, and I can imagine that might not be feasible. Yes, I thought of this. It's not very different from how the initial DTB is handled right now, theoretically. The difference is that the DTB is small. It is placed at 0x4000_0000 (1GB, base of DRAM). The earliest phase of the firmare (SEC and first half of PEI) uses memory at the fixed address 0x4007_c000 (1GB + 496 KB). I would really not want to change that in the firmware. http://thread.gmane.org/gmane.comp.bios.tianocore.devel/8969/focus=9029 We could place the kernel image and the initrd elsewhere, but (a) placing stuff in the DRAM in advance puts minimum DRAM size requirements on the guest (*), which is hard to verify / enforce in the firmware with blobs of wildly varying size, and more importantly, (b) auditing the safety of the initial DTB was already the hardest part of the firmware patchset review by far. (No doubt it was one of the hardest things to implement for Ard too.) (*) See later parts in the same message, wrt. "second world". Later phases need a good chunk of memory counting downwards from the top. See also the SIZE_128MB assert near <https://github.com/tianocore/edk2/blob/master/ArmPlatformPkg/ArmVirtualizationPkg/Library/ArmVirtualizationPlatformLib/Virt.c#L174>. The nice thing about fw_cfg is that it lives in the "128MB..256MB [...] miscellaneous device I/O" space. It's basically impossible to collide with that in UEFI, yet it can give you various blobs (and qemu can even compute those dynamically, see ACPI rebuild after PCI enumeration on x86). I'd *really* like to stick with that. I'll try to figure out what's wrong with TCG and come up with a patch for the second data register. Thanks! Laszlo
Hi Laszlo, On 12/08/2014 09:01 AM, Laszlo Ersek wrote: > On 11/30/14 17:59, Laszlo Ersek wrote: >> fw_cfg already supports exposure over MMIO (used in ppc/mac_newworld.c, >> ppc/mac_oldworld.c, sparc/sun4m.c); we can easily add it to the "virt" >> board. >> >> The mmio register block of fw_cfg is advertized in the device tree. As >> base address we pick 0x09020000, which conforms to the comment preceding >> "a15memmap": it falls in the miscellaneous device I/O range 128MB..256MB, >> and it is aligned at 64KB. The DTB properties follow the documentation in >> the Linux source file "Documentation/devicetree/bindings/arm/fw-cfg.txt". >> >> fw_cfg automatically exports a number of files to the guest; for example, >> "bootorder" (see fw_cfg_machine_reset()). >> >> Signed-off-by: Laszlo Ersek <lersek@redhat.com> >> --- >> >> Notes: >> v2: >> - use a single mmio region of size 0x1000 >> - set "compatible" property to "qemu,fw-cfg-mmio" >> >> hw/arm/virt.c | 21 +++++++++++++++++++++ >> 1 file changed, 21 insertions(+) >> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >> index 314e55b..af794ea 100644 >> --- a/hw/arm/virt.c >> +++ b/hw/arm/virt.c >> @@ -68,6 +68,7 @@ enum { >> VIRT_UART, >> VIRT_MMIO, >> VIRT_RTC, >> + VIRT_FW_CFG, >> }; >> >> typedef struct MemMapEntry { >> @@ -107,6 +108,7 @@ static const MemMapEntry a15memmap[] = { >> [VIRT_GIC_CPU] = { 0x08010000, 0x00010000 }, >> [VIRT_UART] = { 0x09000000, 0x00001000 }, >> [VIRT_RTC] = { 0x09010000, 0x00001000 }, >> + [VIRT_FW_CFG] = { 0x09020000, 0x00001000 }, >> [VIRT_MMIO] = { 0x0a000000, 0x00000200 }, >> /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */ >> /* 0x10000000 .. 0x40000000 reserved for PCI */ >> @@ -519,6 +521,23 @@ static void create_flash(const VirtBoardInfo *vbi) >> g_free(nodename); >> } >> >> +static void create_fw_cfg(const VirtBoardInfo *vbi) >> +{ >> + hwaddr base = vbi->memmap[VIRT_FW_CFG].base; >> + hwaddr size = vbi->memmap[VIRT_FW_CFG].size; >> + char *nodename; >> + >> + fw_cfg_init(0, 0, base, base + 2); >> + >> + nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); >> + qemu_fdt_add_subnode(vbi->fdt, nodename); >> + qemu_fdt_setprop_string(vbi->fdt, nodename, >> + "compatible", "qemu,fw-cfg-mmio"); >> + qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "reg", >> + 2, base, 2, size); >> + g_free(nodename); >> +} >> + >> static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size) >> { >> const VirtBoardInfo *board = (const VirtBoardInfo *)binfo; >> @@ -604,6 +623,8 @@ static void machvirt_init(MachineState *machine) >> */ >> create_virtio_devices(vbi, pic); >> >> + create_fw_cfg(vbi); >> + >> vbi->bootinfo.ram_size = machine->ram_size; >> vbi->bootinfo.kernel_filename = machine->kernel_filename; >> vbi->bootinfo.kernel_cmdline = machine->kernel_cmdline; >> > > So... after playing with this thing for some time, it's become clear > that "MMIO traps" are painfully slow on the aarch64 platform we've been > working on (using KVM). > > The original approach in my guest UEFI patch was a simple loop that > exerted byte-wise access to the fw_cfg device's data register (the only > kind of access that fw_cfg allows ATM). Downloading a kernel image plus > an initrd image byte for byte, which together can total between 30MB and > 50MB, takes simply forever. Just a thought--would it be possible to add a DMA-the-whole-thing-to-this-address register to the simulated device? Chris
On 8 December 2014 at 23:18, Christopher Covington <cov@codeaurora.org> wrote: > Just a thought--would it be possible to add a > DMA-the-whole-thing-to-this-address register to the simulated device? That was an idea raised in the 2010 thread that Laszlo mentioned: http://thread.gmane.org/gmane.comp.emulators.qemu/77582 It's a big old thread so I only skimmed it, but there seemed to be some fairly heavy pushback on the idea -- has anything much changed in the intervening time to make it a better plan this time around? -- PMM
On 12/09/14 00:18, Christopher Covington wrote: > Hi Laszlo, > > On 12/08/2014 09:01 AM, Laszlo Ersek wrote: >> On 11/30/14 17:59, Laszlo Ersek wrote: >>> fw_cfg already supports exposure over MMIO (used in ppc/mac_newworld.c, >>> ppc/mac_oldworld.c, sparc/sun4m.c); we can easily add it to the "virt" >>> board. >>> >>> The mmio register block of fw_cfg is advertized in the device tree. As >>> base address we pick 0x09020000, which conforms to the comment preceding >>> "a15memmap": it falls in the miscellaneous device I/O range 128MB..256MB, >>> and it is aligned at 64KB. The DTB properties follow the documentation in >>> the Linux source file "Documentation/devicetree/bindings/arm/fw-cfg.txt". >>> >>> fw_cfg automatically exports a number of files to the guest; for example, >>> "bootorder" (see fw_cfg_machine_reset()). >>> >>> Signed-off-by: Laszlo Ersek <lersek@redhat.com> >>> --- >>> >>> Notes: >>> v2: >>> - use a single mmio region of size 0x1000 >>> - set "compatible" property to "qemu,fw-cfg-mmio" >>> >>> hw/arm/virt.c | 21 +++++++++++++++++++++ >>> 1 file changed, 21 insertions(+) >>> >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >>> index 314e55b..af794ea 100644 >>> --- a/hw/arm/virt.c >>> +++ b/hw/arm/virt.c >>> @@ -68,6 +68,7 @@ enum { >>> VIRT_UART, >>> VIRT_MMIO, >>> VIRT_RTC, >>> + VIRT_FW_CFG, >>> }; >>> >>> typedef struct MemMapEntry { >>> @@ -107,6 +108,7 @@ static const MemMapEntry a15memmap[] = { >>> [VIRT_GIC_CPU] = { 0x08010000, 0x00010000 }, >>> [VIRT_UART] = { 0x09000000, 0x00001000 }, >>> [VIRT_RTC] = { 0x09010000, 0x00001000 }, >>> + [VIRT_FW_CFG] = { 0x09020000, 0x00001000 }, >>> [VIRT_MMIO] = { 0x0a000000, 0x00000200 }, >>> /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */ >>> /* 0x10000000 .. 0x40000000 reserved for PCI */ >>> @@ -519,6 +521,23 @@ static void create_flash(const VirtBoardInfo *vbi) >>> g_free(nodename); >>> } >>> >>> +static void create_fw_cfg(const VirtBoardInfo *vbi) >>> +{ >>> + hwaddr base = vbi->memmap[VIRT_FW_CFG].base; >>> + hwaddr size = vbi->memmap[VIRT_FW_CFG].size; >>> + char *nodename; >>> + >>> + fw_cfg_init(0, 0, base, base + 2); >>> + >>> + nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); >>> + qemu_fdt_add_subnode(vbi->fdt, nodename); >>> + qemu_fdt_setprop_string(vbi->fdt, nodename, >>> + "compatible", "qemu,fw-cfg-mmio"); >>> + qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "reg", >>> + 2, base, 2, size); >>> + g_free(nodename); >>> +} >>> + >>> static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size) >>> { >>> const VirtBoardInfo *board = (const VirtBoardInfo *)binfo; >>> @@ -604,6 +623,8 @@ static void machvirt_init(MachineState *machine) >>> */ >>> create_virtio_devices(vbi, pic); >>> >>> + create_fw_cfg(vbi); >>> + >>> vbi->bootinfo.ram_size = machine->ram_size; >>> vbi->bootinfo.kernel_filename = machine->kernel_filename; >>> vbi->bootinfo.kernel_cmdline = machine->kernel_cmdline; >>> >> >> So... after playing with this thing for some time, it's become clear >> that "MMIO traps" are painfully slow on the aarch64 platform we've been >> working on (using KVM). >> >> The original approach in my guest UEFI patch was a simple loop that >> exerted byte-wise access to the fw_cfg device's data register (the only >> kind of access that fw_cfg allows ATM). Downloading a kernel image plus >> an initrd image byte for byte, which together can total between 30MB and >> 50MB, takes simply forever. > > Just a thought--would it be possible to add a > DMA-the-whole-thing-to-this-address register to the simulated device? That's exactly what Rich proposed in the 2010 thread that I referenced a few hours ago. http://thread.gmane.org/gmane.comp.emulators.qemu/77582/focus=77589 (search for '"DMA"-like') The original patchset is archived at http://thread.gmane.org/gmane.comp.emulators.qemu/76610/focus=76673 which was shot down; the rebased one got an (unrelated) nack three months later. http://thread.gmane.org/gmane.comp.emulators.qemu/77625 Thanks Laszlo
Hi, > So... after playing with this thing for some time, it's become clear > that "MMIO traps" are painfully slow on the aarch64 platform we've been > working on (using KVM). So, as we don't have compatibility requirements, and we also can't play tricks like using x86 string instructions: How about a completely different, dma-style interface for fw_cfg access? One register for the (physical) target address. One register for the transfer size. One register for the fw_cfg entry. Possibly one register for the fw_cfg offset (not really needed, but avoids the need for side effects such as writing fw_cfg entry register clearing the offset). One register to kick the transfer. cheers, Gerd
On 12/09/14 10:31, Gerd Hoffmann wrote: > Hi, > >> So... after playing with this thing for some time, it's become clear >> that "MMIO traps" are painfully slow on the aarch64 platform we've been >> working on (using KVM). > > So, as we don't have compatibility requirements, and we also can't play > tricks like using x86 string instructions: How about a completely > different, dma-style interface for fw_cfg access? > > One register for the (physical) target address. > One register for the transfer size. > One register for the fw_cfg entry. > Possibly one register for the fw_cfg offset (not really needed, but > avoids the need for side effects such as writing fw_cfg entry register > clearing the offset). > One register to kick the transfer. Again, this was the idea that Rich had in 2010 (see the links in the discussion thus far). It was rejected back then (which is why I didn't even try to resurrect it now), and Peter has now asked if anything has changed that would make that approach more acceptable now. "Avi and Gleb not hanging around so they can't block the approach now" might or might not be such a change; I don't know. But, I tend to trust their opinion, even if dates back to 2010 in this case. Anyway I'm certainly not opposed to performance, so if someone can thoroughly refute everything they said in that thread, go ahead. Thanks! Laszlo
On 9 December 2014 at 15:41, Laszlo Ersek <lersek@redhat.com> wrote: > Again, this was the idea that Rich had in 2010 (see the links in the > discussion thus far). It was rejected back then (which is why I didn't > even try to resurrect it now), and Peter has now asked if anything has > changed that would make that approach more acceptable now. > > "Avi and Gleb not hanging around so they can't block the approach now" > might or might not be such a change; I don't know. But, I tend to trust > their opinion, even if dates back to 2010 in this case. Yeah, this is about my point of view. A direct-write-to-memory fw_cfg doesn't seem too unreasonable to me, but it would be nice to see a solidly argued case from its proponents to counterbalance the previous rejection. > Anyway I'm certainly not opposed to performance, so if someone can > thoroughly refute everything they said in that thread, go ahead. As far as I can tell the main line of argument was "if you're using fw_cfg to transfer huge amounts of data you're doing something wrong". Is ARM much higher overhead than x86 for these accesses? (Added Alex on cc as one of the few people from the previous round of argument who's still active...) -- PMM
On Tue, Dec 09, 2014 at 03:47:44PM +0000, Peter Maydell wrote: > On 9 December 2014 at 15:41, Laszlo Ersek <lersek@redhat.com> wrote: > > Again, this was the idea that Rich had in 2010 (see the links in the > > discussion thus far). It was rejected back then (which is why I didn't > > even try to resurrect it now), and Peter has now asked if anything has > > changed that would make that approach more acceptable now. > > > > "Avi and Gleb not hanging around so they can't block the approach now" > > might or might not be such a change; I don't know. But, I tend to trust > > their opinion, even if dates back to 2010 in this case. > > Yeah, this is about my point of view. A direct-write-to-memory > fw_cfg doesn't seem too unreasonable to me, but it would be nice > to see a solidly argued case from its proponents to counterbalance > the previous rejection. The argument for is that it's essentially instantaneous. This is a big deal for libguestfs where even our new initrd (1.5 MB) takes a noticable fraction of a second to load, and we want to get our total start-up time down to 3 seconds to match x86. > > Anyway I'm certainly not opposed to performance, so if someone can > > thoroughly refute everything they said in that thread, go ahead. > > As far as I can tell the main line of argument was "if you're > using fw_cfg to transfer huge amounts of data you're doing > something wrong". > > Is ARM much higher overhead than x86 for these accesses? On a slightly related topic, virtio-mmio traps are slow: https://bugzilla.redhat.com/show_bug.cgi?id=1123555 This also kills libguestfs performance -- throughput this time, rather than start-up time. Rich.
On 09/12/2014 16:54, Richard W.M. Jones wrote: > On a slightly related topic, virtio-mmio traps are slow: > > https://bugzilla.redhat.com/show_bug.cgi?id=1123555 I suspect it's not virtio-mmio, but just KVM on ARM. It would be nice to have a timing test for vmexits in kvm-unit-tests, similar to what is already there for x86. Paolo
On Wed, Dec 10, 2014 at 12:13:34PM +0100, Paolo Bonzini wrote: > On 09/12/2014 16:54, Richard W.M. Jones wrote: > > On a slightly related topic, virtio-mmio traps are slow: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1123555 > > I suspect it's not virtio-mmio, but just KVM on ARM. > > It would be nice to have a timing test for vmexits in kvm-unit-tests, > similar to what is already there for x86. > Christoffer had some for his initial kernel selftests, and at one point he was porting them to an early version of kvm-unit-tests/arm. We can resurrect that shortly, as I'll be posting kvm-unit-tests/arm64 today. drew
diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 584c40d..739e19c 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -527,7 +527,7 @@ static void create_fw_cfg(const VirtBoardInfo *vbi) hwaddr size = vbi->memmap[VIRT_FW_CFG].size; char *nodename; - fw_cfg_init(0, 0, base, base + 2); + fw_cfg_init(0, 0, base, base + 8); nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); qemu_fdt_add_subnode(vbi->fdt, nodename); diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c index a7122ee..7147fea 100644 --- a/hw/nvram/fw_cfg.c +++ b/hw/nvram/fw_cfg.c @@ -322,8 +322,9 @@ static const MemoryRegionOps fw_cfg_data_mem_ops = { .endianness = DEVICE_NATIVE_ENDIAN, .valid = { .min_access_size = 1, - .max_access_size = 1, + .max_access_size = 8, }, + .impl.max_access_size = 1, }; static const MemoryRegionOps fw_cfg_comb_mem_ops = {