From patchwork Thu Oct 3 15:39:08 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 280376 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 431962C00C1 for ; Fri, 4 Oct 2013 01:55:57 +1000 (EST) Received: from localhost ([::1]:43842 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VRl0O-00049x-RY for incoming@patchwork.ozlabs.org; Thu, 03 Oct 2013 11:39:56 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53769) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VRkzo-0003wu-M3 for qemu-devel@nongnu.org; Thu, 03 Oct 2013 11:39:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VRkzi-0002Hz-Lt for qemu-devel@nongnu.org; Thu, 03 Oct 2013 11:39:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55674) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VRkzi-0002HS-5i for qemu-devel@nongnu.org; Thu, 03 Oct 2013 11:39:14 -0400 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r93Fd8Jt007353 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 3 Oct 2013 11:39:09 -0400 Received: from bling.home (ovpn-113-59.phx2.redhat.com [10.3.113.59]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id r93Fd8j5001128; Thu, 3 Oct 2013 11:39:08 -0400 To: anthony@codemonkey.ws From: Alex Williamson Date: Thu, 03 Oct 2013 09:39:08 -0600 Message-ID: <20131003153902.26487.60908.stgit@bling.home> In-Reply-To: <20131003153601.26487.70022.stgit@bling.home> References: <20131003153601.26487.70022.stgit@bling.home> User-Agent: StGit/0.16 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: aik@ozlabs.ru, qemu-devel@nongnu.org, kvm@vger.kernel.org Subject: [Qemu-devel] [PULL 3/6] vfio-pci: Lazy PCI option ROM loading X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org During vfio-pci initfn, the device is not always in a state where the option ROM can be read. In the case of graphics cards, there's often no per function reset, which means we have host driver state affecting whether the option ROM is usable. Ideally we want to move reading the option ROM past any co-assigned device resets to the point where the guest first tries to read the ROM itself. To accomplish this, we switch the memory region for the option rom to an I/O region rather than a memory mapped region. This has the side benefit that we don't waste KVM memory slots for a BAR where we don't care about performance. This also allows us to delay loading the ROM from the device until the first read by the guest. We then use the PCI config space size of the ROM BAR when setting up the BAR through QEMU PCI. Another benefit of this approach is that previously when a user set the ROM to a file using the romfile= option, we still probed VFIO for the parameters of the ROM, which can result in dmesg errors about an invalid ROM. We now only probe VFIO to get the ROM contents if the guest actually tries to read the ROM. Signed-off-by: Alex Williamson --- hw/misc/vfio.c | 184 +++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 122 insertions(+), 62 deletions(-) diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c index ede026d..730dec5 100644 --- a/hw/misc/vfio.c +++ b/hw/misc/vfio.c @@ -166,6 +166,7 @@ typedef struct VFIODevice { off_t config_offset; /* Offset of config space region within device fd */ unsigned int rom_size; off_t rom_offset; /* Offset of ROM region within device fd */ + void *rom; int msi_cap_size; VFIOMSIVector *msi_vectors; VFIOMSIXInfo *msix; @@ -1058,6 +1059,125 @@ static const MemoryRegionOps vfio_bar_ops = { .endianness = DEVICE_LITTLE_ENDIAN, }; +static void vfio_pci_load_rom(VFIODevice *vdev) +{ + struct vfio_region_info reg_info = { + .argsz = sizeof(reg_info), + .index = VFIO_PCI_ROM_REGION_INDEX + }; + uint64_t size; + off_t off = 0; + size_t bytes; + + if (ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info)) { + error_report("vfio: Error getting ROM info: %m"); + return; + } + + DPRINTF("Device %04x:%02x:%02x.%x ROM:\n", vdev->host.domain, + vdev->host.bus, vdev->host.slot, vdev->host.function); + DPRINTF(" size: 0x%lx, offset: 0x%lx, flags: 0x%lx\n", + (unsigned long)reg_info.size, (unsigned long)reg_info.offset, + (unsigned long)reg_info.flags); + + vdev->rom_size = size = reg_info.size; + vdev->rom_offset = reg_info.offset; + + if (!vdev->rom_size) { + return; + } + + vdev->rom = g_malloc(size); + memset(vdev->rom, 0xff, size); + + while (size) { + bytes = pread(vdev->fd, vdev->rom + off, size, vdev->rom_offset + off); + if (bytes == 0) { + break; + } else if (bytes > 0) { + off += bytes; + size -= bytes; + } else { + if (errno == EINTR || errno == EAGAIN) { + continue; + } + error_report("vfio: Error reading device ROM: %m"); + break; + } + } +} + +static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size) +{ + VFIODevice *vdev = opaque; + uint64_t val = ((uint64_t)1 << (size * 8)) - 1; + + /* Load the ROM lazily when the guest tries to read it */ + if (unlikely(!vdev->rom)) { + vfio_pci_load_rom(vdev); + } + + memcpy(&val, vdev->rom + addr, + (addr < vdev->rom_size) ? MIN(size, vdev->rom_size - addr) : 0); + + DPRINTF("%s(%04x:%02x:%02x.%x, 0x%"HWADDR_PRIx", 0x%x) = 0x%"PRIx64"\n", + __func__, vdev->host.domain, vdev->host.bus, vdev->host.slot, + vdev->host.function, addr, size, val); + + return val; +} + +static const MemoryRegionOps vfio_rom_ops = { + .read = vfio_rom_read, + .endianness = DEVICE_LITTLE_ENDIAN, +}; + +static void vfio_pci_size_rom(VFIODevice *vdev) +{ + uint32_t orig, size = (uint32_t)PCI_ROM_ADDRESS_MASK; + off_t offset = vdev->config_offset + PCI_ROM_ADDRESS; + char name[32]; + + if (vdev->pdev.romfile || !vdev->pdev.rom_bar) { + return; + } + + /* + * Use the same size ROM BAR as the physical device. The contents + * will get filled in later when the guest tries to read it. + */ + if (pread(vdev->fd, &orig, 4, offset) != 4 || + pwrite(vdev->fd, &size, 4, offset) != 4 || + pread(vdev->fd, &size, 4, offset) != 4 || + pwrite(vdev->fd, &orig, 4, offset) != 4) { + error_report("%s(%04x:%02x:%02x.%x) failed: %m", + __func__, vdev->host.domain, vdev->host.bus, + vdev->host.slot, vdev->host.function); + return; + } + + size = ~(size & PCI_ROM_ADDRESS_MASK) + 1; + + if (!size) { + return; + } + + DPRINTF("%04x:%02x:%02x.%x ROM size 0x%x\n", vdev->host.domain, + vdev->host.bus, vdev->host.slot, vdev->host.function, size); + + snprintf(name, sizeof(name), "vfio[%04x:%02x:%02x.%x].rom", + vdev->host.domain, vdev->host.bus, vdev->host.slot, + vdev->host.function); + + memory_region_init_io(&vdev->pdev.rom, OBJECT(vdev), + &vfio_rom_ops, vdev, name, size); + + pci_register_bar(&vdev->pdev, PCI_ROM_SLOT, + PCI_BASE_ADDRESS_SPACE_MEMORY, &vdev->pdev.rom); + + vdev->pdev.has_rom = true; +} + static void vfio_vga_write(void *opaque, hwaddr addr, uint64_t data, unsigned size) { @@ -2638,51 +2758,6 @@ static int vfio_add_capabilities(VFIODevice *vdev) return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]); } -static int vfio_load_rom(VFIODevice *vdev) -{ - uint64_t size = vdev->rom_size; - char name[32]; - off_t off = 0, voff = vdev->rom_offset; - ssize_t bytes; - void *ptr; - - /* If loading ROM from file, pci handles it */ - if (vdev->pdev.romfile || !vdev->pdev.rom_bar || !size) { - return 0; - } - - DPRINTF("%s(%04x:%02x:%02x.%x)\n", __func__, vdev->host.domain, - vdev->host.bus, vdev->host.slot, vdev->host.function); - - snprintf(name, sizeof(name), "vfio[%04x:%02x:%02x.%x].rom", - vdev->host.domain, vdev->host.bus, vdev->host.slot, - vdev->host.function); - memory_region_init_ram(&vdev->pdev.rom, OBJECT(vdev), name, size); - ptr = memory_region_get_ram_ptr(&vdev->pdev.rom); - memset(ptr, 0xff, size); - - while (size) { - bytes = pread(vdev->fd, ptr + off, size, voff + off); - if (bytes == 0) { - break; /* expect that we could get back less than the ROM BAR */ - } else if (bytes > 0) { - off += bytes; - size -= bytes; - } else { - if (errno == EINTR || errno == EAGAIN) { - continue; - } - error_report("vfio: Error reading device ROM: %m"); - memory_region_destroy(&vdev->pdev.rom); - return -errno; - } - } - - pci_register_bar(&vdev->pdev, PCI_ROM_SLOT, 0, &vdev->pdev.rom); - vdev->pdev.has_rom = true; - return 0; -} - static int vfio_connect_container(VFIOGroup *group) { VFIOContainer *container; @@ -2916,22 +2991,6 @@ static int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vdev) QLIST_INIT(&vdev->bars[i].quirks); } - reg_info.index = VFIO_PCI_ROM_REGION_INDEX; - - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info); - if (ret) { - error_report("vfio: Error getting ROM info: %m"); - goto error; - } - - DPRINTF("Device %s ROM:\n", name); - DPRINTF(" size: 0x%lx, offset: 0x%lx, flags: 0x%lx\n", - (unsigned long)reg_info.size, (unsigned long)reg_info.offset, - (unsigned long)reg_info.flags); - - vdev->rom_size = reg_info.size; - vdev->rom_offset = reg_info.offset; - reg_info.index = VFIO_PCI_CONFIG_REGION_INDEX; ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, ®_info); @@ -3229,7 +3288,7 @@ static int vfio_initfn(PCIDevice *pdev) memset(&vdev->pdev.config[PCI_BASE_ADDRESS_0], 0, 24); memset(&vdev->pdev.config[PCI_ROM_ADDRESS], 0, 4); - vfio_load_rom(vdev); + vfio_pci_size_rom(vdev); ret = vfio_early_setup_msix(vdev); if (ret) { @@ -3294,6 +3353,7 @@ static void vfio_exitfn(PCIDevice *pdev) vfio_teardown_msi(vdev); vfio_unmap_bars(vdev); g_free(vdev->emulated_config_bits); + g_free(vdev->rom); vfio_put_device(vdev); vfio_put_group(group); }