From patchwork Tue Nov 29 02:11:05 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 128198 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [140.186.70.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 379551007D2 for ; Tue, 29 Nov 2011 13:11:38 +1100 (EST) Received: from localhost ([::1]:53141 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RVDAS-000613-Eo for incoming@patchwork.ozlabs.org; Mon, 28 Nov 2011 21:11:32 -0500 Received: from eggs.gnu.org ([140.186.70.92]:57369) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RVDAL-00060k-IP for qemu-devel@nongnu.org; Mon, 28 Nov 2011 21:11:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RVDAJ-0003rf-KN for qemu-devel@nongnu.org; Mon, 28 Nov 2011 21:11:25 -0500 Received: from e23smtp02.au.ibm.com ([202.81.31.144]:56061) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RVDAI-0003qs-Lg for qemu-devel@nongnu.org; Mon, 28 Nov 2011 21:11:23 -0500 Received: from /spool/local by e23smtp02.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 29 Nov 2011 01:57:26 +1000 Received: from d23relay04.au.ibm.com ([202.81.31.246]) by e23smtp02.au.ibm.com ([202.81.31.208]) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 29 Nov 2011 01:57:24 +1000 Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id pAT27gBa3252350 for ; Tue, 29 Nov 2011 13:07:45 +1100 Received: from d23av01.au.ibm.com (loopback [127.0.0.1]) by d23av01.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id pAT2B5a6004338 for ; Tue, 29 Nov 2011 13:11:06 +1100 Received: from ozlabs.au.ibm.com (ozlabs.au.ibm.com [9.190.163.12]) by d23av01.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id pAT2B5Hh004325; Tue, 29 Nov 2011 13:11:05 +1100 Received: from [10.61.2.175] (haven.au.ibm.com [9.190.164.82]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.au.ibm.com (Postfix) with ESMTPSA id 78DF17347E; Tue, 29 Nov 2011 13:11:05 +1100 (EST) Message-ID: <4ED43F39.8050606@au1.ibm.com> Date: Tue, 29 Nov 2011 13:11:05 +1100 From: Alexey Kardashevskiy User-Agent: Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: Alex Williamson References: <20111103195452.21259.93021.stgit@bling.home> <4ED43AD9.5090509@au1.ibm.com> <4ED43CFE.8040009@au1.ibm.com> In-Reply-To: <4ED43CFE.8040009@au1.ibm.com> x-cbid: 11112815-5490-0000-0000-00000037A52A X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 202.81.31.144 Cc: aafabbri@cisco.com, kvm@vger.kernel.org, pmac@au1.ibm.com, qemu-devel@nongnu.org, joerg.roedel@amd.com, konrad.wilk@oracle.com, agraf@suse.de, dwg@au1.ibm.com, chrisw@sous-sol.org, B08248@freescale.com, iommu@lists.linux-foundation.org, avi@redhat.com, linux-pci@vger.kernel.org, B07421@freescale.com, benve@cisco.com Subject: Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Hi all again, It was actually the very first problem - endianess :-) I am still not sure what format is better for cached config space or whether we should cache it all. Also, as Benh already mentioned, vfio_virt_init reads a config space to a cache by pci_read_config_dword for the whole space while some devices may not like it as they might distinguish length of PCI transactions. KERNEL patch: === end === On 29/11/11 13:01, Alexey Kardashevskiy wrote: > Hi all, > > Another problem I hit on POWER - MSI interrupts allocation. The existing VFIO does not expect a PBH > to support less interrupts that a device might request. In my case, PHB's limit is 8 interrupts > while my test card (10Gb ethernet CXGB3) wants 9. Below are the patches to demonstrate the idea. > > > > > > On 29/11/11 12:52, Alexey Kardashevskiy wrote: >> Hi! >> >> I tried (successfully) to run it on POWER and while doing that I found some issues. I'll try to >> explain them in separate mails. >> >> >> >> On 04/11/11 07:12, Alex Williamson wrote: >>> VFIO provides a secure, IOMMU based interface for user space >>> drivers, including device assignment to virtual machines. >>> This provides the base management of IOMMU groups, devices, >>> and IOMMU objects. See Documentation/vfio.txt included in >>> this patch for user and kernel API description. >>> >>> Note, this implements the new API discussed at KVM Forum >>> 2011, as represented by the drvier version 0.2. It's hoped >>> that this provides a modular enough interface to support PCI >>> and non-PCI userspace drivers across various architectures >>> and IOMMU implementations. >>> >>> Signed-off-by: Alex Williamson >>> --- >>> >>> Fingers crossed, this is the last RFC for VFIO, but we need >>> the iommu group support before this can go upstream >>> (http://lkml.indiana.edu/hypermail/linux/kernel/1110.2/02303.html), >>> hoping this helps push that along. >>> >>> Since the last posting, this version completely modularizes >>> the device backends and better defines the APIs between the >>> core VFIO code and the device backends. I expect that we >>> might also adopt a modular IOMMU interface as iommu_ops learns >>> about different types of hardware. Also many, many cleanups. >>> Check the complete git history for details: >>> >>> git://github.com/awilliam/linux-vfio.git vfio-ng >>> >>> (matching qemu tree: git://github.com/awilliam/qemu-vfio.git) >>> >>> This version, along with the supporting VFIO PCI backend can >>> be found here: >>> >>> git://github.com/awilliam/linux-vfio.git vfio-next-20111103 >>> >>> I've held off on implementing a kernel->user signaling >>> mechanism for now since the previous netlink version produced >>> too many gag reflexes. It's easy enough to set a bit in the >>> group flags too indicate such support in the future, so I >>> think we can move ahead without it. >>> >>> Appreciate any feedback or suggestions. Thanks, >>> >>> Alex >>> >> >> > > diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index b3bab99..9d563b4 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -757,6 +757,16 @@ static int vfio_virt_init(struct vfio_pci_device *vdev) vdev->rbar[5] = *(u32 *)&vdev->vconfig[PCI_BASE_ADDRESS_5]; vdev->rbar[6] = *(u32 *)&vdev->vconfig[PCI_ROM_ADDRESS]; + /* + * As pci_read_config_XXXX returns data in native format, + * and the cached copy is used in assumption that it is + * native PCI format, fix endianness in the cached copy. + */ + lp = (u32 *)vdev->vconfig; + for (i = 0; i < pdev->cfg_size/sizeof(u32); i++, lp++) { + *lp = cpu_to_le32(*lp); + } + /* for sr-iov devices */ vdev->vconfig[PCI_VENDOR_ID] = pdev->vendor & 0xFF; vdev->vconfig[PCI_VENDOR_ID+1] = pdev->vendor >> 8; @@ -807,18 +817,18 @@ static void vfio_bar_fixup(struct vfio_pci_device *vdev) else mask = 0; lp = (u32 *)(vdev->vconfig + PCI_BASE_ADDRESS_0 + 4*bar); - *lp &= (u32)mask; + *lp &= cpu_to_le32((u32)mask); if (pci_resource_flags(pdev, bar) & IORESOURCE_IO) - *lp |= PCI_BASE_ADDRESS_SPACE_IO; + *lp |= cpu_to_le32(PCI_BASE_ADDRESS_SPACE_IO); else if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM) { - *lp |= PCI_BASE_ADDRESS_SPACE_MEMORY; + *lp |= cpu_to_le32(PCI_BASE_ADDRESS_SPACE_MEMORY); if (pci_resource_flags(pdev, bar) & IORESOURCE_PREFETCH) - *lp |= PCI_BASE_ADDRESS_MEM_PREFETCH; + *lp |= cpu_to_le32(PCI_BASE_ADDRESS_MEM_PREFETCH); if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM_64) { - *lp |= PCI_BASE_ADDRESS_MEM_TYPE_64; + *lp |= cpu_to_le32(PCI_BASE_ADDRESS_MEM_TYPE_64); lp++; - *lp &= (u32)(mask >> 32); + *lp &= cpu_to_le32((u32)(mask >> 32)); bar++; } } @@ -830,7 +840,7 @@ static void vfio_bar_fixup(struct vfio_pci_device *vdev) } else mask = 0; lp = (u32 *)(vdev->vconfig + PCI_ROM_ADDRESS); - *lp &= (u32)mask; + *lp &= cpu_to_le32((u32)mask); vdev->bardirty = 0; } === end === QEMU patch: diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index 980eec7..1c97c35 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -405,6 +405,8 @@ static void vfio_resource_write(void *opaque, target_phys_addr_t addr, { PCIResource *res = opaque; + fprintf(stderr, "change endianness????\n"); + if (pwrite(res->fd, &data, size, res->offset + addr) != size) { fprintf(stderr, "%s(,0x%"PRIx64", 0x%"PRIx64", %d) failed: %s\n", __FUNCTION__, addr, data, size, strerror(errno)); @@ -429,6 +431,9 @@ static uint64_t vfio_resource_read(void *opaque, DPRINTF("%s(BAR%d+0x%"PRIx64", %d) = 0x%"PRIx64"\n", __FUNCTION__, res->bar, addr, size, data); + data = le32_to_cpu(data); + DPRINTF("%s(BAR%d+0x%"PRIx64", %d) = 0x%"PRIx64" --- CPU\n", + __FUNCTION__, res->bar, addr, size, data); return data; } @@ -454,13 +459,25 @@ static uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len) val = pci_default_read_config(pdev, addr, len); } else { - if (pread(vdev->fd, &val, len, vdev->config_offset + addr) != len) { + u8 buf[4] = {0}; + if (pread(vdev->fd, buf, len, vdev->config_offset + addr) != len) { fprintf(stderr, "%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) failed: %s\n", __FUNCTION__, vdev->host.seg, vdev->host.bus, vdev->host.dev, vdev->host.func, addr, len, strerror(errno)); return -1; } + switch (len) { + case 1: val = buf[0]; break; + case 2: val = le16_to_cpupu((uint16_t*)buf); break; + case 4: val = le32_to_cpupu((uint32_t*)buf); break; + default: + fprintf(stderr, "%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) failed: %s\n", + __FUNCTION__, vdev->host.seg, vdev->host.bus, + vdev->host.dev, vdev->host.func, addr, len, + strerror(errno)); + break; + } } DPRINTF("%s(%04x:%02x:%02x.%x, 0x%x, 0x%x) %x\n", __FUNCTION__, vdev->host.seg, vdev->host.bus, vdev->host.dev, @@ -477,8 +494,20 @@ static void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr, vdev->host.seg, vdev->host.bus, vdev->host.dev, vdev->host.func, addr, val, len); + u8 buf[4] = {0}; + switch (len) { + case 1: buf[0] = val & 0xFF; break; + case 2: cpu_to_le16wu((uint16_t*)buf, val); break; + case 4: cpu_to_le32wu((uint32_t*)buf, val); break; + default: + fprintf(stderr, "%s(%04x:%02x:%02x.%x, 0x%x, 0x%x, 0x%x) failed: %s\n", + __FUNCTION__, vdev->host.seg, vdev->host.bus, vdev->host.dev, + vdev->host.func, addr, val, len, strerror(errno)); + return; + } + /* Write everything to VFIO, let it filter out what we can't write */ - if (pwrite(vdev->fd, &val, len, vdev->config_offset + addr) != len) { + if (pwrite(vdev->fd, buf, len, vdev->config_offset + addr) != len) { fprintf(stderr, "%s(%04x:%02x:%02x.%x, 0x%x, 0x%x, 0x%x) failed: %s\n", __FUNCTION__, vdev->host.seg, vdev->host.bus, vdev->host.dev, vdev->host.func, addr, val, len, strerror(errno)); @@ -675,6 +704,7 @@ static int vfio_setup_msi(VFIODevice *vdev) vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) { return -1; } + ctrl = le16_to_cpu(ctrl); msi_64bit = !!(ctrl & PCI_MSI_FLAGS_64BIT); msi_maskbit = !!(ctrl & PCI_MSI_FLAGS_MASKBIT);