From patchwork Thu Feb 23 04:20:25 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 731389 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vTLdw2Jlrz9s7C for ; Thu, 23 Feb 2017 15:21:24 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ozlabs-ru.20150623.gappssmtp.com header.i=@ozlabs-ru.20150623.gappssmtp.com header.b="qqqhpzCl"; dkim-atps=neutral Received: from localhost ([::1]:56469 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cgku9-0004EM-R5 for incoming@patchwork.ozlabs.org; Wed, 22 Feb 2017 23:21:21 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54252) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cgktR-0003vT-7w for qemu-devel@nongnu.org; Wed, 22 Feb 2017 23:20:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cgktO-00085X-3W for qemu-devel@nongnu.org; Wed, 22 Feb 2017 23:20:37 -0500 Received: from mail-pg0-x244.google.com ([2607:f8b0:400e:c05::244]:34105) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cgktN-000856-Ql for qemu-devel@nongnu.org; Wed, 22 Feb 2017 23:20:34 -0500 Received: by mail-pg0-x244.google.com with SMTP id s67so3099449pgb.1 for ; Wed, 22 Feb 2017 20:20:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ozlabs-ru.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=FhIVXZ5NS1lAdKsBJMTNP5d+QpFE3Hba34HwC9z0aqU=; b=qqqhpzClQccOhPBKAbdESw5ikNNPa9JocnjKljkoflQl5c6xpDxktARk02C/fyt1r+ xcjKHki1vRD4pTdFJbB+EMZfwFLWHP9plFtlSGVXaP4i607zW5QHRufbB5GYzQpiPGc8 DKQih7OyGuE2NFmsjhNyLa3aWYrcueYo1oTwxl8T7l6Z3m7g1kPfKc9pqqfJFqcJTQB2 n80vG2gFGvkvsPiKlfgdssucLL2dB14qdDFemNJvye12Jf18NFkzKwIeEpA7S/7fbL6b mvYbrW+unC8M66h+MJ/jbeWwzJLQkmlaqP8q0YnXRA1tbwjTLqZ8e8gMtcsoztyK8KgN xreA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=FhIVXZ5NS1lAdKsBJMTNP5d+QpFE3Hba34HwC9z0aqU=; b=ajV3+Zc9kFcYwn7W++38OsxYijVGjXS8TRDJpF3LPZCpG2+x6+GFzDByNb12Bema6A v7Nt6XsS2kz4MLkUaOWpVyYSWbhIk3sSwnRuxbMq+iD1KHgYYVZK8EhQ/S3kD5E+OGTx ftoy3a3HWPtbXvls7mut9ybmG5XNS9/gNgwflaxtDyrg6IX8OlAl97hd+y0XFhDi2qZL zV915RX1ouiVU/nxcqsUCR1waPui7YKLlmmMcHpqWlVZ0Jeyp6wBvK9H1RVDqDuqFRgs WO8Vu0hpfDre5jCShxM+RGhVn3rfeBvWmS45ZX+JOZ7LWLQjh1cR75ATCD6KdvY0kjbX UEMg== X-Gm-Message-State: AMke39m4QKpawei47Rj0+U9Xna3Jfs+YWbEnDlafijM2GzyAGzzFJR+/buOG/Ks56QK0Ww== X-Received: by 10.84.128.33 with SMTP id 30mr52770474pla.128.1487823630857; Wed, 22 Feb 2017 20:20:30 -0800 (PST) Received: from [10.61.2.175] ([122.99.82.10]) by smtp.googlemail.com with ESMTPSA id m74sm6300095pfk.41.2017.02.22.20.20.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Feb 2017 20:20:30 -0800 (PST) To: Yongji Xie , pbonzini@redhat.com References: <1487659615-15820-1-git-send-email-xyjxie@linux.vnet.ibm.com> From: Alexey Kardashevskiy Message-ID: <5edff645-12e8-d3e0-1849-302b6986c232@ozlabs.ru> Date: Thu, 23 Feb 2017 15:20:25 +1100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: <1487659615-15820-1-git-send-email-xyjxie@linux.vnet.ibm.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2607:f8b0:400e:c05::244 Subject: Re: [Qemu-devel] [PATCH] memory: make ram device read/write endian sensitive X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , zhong@linux.vnet.ibm.com, qemu-devel@nongnu.org, alex.williamson@redhat.com, Paul Mackerras , David Gibson Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" On 21/02/17 17:46, Yongji Xie wrote: > At the moment ram device's memory regions are NATIVE_ENDIAN. This does > not work on PPC64 because VFIO PCI device is little endian but PPC64 > always defines static macro TARGET_WORDS_BIGENDIAN. > > This fixes endianness for ram device the same way as it is done > for VFIO region in commit 6758008e2c4e79fb6bd04fe8e7a41665fa583965. > > Signed-off-by: Yongji Xie > --- > memory.c | 14 +++++++------- > 1 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/memory.c b/memory.c > index 6c58373..1ccb99f 100644 > --- a/memory.c > +++ b/memory.c > @@ -1139,13 +1139,13 @@ static uint64_t memory_region_ram_device_read(void *opaque, > data = *(uint8_t *)(mr->ram_block->host + addr); > break; > case 2: > - data = *(uint16_t *)(mr->ram_block->host + addr); > + data = le16_to_cpu(*(uint16_t *)(mr->ram_block->host + addr)); > break; > case 4: > - data = *(uint32_t *)(mr->ram_block->host + addr); > + data = le32_to_cpu(*(uint32_t *)(mr->ram_block->host + addr)); > break; > case 8: > - data = *(uint64_t *)(mr->ram_block->host + addr); > + data = le64_to_cpu(*(uint64_t *)(mr->ram_block->host + addr)); > break; > } > > @@ -1166,13 +1166,13 @@ static void memory_region_ram_device_write(void *opaque, hwaddr addr, > *(uint8_t *)(mr->ram_block->host + addr) = (uint8_t)data; > break; > case 2: > - *(uint16_t *)(mr->ram_block->host + addr) = (uint16_t)data; > + *(uint16_t *)(mr->ram_block->host + addr) = cpu_to_le16((uint16_t)data); > break; > case 4: > - *(uint32_t *)(mr->ram_block->host + addr) = (uint32_t)data; > + *(uint32_t *)(mr->ram_block->host + addr) = cpu_to_le32((uint32_t)data); > break; > case 8: > - *(uint64_t *)(mr->ram_block->host + addr) = data; > + *(uint64_t *)(mr->ram_block->host + addr) = cpu_to_le64(data); > break; > } > } > @@ -1180,7 +1180,7 @@ static void memory_region_ram_device_write(void *opaque, hwaddr addr, > static const MemoryRegionOps ram_device_mem_ops = { > .read = memory_region_ram_device_read, > .write = memory_region_ram_device_write, > - .endianness = DEVICE_NATIVE_ENDIAN, > + .endianness = DEVICE_LITTLE_ENDIAN, > .valid = { > .min_access_size = 1, > .max_access_size = 8, > I did some debugging today. First, Paolo is right and ram_device_mem_ops::endianness should be host-endian which happens to be little in our test case (ppc64le) so changes to .read/.write are actually no-op (I believe so, have not checked). But I was wondering why this gets executed at all. The test case is: qemu-system-ppc64 ... -device "vfio-pci,id=vfio0001_03_00_0,host=0001:03:00.0 -drive id=DRIVE0,if=none,file=./test.qcow2,format=qcow2 -device virtio-blk-pci,id=vblk0,drive=DRIVE0 The host kernel is v4.10, ppc64le (little endian), 64K system page size, QEMU is v2.8.0. When this boots, lspci shows: 00:00.0 Ethernet controller: Chelsio Communications Inc T310 10GbE Single Port Adapter Subsystem: IBM Device 038c Flags: bus master, fast devsel, latency 0, IRQ 18 Memory at 210000004000 (64-bit, non-prefetchable) [size=4K] Memory at 210000800000 (64-bit, non-prefetchable) [size=8M] Memory at 210001000000 (64-bit, non-prefetchable) [size=4K] Expansion ROM at 2000c0080000 [disabled] [size=512K] Capabilities: [40] Power Management version 3 Capabilities: [48] MSI: Enable- Count=1/32 Maskable- 64bit+ Capabilities: [58] Express Endpoint, MSI 00 Capabilities: [94] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=32 Masked- Kernel driver in use: cxgb3 00:01.0 SCSI storage controller: Red Hat, Inc Virtio block device Subsystem: Red Hat, Inc Device 0002 Physical Slot: C16 Flags: bus master, fast devsel, latency 0, IRQ 17 I/O ports at 0040 [size=64] Memory at 2000c0000000 (32-bit, non-prefetchable) [size=4K] Memory at 210000000000 (64-bit, prefetchable) [size=16K] Capabilities: [98] MSI-X: Enable+ Count=2 Masked- Capabilities: [84] Vendor Specific Information: Len=14 Capabilities: [70] Vendor Specific Information: Len=14 Capabilities: [60] Vendor Specific Information: Len=10 Capabilities: [50] Vendor Specific Information: Len=10 Capabilities: [40] Vendor Specific Information: Len=10 Kernel driver in use: virtio-pci As we can see, BAR0 of Chelsio is 210000004000 - not aligned (it should have been aligned but it is not - this is another bug, in QEMU). Normally such a BAR would be emulated by VFIO. However, since 95251725e335 "vfio: Add support for mmapping sub-page MMIO BARs" we mmap such BARs to QEMU in a hope they will be registered later via KVM_SET_USER_MEMORY_REGION - and this fails as the guest address is not host page size aligned. So we end up having the following memory tree: memory-region: pci@800000020000000.mmio 0000000000000000-ffffffffffffffff (prio 0, RW): pci@800000020000000.mmio 00000000c0000000-00000000c0000fff (prio 1, RW): virtio-blk-pci-msix 00000000c0000000-00000000c000001f (prio 0, RW): msix-table 00000000c0000800-00000000c0000807 (prio 0, RW): msix-pba 0000210000000000-0000210000003fff (prio 1, RW): virtio-pci 0000210000000000-0000210000000fff (prio 0, RW): virtio-pci-common 0000210000001000-0000210000001fff (prio 0, RW): virtio-pci-isr 0000210000002000-0000210000002fff (prio 0, RW): virtio-pci-device 0000210000003000-0000210000003fff (prio 0, RW): virtio-pci-notify 0000210000004000-0000210000004fff (prio 1, RW): 0001:03:00.0 BAR 0 0000210000004000-0000210000004fff (prio 0, RW): 0001:03:00.0 BAR 0 mmaps[0] 0000210000800000-0000210000ffffff (prio 1, RW): 0001:03:00.0 BAR 2 0000210000800000-0000210000ffffff (prio 0, RW): 0001:03:00.0 BAR 2 mmaps[0] 0000210001000000-0000210001000fff (prio 1, RW): 0001:03:00.0 BAR 4 0000210001000000-00002100010001ff (prio 0, RW): msix-table 0000210001000800-0000210001000807 (prio 0, RW): msix-pba [disabled] The problem region which this patch is fixing is "0001:03:00.0 BAR 0 mmaps[0]". It is mmaped to QEMU and not registered in KVM. So when the guest accesses this BAR, we trap into KVM, KVM passes it to QEMU and QEMU read/writes this memory region. A simple hack like below fixes it - it basically removes mmap'd memory region from the tree and MMIO starts being handled by the parent MR - "0001:03:00.0 BAR 0" (which is "little endian" - vfio_region_ops). I am wondering now what would be a correct approach here? Add/remove mmap'd MRs once we detect aligned/unaligned BARs? Keep things where they are in the VFIO department and just fix ram_device_mem_ops::endianness? Thanks. memory_region_add_subregion_overlap(r->address_space, diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index d7dbe0e3e0..0657a27623 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -1109,7 +1109,10 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar) memory_region_transaction_begin(); memory_region_set_size(mr, size); - memory_region_set_size(mmap_mr, size); + if (bar_addr & qemu_real_host_page_mask) + memory_region_del_subregion(mr, mmap_mr); + else + memory_region_set_size(mmap_mr, size); if (size != region->size && memory_region_is_mapped(mr)) { memory_region_del_subregion(r->address_space, mr);