diff mbox

[qemu,1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask

Message ID 1436525028-23963-2-git-send-email-aik@ozlabs.ru
State New
Headers show

Commit Message

Alexey Kardashevskiy July 10, 2015, 10:43 a.m. UTC
These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
a real host page size:
4e51361d7 "cpu-all: complete "real" host page size API" and
f7ceed190 "vfio: cpu: Use "real" page size API"

This finished the transition by:
- %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
- %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
- removing bitfield length for offsets in VFIOQuirk::data as
qemu_real_host_page_mask is not a macro

This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
the minimum page size which IOMMU regions may be using and at the moment
memory regions do not carry the actual page size.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

In reality DMA windows are always a lot bigger than a single 4K page
and aligned to 32/64MB, may be only use here qemu_real_host_page_mask?

---
 hw/vfio/common.c | 26 ++++++++++++++++----------
 hw/vfio/pci.c    | 30 +++++++++++++++---------------
 2 files changed, 31 insertions(+), 25 deletions(-)

Comments

David Gibson July 13, 2015, 6:15 a.m. UTC | #1
On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote:
> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
> a real host page size:
> 4e51361d7 "cpu-all: complete "real" host page size API" and
> f7ceed190 "vfio: cpu: Use "real" page size API"
> 
> This finished the transition by:
> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
> - removing bitfield length for offsets in VFIOQuirk::data as
> qemu_real_host_page_mask is not a macro

This does not make much sense to me.  f7ceed190 moved to
REAL_HOST_PAGE_SIZE because it's back end stuff that really depends
only on the host page size.

Here you're applying a blanket change to vfio code, in particular to
the DMA handling code, and for DMA both the host and target page size
can be relevant, depending on the details of the IOMMU implementation.

> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
> the minimum page size which IOMMU regions may be using and at the moment
> memory regions do not carry the actual page size.

And this exception also doesn't make much sense to me.  Partly it's
confusing because the listener is doing different things depending on
whether we have a guest visible IOMMU or not.

In short, there doesn't seem to be a coherent explanation here of
where the page size / alignment restriction is coming from, and
therefore whether it needs to be a host page alignment, a guest page
alignment, or both.

> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> In reality DMA windows are always a lot bigger than a single 4K page
> and aligned to 32/64MB, may be only use here
> qemu_real_host_page_mask?

I don't understand this question either.
Alexey Kardashevskiy July 13, 2015, 7:24 a.m. UTC | #2
On 07/13/2015 04:15 PM, David Gibson wrote:
> On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote:
>> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
>> a real host page size:
>> 4e51361d7 "cpu-all: complete "real" host page size API" and
>> f7ceed190 "vfio: cpu: Use "real" page size API"
>>
>> This finished the transition by:
>> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
>> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
>> - removing bitfield length for offsets in VFIOQuirk::data as
>> qemu_real_host_page_mask is not a macro
>
> This does not make much sense to me.  f7ceed190 moved to
> REAL_HOST_PAGE_SIZE because it's back end stuff that really depends
> only on the host page size.
>
> Here you're applying a blanket change to vfio code, in particular to
> the DMA handling code, and for DMA both the host and target page size
> can be relevant, depending on the details of the IOMMU implementation.


5/5 uses this listener for memory preregistration - this totally depends on 
the host page size. It was suggested to make this listener do memory 
preregistration too, not just DMA.


>> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
>> the minimum page size which IOMMU regions may be using and at the moment
>> memory regions do not carry the actual page size.
>
> And this exception also doesn't make much sense to me.  Partly it's
> confusing because the listener is doing different things depending on
> whether we have a guest visible IOMMU or not.

Yes...

> In short, there doesn't seem to be a coherent explanation here of
> where the page size / alignment restriction is coming from, and
> therefore whether it needs to be a host page alignment, a guest page
> alignment, or both.
>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> In reality DMA windows are always a lot bigger than a single 4K page
>> and aligned to 32/64MB, may be only use here
>> qemu_real_host_page_mask?
>
> I don't understand this question either.

The listener is called on RAM regions and DMA windows. If it is a RAM 
region, then host page size applies. If it is a DMA window - then 4K. So 
the same listener has to use different page size in different situation to 
check for alignments. But in reality everything will be aligned to 
megabytes or so. So we could enforce 64K alignment for DMA windows, for 
example, and make code simpler.
Peter Crosthwaite July 13, 2015, 8:32 p.m. UTC | #3
On Sun, Jul 12, 2015 at 11:15 PM, David Gibson
<david@gibson.dropbear.id.au> wrote:
> On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote:
>> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
>> a real host page size:
>> 4e51361d7 "cpu-all: complete "real" host page size API" and
>> f7ceed190 "vfio: cpu: Use "real" page size API"
>>
>> This finished the transition by:
>> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
>> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
>> - removing bitfield length for offsets in VFIOQuirk::data as
>> qemu_real_host_page_mask is not a macro
>
> This does not make much sense to me.  f7ceed190 moved to
> REAL_HOST_PAGE_SIZE because it's back end stuff that really depends
> only on the host page size.
>
> Here you're applying a blanket change to vfio code, in particular to
> the DMA handling code, and for DMA both the host and target page size
> can be relevant, depending on the details of the IOMMU implementation.
>

So the multi-arch work (for which f7ceed190 preps) does have a problem
that needs something like this. TARGET_PAGE_MASK and TARGET_PAGE_ALIGN
do need to go away from common code, or this has to be promoted to
cpu-specific code. Consequently, the page size is fixed to 4K for
multi-arch and this is not a good long-term limitation. Is the IOMMU
page size really tied to the CPU implementation? In practice this is
going to be the case, but IOMMU and CPU should be decoupleable.

If vfio needs to respect a particular (or all?) CPUs/IOMMUs page
alignment then can we virtualise this as data rather than a macroified
constant?

uint64_t  page_align = 0;

CPU_FOREACH(cpu, ...) {
    CPUClass *cc = CPU_GET_CLASS(cpu);

    page_align = MAX(page_align, cc->page_size);
}

/* This is a little more made up ... */
IOMMU_FOREACH(iommu, ...) {
    page_align = MAX(page_align, iommu->page_size);
}

Regards,
Peter

>> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
>> the minimum page size which IOMMU regions may be using and at the moment
>> memory regions do not carry the actual page size.
>
> And this exception also doesn't make much sense to me.  Partly it's
> confusing because the listener is doing different things depending on
> whether we have a guest visible IOMMU or not.
>
> In short, there doesn't seem to be a coherent explanation here of
> where the page size / alignment restriction is coming from, and
> therefore whether it needs to be a host page alignment, a guest page
> alignment, or both.
>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> In reality DMA windows are always a lot bigger than a single 4K page
>> and aligned to 32/64MB, may be only use here
>> qemu_real_host_page_mask?
>
> I don't understand this question either.
>
> --
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson
David Gibson July 14, 2015, 3:46 a.m. UTC | #4
On Mon, Jul 13, 2015 at 05:24:17PM +1000, Alexey Kardashevskiy wrote:
> On 07/13/2015 04:15 PM, David Gibson wrote:
> >On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote:
> >>These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
> >>a real host page size:
> >>4e51361d7 "cpu-all: complete "real" host page size API" and
> >>f7ceed190 "vfio: cpu: Use "real" page size API"
> >>
> >>This finished the transition by:
> >>- %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
> >>- %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
> >>- removing bitfield length for offsets in VFIOQuirk::data as
> >>qemu_real_host_page_mask is not a macro
> >
> >This does not make much sense to me.  f7ceed190 moved to
> >REAL_HOST_PAGE_SIZE because it's back end stuff that really depends
> >only on the host page size.
> >
> >Here you're applying a blanket change to vfio code, in particular to
> >the DMA handling code, and for DMA both the host and target page size
> >can be relevant, depending on the details of the IOMMU implementation.
> 
> 
> 5/5 uses this listener for memory preregistration - this totally depends on
> the host page size. It was suggested to make this listener do memory
> preregistration too, not just DMA.
> 
> 
> >>This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is
> >>the minimum page size which IOMMU regions may be using and at the moment
> >>memory regions do not carry the actual page size.
> >
> >And this exception also doesn't make much sense to me.  Partly it's
> >confusing because the listener is doing different things depending on
> >whether we have a guest visible IOMMU or not.
> 
> Yes...
> 
> >In short, there doesn't seem to be a coherent explanation here of
> >where the page size / alignment restriction is coming from, and
> >therefore whether it needs to be a host page alignment, a guest page
> >alignment, or both.
> >
> >>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>---
> >>
> >>In reality DMA windows are always a lot bigger than a single 4K page
> >>and aligned to 32/64MB, may be only use here
> >>qemu_real_host_page_mask?
> >
> >I don't understand this question either.
> 
> The listener is called on RAM regions and DMA windows. If it is a RAM
> region, then host page size applies. If it is a DMA window - then
> 4K.

That might be true in the particular cases you're thinking about, but
you've got to think more generally if you're going to have coherent
semantics in the core code here.

For preregistration, host page size applies.

For auto-mapping of RAM regions (as on x86), host IOMMU page size
applies (which is probably the same as host page size, but it doesn't
theoretically have to be).  Guest page size kind of implicitly
applies, since added RAM regions generally have to be target page
aligned anyway.

For guest controlled mapping, you're contrained by both the host iommu
page size and the guest iommu page size.
Alexey Kardashevskiy July 14, 2015, 7:08 a.m. UTC | #5
On 07/14/2015 06:32 AM, Peter Crosthwaite wrote:
> On Sun, Jul 12, 2015 at 11:15 PM, David Gibson
> <david@gibson.dropbear.id.au> wrote:
>> On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote:
>>> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
>>> a real host page size:
>>> 4e51361d7 "cpu-all: complete "real" host page size API" and
>>> f7ceed190 "vfio: cpu: Use "real" page size API"
>>>
>>> This finished the transition by:
>>> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
>>> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
>>> - removing bitfield length for offsets in VFIOQuirk::data as
>>> qemu_real_host_page_mask is not a macro
>>
>> This does not make much sense to me.  f7ceed190 moved to
>> REAL_HOST_PAGE_SIZE because it's back end stuff that really depends
>> only on the host page size.
>>
>> Here you're applying a blanket change to vfio code, in particular to
>> the DMA handling code, and for DMA both the host and target page size
>> can be relevant, depending on the details of the IOMMU implementation.
>>
>
> So the multi-arch work (for which f7ceed190 preps) does have a problem
> that needs something like this. TARGET_PAGE_MASK and TARGET_PAGE_ALIGN
> do need to go away from common code, or this has to be promoted to
> cpu-specific code. Consequently, the page size is fixed to 4K for
> multi-arch and this is not a good long-term limitation. Is the IOMMU
> page size really tied to the CPU implementation? In practice this is
> going to be the case, but IOMMU and CPU should be decoupleable.
>
> If vfio needs to respect a particular (or all?) CPUs/IOMMUs page
> alignment then can we virtualise this as data rather than a macroified
> constant?
>
> uint64_t  page_align = 0;
>
> CPU_FOREACH(cpu, ...) {
>      CPUClass *cc = CPU_GET_CLASS(cpu);
>
>      page_align = MAX(page_align, cc->page_size);
> }
>
> /* This is a little more made up ... */
> IOMMU_FOREACH(iommu, ...) {
>      page_align = MAX(page_align, iommu->page_size);
> }

This assumes that IOMMU has a constant page size, is this always true?
Cannot it be a (contigous?) set of different size chunks, for example, one 
per memory dimm?
diff mbox

Patch

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 85ee9b0..d115ec9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -321,6 +321,9 @@  static void vfio_listener_region_add(MemoryListener *listener,
     Int128 llend;
     void *vaddr;
     int ret;
+    const bool is_iommu = memory_region_is_iommu(section->mr);
+    const hwaddr page_mask =
+        is_iommu ? TARGET_PAGE_MASK : qemu_real_host_page_mask;
 
     if (vfio_listener_skipped_section(section)) {
         trace_vfio_listener_region_add_skip(
@@ -330,16 +333,16 @@  static void vfio_listener_region_add(MemoryListener *listener,
         return;
     }
 
-    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
-                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+    if (unlikely((section->offset_within_address_space & ~page_mask) !=
+                 (section->offset_within_region & ~page_mask))) {
         error_report("%s received unaligned region", __func__);
         return;
     }
 
-    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    iova = ROUND_UP(section->offset_within_address_space, ~page_mask + 1);
     llend = int128_make64(section->offset_within_address_space);
     llend = int128_add(llend, section->size);
-    llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+    llend = int128_and(llend, int128_exts64(page_mask));
 
     if (int128_ge(int128_make64(iova), llend)) {
         return;
@@ -347,7 +350,7 @@  static void vfio_listener_region_add(MemoryListener *listener,
 
     memory_region_ref(section->mr);
 
-    if (memory_region_is_iommu(section->mr)) {
+    if (is_iommu) {
         VFIOGuestIOMMU *giommu;
 
         trace_vfio_listener_region_add_iommu(iova,
@@ -423,6 +426,9 @@  static void vfio_listener_region_del(MemoryListener *listener,
                                             iommu_data.type1.listener);
     hwaddr iova, end;
     int ret;
+    const bool is_iommu = memory_region_is_iommu(section->mr);
+    const hwaddr page_mask =
+        is_iommu ? TARGET_PAGE_MASK : qemu_real_host_page_mask;
 
     if (vfio_listener_skipped_section(section)) {
         trace_vfio_listener_region_del_skip(
@@ -432,13 +438,13 @@  static void vfio_listener_region_del(MemoryListener *listener,
         return;
     }
 
-    if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
-                 (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+    if (unlikely((section->offset_within_address_space & ~page_mask) !=
+                 (section->offset_within_region & ~page_mask))) {
         error_report("%s received unaligned region", __func__);
         return;
     }
 
-    if (memory_region_is_iommu(section->mr)) {
+    if (is_iommu) {
         VFIOGuestIOMMU *giommu;
 
         QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
@@ -459,9 +465,9 @@  static void vfio_listener_region_del(MemoryListener *listener,
          */
     }
 
-    iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+    iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space);
     end = (section->offset_within_address_space + int128_get64(section->size)) &
-          TARGET_PAGE_MASK;
+          page_mask;
 
     if (iova >= end) {
         return;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 2ed877f..7694afe 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -50,16 +50,16 @@  typedef struct VFIOQuirk {
     struct VFIOPCIDevice *vdev;
     QLIST_ENTRY(VFIOQuirk) next;
     struct {
-        uint32_t base_offset:TARGET_PAGE_BITS;
-        uint32_t address_offset:TARGET_PAGE_BITS;
+        uint32_t base_offset;
+        uint32_t address_offset;
         uint32_t address_size:3;
         uint32_t bar:3;
 
         uint32_t address_match;
         uint32_t address_mask;
 
-        uint32_t address_val:TARGET_PAGE_BITS;
-        uint32_t data_offset:TARGET_PAGE_BITS;
+        uint32_t address_val;
+        uint32_t data_offset;
         uint32_t data_size:3;
 
         uint8_t flags;
@@ -1319,8 +1319,8 @@  static uint64_t vfio_generic_quirk_read(void *opaque,
 {
     VFIOQuirk *quirk = opaque;
     VFIOPCIDevice *vdev = quirk->vdev;
-    hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
-    hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
+    hwaddr base = quirk->data.address_match & qemu_real_host_page_mask;
+    hwaddr offset = quirk->data.address_match & ~qemu_real_host_page_mask;
     uint64_t data;
 
     if (vfio_flags_enabled(quirk->data.flags, quirk->data.read_flags) &&
@@ -1349,8 +1349,8 @@  static void vfio_generic_quirk_write(void *opaque, hwaddr addr,
 {
     VFIOQuirk *quirk = opaque;
     VFIOPCIDevice *vdev = quirk->vdev;
-    hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
-    hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK;
+    hwaddr base = quirk->data.address_match & qemu_real_host_page_mask;
+    hwaddr offset = quirk->data.address_match & ~qemu_real_host_page_mask;
 
     if (vfio_flags_enabled(quirk->data.flags, quirk->data.write_flags) &&
         ranges_overlap(addr, size, offset, quirk->data.address_mask + 1)) {
@@ -1650,9 +1650,9 @@  static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr)
 
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
                           "vfio-ati-bar2-4000-quirk",
-                          TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
+                          REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 1));
     memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
-                          quirk->data.address_match & TARGET_PAGE_MASK,
+                          quirk->data.address_match & qemu_real_host_page_mask,
                           &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1888,7 +1888,7 @@  static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr,
     VFIOQuirk *quirk = opaque;
     VFIOPCIDevice *vdev = quirk->vdev;
     PCIDevice *pdev = &vdev->pdev;
-    hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK;
+    hwaddr base = quirk->data.address_match & qemu_real_host_page_mask;
 
     vfio_generic_quirk_write(opaque, addr, data, size);
 
@@ -1943,9 +1943,9 @@  static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr)
 
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_nvidia_88000_quirk,
                           quirk, "vfio-nvidia-bar0-88000-quirk",
-                          TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
+                          REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 1));
     memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
-                          quirk->data.address_match & TARGET_PAGE_MASK,
+                          quirk->data.address_match & qemu_real_host_page_mask,
                           &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
@@ -1980,9 +1980,9 @@  static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr)
 
     memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk,
                           "vfio-nvidia-bar0-1800-quirk",
-                          TARGET_PAGE_ALIGN(quirk->data.address_mask + 1));
+                          REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 1));
     memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem,
-                          quirk->data.address_match & TARGET_PAGE_MASK,
+                          quirk->data.address_match & qemu_real_host_page_mask,
                           &quirk->mem, 1);
 
     QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);