Message ID | 1436525028-23963-2-git-send-email-aik@ozlabs.ru |
---|---|
State | New |
Headers | show |
On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote: > These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to > a real host page size: > 4e51361d7 "cpu-all: complete "real" host page size API" and > f7ceed190 "vfio: cpu: Use "real" page size API" > > This finished the transition by: > - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/ > - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/ > - removing bitfield length for offsets in VFIOQuirk::data as > qemu_real_host_page_mask is not a macro This does not make much sense to me. f7ceed190 moved to REAL_HOST_PAGE_SIZE because it's back end stuff that really depends only on the host page size. Here you're applying a blanket change to vfio code, in particular to the DMA handling code, and for DMA both the host and target page size can be relevant, depending on the details of the IOMMU implementation. > This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is > the minimum page size which IOMMU regions may be using and at the moment > memory regions do not carry the actual page size. And this exception also doesn't make much sense to me. Partly it's confusing because the listener is doing different things depending on whether we have a guest visible IOMMU or not. In short, there doesn't seem to be a coherent explanation here of where the page size / alignment restriction is coming from, and therefore whether it needs to be a host page alignment, a guest page alignment, or both. > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > > In reality DMA windows are always a lot bigger than a single 4K page > and aligned to 32/64MB, may be only use here > qemu_real_host_page_mask? I don't understand this question either.
On 07/13/2015 04:15 PM, David Gibson wrote: > On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote: >> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to >> a real host page size: >> 4e51361d7 "cpu-all: complete "real" host page size API" and >> f7ceed190 "vfio: cpu: Use "real" page size API" >> >> This finished the transition by: >> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/ >> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/ >> - removing bitfield length for offsets in VFIOQuirk::data as >> qemu_real_host_page_mask is not a macro > > This does not make much sense to me. f7ceed190 moved to > REAL_HOST_PAGE_SIZE because it's back end stuff that really depends > only on the host page size. > > Here you're applying a blanket change to vfio code, in particular to > the DMA handling code, and for DMA both the host and target page size > can be relevant, depending on the details of the IOMMU implementation. 5/5 uses this listener for memory preregistration - this totally depends on the host page size. It was suggested to make this listener do memory preregistration too, not just DMA. >> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is >> the minimum page size which IOMMU regions may be using and at the moment >> memory regions do not carry the actual page size. > > And this exception also doesn't make much sense to me. Partly it's > confusing because the listener is doing different things depending on > whether we have a guest visible IOMMU or not. Yes... > In short, there doesn't seem to be a coherent explanation here of > where the page size / alignment restriction is coming from, and > therefore whether it needs to be a host page alignment, a guest page > alignment, or both. > >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> >> In reality DMA windows are always a lot bigger than a single 4K page >> and aligned to 32/64MB, may be only use here >> qemu_real_host_page_mask? > > I don't understand this question either. The listener is called on RAM regions and DMA windows. If it is a RAM region, then host page size applies. If it is a DMA window - then 4K. So the same listener has to use different page size in different situation to check for alignments. But in reality everything will be aligned to megabytes or so. So we could enforce 64K alignment for DMA windows, for example, and make code simpler.
On Sun, Jul 12, 2015 at 11:15 PM, David Gibson <david@gibson.dropbear.id.au> wrote: > On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote: >> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to >> a real host page size: >> 4e51361d7 "cpu-all: complete "real" host page size API" and >> f7ceed190 "vfio: cpu: Use "real" page size API" >> >> This finished the transition by: >> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/ >> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/ >> - removing bitfield length for offsets in VFIOQuirk::data as >> qemu_real_host_page_mask is not a macro > > This does not make much sense to me. f7ceed190 moved to > REAL_HOST_PAGE_SIZE because it's back end stuff that really depends > only on the host page size. > > Here you're applying a blanket change to vfio code, in particular to > the DMA handling code, and for DMA both the host and target page size > can be relevant, depending on the details of the IOMMU implementation. > So the multi-arch work (for which f7ceed190 preps) does have a problem that needs something like this. TARGET_PAGE_MASK and TARGET_PAGE_ALIGN do need to go away from common code, or this has to be promoted to cpu-specific code. Consequently, the page size is fixed to 4K for multi-arch and this is not a good long-term limitation. Is the IOMMU page size really tied to the CPU implementation? In practice this is going to be the case, but IOMMU and CPU should be decoupleable. If vfio needs to respect a particular (or all?) CPUs/IOMMUs page alignment then can we virtualise this as data rather than a macroified constant? uint64_t page_align = 0; CPU_FOREACH(cpu, ...) { CPUClass *cc = CPU_GET_CLASS(cpu); page_align = MAX(page_align, cc->page_size); } /* This is a little more made up ... */ IOMMU_FOREACH(iommu, ...) { page_align = MAX(page_align, iommu->page_size); } Regards, Peter >> This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is >> the minimum page size which IOMMU regions may be using and at the moment >> memory regions do not carry the actual page size. > > And this exception also doesn't make much sense to me. Partly it's > confusing because the listener is doing different things depending on > whether we have a guest visible IOMMU or not. > > In short, there doesn't seem to be a coherent explanation here of > where the page size / alignment restriction is coming from, and > therefore whether it needs to be a host page alignment, a guest page > alignment, or both. > >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> >> In reality DMA windows are always a lot bigger than a single 4K page >> and aligned to 32/64MB, may be only use here >> qemu_real_host_page_mask? > > I don't understand this question either. > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson
On Mon, Jul 13, 2015 at 05:24:17PM +1000, Alexey Kardashevskiy wrote: > On 07/13/2015 04:15 PM, David Gibson wrote: > >On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote: > >>These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to > >>a real host page size: > >>4e51361d7 "cpu-all: complete "real" host page size API" and > >>f7ceed190 "vfio: cpu: Use "real" page size API" > >> > >>This finished the transition by: > >>- %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/ > >>- %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/ > >>- removing bitfield length for offsets in VFIOQuirk::data as > >>qemu_real_host_page_mask is not a macro > > > >This does not make much sense to me. f7ceed190 moved to > >REAL_HOST_PAGE_SIZE because it's back end stuff that really depends > >only on the host page size. > > > >Here you're applying a blanket change to vfio code, in particular to > >the DMA handling code, and for DMA both the host and target page size > >can be relevant, depending on the details of the IOMMU implementation. > > > 5/5 uses this listener for memory preregistration - this totally depends on > the host page size. It was suggested to make this listener do memory > preregistration too, not just DMA. > > > >>This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is > >>the minimum page size which IOMMU regions may be using and at the moment > >>memory regions do not carry the actual page size. > > > >And this exception also doesn't make much sense to me. Partly it's > >confusing because the listener is doing different things depending on > >whether we have a guest visible IOMMU or not. > > Yes... > > >In short, there doesn't seem to be a coherent explanation here of > >where the page size / alignment restriction is coming from, and > >therefore whether it needs to be a host page alignment, a guest page > >alignment, or both. > > > >>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > >>--- > >> > >>In reality DMA windows are always a lot bigger than a single 4K page > >>and aligned to 32/64MB, may be only use here > >>qemu_real_host_page_mask? > > > >I don't understand this question either. > > The listener is called on RAM regions and DMA windows. If it is a RAM > region, then host page size applies. If it is a DMA window - then > 4K. That might be true in the particular cases you're thinking about, but you've got to think more generally if you're going to have coherent semantics in the core code here. For preregistration, host page size applies. For auto-mapping of RAM regions (as on x86), host IOMMU page size applies (which is probably the same as host page size, but it doesn't theoretically have to be). Guest page size kind of implicitly applies, since added RAM regions generally have to be target page aligned anyway. For guest controlled mapping, you're contrained by both the host iommu page size and the guest iommu page size.
On 07/14/2015 06:32 AM, Peter Crosthwaite wrote: > On Sun, Jul 12, 2015 at 11:15 PM, David Gibson > <david@gibson.dropbear.id.au> wrote: >> On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote: >>> These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to >>> a real host page size: >>> 4e51361d7 "cpu-all: complete "real" host page size API" and >>> f7ceed190 "vfio: cpu: Use "real" page size API" >>> >>> This finished the transition by: >>> - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/ >>> - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/ >>> - removing bitfield length for offsets in VFIOQuirk::data as >>> qemu_real_host_page_mask is not a macro >> >> This does not make much sense to me. f7ceed190 moved to >> REAL_HOST_PAGE_SIZE because it's back end stuff that really depends >> only on the host page size. >> >> Here you're applying a blanket change to vfio code, in particular to >> the DMA handling code, and for DMA both the host and target page size >> can be relevant, depending on the details of the IOMMU implementation. >> > > So the multi-arch work (for which f7ceed190 preps) does have a problem > that needs something like this. TARGET_PAGE_MASK and TARGET_PAGE_ALIGN > do need to go away from common code, or this has to be promoted to > cpu-specific code. Consequently, the page size is fixed to 4K for > multi-arch and this is not a good long-term limitation. Is the IOMMU > page size really tied to the CPU implementation? In practice this is > going to be the case, but IOMMU and CPU should be decoupleable. > > If vfio needs to respect a particular (or all?) CPUs/IOMMUs page > alignment then can we virtualise this as data rather than a macroified > constant? > > uint64_t page_align = 0; > > CPU_FOREACH(cpu, ...) { > CPUClass *cc = CPU_GET_CLASS(cpu); > > page_align = MAX(page_align, cc->page_size); > } > > /* This is a little more made up ... */ > IOMMU_FOREACH(iommu, ...) { > page_align = MAX(page_align, iommu->page_size); > } This assumes that IOMMU has a constant page size, is this always true? Cannot it be a (contigous?) set of different size chunks, for example, one per memory dimm?
diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 85ee9b0..d115ec9 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -321,6 +321,9 @@ static void vfio_listener_region_add(MemoryListener *listener, Int128 llend; void *vaddr; int ret; + const bool is_iommu = memory_region_is_iommu(section->mr); + const hwaddr page_mask = + is_iommu ? TARGET_PAGE_MASK : qemu_real_host_page_mask; if (vfio_listener_skipped_section(section)) { trace_vfio_listener_region_add_skip( @@ -330,16 +333,16 @@ static void vfio_listener_region_add(MemoryListener *listener, return; } - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != - (section->offset_within_region & ~TARGET_PAGE_MASK))) { + if (unlikely((section->offset_within_address_space & ~page_mask) != + (section->offset_within_region & ~page_mask))) { error_report("%s received unaligned region", __func__); return; } - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); + iova = ROUND_UP(section->offset_within_address_space, ~page_mask + 1); llend = int128_make64(section->offset_within_address_space); llend = int128_add(llend, section->size); - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); + llend = int128_and(llend, int128_exts64(page_mask)); if (int128_ge(int128_make64(iova), llend)) { return; @@ -347,7 +350,7 @@ static void vfio_listener_region_add(MemoryListener *listener, memory_region_ref(section->mr); - if (memory_region_is_iommu(section->mr)) { + if (is_iommu) { VFIOGuestIOMMU *giommu; trace_vfio_listener_region_add_iommu(iova, @@ -423,6 +426,9 @@ static void vfio_listener_region_del(MemoryListener *listener, iommu_data.type1.listener); hwaddr iova, end; int ret; + const bool is_iommu = memory_region_is_iommu(section->mr); + const hwaddr page_mask = + is_iommu ? TARGET_PAGE_MASK : qemu_real_host_page_mask; if (vfio_listener_skipped_section(section)) { trace_vfio_listener_region_del_skip( @@ -432,13 +438,13 @@ static void vfio_listener_region_del(MemoryListener *listener, return; } - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != - (section->offset_within_region & ~TARGET_PAGE_MASK))) { + if (unlikely((section->offset_within_address_space & ~page_mask) != + (section->offset_within_region & ~page_mask))) { error_report("%s received unaligned region", __func__); return; } - if (memory_region_is_iommu(section->mr)) { + if (is_iommu) { VFIOGuestIOMMU *giommu; QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) { @@ -459,9 +465,9 @@ static void vfio_listener_region_del(MemoryListener *listener, */ } - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); end = (section->offset_within_address_space + int128_get64(section->size)) & - TARGET_PAGE_MASK; + page_mask; if (iova >= end) { return; diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 2ed877f..7694afe 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -50,16 +50,16 @@ typedef struct VFIOQuirk { struct VFIOPCIDevice *vdev; QLIST_ENTRY(VFIOQuirk) next; struct { - uint32_t base_offset:TARGET_PAGE_BITS; - uint32_t address_offset:TARGET_PAGE_BITS; + uint32_t base_offset; + uint32_t address_offset; uint32_t address_size:3; uint32_t bar:3; uint32_t address_match; uint32_t address_mask; - uint32_t address_val:TARGET_PAGE_BITS; - uint32_t data_offset:TARGET_PAGE_BITS; + uint32_t address_val; + uint32_t data_offset; uint32_t data_size:3; uint8_t flags; @@ -1319,8 +1319,8 @@ static uint64_t vfio_generic_quirk_read(void *opaque, { VFIOQuirk *quirk = opaque; VFIOPCIDevice *vdev = quirk->vdev; - hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK; - hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK; + hwaddr base = quirk->data.address_match & qemu_real_host_page_mask; + hwaddr offset = quirk->data.address_match & ~qemu_real_host_page_mask; uint64_t data; if (vfio_flags_enabled(quirk->data.flags, quirk->data.read_flags) && @@ -1349,8 +1349,8 @@ static void vfio_generic_quirk_write(void *opaque, hwaddr addr, { VFIOQuirk *quirk = opaque; VFIOPCIDevice *vdev = quirk->vdev; - hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK; - hwaddr offset = quirk->data.address_match & ~TARGET_PAGE_MASK; + hwaddr base = quirk->data.address_match & qemu_real_host_page_mask; + hwaddr offset = quirk->data.address_match & ~qemu_real_host_page_mask; if (vfio_flags_enabled(quirk->data.flags, quirk->data.write_flags) && ranges_overlap(addr, size, offset, quirk->data.address_mask + 1)) { @@ -1650,9 +1650,9 @@ static void vfio_probe_ati_bar2_4000_quirk(VFIOPCIDevice *vdev, int nr) memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk, "vfio-ati-bar2-4000-quirk", - TARGET_PAGE_ALIGN(quirk->data.address_mask + 1)); + REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 1)); memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem, - quirk->data.address_match & TARGET_PAGE_MASK, + quirk->data.address_match & qemu_real_host_page_mask, &quirk->mem, 1); QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next); @@ -1888,7 +1888,7 @@ static void vfio_nvidia_88000_quirk_write(void *opaque, hwaddr addr, VFIOQuirk *quirk = opaque; VFIOPCIDevice *vdev = quirk->vdev; PCIDevice *pdev = &vdev->pdev; - hwaddr base = quirk->data.address_match & TARGET_PAGE_MASK; + hwaddr base = quirk->data.address_match & qemu_real_host_page_mask; vfio_generic_quirk_write(opaque, addr, data, size); @@ -1943,9 +1943,9 @@ static void vfio_probe_nvidia_bar0_88000_quirk(VFIOPCIDevice *vdev, int nr) memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_nvidia_88000_quirk, quirk, "vfio-nvidia-bar0-88000-quirk", - TARGET_PAGE_ALIGN(quirk->data.address_mask + 1)); + REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 1)); memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem, - quirk->data.address_match & TARGET_PAGE_MASK, + quirk->data.address_match & qemu_real_host_page_mask, &quirk->mem, 1); QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next); @@ -1980,9 +1980,9 @@ static void vfio_probe_nvidia_bar0_1800_quirk(VFIOPCIDevice *vdev, int nr) memory_region_init_io(&quirk->mem, OBJECT(vdev), &vfio_generic_quirk, quirk, "vfio-nvidia-bar0-1800-quirk", - TARGET_PAGE_ALIGN(quirk->data.address_mask + 1)); + REAL_HOST_PAGE_ALIGN(quirk->data.address_mask + 1)); memory_region_add_subregion_overlap(&vdev->bars[nr].region.mem, - quirk->data.address_match & TARGET_PAGE_MASK, + quirk->data.address_match & qemu_real_host_page_mask, &quirk->mem, 1); QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to a real host page size: 4e51361d7 "cpu-all: complete "real" host page size API" and f7ceed190 "vfio: cpu: Use "real" page size API" This finished the transition by: - %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/ - %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/ - removing bitfield length for offsets in VFIOQuirk::data as qemu_real_host_page_mask is not a macro This keeps using TARGET_PAGE_MASK for IOMMU regions though as it is the minimum page size which IOMMU regions may be using and at the moment memory regions do not carry the actual page size. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- In reality DMA windows are always a lot bigger than a single 4K page and aligned to 32/64MB, may be only use here qemu_real_host_page_mask? --- hw/vfio/common.c | 26 ++++++++++++++++---------- hw/vfio/pci.c | 30 +++++++++++++++--------------- 2 files changed, 31 insertions(+), 25 deletions(-)