diff mbox series

[qemu] vfio: Print address space address when cannot map MMIO for DMA

Message ID 20180322081837.21460-1-aik@ozlabs.ru
State New
Headers show
Series [qemu] vfio: Print address space address when cannot map MMIO for DMA | expand

Commit Message

Alexey Kardashevskiy March 22, 2018, 8:18 a.m. UTC
The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added
an error message if a passed memory section address or size is not aligned
to the minimal IOMMU page size. However although it checks
offset_within_address_space for the alignment, offset_within_region is
printed instead which makes it harder to find out what device caused
the message so this replaces offset_within_region with
offset_within_address_space.

While we are here, this replaces '..' with 'size=' (as the second number
is a size, not an end offset) and adds a memory region name.

Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

Message on slightly hacked QEMU (iommu pagesize=8K) looks now like this:

qemu-system-x86_64: Region "0000:00:1a.0 BAR 0 mmaps[0]" 0xfebc0000 size=0x1000 is not aligned to 0x2000 and cannot be mapped for DMA
---
 hw/vfio/common.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Eric Auger March 28, 2018, 9:03 p.m. UTC | #1
Hi Alexey, Alex,
On 22/03/18 09:18, Alexey Kardashevskiy wrote:
> The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added
> an error message if a passed memory section address or size is not aligned
> to the minimal IOMMU page size. However although it checks
> offset_within_address_space for the alignment, offset_within_region is
> printed instead which makes it harder to find out what device caused
> the message so this replaces offset_within_region with
> offset_within_address_space.
> 
> While we are here, this replaces '..' with 'size=' (as the second number
> is a size, not an end offset) and adds a memory region name.
> 
> Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions"
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
The patch indeed fixes the reported format issues.

However I have some other concerns with the info that is reported to the
end-user. See below.

Assigning an e1000e device with a 64kB host, here are the traces I get:

Region XXX is not aligned to 0x10000 and cannot be mapped for DMA

"0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
"0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
"0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
"0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
"0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
"0004:01:00.0 BAR 3 mmaps[0]" 0x100a4808 size=0xb7f8
"0004:01:00.0 BAR 3 mmaps[0]" 0x100e0050 size=0x3fb0
"0004:01:00.0 BAR 3 mmaps[0]" 0x100e4808 size=0xb7f8

It took me some time to understand what happens but here is now my
understanding:

1) When looking at vfio_pci_write_config() pdev->io_regions[bar].addr =
bar_addr in vfio_sub_page_bar_update_mapping() I see the following values:

UNMAPPED -> 0x0 ->UNMAPPED -> 0x100a0000 -> UNMAPPED -> 0x100a0000 ->
UNMAPPED -> 0x100e0000

vfio_sub_page_bar_update_mapping() create mrs with base bar at
0x100a0000 and 0x100e0000 successively, hence the
vfio_listener_region_add on 0x100axxxx. Indeed, 0x0-0x50 msix-table mmio
region induces some memory section at 0x100a0050 and 0x100e50 successively.

However this is confusing for the end-user who only has access to the
final mapping (0x100e0000) through lspi [1].

2) The changes in the size (0x3fb0 <-> 0xffb0) relate to the extension
of the 16kB bar to 64kB in vfio_sub_page_bar_update_mapping

3) Also it happens that I have a virtio-scsi-pci device that is put just
after the BAR3 at 0x100a4000 and 0x100e4000 successively. The device has
its own msi-table and pba mmio regions[2]. As mmaps[0] is extended to
64kB (with prio 0), we have those MMIO regions which result in new
memory sections, which cause vfio_listener_region_add calls. This
typically explains why we get a warning on 0x100e4808 (0xb7f8). By the
way I don't get why we don't have a trace for "0004:01:00.0 BAR 3
mmaps[0]" 0x100e4040 size=0x7c0, ie. mmaps[0] space between
virtio-scsi-pci msic-table and pba.

So at the end of the day, my fear is all those info become really
frightening and confusing for the end-user and even not relevant
(0x100a0000 stuff). So I would rather simply remove the trace in 2.12
until we find a place where we could generate a clear hint for the
end-user, suggesting to relocate the msix bar.

Thoughts?

Thanks

Eric


[1] lspci

Region 3: Memory at 104e0000 (32-bit, non-prefetchable) [size=16K]
Expansion ROM at 10480000 [disabled] [size=256K]
../..
Capabilities: [a0] MSI-X: Enable+ Count=5 Masked-
Vector table: BAR=3 offset=00000000
              PBA: BAR=3 offset=00002000

[2] info mtree (final)

100e0000-00000000100effff (prio 0, i/o): 0004:01:00.0 base BAR 3
 100e0000-00000000100e004f (prio 0, i/o): msix-table
 100e0000-00000000100effff (prio 0, i/o): 0004:01:00.0 BAR 3
  100e0000-00000000100effff (prio 0, ramd): 0004:01:00.0 BAR 3 mmaps[0]
      100e2000-00000000100e2007 (prio 0, i/o): msix-pba [disabled]
100e4000-00000000100e4fff (prio 1, i/o): virtio-scsi-pci-msix
 00000000100e4000-00000000100e403f (prio 0, i/o): msix-table
 00000000100e4800-00000000100e4807 (prio 0, i/o): msix-pba







> ---
> 
> Message on slightly hacked QEMU (iommu pagesize=8K) looks now like this:
> 
> qemu-system-x86_64: Region "0000:00:1a.0 BAR 0 mmaps[0]" 0xfebc0000 size=0x1000 is not aligned to 0x2000 and cannot be mapped for DMA
> ---
>  hw/vfio/common.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 5e84716..e2db596 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -548,10 +548,11 @@ static void vfio_listener_region_add(MemoryListener *listener,
>          hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
>  
>          if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
> -            error_report("Region 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx
> +            error_report("Region \"%s\" 0x%"HWADDR_PRIx" size=0x%"HWADDR_PRIx
>                           " is not aligned to 0x%"HWADDR_PRIx
>                           " and cannot be mapped for DMA",
> -                         section->offset_within_region,
> +                         memory_region_name(section->mr),
> +                         section->offset_within_address_space,
>                           int128_getlo(section->size),
>                           pgmask + 1);
>              return;
>
Alex Williamson March 28, 2018, 10:13 p.m. UTC | #2
On Wed, 28 Mar 2018 23:03:23 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Alexey, Alex,
> On 22/03/18 09:18, Alexey Kardashevskiy wrote:
> > The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added
> > an error message if a passed memory section address or size is not aligned
> > to the minimal IOMMU page size. However although it checks
> > offset_within_address_space for the alignment, offset_within_region is
> > printed instead which makes it harder to find out what device caused
> > the message so this replaces offset_within_region with
> > offset_within_address_space.
> > 
> > While we are here, this replaces '..' with 'size=' (as the second number
> > is a size, not an end offset) and adds a memory region name.
> > 
> > Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions"
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>  
> The patch indeed fixes the reported format issues.
> 
> However I have some other concerns with the info that is reported to the
> end-user. See below.
> 
> Assigning an e1000e device with a 64kB host, here are the traces I get:
> 
> Region XXX is not aligned to 0x10000 and cannot be mapped for DMA
> 
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a4808 size=0xb7f8
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e0050 size=0x3fb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e4808 size=0xb7f8
> 
> It took me some time to understand what happens but here is now my
> understanding:
> 
> 1) When looking at vfio_pci_write_config() pdev->io_regions[bar].addr =
> bar_addr in vfio_sub_page_bar_update_mapping() I see the following values:
> 
> UNMAPPED -> 0x0 ->UNMAPPED -> 0x100a0000 -> UNMAPPED -> 0x100a0000 ->
> UNMAPPED -> 0x100e0000
> 
> vfio_sub_page_bar_update_mapping() create mrs with base bar at
> 0x100a0000 and 0x100e0000 successively, hence the
> vfio_listener_region_add on 0x100axxxx. Indeed, 0x0-0x50 msix-table mmio
> region induces some memory section at 0x100a0050 and 0x100e50 successively.
> 
> However this is confusing for the end-user who only has access to the
> final mapping (0x100e0000) through lspi [1].
> 
> 2) The changes in the size (0x3fb0 <-> 0xffb0) relate to the extension
> of the 16kB bar to 64kB in vfio_sub_page_bar_update_mapping
> 
> 3) Also it happens that I have a virtio-scsi-pci device that is put just
> after the BAR3 at 0x100a4000 and 0x100e4000 successively. The device has
> its own msi-table and pba mmio regions[2]. As mmaps[0] is extended to
> 64kB (with prio 0), we have those MMIO regions which result in new
> memory sections, which cause vfio_listener_region_add calls. This
> typically explains why we get a warning on 0x100e4808 (0xb7f8). By the
> way I don't get why we don't have a trace for "0004:01:00.0 BAR 3
> mmaps[0]" 0x100e4040 size=0x7c0, ie. mmaps[0] space between
> virtio-scsi-pci msic-table and pba.
> 
> So at the end of the day, my fear is all those info become really
> frightening and confusing for the end-user and even not relevant
> (0x100a0000 stuff). So I would rather simply remove the trace in 2.12
> until we find a place where we could generate a clear hint for the
> end-user, suggesting to relocate the msix bar.
> 
> Thoughts?

Yep, I think that's probably the right approach.  Everything works as
it should and nothing has worse access in 2.12 than it did in 2.11,
there's only the opportunity to make things better with msi-x
relocation and I don't think we want to error on the side of reporting
too many errors that users cannot understand in an attempt to advise
them of an unsupported option that might be better.  Let's remove the
error report for 2.12 and think about how we could make a concise
suggestion, once, while initializing the device.  Who's posting the
patch?  Thanks,

Alex

PS - Why isn't the firmware/kernel on aarch64 making an attempt to
align PCI resources on page boundaries?  Does
pci=realloc,resource_alignment=pci:ffffffff:ffffffff change it? (I'm
not sure if PCI_ANY_ID works for that option)
Alexey Kardashevskiy March 29, 2018, 1:55 a.m. UTC | #3
On 29/3/18 8:03 am, Auger Eric wrote:
> Hi Alexey, Alex,
> On 22/03/18 09:18, Alexey Kardashevskiy wrote:
>> The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added
>> an error message if a passed memory section address or size is not aligned
>> to the minimal IOMMU page size. However although it checks
>> offset_within_address_space for the alignment, offset_within_region is
>> printed instead which makes it harder to find out what device caused
>> the message so this replaces offset_within_region with
>> offset_within_address_space.
>>
>> While we are here, this replaces '..' with 'size=' (as the second number
>> is a size, not an end offset) and adds a memory region name.
>>
>> Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions"
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> The patch indeed fixes the reported format issues.
> 
> However I have some other concerns with the info that is reported to the
> end-user. See below.
> 
> Assigning an e1000e device with a 64kB host, here are the traces I get:
> 
> Region XXX is not aligned to 0x10000 and cannot be mapped for DMA
> 
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a4808 size=0xb7f8
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e0050 size=0x3fb0
> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e4808 size=0xb7f8
> 
> It took me some time to understand what happens but here is now my
> understanding:
> 
> 1) When looking at vfio_pci_write_config() pdev->io_regions[bar].addr =
> bar_addr in vfio_sub_page_bar_update_mapping() I see the following values:
> 
> UNMAPPED -> 0x0 ->UNMAPPED -> 0x100a0000 -> UNMAPPED -> 0x100a0000 ->
> UNMAPPED -> 0x100e0000
> 
> vfio_sub_page_bar_update_mapping() create mrs with base bar at
> 0x100a0000 and 0x100e0000 successively, hence the
> vfio_listener_region_add on 0x100axxxx. Indeed, 0x0-0x50 msix-table mmio
> region induces some memory section at 0x100a0050 and 0x100e50 successively.
> 
> However this is confusing for the end-user who only has access to the
> final mapping (0x100e0000) through lspi [1].


The trace shows that at least at some point the BAR actually was
0x100a0000, I find this info rather useful than confusing as it might
expose a bug of some sort, for example.

The user also has access to the MR name which is the host PCI address + BAR
index, how is that confusing?


> 2) The changes in the size (0x3fb0 <-> 0xffb0) relate to the extension
> of the 16kB bar to 64kB in vfio_sub_page_bar_update_mapping
>> 3) Also it happens that I have a virtio-scsi-pci device that is put just
> after the BAR3 at 0x100a4000 and 0x100e4000 successively. The device has

e1000e gets aligned to 64k but this one avoids the alignment for some reason?


> its own msi-table and pba mmio regions[2]. As mmaps[0] is extended to
> 64kB (with prio 0), we have those MMIO regions which result in new
> memory sections, which cause vfio_listener_region_add calls. This
> typically explains why we get a warning on 0x100e4808 (0xb7f8). By the
> way I don't get why we don't have a trace for "0004:01:00.0 BAR 3
> mmaps[0]" 0x100e4040 size=0x7c0, ie. mmaps[0] space between
> virtio-scsi-pci msic-table and pba.


"info mtree -f" might give a hint how MRs got resolved, could it end up
being emulated (==skipped by the vfio listener)?


> So at the end of the day, my fear is all those info become really
> frightening and confusing for the end-user and even not relevant
> (0x100a0000 stuff). So I would rather simply remove the trace in 2.12
> until we find a place where we could generate a clear hint for the
> end-user, suggesting to relocate the msix bar.
> 
> Thoughts?

Please post complete "lspci -v" output for both pci devices and "info mtree
-f" (in addition to "info mtree", not instead).

In general, the error_report() could be removed as we did not have any
indication of not mapping before so we do not have to start now, I am just
missing the point here - the message exposes potentially not-working P2P
which is useful for people who care about that and do not know if actually
might work. Rather than silencing it, I'd convert it into the trace point.

Thanks,


> Thanks
> 
> Eric
> 
> 
> [1] lspci
> 
> Region 3: Memory at 104e0000 (32-bit, non-prefetchable) [size=16K]
> Expansion ROM at 10480000 [disabled] [size=256K]
> ../..
> Capabilities: [a0] MSI-X: Enable+ Count=5 Masked-
> Vector table: BAR=3 offset=00000000
>               PBA: BAR=3 offset=00002000
> 
> [2] info mtree (final)
> 
> 100e0000-00000000100effff (prio 0, i/o): 0004:01:00.0 base BAR 3
>  100e0000-00000000100e004f (prio 0, i/o): msix-table
>  100e0000-00000000100effff (prio 0, i/o): 0004:01:00.0 BAR 3
>   100e0000-00000000100effff (prio 0, ramd): 0004:01:00.0 BAR 3 mmaps[0]
>       100e2000-00000000100e2007 (prio 0, i/o): msix-pba [disabled]
> 100e4000-00000000100e4fff (prio 1, i/o): virtio-scsi-pci-msix
>  00000000100e4000-00000000100e403f (prio 0, i/o): msix-table
>  00000000100e4800-00000000100e4807 (prio 0, i/o): msix-pba
> 
> 
> 
> 
> 
> 
> 
>> ---
>>
>> Message on slightly hacked QEMU (iommu pagesize=8K) looks now like this:
>>
>> qemu-system-x86_64: Region "0000:00:1a.0 BAR 0 mmaps[0]" 0xfebc0000 size=0x1000 is not aligned to 0x2000 and cannot be mapped for DMA
>> ---
>>  hw/vfio/common.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 5e84716..e2db596 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -548,10 +548,11 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>          hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
>>  
>>          if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
>> -            error_report("Region 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx
>> +            error_report("Region \"%s\" 0x%"HWADDR_PRIx" size=0x%"HWADDR_PRIx
>>                           " is not aligned to 0x%"HWADDR_PRIx
>>                           " and cannot be mapped for DMA",
>> -                         section->offset_within_region,
>> +                         memory_region_name(section->mr),
>> +                         section->offset_within_address_space,
>>                           int128_getlo(section->size),
>>                           pgmask + 1);
>>              return;
>>
Eric Auger March 29, 2018, 10:14 a.m. UTC | #4
Hi Alexey,
On 29/03/18 03:55, Alexey Kardashevskiy wrote:
> On 29/3/18 8:03 am, Auger Eric wrote:
>> Hi Alexey, Alex,
>> On 22/03/18 09:18, Alexey Kardashevskiy wrote:
>>> The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added
>>> an error message if a passed memory section address or size is not aligned
>>> to the minimal IOMMU page size. However although it checks
>>> offset_within_address_space for the alignment, offset_within_region is
>>> printed instead which makes it harder to find out what device caused
>>> the message so this replaces offset_within_region with
>>> offset_within_address_space.
>>>
>>> While we are here, this replaces '..' with 'size=' (as the second number
>>> is a size, not an end offset) and adds a memory region name.
>>>
>>> Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions"
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> The patch indeed fixes the reported format issues.
>>
>> However I have some other concerns with the info that is reported to the
>> end-user. See below.
>>
>> Assigning an e1000e device with a 64kB host, here are the traces I get:
>>
>> Region XXX is not aligned to 0x10000 and cannot be mapped for DMA
>>
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a4808 size=0xb7f8
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e0050 size=0x3fb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e4808 size=0xb7f8
>>
>> It took me some time to understand what happens but here is now my
>> understanding:
>>
>> 1) When looking at vfio_pci_write_config() pdev->io_regions[bar].addr =
>> bar_addr in vfio_sub_page_bar_update_mapping() I see the following values:
>>
>> UNMAPPED -> 0x0 ->UNMAPPED -> 0x100a0000 -> UNMAPPED -> 0x100a0000 ->
>> UNMAPPED -> 0x100e0000
>>
>> vfio_sub_page_bar_update_mapping() create mrs with base bar at
>> 0x100a0000 and 0x100e0000 successively, hence the
>> vfio_listener_region_add on 0x100axxxx. Indeed, 0x0-0x50 msix-table mmio
>> region induces some memory section at 0x100a0050 and 0x100e50 successively.
>>
>> However this is confusing for the end-user who only has access to the
>> final mapping (0x100e0000) through lspi [1].
> 
> 
> The trace shows that at least at some point the BAR actually was
> 0x100a0000, I find this info rather useful than confusing as it might
> expose a bug of some sort, for example.
> 
> The user also has access to the MR name which is the host PCI address + BAR
> index, how is that confusing?

To me it is confusing since it does not match the final location of bar3
as output by lspci. I couldn't understanding how 0x100axxxx related to bar3.
> 
> 
>> 2) The changes in the size (0x3fb0 <-> 0xffb0) relate to the extension
>> of the 16kB bar to 64kB in vfio_sub_page_bar_update_mapping
>>> 3) Also it happens that I have a virtio-scsi-pci device that is put just
>> after the BAR3 at 0x100a4000 and 0x100e4000 successively. The device has
> 
> e1000e gets aligned to 64k but this one avoids the alignment for some reason?

Yes I will enquire about the allocation policy
> 
> 
>> its own msi-table and pba mmio regions[2]. As mmaps[0] is extended to
>> 64kB (with prio 0), we have those MMIO regions which result in new
>> memory sections, which cause vfio_listener_region_add calls. This
>> typically explains why we get a warning on 0x100e4808 (0xb7f8). By the
>> way I don't get why we don't have a trace for "0004:01:00.0 BAR 3
>> mmaps[0]" 0x100e4040 size=0x7c0, ie. mmaps[0] space between
>> virtio-scsi-pci msic-table and pba.
> 
> 
> "info mtree -f" might give a hint how MRs got resolved, could it end up
> being emulated (==skipped by the vfio listener)?

Actually that's what is strange as I can see it in info mtree -f output. See at the end of the mail.
> 
> 
>> So at the end of the day, my fear is all those info become really
>> frightening and confusing for the end-user and even not relevant
>> (0x100a0000 stuff). So I would rather simply remove the trace in 2.12
>> until we find a place where we could generate a clear hint for the
>> end-user, suggesting to relocate the msix bar.
>>
>> Thoughts?
> 
> Please post complete "lspci -v" output for both pci devices and "info mtree
> -f" (in addition to "info mtree", not instead).

see at the end
> 
> In general, the error_report() could be removed as we did not have any
> indication of not mapping before so we do not have to start now, I am just
> missing the point here - the message exposes potentially not-working P2P
> which is useful for people who care about that and do not know if actually
> might work. Rather than silencing it, I'd convert it into the trace point.

But typically output that
Region "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0 cannot be DMA mapped
is not strictly correct (by chance it was not re-allocated to something else, right?)

Thanks

Eric
> 
> Thanks,
> 
> 
>> Thanks
>>
>> Eric
>>
>>
>> [1] lspci
>>
>> Region 3: Memory at 104e0000 (32-bit, non-prefetchable) [size=16K]
>> Expansion ROM at 10480000 [disabled] [size=256K]
>> ../..
>> Capabilities: [a0] MSI-X: Enable+ Count=5 Masked-
>> Vector table: BAR=3 offset=00000000
>>               PBA: BAR=3 offset=00002000
>>
>> [2] info mtree (final)
>>
>> 100e0000-00000000100effff (prio 0, i/o): 0004:01:00.0 base BAR 3
>>  100e0000-00000000100e004f (prio 0, i/o): msix-table
>>  100e0000-00000000100effff (prio 0, i/o): 0004:01:00.0 BAR 3
>>   100e0000-00000000100effff (prio 0, ramd): 0004:01:00.0 BAR 3 mmaps[0]
>>       100e2000-00000000100e2007 (prio 0, i/o): msix-pba [disabled]
>> 100e4000-00000000100e4fff (prio 1, i/o): virtio-scsi-pci-msix
>>  00000000100e4000-00000000100e403f (prio 0, i/o): msix-table
>>  00000000100e4800-00000000100e4807 (prio 0, i/o): msix-pba
>>
>>
>>
>>
>>
>>
>>
>>> ---
>>>
>>> Message on slightly hacked QEMU (iommu pagesize=8K) looks now like this:
>>>
>>> qemu-system-x86_64: Region "0000:00:1a.0 BAR 0 mmaps[0]" 0xfebc0000 size=0x1000 is not aligned to 0x2000 and cannot be mapped for DMA
>>> ---
>>>  hw/vfio/common.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 5e84716..e2db596 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -548,10 +548,11 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>>          hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
>>>  
>>>          if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
>>> -            error_report("Region 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx
>>> +            error_report("Region \"%s\" 0x%"HWADDR_PRIx" size=0x%"HWADDR_PRIx
>>>                           " is not aligned to 0x%"HWADDR_PRIx
>>>                           " and cannot be mapped for DMA",
>>> -                         section->offset_within_region,
>>> +                         memory_region_name(section->mr),
>>> +                         section->offset_within_address_space,
>>>                           int128_getlo(section->size),
>>>                           pgmask + 1);
>>>              return;
>>>
> 
> 

(qemu) info mtree -f
info mtree -f
FlatView #0
 AS "memory", root: system
 AS "cpu-memory-0", root: system
 AS "cpu-memory-1", root: system
 AS "cpu-memory-2", root: system
 AS "cpu-memory-3", root: system
 AS "vfio-pci", root: bus master container
 AS "virtio-scsi-pci", root: bus master container
 Root memory region: system
  0000000000000000-0000000003ffffff (prio 0, romd): virt.flash0
  0000000004000000-0000000007ffffff (prio 0, romd): virt.flash1
  0000000008000000-000000000800ffff (prio 0, i/o): gicv3_dist
  0000000008080000-000000000808ffff (prio 0, i/o): control
  0000000008090000-000000000809ffff (prio 0, i/o): translation
  00000000080a0000-000000000811ffff (prio 0, i/o): gicv3_redist
  0000000009000000-0000000009000fff (prio 0, i/o): pl011
  0000000009010000-0000000009010fff (prio 0, i/o): pl031
  0000000009020000-0000000009020007 (prio 0, i/o): fwcfg.data
  0000000009020008-0000000009020009 (prio 0, i/o): fwcfg.ctl
  0000000009020010-0000000009020017 (prio 0, i/o): fwcfg.dma
  0000000009030000-0000000009030fff (prio 0, i/o): pl061
  000000000a000000-000000000a0001ff (prio 0, i/o): virtio-mmio
  000000000a000200-000000000a0003ff (prio 0, i/o): virtio-mmio
  000000000a000400-000000000a0005ff (prio 0, i/o): virtio-mmio
  000000000a000600-000000000a0007ff (prio 0, i/o): virtio-mmio
  000000000a000800-000000000a0009ff (prio 0, i/o): virtio-mmio
  000000000a000a00-000000000a000bff (prio 0, i/o): virtio-mmio
  000000000a000c00-000000000a000dff (prio 0, i/o): virtio-mmio
  000000000a000e00-000000000a000fff (prio 0, i/o): virtio-mmio
  000000000a001000-000000000a0011ff (prio 0, i/o): virtio-mmio
  000000000a001200-000000000a0013ff (prio 0, i/o): virtio-mmio
  000000000a001400-000000000a0015ff (prio 0, i/o): virtio-mmio
  000000000a001600-000000000a0017ff (prio 0, i/o): virtio-mmio
  000000000a001800-000000000a0019ff (prio 0, i/o): virtio-mmio
  000000000a001a00-000000000a001bff (prio 0, i/o): virtio-mmio
  000000000a001c00-000000000a001dff (prio 0, i/o): virtio-mmio
  000000000a001e00-000000000a001fff (prio 0, i/o): virtio-mmio
  000000000a002000-000000000a0021ff (prio 0, i/o): virtio-mmio
  000000000a002200-000000000a0023ff (prio 0, i/o): virtio-mmio
  000000000a002400-000000000a0025ff (prio 0, i/o): virtio-mmio
  000000000a002600-000000000a0027ff (prio 0, i/o): virtio-mmio
  000000000a002800-000000000a0029ff (prio 0, i/o): virtio-mmio
  000000000a002a00-000000000a002bff (prio 0, i/o): virtio-mmio
  000000000a002c00-000000000a002dff (prio 0, i/o): virtio-mmio
  000000000a002e00-000000000a002fff (prio 0, i/o): virtio-mmio
  000000000a003000-000000000a0031ff (prio 0, i/o): virtio-mmio
  000000000a003200-000000000a0033ff (prio 0, i/o): virtio-mmio
  000000000a003400-000000000a0035ff (prio 0, i/o): virtio-mmio
  000000000a003600-000000000a0037ff (prio 0, i/o): virtio-mmio
  000000000a003800-000000000a0039ff (prio 0, i/o): virtio-mmio
  000000000a003a00-000000000a003bff (prio 0, i/o): virtio-mmio
  000000000a003c00-000000000a003dff (prio 0, i/o): virtio-mmio
  000000000a003e00-000000000a003fff (prio 0, i/o): virtio-mmio
  0000000010000000-000000001007ffff (prio 0, ramd): 0004:01:00.0 BAR 1 mmaps[0]
  00000000100c0000-00000000100dffff (prio 0, ramd): 0004:01:00.0 BAR 0 mmaps[0]
  00000000100e0000-00000000100e004f (prio 0, i/o): msix-table
  00000000100e0050-00000000100e3fff (prio 0, ramd): 0004:01:00.0 BAR 3 mmaps[0] @0000000000000050
  00000000100e4000-00000000100e403f (prio 0, i/o): msix-table
  00000000100e4040-00000000100e47ff (prio 0, ramd): 0004:01:00.0 BAR 3 mmaps[0] @0000000000004040
  00000000100e4800-00000000100e4807 (prio 0, i/o): msix-pba
  00000000100e4808-00000000100effff (prio 0, ramd): 0004:01:00.0 BAR 3 mmaps[0] @0000000000004808
  000000003eff1000-000000003eff103f (prio 1, i/o): virtio-pci
  000000003f000000-000000003fffffff (prio 0, i/o): pcie-mmcfg-mmio
  0000000040000000-000000023fffffff (prio 0, ram): mach-virt.ram
  0000008000000000-0000008000000fff (prio 0, i/o): virtio-pci-common
  0000008000001000-0000008000001fff (prio 0, i/o): virtio-pci-isr
  0000008000002000-0000008000002fff (prio 0, i/o): virtio-pci-device
  0000008000003000-0000008000003fff (prio 0, i/o): virtio-pci-notify

FlatView #1
 AS "I/O", root: io
 Root memory region: io
  0000000000000000-000000000000ffff (prio 0, i/o): io

FlatView #2
 AS "gpex-root", root: bus master container
 Root memory region: (none)
  No rendered FlatView

(qemu)
(qemu)

(qemu) info mtree
info mtree
address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-0000000003ffffff (prio 0, romd): virt.flash0
    0000000004000000-0000000007ffffff (prio 0, romd): virt.flash1
    0000000008000000-000000000800ffff (prio 0, i/o): gicv3_dist
    0000000008080000-000000000809ffff (prio 0, i/o): gicv3_its
      0000000008080000-000000000808ffff (prio 0, i/o): control
      0000000008090000-000000000809ffff (prio 0, i/o): translation
    00000000080a0000-000000000811ffff (prio 0, i/o): gicv3_redist
    0000000009000000-0000000009000fff (prio 0, i/o): pl011
    0000000009010000-0000000009010fff (prio 0, i/o): pl031
    0000000009020000-0000000009020007 (prio 0, i/o): fwcfg.data
    0000000009020008-0000000009020009 (prio 0, i/o): fwcfg.ctl
    0000000009020010-0000000009020017 (prio 0, i/o): fwcfg.dma
    0000000009030000-0000000009030fff (prio 0, i/o): pl061
    000000000a000000-000000000a0001ff (prio 0, i/o): virtio-mmio
    000000000a000200-000000000a0003ff (prio 0, i/o): virtio-mmio
    000000000a000400-000000000a0005ff (prio 0, i/o): virtio-mmio
    000000000a000600-000000000a0007ff (prio 0, i/o): virtio-mmio
    000000000a000800-000000000a0009ff (prio 0, i/o): virtio-mmio
    000000000a000a00-000000000a000bff (prio 0, i/o): virtio-mmio
    000000000a000c00-000000000a000dff (prio 0, i/o): virtio-mmio
    000000000a000e00-000000000a000fff (prio 0, i/o): virtio-mmio
    000000000a001000-000000000a0011ff (prio 0, i/o): virtio-mmio
    000000000a001200-000000000a0013ff (prio 0, i/o): virtio-mmio
    000000000a001400-000000000a0015ff (prio 0, i/o): virtio-mmio
    000000000a001600-000000000a0017ff (prio 0, i/o): virtio-mmio
    000000000a001800-000000000a0019ff (prio 0, i/o): virtio-mmio
    000000000a001a00-000000000a001bff (prio 0, i/o): virtio-mmio
    000000000a001c00-000000000a001dff (prio 0, i/o): virtio-mmio
    000000000a001e00-000000000a001fff (prio 0, i/o): virtio-mmio
    000000000a002000-000000000a0021ff (prio 0, i/o): virtio-mmio
    000000000a002200-000000000a0023ff (prio 0, i/o): virtio-mmio
    000000000a002400-000000000a0025ff (prio 0, i/o): virtio-mmio
    000000000a002600-000000000a0027ff (prio 0, i/o): virtio-mmio
    000000000a002800-000000000a0029ff (prio 0, i/o): virtio-mmio
    000000000a002a00-000000000a002bff (prio 0, i/o): virtio-mmio
    000000000a002c00-000000000a002dff (prio 0, i/o): virtio-mmio
    000000000a002e00-000000000a002fff (prio 0, i/o): virtio-mmio
    000000000a003000-000000000a0031ff (prio 0, i/o): virtio-mmio
    000000000a003200-000000000a0033ff (prio 0, i/o): virtio-mmio
    000000000a003400-000000000a0035ff (prio 0, i/o): virtio-mmio
    000000000a003600-000000000a0037ff (prio 0, i/o): virtio-mmio
    000000000a003800-000000000a0039ff (prio 0, i/o): virtio-mmio
    000000000a003a00-000000000a003bff (prio 0, i/o): virtio-mmio
    000000000a003c00-000000000a003dff (prio 0, i/o): virtio-mmio
    000000000a003e00-000000000a003fff (prio 0, i/o): virtio-mmio
    000000000c000000-000000000dffffff (prio 0, i/o): platform bus
    0000000010000000-000000003efeffff (prio 0, i/o): alias pcie-mmio @gpex_mmio 0000000010000000-000000003efeffff
    000000003eff0000-000000003effffff (prio 0, i/o): gpex_ioport
      000000003eff1000-000000003eff103f (prio 1, i/o): virtio-pci
    000000003f000000-000000003fffffff (prio 0, i/o): alias pcie-ecam @pcie-mmcfg-mmio 0000000000000000-0000000000ffffff
    0000000040000000-000000023fffffff (prio 0, ram): mach-virt.ram
    0000008000000000-000000ffffffffff (prio 0, i/o): alias pcie-mmio-high @gpex_mmio 0000008000000000-000000ffffffffff

address-space: I/O
  0000000000000000-000000000000ffff (prio 0, i/o): io

address-space: cpu-memory-0
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-0000000003ffffff (prio 0, romd): virt.flash0
    0000000004000000-0000000007ffffff (prio 0, romd): virt.flash1
    0000000008000000-000000000800ffff (prio 0, i/o): gicv3_dist
    0000000008080000-000000000809ffff (prio 0, i/o): gicv3_its
      0000000008080000-000000000808ffff (prio 0, i/o): control
      0000000008090000-000000000809ffff (prio 0, i/o): translation
    00000000080a0000-000000000811ffff (prio 0, i/o): gicv3_redist
    0000000009000000-0000000009000fff (prio 0, i/o): pl011
    0000000009010000-0000000009010fff (prio 0, i/o): pl031
    0000000009020000-0000000009020007 (prio 0, i/o): fwcfg.data
    0000000009020008-0000000009020009 (prio 0, i/o): fwcfg.ctl
    0000000009020010-0000000009020017 (prio 0, i/o): fwcfg.dma
    0000000009030000-0000000009030fff (prio 0, i/o): pl061
    000000000a000000-000000000a0001ff (prio 0, i/o): virtio-mmio
    000000000a000200-000000000a0003ff (prio 0, i/o): virtio-mmio
    000000000a000400-000000000a0005ff (prio 0, i/o): virtio-mmio
    000000000a000600-000000000a0007ff (prio 0, i/o): virtio-mmio
    000000000a000800-000000000a0009ff (prio 0, i/o): virtio-mmio
    000000000a000a00-000000000a000bff (prio 0, i/o): virtio-mmio
    000000000a000c00-000000000a000dff (prio 0, i/o): virtio-mmio
    000000000a000e00-000000000a000fff (prio 0, i/o): virtio-mmio
    000000000a001000-000000000a0011ff (prio 0, i/o): virtio-mmio
    000000000a001200-000000000a0013ff (prio 0, i/o): virtio-mmio
    000000000a001400-000000000a0015ff (prio 0, i/o): virtio-mmio
    000000000a001600-000000000a0017ff (prio 0, i/o): virtio-mmio
    000000000a001800-000000000a0019ff (prio 0, i/o): virtio-mmio
    000000000a001a00-000000000a001bff (prio 0, i/o): virtio-mmio
    000000000a001c00-000000000a001dff (prio 0, i/o): virtio-mmio
    000000000a001e00-000000000a001fff (prio 0, i/o): virtio-mmio
    000000000a002000-000000000a0021ff (prio 0, i/o): virtio-mmio
    000000000a002200-000000000a0023ff (prio 0, i/o): virtio-mmio
    000000000a002400-000000000a0025ff (prio 0, i/o): virtio-mmio
    000000000a002600-000000000a0027ff (prio 0, i/o): virtio-mmio
    000000000a002800-000000000a0029ff (prio 0, i/o): virtio-mmio
    000000000a002a00-000000000a002bff (prio 0, i/o): virtio-mmio
    000000000a002c00-000000000a002dff (prio 0, i/o): virtio-mmio
    000000000a002e00-000000000a002fff (prio 0, i/o): virtio-mmio
    000000000a003000-000000000a0031ff (prio 0, i/o): virtio-mmio
    000000000a003200-000000000a0033ff (prio 0, i/o): virtio-mmio
    000000000a003400-000000000a0035ff (prio 0, i/o): virtio-mmio
    000000000a003600-000000000a0037ff (prio 0, i/o): virtio-mmio
    000000000a003800-000000000a0039ff (prio 0, i/o): virtio-mmio
    000000000a003a00-000000000a003bff (prio 0, i/o): virtio-mmio
    000000000a003c00-000000000a003dff (prio 0, i/o): virtio-mmio
    000000000a003e00-000000000a003fff (prio 0, i/o): virtio-mmio
    000000000c000000-000000000dffffff (prio 0, i/o): platform bus
    0000000010000000-000000003efeffff (prio 0, i/o): alias pcie-mmio @gpex_mmio 0000000010000000-000000003efeffff
    000000003eff0000-000000003effffff (prio 0, i/o): gpex_ioport
      000000003eff1000-000000003eff103f (prio 1, i/o): virtio-pci
    000000003f000000-000000003fffffff (prio 0, i/o): alias pcie-ecam @pcie-mmcfg-mmio 0000000000000000-0000000000ffffff
    0000000040000000-000000023fffffff (prio 0, ram): mach-virt.ram
    0000008000000000-000000ffffffffff (prio 0, i/o): alias pcie-mmio-high @gpex_mmio 0000008000000000-000000ffffffffff

address-space: cpu-memory-1
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-0000000003ffffff (prio 0, romd): virt.flash0
    0000000004000000-0000000007ffffff (prio 0, romd): virt.flash1
    0000000008000000-000000000800ffff (prio 0, i/o): gicv3_dist
    0000000008080000-000000000809ffff (prio 0, i/o): gicv3_its
      0000000008080000-000000000808ffff (prio 0, i/o): control
      0000000008090000-000000000809ffff (prio 0, i/o): translation
    00000000080a0000-000000000811ffff (prio 0, i/o): gicv3_redist
    0000000009000000-0000000009000fff (prio 0, i/o): pl011
    0000000009010000-0000000009010fff (prio 0, i/o): pl031
    0000000009020000-0000000009020007 (prio 0, i/o): fwcfg.data
    0000000009020008-0000000009020009 (prio 0, i/o): fwcfg.ctl
    0000000009020010-0000000009020017 (prio 0, i/o): fwcfg.dma
    0000000009030000-0000000009030fff (prio 0, i/o): pl061
    000000000a000000-000000000a0001ff (prio 0, i/o): virtio-mmio
    000000000a000200-000000000a0003ff (prio 0, i/o): virtio-mmio
    000000000a000400-000000000a0005ff (prio 0, i/o): virtio-mmio
    000000000a000600-000000000a0007ff (prio 0, i/o): virtio-mmio
    000000000a000800-000000000a0009ff (prio 0, i/o): virtio-mmio
    000000000a000a00-000000000a000bff (prio 0, i/o): virtio-mmio
    000000000a000c00-000000000a000dff (prio 0, i/o): virtio-mmio
    000000000a000e00-000000000a000fff (prio 0, i/o): virtio-mmio
    000000000a001000-000000000a0011ff (prio 0, i/o): virtio-mmio
    000000000a001200-000000000a0013ff (prio 0, i/o): virtio-mmio
    000000000a001400-000000000a0015ff (prio 0, i/o): virtio-mmio
    000000000a001600-000000000a0017ff (prio 0, i/o): virtio-mmio
    000000000a001800-000000000a0019ff (prio 0, i/o): virtio-mmio
    000000000a001a00-000000000a001bff (prio 0, i/o): virtio-mmio
    000000000a001c00-000000000a001dff (prio 0, i/o): virtio-mmio
    000000000a001e00-000000000a001fff (prio 0, i/o): virtio-mmio
    000000000a002000-000000000a0021ff (prio 0, i/o): virtio-mmio
    000000000a002200-000000000a0023ff (prio 0, i/o): virtio-mmio
    000000000a002400-000000000a0025ff (prio 0, i/o): virtio-mmio
    000000000a002600-000000000a0027ff (prio 0, i/o): virtio-mmio
    000000000a002800-000000000a0029ff (prio 0, i/o): virtio-mmio
    000000000a002a00-000000000a002bff (prio 0, i/o): virtio-mmio
    000000000a002c00-000000000a002dff (prio 0, i/o): virtio-mmio
    000000000a002e00-000000000a002fff (prio 0, i/o): virtio-mmio
    000000000a003000-000000000a0031ff (prio 0, i/o): virtio-mmio
    000000000a003200-000000000a0033ff (prio 0, i/o): virtio-mmio
    000000000a003400-000000000a0035ff (prio 0, i/o): virtio-mmio
    000000000a003600-000000000a0037ff (prio 0, i/o): virtio-mmio
    000000000a003800-000000000a0039ff (prio 0, i/o): virtio-mmio
    000000000a003a00-000000000a003bff (prio 0, i/o): virtio-mmio
    000000000a003c00-000000000a003dff (prio 0, i/o): virtio-mmio
    000000000a003e00-000000000a003fff (prio 0, i/o): virtio-mmio
    000000000c000000-000000000dffffff (prio 0, i/o): platform bus
    0000000010000000-000000003efeffff (prio 0, i/o): alias pcie-mmio @gpex_mmio 0000000010000000-000000003efeffff
    000000003eff0000-000000003effffff (prio 0, i/o): gpex_ioport
      000000003eff1000-000000003eff103f (prio 1, i/o): virtio-pci
    000000003f000000-000000003fffffff (prio 0, i/o): alias pcie-ecam @pcie-mmcfg-mmio 0000000000000000-0000000000ffffff
    0000000040000000-000000023fffffff (prio 0, ram): mach-virt.ram
    0000008000000000-000000ffffffffff (prio 0, i/o): alias pcie-mmio-high @gpex_mmio 0000008000000000-000000ffffffffff

address-space: cpu-memory-2
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-0000000003ffffff (prio 0, romd): virt.flash0
    0000000004000000-0000000007ffffff (prio 0, romd): virt.flash1
    0000000008000000-000000000800ffff (prio 0, i/o): gicv3_dist
    0000000008080000-000000000809ffff (prio 0, i/o): gicv3_its
      0000000008080000-000000000808ffff (prio 0, i/o): control
      0000000008090000-000000000809ffff (prio 0, i/o): translation
    00000000080a0000-000000000811ffff (prio 0, i/o): gicv3_redist
    0000000009000000-0000000009000fff (prio 0, i/o): pl011
    0000000009010000-0000000009010fff (prio 0, i/o): pl031
    0000000009020000-0000000009020007 (prio 0, i/o): fwcfg.data
    0000000009020008-0000000009020009 (prio 0, i/o): fwcfg.ctl
    0000000009020010-0000000009020017 (prio 0, i/o): fwcfg.dma
    0000000009030000-0000000009030fff (prio 0, i/o): pl061
    000000000a000000-000000000a0001ff (prio 0, i/o): virtio-mmio
    000000000a000200-000000000a0003ff (prio 0, i/o): virtio-mmio
    000000000a000400-000000000a0005ff (prio 0, i/o): virtio-mmio
    000000000a000600-000000000a0007ff (prio 0, i/o): virtio-mmio
    000000000a000800-000000000a0009ff (prio 0, i/o): virtio-mmio
    000000000a000a00-000000000a000bff (prio 0, i/o): virtio-mmio
    000000000a000c00-000000000a000dff (prio 0, i/o): virtio-mmio
    000000000a000e00-000000000a000fff (prio 0, i/o): virtio-mmio
    000000000a001000-000000000a0011ff (prio 0, i/o): virtio-mmio
    000000000a001200-000000000a0013ff (prio 0, i/o): virtio-mmio
    000000000a001400-000000000a0015ff (prio 0, i/o): virtio-mmio
    000000000a001600-000000000a0017ff (prio 0, i/o): virtio-mmio
    000000000a001800-000000000a0019ff (prio 0, i/o): virtio-mmio
    000000000a001a00-000000000a001bff (prio 0, i/o): virtio-mmio
    000000000a001c00-000000000a001dff (prio 0, i/o): virtio-mmio
    000000000a001e00-000000000a001fff (prio 0, i/o): virtio-mmio
    000000000a002000-000000000a0021ff (prio 0, i/o): virtio-mmio
    000000000a002200-000000000a0023ff (prio 0, i/o): virtio-mmio
    000000000a002400-000000000a0025ff (prio 0, i/o): virtio-mmio
    000000000a002600-000000000a0027ff (prio 0, i/o): virtio-mmio
    000000000a002800-000000000a0029ff (prio 0, i/o): virtio-mmio
    000000000a002a00-000000000a002bff (prio 0, i/o): virtio-mmio
    000000000a002c00-000000000a002dff (prio 0, i/o): virtio-mmio
    000000000a002e00-000000000a002fff (prio 0, i/o): virtio-mmio
    000000000a003000-000000000a0031ff (prio 0, i/o): virtio-mmio
    000000000a003200-000000000a0033ff (prio 0, i/o): virtio-mmio
    000000000a003400-000000000a0035ff (prio 0, i/o): virtio-mmio
    000000000a003600-000000000a0037ff (prio 0, i/o): virtio-mmio
    000000000a003800-000000000a0039ff (prio 0, i/o): virtio-mmio
    000000000a003a00-000000000a003bff (prio 0, i/o): virtio-mmio
    000000000a003c00-000000000a003dff (prio 0, i/o): virtio-mmio
    000000000a003e00-000000000a003fff (prio 0, i/o): virtio-mmio
    000000000c000000-000000000dffffff (prio 0, i/o): platform bus
    0000000010000000-000000003efeffff (prio 0, i/o): alias pcie-mmio @gpex_mmio 0000000010000000-000000003efeffff
    000000003eff0000-000000003effffff (prio 0, i/o): gpex_ioport
      000000003eff1000-000000003eff103f (prio 1, i/o): virtio-pci
    000000003f000000-000000003fffffff (prio 0, i/o): alias pcie-ecam @pcie-mmcfg-mmio 0000000000000000-0000000000ffffff
    0000000040000000-000000023fffffff (prio 0, ram): mach-virt.ram
    0000008000000000-000000ffffffffff (prio 0, i/o): alias pcie-mmio-high @gpex_mmio 0000008000000000-000000ffffffffff

address-space: cpu-memory-3
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-0000000003ffffff (prio 0, romd): virt.flash0
    0000000004000000-0000000007ffffff (prio 0, romd): virt.flash1
    0000000008000000-000000000800ffff (prio 0, i/o): gicv3_dist
    0000000008080000-000000000809ffff (prio 0, i/o): gicv3_its
      0000000008080000-000000000808ffff (prio 0, i/o): control
      0000000008090000-000000000809ffff (prio 0, i/o): translation
    00000000080a0000-000000000811ffff (prio 0, i/o): gicv3_redist
    0000000009000000-0000000009000fff (prio 0, i/o): pl011
    0000000009010000-0000000009010fff (prio 0, i/o): pl031
    0000000009020000-0000000009020007 (prio 0, i/o): fwcfg.data
    0000000009020008-0000000009020009 (prio 0, i/o): fwcfg.ctl
    0000000009020010-0000000009020017 (prio 0, i/o): fwcfg.dma
    0000000009030000-0000000009030fff (prio 0, i/o): pl061
    000000000a000000-000000000a0001ff (prio 0, i/o): virtio-mmio
    000000000a000200-000000000a0003ff (prio 0, i/o): virtio-mmio
    000000000a000400-000000000a0005ff (prio 0, i/o): virtio-mmio
    000000000a000600-000000000a0007ff (prio 0, i/o): virtio-mmio
    000000000a000800-000000000a0009ff (prio 0, i/o): virtio-mmio
    000000000a000a00-000000000a000bff (prio 0, i/o): virtio-mmio
    000000000a000c00-000000000a000dff (prio 0, i/o): virtio-mmio
    000000000a000e00-000000000a000fff (prio 0, i/o): virtio-mmio
    000000000a001000-000000000a0011ff (prio 0, i/o): virtio-mmio
    000000000a001200-000000000a0013ff (prio 0, i/o): virtio-mmio
    000000000a001400-000000000a0015ff (prio 0, i/o): virtio-mmio
    000000000a001600-000000000a0017ff (prio 0, i/o): virtio-mmio
    000000000a001800-000000000a0019ff (prio 0, i/o): virtio-mmio
    000000000a001a00-000000000a001bff (prio 0, i/o): virtio-mmio
    000000000a001c00-000000000a001dff (prio 0, i/o): virtio-mmio
    000000000a001e00-000000000a001fff (prio 0, i/o): virtio-mmio
    000000000a002000-000000000a0021ff (prio 0, i/o): virtio-mmio
    000000000a002200-000000000a0023ff (prio 0, i/o): virtio-mmio
    000000000a002400-000000000a0025ff (prio 0, i/o): virtio-mmio
    000000000a002600-000000000a0027ff (prio 0, i/o): virtio-mmio
    000000000a002800-000000000a0029ff (prio 0, i/o): virtio-mmio
    000000000a002a00-000000000a002bff (prio 0, i/o): virtio-mmio
    000000000a002c00-000000000a002dff (prio 0, i/o): virtio-mmio
    000000000a002e00-000000000a002fff (prio 0, i/o): virtio-mmio
    000000000a003000-000000000a0031ff (prio 0, i/o): virtio-mmio
    000000000a003200-000000000a0033ff (prio 0, i/o): virtio-mmio
    000000000a003400-000000000a0035ff (prio 0, i/o): virtio-mmio
    000000000a003600-000000000a0037ff (prio 0, i/o): virtio-mmio
    000000000a003800-000000000a0039ff (prio 0, i/o): virtio-mmio
    000000000a003a00-000000000a003bff (prio 0, i/o): virtio-mmio
    000000000a003c00-000000000a003dff (prio 0, i/o): virtio-mmio
    000000000a003e00-000000000a003fff (prio 0, i/o): virtio-mmio
    000000000c000000-000000000dffffff (prio 0, i/o): platform bus
    0000000010000000-000000003efeffff (prio 0, i/o): alias pcie-mmio @gpex_mmio 0000000010000000-000000003efeffff
    000000003eff0000-000000003effffff (prio 0, i/o): gpex_ioport
      000000003eff1000-000000003eff103f (prio 1, i/o): virtio-pci
    000000003f000000-000000003fffffff (prio 0, i/o): alias pcie-ecam @pcie-mmcfg-mmio 0000000000000000-0000000000ffffff
    0000000040000000-000000023fffffff (prio 0, ram): mach-virt.ram
    0000008000000000-000000ffffffffff (prio 0, i/o): alias pcie-mmio-high @gpex_mmio 0000008000000000-000000ffffffffff

address-space: gpex-root
  0000000000000000-ffffffffffffffff (prio 0, i/o): bus master container
    0000000000000000-ffffffffffffffff (prio 0, i/o): alias bus master @system 0000000000000000-ffffffffffffffff [disabled]

address-space: vfio-pci
  0000000000000000-ffffffffffffffff (prio 0, i/o): bus master container
    0000000000000000-ffffffffffffffff (prio 0, i/o): alias bus master @system 0000000000000000-ffffffffffffffff

address-space: virtio-scsi-pci
  0000000000000000-ffffffffffffffff (prio 0, i/o): bus master container
    0000000000000000-ffffffffffffffff (prio 0, i/o): alias bus master @system 0000000000000000-ffffffffffffffff

memory-region: gpex_mmio
  0000000000000000-ffffffffffffffff (prio 0, i/o): gpex_mmio
    0000000010000000-000000001007ffff (prio 1, i/o): 0004:01:00.0 base BAR 1
      0000000010000000-000000001007ffff (prio 0, i/o): 0004:01:00.0 BAR 1
        0000000010000000-000000001007ffff (prio 0, ramd): 0004:01:00.0 BAR 1 mmaps[0]
    00000000100c0000-00000000100dffff (prio 1, i/o): 0004:01:00.0 base BAR 0
      00000000100c0000-00000000100dffff (prio 0, i/o): 0004:01:00.0 BAR 0
        00000000100c0000-00000000100dffff (prio 0, ramd): 0004:01:00.0 BAR 0 mmaps[0]
    00000000100e0000-00000000100effff (prio 0, i/o): 0004:01:00.0 base BAR 3
      00000000100e0000-00000000100e004f (prio 0, i/o): msix-table
      00000000100e0000-00000000100effff (prio 0, i/o): 0004:01:00.0 BAR 3
        00000000100e0000-00000000100effff (prio 0, ramd): 0004:01:00.0 BAR 3 mmaps[0]
      00000000100e2000-00000000100e2007 (prio 0, i/o): msix-pba [disabled]
    00000000100e4000-00000000100e4fff (prio 1, i/o): virtio-scsi-pci-msix
      00000000100e4000-00000000100e403f (prio 0, i/o): msix-table
      00000000100e4800-00000000100e4807 (prio 0, i/o): msix-pba
    0000008000000000-0000008000003fff (prio 1, i/o): virtio-pci
      0000008000000000-0000008000000fff (prio 0, i/o): virtio-pci-common
      0000008000001000-0000008000001fff (prio 0, i/o): virtio-pci-isr
      0000008000002000-0000008000002fff (prio 0, i/o): virtio-pci-device
      0000008000003000-0000008000003fff (prio 0, i/o): virtio-pci-notify

memory-region: pcie-mmcfg-mmio
  0000000000000000-000000001fffffff (prio 0, i/o): pcie-mmcfg-mmio

memory-region: system
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-0000000003ffffff (prio 0, romd): virt.flash0
    0000000004000000-0000000007ffffff (prio 0, romd): virt.flash1
    0000000008000000-000000000800ffff (prio 0, i/o): gicv3_dist
    0000000008080000-000000000809ffff (prio 0, i/o): gicv3_its
      0000000008080000-000000000808ffff (prio 0, i/o): control
      0000000008090000-000000000809ffff (prio 0, i/o): translation
    00000000080a0000-000000000811ffff (prio 0, i/o): gicv3_redist
    0000000009000000-0000000009000fff (prio 0, i/o): pl011
    0000000009010000-0000000009010fff (prio 0, i/o): pl031
    0000000009020000-0000000009020007 (prio 0, i/o): fwcfg.data
    0000000009020008-0000000009020009 (prio 0, i/o): fwcfg.ctl
    0000000009020010-0000000009020017 (prio 0, i/o): fwcfg.dma
    0000000009030000-0000000009030fff (prio 0, i/o): pl061
    000000000a000000-000000000a0001ff (prio 0, i/o): virtio-mmio
    000000000a000200-000000000a0003ff (prio 0, i/o): virtio-mmio
    000000000a000400-000000000a0005ff (prio 0, i/o): virtio-mmio
    000000000a000600-000000000a0007ff (prio 0, i/o): virtio-mmio
    000000000a000800-000000000a0009ff (prio 0, i/o): virtio-mmio
    000000000a000a00-000000000a000bff (prio 0, i/o): virtio-mmio
    000000000a000c00-000000000a000dff (prio 0, i/o): virtio-mmio
    000000000a000e00-000000000a000fff (prio 0, i/o): virtio-mmio
    000000000a001000-000000000a0011ff (prio 0, i/o): virtio-mmio
    000000000a001200-000000000a0013ff (prio 0, i/o): virtio-mmio
    000000000a001400-000000000a0015ff (prio 0, i/o): virtio-mmio
    000000000a001600-000000000a0017ff (prio 0, i/o): virtio-mmio
    000000000a001800-000000000a0019ff (prio 0, i/o): virtio-mmio
    000000000a001a00-000000000a001bff (prio 0, i/o): virtio-mmio
    000000000a001c00-000000000a001dff (prio 0, i/o): virtio-mmio
    000000000a001e00-000000000a001fff (prio 0, i/o): virtio-mmio
    000000000a002000-000000000a0021ff (prio 0, i/o): virtio-mmio
    000000000a002200-000000000a0023ff (prio 0, i/o): virtio-mmio
    000000000a002400-000000000a0025ff (prio 0, i/o): virtio-mmio
    000000000a002600-000000000a0027ff (prio 0, i/o): virtio-mmio
    000000000a002800-000000000a0029ff (prio 0, i/o): virtio-mmio
    000000000a002a00-000000000a002bff (prio 0, i/o): virtio-mmio
    000000000a002c00-000000000a002dff (prio 0, i/o): virtio-mmio
    000000000a002e00-000000000a002fff (prio 0, i/o): virtio-mmio
    000000000a003000-000000000a0031ff (prio 0, i/o): virtio-mmio
    000000000a003200-000000000a0033ff (prio 0, i/o): virtio-mmio
    000000000a003400-000000000a0035ff (prio 0, i/o): virtio-mmio
    000000000a003600-000000000a0037ff (prio 0, i/o): virtio-mmio
    000000000a003800-000000000a0039ff (prio 0, i/o): virtio-mmio
    000000000a003a00-000000000a003bff (prio 0, i/o): virtio-mmio
    000000000a003c00-000000000a003dff (prio 0, i/o): virtio-mmio
    000000000a003e00-000000000a003fff (prio 0, i/o): virtio-mmio
    000000000c000000-000000000dffffff (prio 0, i/o): platform bus
    0000000010000000-000000003efeffff (prio 0, i/o): alias pcie-mmio @gpex_mmio 0000000010000000-000000003efeffff
    000000003eff0000-000000003effffff (prio 0, i/o): gpex_ioport
      000000003eff1000-000000003eff103f (prio 1, i/o): virtio-pci
    000000003f000000-000000003fffffff (prio 0, i/o): alias pcie-ecam @pcie-mmcfg-mmio 0000000000000000-0000000000ffffff
    0000000040000000-000000023fffffff (prio 0, ram): mach-virt.ram
    0000008000000000-000000ffffffffff (prio 0, i/o): alias pcie-mmio-high @gpex_mmio 0000008000000000-000000ffffffffff
Eric Auger March 29, 2018, 2:42 p.m. UTC | #5
Hi Alex,

On 29/03/18 00:13, Alex Williamson wrote:
> On Wed, 28 Mar 2018 23:03:23 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Alexey, Alex,
>> On 22/03/18 09:18, Alexey Kardashevskiy wrote:
>>> The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added
>>> an error message if a passed memory section address or size is not aligned
>>> to the minimal IOMMU page size. However although it checks
>>> offset_within_address_space for the alignment, offset_within_region is
>>> printed instead which makes it harder to find out what device caused
>>> the message so this replaces offset_within_region with
>>> offset_within_address_space.
>>>
>>> While we are here, this replaces '..' with 'size=' (as the second number
>>> is a size, not an end offset) and adds a memory region name.
>>>
>>> Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions"
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>  
>> The patch indeed fixes the reported format issues.
>>
>> However I have some other concerns with the info that is reported to the
>> end-user. See below.
>>
>> Assigning an e1000e device with a 64kB host, here are the traces I get:
>>
>> Region XXX is not aligned to 0x10000 and cannot be mapped for DMA
>>
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a4808 size=0xb7f8
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e0050 size=0x3fb0
>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e4808 size=0xb7f8
>>
>> It took me some time to understand what happens but here is now my
>> understanding:
>>
>> 1) When looking at vfio_pci_write_config() pdev->io_regions[bar].addr =
>> bar_addr in vfio_sub_page_bar_update_mapping() I see the following values:
>>
>> UNMAPPED -> 0x0 ->UNMAPPED -> 0x100a0000 -> UNMAPPED -> 0x100a0000 ->
>> UNMAPPED -> 0x100e0000
>>
>> vfio_sub_page_bar_update_mapping() create mrs with base bar at
>> 0x100a0000 and 0x100e0000 successively, hence the
>> vfio_listener_region_add on 0x100axxxx. Indeed, 0x0-0x50 msix-table mmio
>> region induces some memory section at 0x100a0050 and 0x100e50 successively.
>>
>> However this is confusing for the end-user who only has access to the
>> final mapping (0x100e0000) through lspi [1].
>>
>> 2) The changes in the size (0x3fb0 <-> 0xffb0) relate to the extension
>> of the 16kB bar to 64kB in vfio_sub_page_bar_update_mapping
>>
>> 3) Also it happens that I have a virtio-scsi-pci device that is put just
>> after the BAR3 at 0x100a4000 and 0x100e4000 successively. The device has
>> its own msi-table and pba mmio regions[2]. As mmaps[0] is extended to
>> 64kB (with prio 0), we have those MMIO regions which result in new
>> memory sections, which cause vfio_listener_region_add calls. This
>> typically explains why we get a warning on 0x100e4808 (0xb7f8). By the
>> way I don't get why we don't have a trace for "0004:01:00.0 BAR 3
>> mmaps[0]" 0x100e4040 size=0x7c0, ie. mmaps[0] space between
>> virtio-scsi-pci msic-table and pba.
>>
>> So at the end of the day, my fear is all those info become really
>> frightening and confusing for the end-user and even not relevant
>> (0x100a0000 stuff). So I would rather simply remove the trace in 2.12
>> until we find a place where we could generate a clear hint for the
>> end-user, suggesting to relocate the msix bar.
>>
>> Thoughts?
> 
> Yep, I think that's probably the right approach.  Everything works as
> it should and nothing has worse access in 2.12 than it did in 2.11,
> there's only the opportunity to make things better with msi-x
> relocation and I don't think we want to error on the side of reporting
> too many errors that users cannot understand in an attempt to advise
> them of an unsupported option that might be better.  Let's remove the
> error report for 2.12 and think about how we could make a concise
> suggestion, once, while initializing the device.  Who's posting the
> patch?  Thanks,
> 
> Alex
> 
> PS - Why isn't the firmware/kernel on aarch64 making an attempt to
> align PCI resources on page boundaries?

so according to Laszlo's input I understand virtio-scsi-pci region 1 is
4K and the spec only mandates a natural alignment.

  Does
> pci=realloc,resource_alignment=pci:ffffffff:ffffffff change it? (I'm
> not sure if PCI_ANY_ID works for that option)
no, it does not.

Thanks

Eric
>
Alex Williamson March 29, 2018, 4:03 p.m. UTC | #6
On Thu, 29 Mar 2018 16:42:12 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Alex,
> 
> On 29/03/18 00:13, Alex Williamson wrote:
> > On Wed, 28 Mar 2018 23:03:23 +0200
> > Auger Eric <eric.auger@redhat.com> wrote:
> >   
> >> Hi Alexey, Alex,
> >> On 22/03/18 09:18, Alexey Kardashevskiy wrote:  
> >>> The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added
> >>> an error message if a passed memory section address or size is not aligned
> >>> to the minimal IOMMU page size. However although it checks
> >>> offset_within_address_space for the alignment, offset_within_region is
> >>> printed instead which makes it harder to find out what device caused
> >>> the message so this replaces offset_within_region with
> >>> offset_within_address_space.
> >>>
> >>> While we are here, this replaces '..' with 'size=' (as the second number
> >>> is a size, not an end offset) and adds a memory region name.
> >>>
> >>> Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions"
> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>    
> >> The patch indeed fixes the reported format issues.
> >>
> >> However I have some other concerns with the info that is reported to the
> >> end-user. See below.
> >>
> >> Assigning an e1000e device with a 64kB host, here are the traces I get:
> >>
> >> Region XXX is not aligned to 0x10000 and cannot be mapped for DMA
> >>
> >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
> >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
> >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a4808 size=0xb7f8
> >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e0050 size=0x3fb0
> >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e4808 size=0xb7f8
> >>
> >> It took me some time to understand what happens but here is now my
> >> understanding:
> >>
> >> 1) When looking at vfio_pci_write_config() pdev->io_regions[bar].addr =
> >> bar_addr in vfio_sub_page_bar_update_mapping() I see the following values:
> >>
> >> UNMAPPED -> 0x0 ->UNMAPPED -> 0x100a0000 -> UNMAPPED -> 0x100a0000 ->
> >> UNMAPPED -> 0x100e0000
> >>
> >> vfio_sub_page_bar_update_mapping() create mrs with base bar at
> >> 0x100a0000 and 0x100e0000 successively, hence the
> >> vfio_listener_region_add on 0x100axxxx. Indeed, 0x0-0x50 msix-table mmio
> >> region induces some memory section at 0x100a0050 and 0x100e50 successively.
> >>
> >> However this is confusing for the end-user who only has access to the
> >> final mapping (0x100e0000) through lspi [1].
> >>
> >> 2) The changes in the size (0x3fb0 <-> 0xffb0) relate to the extension
> >> of the 16kB bar to 64kB in vfio_sub_page_bar_update_mapping
> >>
> >> 3) Also it happens that I have a virtio-scsi-pci device that is put just
> >> after the BAR3 at 0x100a4000 and 0x100e4000 successively. The device has
> >> its own msi-table and pba mmio regions[2]. As mmaps[0] is extended to
> >> 64kB (with prio 0), we have those MMIO regions which result in new
> >> memory sections, which cause vfio_listener_region_add calls. This
> >> typically explains why we get a warning on 0x100e4808 (0xb7f8). By the
> >> way I don't get why we don't have a trace for "0004:01:00.0 BAR 3
> >> mmaps[0]" 0x100e4040 size=0x7c0, ie. mmaps[0] space between
> >> virtio-scsi-pci msic-table and pba.
> >>
> >> So at the end of the day, my fear is all those info become really
> >> frightening and confusing for the end-user and even not relevant
> >> (0x100a0000 stuff). So I would rather simply remove the trace in 2.12
> >> until we find a place where we could generate a clear hint for the
> >> end-user, suggesting to relocate the msix bar.
> >>
> >> Thoughts?  
> > 
> > Yep, I think that's probably the right approach.  Everything works as
> > it should and nothing has worse access in 2.12 than it did in 2.11,
> > there's only the opportunity to make things better with msi-x
> > relocation and I don't think we want to error on the side of reporting
> > too many errors that users cannot understand in an attempt to advise
> > them of an unsupported option that might be better.  Let's remove the
> > error report for 2.12 and think about how we could make a concise
> > suggestion, once, while initializing the device.  Who's posting the
> > patch?  Thanks,
> > 
> > Alex
> > 
> > PS - Why isn't the firmware/kernel on aarch64 making an attempt to
> > align PCI resources on page boundaries?  
> 
> so according to Laszlo's input I understand virtio-scsi-pci region 1 is
> 4K and the spec only mandates a natural alignment.

Right, as I noted in the other thread, it's not a spec requirement,
more of a best practices feature.

>   Does
> > pci=realloc,resource_alignment=pci:ffffffff:ffffffff change it? (I'm
> > not sure if PCI_ANY_ID works for that option)  
> no, it does not.

Not entirely unexpected, and since you're dealing with a virtio-scsi
device it could certainly be your boot device, so firmware leaving it
unprogrammed (in order to allow the OS to program it with 64K
alignment) probably isn't an option either.

For this particular example, the firmware aligning BARs is a secondary
optimization.  Ignoring for a moment that this BAR is interrupted by an
MSI-X vector table emulation section, if BAR resources were 64K
aligned, then we could mmap the BAR through to a userspace driver in
the guest, just as the host firmware has allowed us to do for this L1
qemu instance.  However, since we do have the MSI-X vector table
splitting the BAR already, it wouldn't help us if the firmware had done
this optimization since we still need to relocate the MSI-X vector
table, and if we do that by extending this BAR, we "kill two birds".
However, trying to figure out how to do that automatically has many
perils, as we've seen, so the problem becomes how we can concisely
advise users when this might be an option to consider.  Perhaps we
should resurrect some of the code that attempted to do the "auto" part
of MSI-X relocation to simply advise when it might be useful.  Thanks,

Alex
Alex Williamson March 29, 2018, 4:09 p.m. UTC | #7
On Thu, 29 Mar 2018 12:55:51 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 29/3/18 8:03 am, Auger Eric wrote:
> > Hi Alexey, Alex,
> > On 22/03/18 09:18, Alexey Kardashevskiy wrote:  
> >> The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added
> >> an error message if a passed memory section address or size is not aligned
> >> to the minimal IOMMU page size. However although it checks
> >> offset_within_address_space for the alignment, offset_within_region is
> >> printed instead which makes it harder to find out what device caused
> >> the message so this replaces offset_within_region with
> >> offset_within_address_space.
> >>
> >> While we are here, this replaces '..' with 'size=' (as the second number
> >> is a size, not an end offset) and adds a memory region name.
> >>
> >> Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions"
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>  
> > The patch indeed fixes the reported format issues.
> > 
> > However I have some other concerns with the info that is reported to the
> > end-user. See below.
> > 
> > Assigning an e1000e device with a 64kB host, here are the traces I get:
> > 
> > Region XXX is not aligned to 0x10000 and cannot be mapped for DMA
> > 
> > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
> > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
> > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
> > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a4808 size=0xb7f8
> > "0004:01:00.0 BAR 3 mmaps[0]" 0x100e0050 size=0x3fb0
> > "0004:01:00.0 BAR 3 mmaps[0]" 0x100e4808 size=0xb7f8
> > 
> > It took me some time to understand what happens but here is now my
> > understanding:
> > 
> > 1) When looking at vfio_pci_write_config() pdev->io_regions[bar].addr =
> > bar_addr in vfio_sub_page_bar_update_mapping() I see the following values:
> > 
> > UNMAPPED -> 0x0 ->UNMAPPED -> 0x100a0000 -> UNMAPPED -> 0x100a0000 ->
> > UNMAPPED -> 0x100e0000
> > 
> > vfio_sub_page_bar_update_mapping() create mrs with base bar at
> > 0x100a0000 and 0x100e0000 successively, hence the
> > vfio_listener_region_add on 0x100axxxx. Indeed, 0x0-0x50 msix-table mmio
> > region induces some memory section at 0x100a0050 and 0x100e50 successively.
> > 
> > However this is confusing for the end-user who only has access to the
> > final mapping (0x100e0000) through lspi [1].  
> 
> 
> The trace shows that at least at some point the BAR actually was
> 0x100a0000, I find this info rather useful than confusing as it might
> expose a bug of some sort, for example.
> 
> The user also has access to the MR name which is the host PCI address + BAR
> index, how is that confusing?

Seriously?  You're expecting an awfully lot from users to be able to
interpret this in a meaningful way and not just freak out and file bugs.

> > 2) The changes in the size (0x3fb0 <-> 0xffb0) relate to the extension
> > of the 16kB bar to 64kB in vfio_sub_page_bar_update_mapping  
> >> 3) Also it happens that I have a virtio-scsi-pci device that is put just  
> > after the BAR3 at 0x100a4000 and 0x100e4000 successively. The device has  
> 
> e1000e gets aligned to 64k but this one avoids the alignment for some reason?
> 
> 
> > its own msi-table and pba mmio regions[2]. As mmaps[0] is extended to
> > 64kB (with prio 0), we have those MMIO regions which result in new
> > memory sections, which cause vfio_listener_region_add calls. This
> > typically explains why we get a warning on 0x100e4808 (0xb7f8). By the
> > way I don't get why we don't have a trace for "0004:01:00.0 BAR 3
> > mmaps[0]" 0x100e4040 size=0x7c0, ie. mmaps[0] space between
> > virtio-scsi-pci msic-table and pba.  
> 
> 
> "info mtree -f" might give a hint how MRs got resolved, could it end up
> being emulated (==skipped by the vfio listener)?
> 
> 
> > So at the end of the day, my fear is all those info become really
> > frightening and confusing for the end-user and even not relevant
> > (0x100a0000 stuff). So I would rather simply remove the trace in 2.12
> > until we find a place where we could generate a clear hint for the
> > end-user, suggesting to relocate the msix bar.
> > 
> > Thoughts?  
> 
> Please post complete "lspci -v" output for both pci devices and "info mtree
> -f" (in addition to "info mtree", not instead).
> 
> In general, the error_report() could be removed as we did not have any
> indication of not mapping before so we do not have to start now, I am just
> missing the point here - the message exposes potentially not-working P2P
> which is useful for people who care about that and do not know if actually
> might work. Rather than silencing it, I'd convert it into the trace point.

A trace point would be an ok option too.  P2P is something I don't want
to lose, but it has never worked in this particular case and I don't
think it's worth alarming users with new warnings where nothing has
changed and nobody has requested working P2P in this situation.  Thanks,

Alex
Alexey Kardashevskiy April 3, 2018, 3:30 a.m. UTC | #8
On 29/3/18 9:14 pm, Auger Eric wrote:
> Hi Alexey,
> On 29/03/18 03:55, Alexey Kardashevskiy wrote:
>> On 29/3/18 8:03 am, Auger Eric wrote:
>>> Hi Alexey, Alex,
>>> On 22/03/18 09:18, Alexey Kardashevskiy wrote:
>>>> The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added
>>>> an error message if a passed memory section address or size is not aligned
>>>> to the minimal IOMMU page size. However although it checks
>>>> offset_within_address_space for the alignment, offset_within_region is
>>>> printed instead which makes it harder to find out what device caused
>>>> the message so this replaces offset_within_region with
>>>> offset_within_address_space.
>>>>
>>>> While we are here, this replaces '..' with 'size=' (as the second number
>>>> is a size, not an end offset) and adds a memory region name.
>>>>
>>>> Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions"
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> The patch indeed fixes the reported format issues.
>>>
>>> However I have some other concerns with the info that is reported to the
>>> end-user. See below.
>>>
>>> Assigning an e1000e device with a 64kB host, here are the traces I get:
>>>
>>> Region XXX is not aligned to 0x10000 and cannot be mapped for DMA
>>>
>>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
>>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
>>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
>>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0
>>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0
>>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a4808 size=0xb7f8
>>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e0050 size=0x3fb0
>>> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e4808 size=0xb7f8
>>>
>>> It took me some time to understand what happens but here is now my
>>> understanding:
>>>
>>> 1) When looking at vfio_pci_write_config() pdev->io_regions[bar].addr =
>>> bar_addr in vfio_sub_page_bar_update_mapping() I see the following values:
>>>
>>> UNMAPPED -> 0x0 ->UNMAPPED -> 0x100a0000 -> UNMAPPED -> 0x100a0000 ->
>>> UNMAPPED -> 0x100e0000
>>>
>>> vfio_sub_page_bar_update_mapping() create mrs with base bar at
>>> 0x100a0000 and 0x100e0000 successively, hence the
>>> vfio_listener_region_add on 0x100axxxx. Indeed, 0x0-0x50 msix-table mmio
>>> region induces some memory section at 0x100a0050 and 0x100e50 successively.
>>>
>>> However this is confusing for the end-user who only has access to the
>>> final mapping (0x100e0000) through lspi [1].
>>
>>
>> The trace shows that at least at some point the BAR actually was
>> 0x100a0000, I find this info rather useful than confusing as it might
>> expose a bug of some sort, for example.
>>
>> The user also has access to the MR name which is the host PCI address + BAR
>> index, how is that confusing?
> 
> To me it is confusing since it does not match the final location of bar3
> as output by lspci. I couldn't understanding how 0x100axxxx related to bar3.


PCI resource reallocation is not that rate on PPC, at least, so I am kinda
used to it...



>>> 2) The changes in the size (0x3fb0 <-> 0xffb0) relate to the extension
>>> of the 16kB bar to 64kB in vfio_sub_page_bar_update_mapping
>>>> 3) Also it happens that I have a virtio-scsi-pci device that is put just
>>> after the BAR3 at 0x100a4000 and 0x100e4000 successively. The device has
>>
>> e1000e gets aligned to 64k but this one avoids the alignment for some reason?
> 
> Yes I will enquire about the allocation policy
>>
>>
>>> its own msi-table and pba mmio regions[2]. As mmaps[0] is extended to
>>> 64kB (with prio 0), we have those MMIO regions which result in new
>>> memory sections, which cause vfio_listener_region_add calls. This
>>> typically explains why we get a warning on 0x100e4808 (0xb7f8). By the
>>> way I don't get why we don't have a trace for "0004:01:00.0 BAR 3
>>> mmaps[0]" 0x100e4040 size=0x7c0, ie. mmaps[0] space between
>>> virtio-scsi-pci msic-table and pba.
>>
>>
>> "info mtree -f" might give a hint how MRs got resolved, could it end up
>> being emulated (==skipped by the vfio listener)?
> 
> Actually that's what is strange as I can see it in info mtree -f output. See at the end of the mail.

Hm. Looks strange. Would be nice to know why...


>>> So at the end of the day, my fear is all those info become really
>>> frightening and confusing for the end-user and even not relevant
>>> (0x100a0000 stuff). So I would rather simply remove the trace in 2.12
>>> until we find a place where we could generate a clear hint for the
>>> end-user, suggesting to relocate the msix bar.
>>>
>>> Thoughts?
>>
>> Please post complete "lspci -v" output for both pci devices and "info mtree
>> -f" (in addition to "info mtree", not instead).
> 
> see at the end
>>
>> In general, the error_report() could be removed as we did not have any
>> indication of not mapping before so we do not have to start now, I am just
>> missing the point here - the message exposes potentially not-working P2P
>> which is useful for people who care about that and do not know if actually
>> might work. Rather than silencing it, I'd convert it into the trace point.
> 
> But typically output that
> Region "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0 cannot be DMA mapped
> is not strictly correct (by chance it was not re-allocated to something else, right?)


I get the point that it might be confusing and not matching "lspci -vb" but
still - what is not correct about it? At the time when the message
appeared, BAR was 0x100a0000.
Eric Auger April 3, 2018, 7:06 a.m. UTC | #9
Hi Alexey,

On 03/04/18 05:30, Alexey Kardashevskiy wrote:
> I get the point that it might be confusing and not matching "lspci -vb" but
> still - what is not correct about it? At the time when the message
> appeared, BAR was 0x100a0000.

Yes they were correct at the time the BAR was located at 0x100a0000 but.
But still I suspect the end-user most probably will not be able to
understand this and will interpret it as an error and report a bug;
whereas it works as before. So to me we should rather use a trace-event
for debug purpose until we find a way to output a usable hint for the
end-user.

Thanks

Eric
diff mbox series

Patch

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5e84716..e2db596 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -548,10 +548,11 @@  static void vfio_listener_region_add(MemoryListener *listener,
         hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
 
         if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
-            error_report("Region 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx
+            error_report("Region \"%s\" 0x%"HWADDR_PRIx" size=0x%"HWADDR_PRIx
                          " is not aligned to 0x%"HWADDR_PRIx
                          " and cannot be mapped for DMA",
-                         section->offset_within_region,
+                         memory_region_name(section->mr),
+                         section->offset_within_address_space,
                          int128_getlo(section->size),
                          pgmask + 1);
             return;