diff mbox series

Please help me understand VIRTIO_F_IOMMU_PLATFORM

Message ID D2F8C14D-1B82-4BDF-A1AC-AF1BF6D4CDD7@me.com
State New
Headers show
Series Please help me understand VIRTIO_F_IOMMU_PLATFORM | expand

Commit Message

Jason Thorpe July 31, 2021, 3:41 p.m. UTC
Hey folks —

I’d like to be able to use VirtIO with qemu-system-alpha but, at least on a NetBSD x86_64 host, it does not currently work.  This is because virtio_bus_device_plugged() in hw/virtio/virtio-bus.c ends up picking address_space_memory as the DMA address space for the VirtIODevice.  This does not work for alpha because the CPU and PCI have different views of system memory.  All that’s needed to fix it is for virtio_bus_device_plugged() to call klass->get_dma_as(qbus->parent), but the code only does that if:

	bool has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM); 

So, obviously, VIRTIO_F_IOMMU_PLATFORM is not getting set for an emulated alpha system, despite the alpha platform having one[*].  But it’s not clear to me that it’s appropriate for alpha to use VIRTIO_F_IOMMU_PLATFORM, at least from my reading of how it’s used.

In any case, the following extremely simple change allows me to use VirtIO devices in qemu-system-alpha with a NetBSD/alpha guest (and I’m told this also fixes using VirtIO devices in qemu-system-sparc64 for a NetBSD/sparc64 guest):


So, VirtIO experts, please weigh in on the correctness of this change… if it is, I’ll post the patch formally.

[*] The way the alpha platform works is that the IOMMU is used if the PCI device performs a memory access to a DMA window where SGMAPs are enabled.  If SGMAPs are not enabled in the DMA window the PCI device is accessing, the translation is performed directly by subtracting the address from the window’s Window Base and appending the result to the window’s Translated Base.  A typical alpha PCI platform has a 1GB DMA window at 1GB from the PCI’s perspective, which maps to 0-1GB in the system address map, and an alpha system with 1GB or less of RAM would thus not need to use the IOMMU, but the translation take place regardless.

-- thorpej

Comments

Mark Cave-Ayland Aug. 18, 2021, 7:58 a.m. UTC | #1
On 31/07/2021 16:41, Jason Thorpe wrote:

(added Michael on CC)

> Hey folks —
> 
> I’d like to be able to use VirtIO with qemu-system-alpha but, at least on a NetBSD x86_64 host, it does not currently work.  This is because virtio_bus_device_plugged() in hw/virtio/virtio-bus.c ends up picking address_space_memory as the DMA address space for the VirtIODevice.  This does not work for alpha because the CPU and PCI have different views of system memory.  All that’s needed to fix it is for virtio_bus_device_plugged() to call klass->get_dma_as(qbus->parent), but the code only does that if:
> 
> 	bool has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
> 
> So, obviously, VIRTIO_F_IOMMU_PLATFORM is not getting set for an emulated alpha system, despite the alpha platform having one[*].  But it’s not clear to me that it’s appropriate for alpha to use VIRTIO_F_IOMMU_PLATFORM, at least from my reading of how it’s used.
> 
> In any case, the following extremely simple change allows me to use VirtIO devices in qemu-system-alpha with a NetBSD/alpha guest (and I’m told this also fixes using VirtIO devices in qemu-system-sparc64 for a NetBSD/sparc64 guest):
> 
> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> index 859978d248..c083e8d737 100644
> --- a/hw/virtio/virtio-bus.c
> +++ b/hw/virtio/virtio-bus.c
> @@ -85,6 +85,8 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
>       if (klass->get_dma_as != NULL && has_iommu) {
>           virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
>           vdev->dma_as = klass->get_dma_as(qbus->parent);
> +    } else if (klass->get_dma_as != NULL) {
> +        vdev->dma_as = klass->get_dma_as(qbus->parent);
>       } else {
>           vdev->dma_as = &address_space_memory;
>       }
> 
> So, VirtIO experts, please weigh in on the correctness of this change… if it is, I’ll post the patch formally.
> 
> [*] The way the alpha platform works is that the IOMMU is used if the PCI device performs a memory access to a DMA window where SGMAPs are enabled.  If SGMAPs are not enabled in the DMA window the PCI device is accessing, the translation is performed directly by subtracting the address from the window’s Window Base and appending the result to the window’s Translated Base.  A typical alpha PCI platform has a 1GB DMA window at 1GB from the PCI’s perspective, which maps to 0-1GB in the system address map, and an alpha system with 1GB or less of RAM would thus not need to use the IOMMU, but the translation take place regardless.
> 
> -- thorpej

Hi Jason,

Thanks for looking at this! I've had previous discussions with Martin trying to 
figure out why virtio-blk-pci doesn't work with Netbsd/sparc64 so glad that you've 
been able to find the underlying cause.

My read on VIRTIO_F_IOMMU_PLATFORM is that this is declaring host IOMMU support so 
the implementation on the guest should not be relevant here.

Testing Linux/sparc64 boot from a virtio-blk-pci device on current git master shows I 
can boot from a virtio-blk-pci device without any problem, even though the existing 
code fails the has_iommu check and vdev->dma_as gets set to address_space_memory 
which is why I haven't spotted this before.

Stepping through the code shows that klass->get_dma_as() is pointing to 
virtio_pci_get_dma_as() which in turn returns pci_get_address_space(dev) which looks 
correct to me. I think that the logic to set vdev->dma_as is incorrect here since 
qemu-system-sparc64 has an emulated IOMMU with its own address space independent of 
whether the host has an IOMMU.

I modified your patch slightly as below and confirmed that I can still boot 
Linux/sparc64 here:

diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 859978d248..ee178b8223 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -82,9 +82,11 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
          return;
      }

-    if (klass->get_dma_as != NULL && has_iommu) {
-        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
+    if (klass->get_dma_as != NULL) {
          vdev->dma_as = klass->get_dma_as(qbus->parent);
+        if (has_iommu) {
+            virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
+        }
      } else {
          vdev->dma_as = &address_space_memory;
      }

Michael: can you comment further on whether the analysis and patch above are correct?


ATB,

Mark.
Jason Thorpe Aug. 18, 2021, 12:35 p.m. UTC | #2
> On Aug 18, 2021, at 12:58 AM, Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> wrote:
> 
> On 31/07/2021 16:41, Jason Thorpe wrote:
> 
> (added Michael on CC)
> 
> Hi Jason,
> 
> Thanks for looking at this! I've had previous discussions with Martin trying to figure out why virtio-blk-pci doesn't work with Netbsd/sparc64 so glad that you've been able to find the underlying cause.
> 
> My read on VIRTIO_F_IOMMU_PLATFORM is that this is declaring host IOMMU support so the implementation on the guest should not be relevant here.

That’s basically the conclusion I came to, as well, but I still wasn’t quite sure how the host IOMMU was relevant so I wasn’t quite willing to stick my neck out :-)

> Testing Linux/sparc64 boot from a virtio-blk-pci device on current git master shows I can boot from a virtio-blk-pci device without any problem, even though the existing code fails the has_iommu check and vdev->dma_as gets set to address_space_memory which is why I haven't spotted this before.
> 
> Stepping through the code shows that klass->get_dma_as() is pointing to virtio_pci_get_dma_as() which in turn returns pci_get_address_space(dev) which looks correct to me. I think that the logic to set vdev->dma_as is incorrect here since qemu-system-sparc64 has an emulated IOMMU with its own address space independent of whether the host has an IOMMU.

Right, this more-or-less the same situation as qemu-system-alpha.  I’m curious why the Linux/sparc64 guest is able to use VirtIO without the patch, though.  I guess that VirtIO client implementation is skipping the normal platform DMA mapping step and is just poking physical addresses in directly?  The NetBSD VirtIO client code behaves like all other NetBSD PCI drivers and does NOT skip the platform DMA mapping step.

> I modified your patch slightly as below and confirmed that I can still boot Linux/sparc64 here:
> 
> diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> index 859978d248..ee178b8223 100644
> --- a/hw/virtio/virtio-bus.c
> +++ b/hw/virtio/virtio-bus.c
> @@ -82,9 +82,11 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
>         return;
>     }
> 
> -    if (klass->get_dma_as != NULL && has_iommu) {
> -        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> +    if (klass->get_dma_as != NULL) {
>         vdev->dma_as = klass->get_dma_as(qbus->parent);
> +        if (has_iommu) {
> +            virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> +        }
>     } else {
>         vdev->dma_as = &address_space_memory;
>     }
> 
> Michael: can you comment further on whether the analysis and patch above are correct?
> 
> 
> ATB,
> 
> Mark.

-- thorpej
Michael S. Tsirkin Oct. 6, 2021, 8:26 a.m. UTC | #3
On Wed, Aug 18, 2021 at 05:35:24AM -0700, Jason Thorpe wrote:
> 
> > On Aug 18, 2021, at 12:58 AM, Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> wrote:
> > 
> > On 31/07/2021 16:41, Jason Thorpe wrote:
> > 
> > (added Michael on CC)
> > 
> > Hi Jason,
> > 
> > Thanks for looking at this! I've had previous discussions with Martin trying to figure out why virtio-blk-pci doesn't work with Netbsd/sparc64 so glad that you've been able to find the underlying cause.
> > 
> > My read on VIRTIO_F_IOMMU_PLATFORM is that this is declaring host IOMMU support so the implementation on the guest should not be relevant here.

The name is confusing. The spec renamed it to VIRTIO_F_ACCESS_PLATFORM
now, with this explanation:

  \item[VIRTIO_F_ACCESS_PLATFORM(33)] This feature indicates that
  the device can be used on a platform where device access to data
  in memory is limited and/or translated. E.g. this is the case if the device can be located
  behind an IOMMU that translates bus addresses from the device into physical
  addresses in memory, if the device can be limited to only access
  certain memory addresses or if special commands such as
  a cache flush can be needed to synchronise data in memory with
  the device. Whether accesses are actually limited or translated
  is described by platform-specific means.
  If this feature bit is set to 0, then the device
  has same access to memory addresses supplied to it as the
  driver has.
  In particular, the device will always use physical addresses
  matching addresses used by the driver (typically meaning
  physical addresses used by the CPU)
  and not translated further, and can access any address supplied to it by
  the driver. When clear, this overrides any platform-specific description of
  whether device access is limited or translated in any way, e.g.
  whether an IOMMU may be present.



> That’s basically the conclusion I came to, as well, but I still wasn’t quite sure how the host IOMMU was relevant so I wasn’t quite willing to stick my neck out :-)
> 
> > Testing Linux/sparc64 boot from a virtio-blk-pci device on current git master shows I can boot from a virtio-blk-pci device without any problem, even though the existing code fails the has_iommu check and vdev->dma_as gets set to address_space_memory which is why I haven't spotted this before.
> > 
> > Stepping through the code shows that klass->get_dma_as() is pointing to virtio_pci_get_dma_as() which in turn returns pci_get_address_space(dev) which looks correct to me. I think that the logic to set vdev->dma_as is incorrect here since qemu-system-sparc64 has an emulated IOMMU with its own address space independent of whether the host has an IOMMU.
> 
> Right, this more-or-less the same situation as qemu-system-alpha.  I’m curious why the Linux/sparc64 guest is able to use VirtIO without the patch, though.  I guess that VirtIO client implementation is skipping the normal platform DMA mapping step and is just poking physical addresses in directly?  The NetBSD VirtIO client code behaves like all other NetBSD PCI drivers and does NOT skip the platform DMA mapping step.

How did it work then?
If the driver wants to use DMA addresses normally it has to negotiate VIRTIO_F_ACCESS_PLATFORM.
Without it the assumption is that it uses physical addresses.

> > I modified your patch slightly as below and confirmed that I can still boot Linux/sparc64 here:
> > 
> > diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> > index 859978d248..ee178b8223 100644
> > --- a/hw/virtio/virtio-bus.c
> > +++ b/hw/virtio/virtio-bus.c
> > @@ -82,9 +82,11 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
> >         return;
> >     }
> > 
> > -    if (klass->get_dma_as != NULL && has_iommu) {
> > -        virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> > +    if (klass->get_dma_as != NULL) {
> >         vdev->dma_as = klass->get_dma_as(qbus->parent);
> > +        if (has_iommu) {
> > +            virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> > +        }
> >     } else {
> >         vdev->dma_as = &address_space_memory;
> >     }
> > 
> > Michael: can you comment further on whether the analysis and patch above are correct?
> > 
> > 
> > ATB,
> > 
> > Mark.
> 
> -- thorpej

Maybe this platform simply has to mandate VIRTIO_F_ACCESS_PLATFORM then.

I would be very reluctant to add to the pile of hacks around legacy
devices.
diff mbox series

Patch

diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index 859978d248..c083e8d737 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -85,6 +85,8 @@  void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
     if (klass->get_dma_as != NULL && has_iommu) {
         virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
         vdev->dma_as = klass->get_dma_as(qbus->parent);
+    } else if (klass->get_dma_as != NULL) {
+        vdev->dma_as = klass->get_dma_as(qbus->parent);
     } else {
         vdev->dma_as = &address_space_memory;
     }