diff mbox

[07/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers

Message ID 1340087992-2399-8-git-send-email-benh@kernel.crashing.org
State New
Headers show

Commit Message

Benjamin Herrenschmidt June 19, 2012, 6:39 a.m. UTC
From: David Gibson <david@gibson.dropbear.id.au>

The USB UHCI and EHCI drivers were converted some time ago to use the
pci_dma_*() helper functions.  However, this conversion was not complete
because in some places both these drivers do DMA via the usb_packet_map()
function in usb-libhw.c.  That function directly used
cpu_physical_memory_map().

Now that the sglist code uses DMA wrappers properly, we can convert the
functions in usb-libhw.c, thus conpleting the conversion of UHCI and EHCI
to use the DMA wrappers.

Note that usb_packet_map() invokes dma_memory_map() with a NULL invalidate
callback function.  When IOMMU support is added, this will mean that
usb_packet_map() and the corresponding usb_packet_unmap() must be called in
close proximity without dropping the qemu device lock - otherwise the guest
might invalidate IOMMU mappings while they are still in use by the device
code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 hw/usb.h          |    2 +-
 hw/usb/hcd-ehci.c |    4 ++--
 hw/usb/hcd-uhci.c |    2 +-
 hw/usb/libhw.c    |   21 +++++++++++----------
 4 files changed, 15 insertions(+), 14 deletions(-)

Comments

Gerd Hoffmann June 19, 2012, 1:42 p.m. UTC | #1
Hi,

> Note that usb_packet_map() invokes dma_memory_map() with a NULL invalidate
> callback function.  When IOMMU support is added, this will mean that
> usb_packet_map() and the corresponding usb_packet_unmap() must be called in
> close proximity without dropping the qemu device lock

Well, that isn't guaranteed ...

> - otherwise the guest
> might invalidate IOMMU mappings while they are still in use by the device
> code.

Guest tearing down mapping while usb packets using them are still in
flight would be a guest bug.  Still not impossible to happen though. How
is this case supposed to be handled?

cheers,
  Gerd
Benjamin Herrenschmidt June 19, 2012, 8:23 p.m. UTC | #2
On Tue, 2012-06-19 at 15:42 +0200, Gerd Hoffmann wrote:
> Well, that isn't guaranteed ...
> 
> > - otherwise the guest
> > might invalidate IOMMU mappings while they are still in use by the device
> > code.
> 
> Guest tearing down mapping while usb packets using them are still in
> flight would be a guest bug.  Still not impossible to happen though. How
> is this case supposed to be handled? 

Like with any other device, it's hard ... what would happen on real
hardware is that the USB controller will get a target abort, which will
result in the controller reporting an error (typically in the PCI status
register) and stopping.

In qemu we tend not to deal with DMA failures at all.

If the scenario above happens, we will potentially continue accessing
the guest memory after it has been unmapped. While this is bad, in
practice, it's not a huge deal because the USB controller is only
accessed by the guest kernel so it's a matter of the guest kernel
shooting itself in the foot.

So we don't have to fix it as a pre-req to merging the patches, though
it would be nice if we did in the long run.

The way to fix it is to register a cancel callback
(dma_memory_map_with_cancel), which will be called by the iommu code
when the translation is invalidated, and which can be used to cancel
pending transactions etc... and generally prevent further access to the
memory.

However the current implementation never calls cancel.

Cheers.
Ben.
David Gibson June 20, 2012, 3:14 a.m. UTC | #3
On Wed, Jun 20, 2012 at 06:23:58AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2012-06-19 at 15:42 +0200, Gerd Hoffmann wrote:
> > Well, that isn't guaranteed ...
> > 
> > > - otherwise the guest
> > > might invalidate IOMMU mappings while they are still in use by the device
> > > code.
> > 
> > Guest tearing down mapping while usb packets using them are still in
> > flight would be a guest bug.  Still not impossible to happen though. How
> > is this case supposed to be handled? 
> 
> Like with any other device, it's hard ... what would happen on real
> hardware is that the USB controller will get a target abort, which will
> result in the controller reporting an error (typically in the PCI status
> register) and stopping.
> 
> In qemu we tend not to deal with DMA failures at all.
> 
> If the scenario above happens, we will potentially continue accessing
> the guest memory after it has been unmapped. While this is bad, in
> practice, it's not a huge deal because the USB controller is only
> accessed by the guest kernel so it's a matter of the guest kernel
> shooting itself in the foot.
> 
> So we don't have to fix it as a pre-req to merging the patches, though
> it would be nice if we did in the long run.
> 
> The way to fix it is to register a cancel callback
> (dma_memory_map_with_cancel), which will be called by the iommu code
> when the translation is invalidated, and which can be used to cancel
> pending transactions etc... and generally prevent further access to the
> memory.

So, in fact the original comment is a bit out of date.  With the
current version of this series, then a guest attempt to invalidate
will be delayed until the unmap occurs.  If we discover that leads to
delays which are too long then we can add the cancel callback to
handle this.  However, the USB case should be ok - it may not be
theoretically guaranteed that the calls are close, but it's certainly
the case at the moment.
Benjamin Herrenschmidt June 20, 2012, 3:52 a.m. UTC | #4
On Wed, 2012-06-20 at 13:14 +1000, David Gibson wrote:
> So, in fact the original comment is a bit out of date.  With the
> current version of this series, then a guest attempt to invalidate
> will be delayed until the unmap occurs. 

No, this code was dropped, including the tracking of the maps, following
comments from Anthony and others. The API for providing a cancel
callback is still there but nothing will call it unless the backend does
its own tracking and decides to do so.

As it is, the race exist but:

 - It will only hurt the guest

 - And only for a very buggy guest

So the worst case is that it hurts something like kdump.


I plan to re-introduce some of the mechanisms for cancellation
eventually, but we agreed that it wasn't going to be a show stopper and
that we could work on getting that sorted in a second phase. I'm looking
at a more efficient way to deal with the tracking of the maps as well
since some devices uses them often.

Cheers,
Ben.
Gerd Hoffmann June 20, 2012, 6:25 a.m. UTC | #5
Hi,

>> Like with any other device, it's hard ... what would happen on real
>> hardware is that the USB controller will get a target abort, which will
>> result in the controller reporting an error (typically in the PCI status
>> register) and stopping.

Not that hard, code to cancel in-flight transactions is in place already
as this can happen for other reasons too.

> handle this.  However, the USB case should be ok - it may not be
> theoretically guaranteed that the calls are close, but it's certainly
> the case at the moment.

Depends on the device.  For the usb hid devices (which is the most
important use case for power I think) packets will be processed
synchronously, so there is no problem here.

usb-storage can keep packets in flight without holding the qemu lock
(waiting for async block I/O finish).  Shouldn't be too long though.

usb-host keeps pretty much every packet in flight without holding the
qemu lock as it passes on the requests to the hosts usbfs, then waits
asynchronously for the request finish before returning the result to the
guest.  Depending on the kind of device you are passing though this can
be *very* long (minutes).

cheers,
  Gerd
Benjamin Herrenschmidt June 20, 2012, 9:25 a.m. UTC | #6
On Wed, 2012-06-20 at 08:25 +0200, Gerd Hoffmann wrote:
> Hi,
> 
> >> Like with any other device, it's hard ... what would happen on real
> >> hardware is that the USB controller will get a target abort, which will
> >> result in the controller reporting an error (typically in the PCI status
> >> register) and stopping.
> 
> Not that hard, code to cancel in-flight transactions is in place already
> as this can happen for other reasons too.

Ok, good,.

> > handle this.  However, the USB case should be ok - it may not be
> > theoretically guaranteed that the calls are close, but it's certainly
> > the case at the moment.
> 
> Depends on the device.  For the usb hid devices (which is the most
> important use case for power I think) packets will be processed
> synchronously, so there is no problem here.
> 
> usb-storage can keep packets in flight without holding the qemu lock
> (waiting for async block I/O finish).  Shouldn't be too long though.
> 
> usb-host keeps pretty much every packet in flight without holding the
> qemu lock as it passes on the requests to the hosts usbfs, then waits
> asynchronously for the request finish before returning the result to the
> guest.  Depending on the kind of device you are passing though this can
> be *very* long (minutes).

Right so with the initial patch series I sent, nothing will happen in
that we don't actually keep track of mappings, don't call the cancel
callback and anyways, OHCI/EHCI don't register a cancel callback.

As I wrote earlier, this is not very harmful so it's good to get merged,
and we can look into improving it and add the cancellation mechanism on
top. There was some original invalidation code from David that was
trying to wait on all pending maps but that had issues, Anthony wasn't
too happy about it, so I decided to attempt to submit/merge the patch
series without solving that issue.

To properly implement cancel without too much overhead, we need some
tracking of qemu maps and we need a quick way to know when the guest
invalidates a translation, if that translation has maps associated with
it.

The best way to do that, from my little experience messing around with
it, is going to essentially be implementation specific (ie depends on
the actual iommu backend).

For example, on TCEs, I could keep a parallel bitmap indicating when a
map is present for a given entry. That could be very efficient if I know
that there won't be more than one qemu map at a time for a given entry
though, so we should discuss whether that's an acceptable limitation.

There's also some "interesting" issues due to the fact that we populate
the TCE tables directly from KVM in "real mode" for speed, so that
bitmap would need to be some kind of shared memory with the kernel
(without locks !) and the kernel would have to be updated to fallback to
sending the invalidation hypercalls to qemu when it collides with a
populated map entry.

It's all doable, it's also a bit tricky, potentially quite a bit of
code, new KVM/qemu interfaces, etc... for a problem that's going to be a
non-issue pretty much 99.9% of the time :-) We still need to address it,
but I haven't convinced myself yet that I have come up with the best
solution :-)

Cheers,
Ben.
Gerd Hoffmann June 20, 2012, 9:54 a.m. UTC | #7
Hi,

> [ lots of interesting background info snipped ]

> It's all doable, it's also a bit tricky, potentially quite a bit of
> code, new KVM/qemu interfaces, etc... for a problem that's going to be a
> non-issue pretty much 99.9% of the time :-) We still need to address it,
> but I haven't convinced myself yet that I have come up with the best
> solution :-)

Yea, sure, it's in the "nice-to-have" not "must-have" category.  And
adding complex code which almost never actually runs needs some care
indeed.  Not having that for the initial merge is perfectly fine with
me, just wanted to know what dragons might be lurking there ;)

Oh, and: Acked-by: Gerd Hoffmann <kraxel@redhat.com>

cheers,
  Gerd
David Gibson June 21, 2012, 1:42 a.m. UTC | #8
On Wed, Jun 20, 2012 at 01:52:12PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-06-20 at 13:14 +1000, David Gibson wrote:
> > So, in fact the original comment is a bit out of date.  With the
> > current version of this series, then a guest attempt to invalidate
> > will be delayed until the unmap occurs. 
> 
> No, this code was dropped, including the tracking of the maps, following
> comments from Anthony and others. The API for providing a cancel
> callback is still there but nothing will call it unless the backend does
> its own tracking and decides to do so.

Ah, right.
diff mbox

Patch

diff --git a/hw/usb.h b/hw/usb.h
index 2a56fe5..a5623d3 100644
--- a/hw/usb.h
+++ b/hw/usb.h
@@ -345,7 +345,7 @@  void usb_packet_check_state(USBPacket *p, USBPacketState expected);
 void usb_packet_setup(USBPacket *p, int pid, USBEndpoint *ep);
 void usb_packet_addbuf(USBPacket *p, void *ptr, size_t len);
 int usb_packet_map(USBPacket *p, QEMUSGList *sgl);
-void usb_packet_unmap(USBPacket *p);
+void usb_packet_unmap(USBPacket *p, QEMUSGList *sgl);
 void usb_packet_copy(USBPacket *p, void *ptr, size_t bytes);
 void usb_packet_skip(USBPacket *p, size_t bytes);
 void usb_packet_cleanup(USBPacket *p);
diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 5298204..81bbc54 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1422,8 +1422,8 @@  static void ehci_execute_complete(EHCIQueue *q)
         set_field(&q->qh.token, p->tbytes, QTD_TOKEN_TBYTES);
     }
     ehci_finish_transfer(q, p->usb_status);
+    usb_packet_unmap(&p->packet, &p->sgl);
     qemu_sglist_destroy(&p->sgl);
-    usb_packet_unmap(&p->packet);
 
     q->qh.token ^= QTD_TOKEN_DTOGGLE;
     q->qh.token &= ~QTD_TOKEN_ACTIVE;
@@ -1547,7 +1547,7 @@  static int ehci_process_itd(EHCIState *ehci,
                 usb_packet_map(&ehci->ipacket, &ehci->isgl);
                 ret = usb_handle_packet(dev, &ehci->ipacket);
                 assert(ret != USB_RET_ASYNC);
-                usb_packet_unmap(&ehci->ipacket);
+                usb_packet_unmap(&ehci->ipacket, &ehci->isgl);
             } else {
                 DPRINTF("ISOCH: attempt to addess non-iso endpoint\n");
                 ret = USB_RET_NAK;
diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index 9871e24..86888ce 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -871,7 +871,7 @@  static int uhci_handle_td(UHCIState *s, uint32_t addr, UHCI_TD *td,
 
 done:
     len = uhci_complete_td(s, td, async, int_mask);
-    usb_packet_unmap(&async->packet);
+    usb_packet_unmap(&async->packet, &async->sgl);
     uhci_async_free(async);
     return len;
 }
diff --git a/hw/usb/libhw.c b/hw/usb/libhw.c
index 2462351..c0de30e 100644
--- a/hw/usb/libhw.c
+++ b/hw/usb/libhw.c
@@ -26,15 +26,15 @@ 
 
 int usb_packet_map(USBPacket *p, QEMUSGList *sgl)
 {
-    int is_write = (p->pid == USB_TOKEN_IN);
-    target_phys_addr_t len;
+    DMADirection dir = (p->pid == USB_TOKEN_IN) ?
+        DMA_DIRECTION_FROM_DEVICE : DMA_DIRECTION_TO_DEVICE;
+    dma_addr_t len;
     void *mem;
     int i;
 
     for (i = 0; i < sgl->nsg; i++) {
         len = sgl->sg[i].len;
-        mem = cpu_physical_memory_map(sgl->sg[i].base, &len,
-                                      is_write);
+        mem = dma_memory_map(sgl->dma, sgl->sg[i].base, &len, dir);
         if (!mem) {
             goto err;
         }
@@ -46,18 +46,19 @@  int usb_packet_map(USBPacket *p, QEMUSGList *sgl)
     return 0;
 
 err:
-    usb_packet_unmap(p);
+    usb_packet_unmap(p, sgl);
     return -1;
 }
 
-void usb_packet_unmap(USBPacket *p)
+void usb_packet_unmap(USBPacket *p, QEMUSGList *sgl)
 {
-    int is_write = (p->pid == USB_TOKEN_IN);
+    DMADirection dir = (p->pid == USB_TOKEN_IN) ?
+        DMA_DIRECTION_FROM_DEVICE : DMA_DIRECTION_TO_DEVICE;
     int i;
 
     for (i = 0; i < p->iov.niov; i++) {
-        cpu_physical_memory_unmap(p->iov.iov[i].iov_base,
-                                  p->iov.iov[i].iov_len, is_write,
-                                  p->iov.iov[i].iov_len);
+        dma_memory_unmap(sgl->dma, p->iov.iov[i].iov_base,
+                         p->iov.iov[i].iov_len, dir,
+                         p->iov.iov[i].iov_len);
     }
 }