Patchwork [09/26] ehci: Use uframe precision for interrupt threshold checking

login
register
mail settings
Submitter Hans de Goede
Date Dec. 14, 2012, 1:35 p.m.
Message ID <1355492147-5023-10-git-send-email-hdegoede@redhat.com>
Download mbox | patch
Permalink /patch/206486/
State New
Headers show

Comments

Hans de Goede - Dec. 14, 2012, 1:35 p.m.
Before this patch, the following could happen:
1) Transfer completes, raises interrupt
2) .5 ms later we check if the guest has queued up any new transfers
3) We find and execute a new transfer
4) .2 ms later the new transfer completes
5) We re-run our frame_timer to write back the completion, but less then
   1 ms has passed since our last run, so frindex is not changed, so the
   interrupt threshold code delays the interrupt
6) 1 ms from the re-run our frame-timer runs again and finally delivers
   the interrupt

This leads to unnecessary large delays of interrupts, this code fixes this
by keeping a shadow frindex variable which is incremented with uframe
precision and using that for interrupt threshold control, making the
interrupt fire at step 5 for guest which have low interrupt threshold
settings (like Linux).

Note that a shadow variable is used instead of changing frindex to
uframe accuracy because we must send a frindex which is a multiple of 8
during migration for migration compatibility, and rounding it down to
a multiple of 8 pre-migration, can lead to frindex going backwards from
the guest pov.

This boosts Linux read speed of a simple cheap USB thumb drive by 6 %.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 hw/usb/hcd-ehci.c | 69 ++++++++++++++++++++++++++++++++++++-------------------
 hw/usb/hcd-ehci.h |  3 +++
 2 files changed, 48 insertions(+), 24 deletions(-)
Gerd Hoffmann - Dec. 17, 2012, 1:16 p.m.
On 12/14/12 14:35, Hans de Goede wrote:
> Note that a shadow variable is used instead of changing frindex to
> uframe accuracy because we must send a frindex which is a multiple of 8
> during migration for migration compatibility, and rounding it down to
> a multiple of 8 pre-migration, can lead to frindex going backwards from
> the guest pov.

Jumping forward instead?

cheers,
  Gerd
Hans de Goede - Dec. 17, 2012, 2:23 p.m.
Hi,

On 12/17/2012 02:16 PM, Gerd Hoffmann wrote:
> On 12/14/12 14:35, Hans de Goede wrote:
>> Note that a shadow variable is used instead of changing frindex to
>> uframe accuracy because we must send a frindex which is a multiple of 8
>> during migration for migration compatibility, and rounding it down to
>> a multiple of 8 pre-migration, can lead to frindex going backwards from
>> the guest pov.
>
> Jumping forward instead?

You mean rounding the send frindex up pre-migration, I didn't really consider
that as it will cause us skipping processing an entry in the periodic frame
list. I guess doing that on migration isn't too bad. OTOH giving the guest
only frame accuracy like we've been doing till now also works fine...

Your choice :)

Regards,

Hans
Gerd Hoffmann - Dec. 17, 2012, 2:39 p.m.
On 12/17/12 15:23, Hans de Goede wrote:
> Hi,
> 
> On 12/17/2012 02:16 PM, Gerd Hoffmann wrote:
>> On 12/14/12 14:35, Hans de Goede wrote:
>>> Note that a shadow variable is used instead of changing frindex to
>>> uframe accuracy because we must send a frindex which is a multiple of 8
>>> during migration for migration compatibility, and rounding it down to
>>> a multiple of 8 pre-migration, can lead to frindex going backwards from
>>> the guest pov.
>>
>> Jumping forward instead?
> 
> You mean rounding the send frindex up pre-migration, I didn't really
> consider
> that as it will cause us skipping processing an entry in the periodic frame
> list. I guess doing that on migration isn't too bad. OTOH giving the guest
> only frame accuracy like we've been doing till now also works fine...
> 
> Your choice :)

I'm looking for a way to avoid the shadow variable, but of course
without breaking migration.  giving the guest only frame accuracy looks
good to me.

cheers,
  Gerd
Hans de Goede - Dec. 17, 2012, 2:47 p.m.
Hi,

On 12/17/2012 03:39 PM, Gerd Hoffmann wrote:
> On 12/17/12 15:23, Hans de Goede wrote:
>> Hi,
>>
>> On 12/17/2012 02:16 PM, Gerd Hoffmann wrote:
>>> On 12/14/12 14:35, Hans de Goede wrote:
>>>> Note that a shadow variable is used instead of changing frindex to
>>>> uframe accuracy because we must send a frindex which is a multiple of 8
>>>> during migration for migration compatibility, and rounding it down to
>>>> a multiple of 8 pre-migration, can lead to frindex going backwards from
>>>> the guest pov.
>>>
>>> Jumping forward instead?
>>
>> You mean rounding the send frindex up pre-migration, I didn't really
>> consider
>> that as it will cause us skipping processing an entry in the periodic frame
>> list. I guess doing that on migration isn't too bad. OTOH giving the guest
>> only frame accuracy like we've been doing till now also works fine...
>>
>> Your choice :)
>
> I'm looking for a way to avoid the shadow variable, but of course
> without breaking migration.  giving the guest only frame accuracy looks
> good to me.

Ok, but then we need the shadow variable, iow then the patch stays as is ...

Regards,

Hans
Gerd Hoffmann - Dec. 17, 2012, 2:51 p.m.
On 12/17/12 15:47, Hans de Goede wrote:
> Hi,
> 
> On 12/17/2012 03:39 PM, Gerd Hoffmann wrote:
>> On 12/17/12 15:23, Hans de Goede wrote:
>>> Hi,
>>>
>>> On 12/17/2012 02:16 PM, Gerd Hoffmann wrote:
>>>> On 12/14/12 14:35, Hans de Goede wrote:
>>>>> Note that a shadow variable is used instead of changing frindex to
>>>>> uframe accuracy because we must send a frindex which is a multiple
>>>>> of 8
>>>>> during migration for migration compatibility, and rounding it down to
>>>>> a multiple of 8 pre-migration, can lead to frindex going backwards
>>>>> from
>>>>> the guest pov.
>>>>
>>>> Jumping forward instead?
>>>
>>> You mean rounding the send frindex up pre-migration, I didn't really
>>> consider
>>> that as it will cause us skipping processing an entry in the periodic
>>> frame
>>> list. I guess doing that on migration isn't too bad. OTOH giving the
>>> guest
>>> only frame accuracy like we've been doing till now also works fine...
>>>
>>> Your choice :)
>>
>> I'm looking for a way to avoid the shadow variable, but of course
>> without breaking migration.  giving the guest only frame accuracy looks
>> good to me.
> 
> Ok, but then we need the shadow variable, iow then the patch stays as is
> ...

Can't we (a) switch frindex to microframe resolution, (b) round to frame
resolution in pre_save and (c) return frindex & ~7 on guest register reads?

cheers,
  Gerd
Hans de Goede - Dec. 18, 2012, 10:20 a.m.
Hi,

On 12/17/2012 03:51 PM, Gerd Hoffmann wrote:
> On 12/17/12 15:47, Hans de Goede wrote:
>> Hi,
>>
>> On 12/17/2012 03:39 PM, Gerd Hoffmann wrote:
>>> On 12/17/12 15:23, Hans de Goede wrote:
>>>> Hi,
>>>>
>>>> On 12/17/2012 02:16 PM, Gerd Hoffmann wrote:
>>>>> On 12/14/12 14:35, Hans de Goede wrote:
>>>>>> Note that a shadow variable is used instead of changing frindex to
>>>>>> uframe accuracy because we must send a frindex which is a multiple
>>>>>> of 8
>>>>>> during migration for migration compatibility, and rounding it down to
>>>>>> a multiple of 8 pre-migration, can lead to frindex going backwards
>>>>>> from
>>>>>> the guest pov.
>>>>>
>>>>> Jumping forward instead?
>>>>
>>>> You mean rounding the send frindex up pre-migration, I didn't really
>>>> consider
>>>> that as it will cause us skipping processing an entry in the periodic
>>>> frame
>>>> list. I guess doing that on migration isn't too bad. OTOH giving the
>>>> guest
>>>> only frame accuracy like we've been doing till now also works fine...
>>>>
>>>> Your choice :)
>>>
>>> I'm looking for a way to avoid the shadow variable, but of course
>>> without breaking migration.  giving the guest only frame accuracy looks
>>> good to me.
>>
>> Ok, but then we need the shadow variable, iow then the patch stays as is
>> ...
>
> Can't we (a) switch frindex to microframe resolution, (b) round to frame
> resolution in pre_save and (c) return frindex & ~7 on guest register reads?

Ah yes we can. I thought about that myself, but I was under the impression
that we were using mmap tricks to allow the guest to read the ioregs directly
without going through a vmexit. But it turns out we've:

static uint64_t ehci_opreg_read(void *ptr, hwaddr addr,
                                 unsigned size)
{
     EHCIState *s = ptr;
     uint32_t val;

     val = s->opreg[addr >> 2];
     trace_usb_ehci_opreg_read(addr + s->opregbase, addr2str(addr), val);
     return val;
}

Can qemu not handle an mmio range where writes are trapped, but reads are
not? That would force the use of the shadow variable, but should otherwise
provide a nice speedup.

Regards,

Hans
Gerd Hoffmann - Dec. 18, 2012, 11:03 a.m.
Hi,

> Can qemu not handle an mmio range where writes are trapped, but reads are
> not? That would force the use of the shadow variable, but should otherwise
> provide a nice speedup.

No.  vmexit is needed anyway btw, but the round-trip to qemu userspace
could be short-cutted in theory.  It's non-trivial though.  Alex had a
talk about it at kvm forum (covering ide).

First a simple read directly + write via qemu isn't that useful.  You
need a policy per register.

For most reads it would work, but there are exceptions.  Registers
holding timers for example.  frindex is actually an example of that.
With async_stepdown active frindex updates are quite jumpy.  We might
want to update the register on guest reads (and maybe also reset async
stepdown in that case).

Likewise the other way around: Not all register writes have some effect
which qemu must emulate, some are just storage (like ehci frame list
address).

Locking is an unsolved issue (in-kernel register reads/writes don't grab
the qemu lock and thus would race with iothread accessing the register
variables).

cheers,
  Gerd
Hans de Goede - Dec. 18, 2012, 11:30 a.m.
Hi,

On 12/18/2012 12:03 PM, Gerd Hoffmann wrote:
>    Hi,
>
>> Can qemu not handle an mmio range where writes are trapped, but reads are
>> not? That would force the use of the shadow variable, but should otherwise
>> provide a nice speedup.
>
> No.  vmexit is needed anyway btw, but the round-trip to qemu userspace
> could be short-cutted in theory.  It's non-trivial though.  Alex had a
> talk about it at kvm forum (covering ide).

Ok, then I'll respin the patch getting rid of the shadow variable.

Regards,

Hans

Patch

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 5d314a0..ef3ab97 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -109,12 +109,13 @@ 
 
 #define FRAME_TIMER_FREQ 1000
 #define FRAME_TIMER_NS   (1000000000 / FRAME_TIMER_FREQ)
+#define UFRAME_TIMER_NS  (FRAME_TIMER_NS / 8)
 
 #define NB_MAXINTRATE    8        // Max rate at which controller issues ints
 #define BUFF_SIZE        5*4096   // Max bytes to transfer per transaction
 #define MAX_QH           100      // Max allowable queue heads in a chain
-#define MIN_FR_PER_TICK  3        // Min frames to process when catching up
-#define PERIODIC_ACTIVE  64
+#define MIN_UFR_PER_TICK 24       /* Min frames to process when catching up */
+#define PERIODIC_ACTIVE  512      /* Micro-frames */
 
 /*  Internal periodic / asynchronous schedule state machine states
  */
@@ -277,7 +278,7 @@  static inline void ehci_update_irq(EHCIState *s)
         level = 1;
     }
 
-    trace_usb_ehci_irq(level, s->frindex, s->usbsts, s->usbintr);
+    trace_usb_ehci_irq(level, s->ufrindex, s->usbsts, s->usbintr);
     qemu_set_irq(s->irq, level);
 }
 
@@ -303,14 +304,14 @@  static inline void ehci_commit_irq(EHCIState *s)
     if (!s->usbsts_pending) {
         return;
     }
-    if (s->usbsts_frindex > s->frindex) {
+    if (s->usbsts_frindex > s->ufrindex) {
         return;
     }
 
     itc = (s->usbcmd >> 16) & 0xff;
     s->usbsts |= s->usbsts_pending;
     s->usbsts_pending = 0;
-    s->usbsts_frindex = s->frindex + itc;
+    s->usbsts_frindex = s->ufrindex + itc;
     ehci_update_irq(s);
 }
 
@@ -921,6 +922,7 @@  static void ehci_reset(void *opaque)
     s->usbsts = USBSTS_HALT;
     s->usbsts_pending = 0;
     s->usbsts_frindex = 0;
+    s->ufrindex = 0;
 
     s->astate = EST_INACTIVE;
     s->pstate = EST_INACTIVE;
@@ -1107,6 +1109,8 @@  static void ehci_opreg_write(void *ptr, hwaddr addr,
 
     case FRINDEX:
         val &= 0x00003ff8; /* frindex is 14bits and always a multiple of 8 */
+        s->usbsts_frindex = val;
+        s->ufrindex = val;
         break;
 
     case CONFIGFLAG:
@@ -2219,16 +2223,21 @@  static void ehci_advance_periodic_state(EHCIState *ehci)
     }
 }
 
-static void ehci_update_frindex(EHCIState *ehci, int frames)
+static void ehci_update_frindex(EHCIState *ehci, int uframes)
 {
     int i;
 
-    if (!ehci_enabled(ehci)) {
+    if (!ehci_enabled(ehci) && ehci->pstate == EST_INACTIVE) {
         return;
     }
 
-    for (i = 0; i < frames; i++) {
-        ehci->frindex += 8;
+    for (i = 0; i < uframes; i++) {
+        ehci->ufrindex++;
+
+        if (ehci->ufrindex & 7) {
+            continue;
+        }
+        ehci->frindex = ehci->ufrindex;
 
         if (ehci->frindex == 0x00002000) {
             ehci_raise_irq(ehci, USBSTS_FLR);
@@ -2237,6 +2246,7 @@  static void ehci_update_frindex(EHCIState *ehci, int frames)
         if (ehci->frindex == 0x00004000) {
             ehci_raise_irq(ehci, USBSTS_FLR);
             ehci->frindex = 0;
+            ehci->ufrindex = 0;
             if (ehci->usbsts_frindex >= 0x00004000) {
                 ehci->usbsts_frindex -= 0x00004000;
             } else {
@@ -2252,33 +2262,33 @@  static void ehci_frame_timer(void *opaque)
     int need_timer = 0;
     int64_t expire_time, t_now;
     uint64_t ns_elapsed;
-    int frames, skipped_frames;
+    int uframes, skipped_uframes;
     int i;
 
     t_now = qemu_get_clock_ns(vm_clock);
     ns_elapsed = t_now - ehci->last_run_ns;
-    frames = ns_elapsed / FRAME_TIMER_NS;
+    uframes = ns_elapsed / UFRAME_TIMER_NS;
 
     if (ehci_periodic_enabled(ehci) || ehci->pstate != EST_INACTIVE) {
         need_timer++;
 
-        if (frames > ehci->maxframes) {
-            skipped_frames = frames - ehci->maxframes;
-            ehci_update_frindex(ehci, skipped_frames);
-            ehci->last_run_ns += FRAME_TIMER_NS * skipped_frames;
-            frames -= skipped_frames;
-            DPRINTF("WARNING - EHCI skipped %d frames\n", skipped_frames);
+        if (uframes > (ehci->maxframes * 8)) {
+            skipped_uframes = uframes - (ehci->maxframes * 8);
+            ehci_update_frindex(ehci, skipped_uframes);
+            ehci->last_run_ns += UFRAME_TIMER_NS * skipped_uframes;
+            uframes -= skipped_uframes;
+            DPRINTF("WARNING - EHCI skipped %d uframes\n", skipped_uframes);
         }
 
-        for (i = 0; i < frames; i++) {
+        for (i = 0; i < uframes; i++) {
             /*
              * If we're running behind schedule, we should not catch up
              * too fast, as that will make some guests unhappy:
-             * 1) We must process a minimum of MIN_FR_PER_TICK frames,
+             * 1) We must process a minimum of MIN_UFR_PER_TICK frames,
              *    otherwise we will never catch up
              * 2) Process frames until the guest has requested an irq (IOC)
              */
-            if (i >= MIN_FR_PER_TICK) {
+            if (i >= MIN_UFR_PER_TICK) {
                 ehci_commit_irq(ehci);
                 if ((ehci->usbsts & USBINTR_MASK) & ehci->usbintr) {
                     break;
@@ -2288,13 +2298,15 @@  static void ehci_frame_timer(void *opaque)
                 ehci->periodic_sched_active--;
             }
             ehci_update_frindex(ehci, 1);
-            ehci_advance_periodic_state(ehci);
-            ehci->last_run_ns += FRAME_TIMER_NS;
+            if ((ehci->ufrindex & 7) == 0) {
+                ehci_advance_periodic_state(ehci);
+            }
+            ehci->last_run_ns += UFRAME_TIMER_NS;
         }
     } else {
         ehci->periodic_sched_active = 0;
-        ehci_update_frindex(ehci, frames);
-        ehci->last_run_ns += FRAME_TIMER_NS * frames;
+        ehci_update_frindex(ehci, uframes);
+        ehci->last_run_ns += UFRAME_TIMER_NS * uframes;
     }
 
     if (ehci->periodic_sched_active) {
@@ -2373,6 +2385,14 @@  static USBBusOps ehci_bus_ops = {
     .wakeup_endpoint = ehci_wakeup_endpoint,
 };
 
+static void usb_ehci_pre_save(void *opaque)
+{
+    EHCIState *ehci = opaque;
+
+    ehci->last_run_ns -= (ehci->ufrindex - ehci->frindex) * UFRAME_TIMER_NS;
+    ehci->ufrindex = ehci->frindex;
+}
+
 static int usb_ehci_post_load(void *opaque, int version_id)
 {
     EHCIState *s = opaque;
@@ -2423,6 +2443,7 @@  const VMStateDescription vmstate_ehci = {
     .name        = "ehci-core",
     .version_id  = 2,
     .minimum_version_id  = 1,
+    .pre_save    = usb_ehci_pre_save,
     .post_load   = usb_ehci_post_load,
     .fields      = (VMStateField[]) {
         /* mmio registers */
diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h
index 3c888a6..d8fad41 100644
--- a/hw/usb/hcd-ehci.h
+++ b/hw/usb/hcd-ehci.h
@@ -300,6 +300,9 @@  struct EHCIState {
     USBPort *companion_ports[NB_PORTS];
     uint32_t usbsts_pending;
     uint32_t usbsts_frindex;
+    /* frindex shadow in uframes precision, frindex moves in steps of 8
+     * (so whole frames) to avoid it going backwards when migrating */
+    uint32_t ufrindex;
     EHCIQueueHead aqueues;
     EHCIQueueHead pqueues;