Patchwork [V4,10/12] SD card users: optimize access to SDClass methods

login
register
mail settings
Submitter Mitsyanko Igor
Date July 27, 2012, 7:29 p.m.
Message ID <1343417387-13953-11-git-send-email-i.mitsyanko@samsung.com>
Download mbox | patch
Permalink /patch/173778/
State New
Headers show

Comments

Mitsyanko Igor - July 27, 2012, 7:29 p.m.
Rather that repeatedly call SD_GET_CLASS() in a loop, call it once before
a loop starts.

Signed-off-by: Igor Mitsyanko <i.mitsyanko@samsung.com>
---
 hw/omap_mmc.c    |    9 +++++----
 hw/pl181.c       |    7 ++++---
 hw/pxa2xx_mmci.c |    6 ++++--
 3 files changed, 13 insertions(+), 9 deletions(-)
Peter Maydell - July 31, 2012, 3:43 p.m.
On 27 July 2012 20:29, Igor Mitsyanko <i.mitsyanko@samsung.com> wrote:
> Rather that repeatedly call SD_GET_CLASS() in a loop, call it once before
> a loop starts.

Anthony claims that SD_GET_CLASS should be cheap enough that we don't
need to hoist it out of loops like this. Do you have profiling data
or similar that caused you to write this patch?

-- PMM
Mitsyanko Igor - July 31, 2012, 5:33 p.m.
On 07/31/2012 07:43 PM, Peter Maydell wrote:
> On 27 July 2012 20:29, Igor Mitsyanko <i.mitsyanko@samsung.com> wrote:
>> Rather that repeatedly call SD_GET_CLASS() in a loop, call it once before
>> a loop starts.
> Anthony claims that SD_GET_CLASS should be cheap enough that we don't
> need to hoist it out of loops like this. Do you have profiling data
> or similar that caused you to write this patch?
>
> -- PMM
>
Well, I've tested it by measuring an execution time of a 4Kb write to SD 
card, results showed that arithmetic mean of time for one 4k write was 
less by ~300us in sequence with SD_GET_CLASS extracted from the loop. 
Although I ran this test several times, I have little faith in test 
methodology and results, it obviously showed significant dispersion 
between measured time of distinct 4K writes (200-300% if I recall 
correctly). I really have no objection no objection to dropping this patch.
Peter Maydell - July 31, 2012, 5:47 p.m.
On 31 July 2012 18:33, Igor Mitsyanko <i.mitsyanko@samsung.com> wrote:
> On 07/31/2012 07:43 PM, Peter Maydell wrote:
>> Anthony claims that SD_GET_CLASS should be cheap enough that we don't
>> need to hoist it out of loops like this. Do you have profiling data
>> or similar that caused you to write this patch?

> Well, I've tested it by measuring an execution time of a 4Kb write to SD
> card, results showed that arithmetic mean of time for one 4k write was less
> by ~300us in sequence with SD_GET_CLASS extracted from the loop.

How much is that as a % of the total time for the write ?

-- PMM
Mitsyanko Igor - July 31, 2012, 6:03 p.m.
On 07/31/2012 09:47 PM, Peter Maydell wrote:
> On 31 July 2012 18:33, Igor Mitsyanko <i.mitsyanko@samsung.com> wrote:
>> On 07/31/2012 07:43 PM, Peter Maydell wrote:
>>> Anthony claims that SD_GET_CLASS should be cheap enough that we don't
>>> need to hoist it out of loops like this. Do you have profiling data
>>> or similar that caused you to write this patch?
>> Well, I've tested it by measuring an execution time of a 4Kb write to SD
>> card, results showed that arithmetic mean of time for one 4k write was less
>> by ~300us in sequence with SD_GET_CLASS extracted from the loop.
> How much is that as a % of the total time for the write ?
>
> -- PMM
>
total write time was ~3.5-4.0 ms I think, so its ~0.08%
Anthony Liguori - July 31, 2012, 6:15 p.m.
Igor Mitsyanko <i.mitsyanko@samsung.com> writes:

> On 07/31/2012 07:43 PM, Peter Maydell wrote:
>> On 27 July 2012 20:29, Igor Mitsyanko <i.mitsyanko@samsung.com> wrote:
>>> Rather that repeatedly call SD_GET_CLASS() in a loop, call it once before
>>> a loop starts.
>> Anthony claims that SD_GET_CLASS should be cheap enough that we don't
>> need to hoist it out of loops like this. Do you have profiling data
>> or similar that caused you to write this patch?
>>
>> -- PMM
>>
> Well, I've tested it by measuring an execution time of a 4Kb write to SD 
> card, results showed that arithmetic mean of time for one 4k write was 
> less by ~300us in sequence with SD_GET_CLASS extracted from the loop.

How many loop iterations that?  300us is a huge amount of time, unless
you were looping on every byte, I have a hard time understanding that
delta.

Regards,

Anthony Liguori

> Although I ran this test several times, I have little faith in test 
> methodology and results, it obviously showed significant dispersion 
> between measured time of distinct 4K writes (200-300% if I recall 
> correctly). I really have no objection no objection to dropping this patch.
Mitsyanko Igor - July 31, 2012, 6:33 p.m.
On 07/31/2012 10:15 PM, Anthony Liguori wrote:
> How many loop iterations that?  300us is a huge amount of time, unless
> you were looping on every byte, I have a hard time understanding that
> delta.

I'm not sure I understand what you mean, I did loop on every byte, 
that's how our SD model works. So, for 4K bytes qemu have to call 
SD_GET_CLASS >4K times.
Or you're asking about write sequence length? It was 10 writes of 4096 
bytes (not statistically reliable, I have to agree).

Patch

diff --git a/hw/omap_mmc.c b/hw/omap_mmc.c
index aec0285..8ceb25b 100644
--- a/hw/omap_mmc.c
+++ b/hw/omap_mmc.c
@@ -219,6 +219,7 @@  static void omap_mmc_command(struct omap_mmc_s *host, int cmd, int dir,
 
 static void omap_mmc_transfer(struct omap_mmc_s *host)
 {
+    SDClass *sd_class = SD_GET_CLASS(host->card);
     uint8_t value;
 
     if (!host->transfer)
@@ -229,10 +230,10 @@  static void omap_mmc_transfer(struct omap_mmc_s *host)
             if (host->fifo_len > host->af_level)
                 break;
 
-            value = sd_read_data(host->card);
+            value = sd_class->read_data(host->card);
             host->fifo[(host->fifo_start + host->fifo_len) & 31] = value;
             if (-- host->blen_counter) {
-                value = sd_read_data(host->card);
+                value = sd_class->read_data(host->card);
                 host->fifo[(host->fifo_start + host->fifo_len) & 31] |=
                         value << 8;
                 host->blen_counter --;
@@ -244,10 +245,10 @@  static void omap_mmc_transfer(struct omap_mmc_s *host)
                 break;
 
             value = host->fifo[host->fifo_start] & 0xff;
-            sd_write_data(host->card, value);
+            sd_class->write_data(host->card, value);
             if (-- host->blen_counter) {
                 value = host->fifo[host->fifo_start] >> 8;
-                sd_write_data(host->card, value);
+                sd_class->write_data(host->card, value);
                 host->blen_counter --;
             }
 
diff --git a/hw/pl181.c b/hw/pl181.c
index 7d91fbb..8cd898c 100644
--- a/hw/pl181.c
+++ b/hw/pl181.c
@@ -209,18 +209,19 @@  error:
 
 static void pl181_fifo_run(pl181_state *s)
 {
+    SDClass *sd_class = SD_GET_CLASS(s->card);
     uint32_t bits;
     uint32_t value = 0;
     int n;
     int is_read;
 
     is_read = (s->datactrl & PL181_DATA_DIRECTION) != 0;
-    if (s->datacnt != 0 && (!is_read || sd_data_ready(s->card))
+    if (s->datacnt != 0 && (!is_read || sd_class->data_ready(s->card))
             && !s->linux_hack) {
         if (is_read) {
             n = 0;
             while (s->datacnt && s->fifo_len < PL181_FIFO_LEN) {
-                value |= (uint32_t)sd_read_data(s->card) << (n * 8);
+                value |= (uint32_t)sd_class->read_data(s->card) << (n * 8);
                 s->datacnt--;
                 n++;
                 if (n == 4) {
@@ -241,7 +242,7 @@  static void pl181_fifo_run(pl181_state *s)
                 }
                 n--;
                 s->datacnt--;
-                sd_write_data(s->card, value & 0xff);
+                sd_class->write_data(s->card, value & 0xff);
                 value >>= 8;
             }
         }
diff --git a/hw/pxa2xx_mmci.c b/hw/pxa2xx_mmci.c
index b505a4c..af19c36 100644
--- a/hw/pxa2xx_mmci.c
+++ b/hw/pxa2xx_mmci.c
@@ -117,12 +117,14 @@  static void pxa2xx_mmci_int_update(PXA2xxMMCIState *s)
 
 static void pxa2xx_mmci_fifo_update(PXA2xxMMCIState *s)
 {
+    SDClass *sd_class = SD_GET_CLASS(s->card);
+
     if (!s->active)
         return;
 
     if (s->cmdat & CMDAT_WR_RD) {
         while (s->bytesleft && s->tx_len) {
-            sd_write_data(s->card, s->tx_fifo[s->tx_start ++]);
+            sd_class->write_data(s->card, s->tx_fifo[s->tx_start ++]);
             s->tx_start &= 0x1f;
             s->tx_len --;
             s->bytesleft --;
@@ -132,7 +134,7 @@  static void pxa2xx_mmci_fifo_update(PXA2xxMMCIState *s)
     } else
         while (s->bytesleft && s->rx_len < 32) {
             s->rx_fifo[(s->rx_start + (s->rx_len ++)) & 0x1f] =
-                sd_read_data(s->card);
+                sd_class->read_data(s->card);
             s->bytesleft --;
             s->intreq |= INT_RXFIFO_REQ;
         }