diff mbox

[ovs-dev,v3,06/12] cmap: Remove prefetching in cmap_find_batch().

Message ID 1476455835-77641-7-git-send-email-bhanuprakash.bodireddy@intel.com
State Superseded
Delegated to: Daniele Di Proietto
Headers show

Commit Message

Bodireddy, Bhanuprakash Oct. 14, 2016, 2:37 p.m. UTC
prefetching the data in to the caches isn't improving the performance in
cmap_find_batch(). Moreover its found that there is slight improvement
in performance with out prefetching.

This patch removes prefetching from cmap_find_batch().

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com>
Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
---
 lib/cmap.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Comments

Daniele Di Proietto Oct. 18, 2016, 3:07 a.m. UTC | #1
2016-10-14 7:37 GMT-07:00 Bhanuprakash Bodireddy <
bhanuprakash.bodireddy@intel.com>:

> prefetching the data in to the caches isn't improving the performance in
> cmap_find_batch(). Moreover its found that there is slight improvement
> in performance with out prefetching.
>
> This patch removes prefetching from cmap_find_batch().
>
> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
> Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com>
> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>
>

I tested this patch in isolation and on my system I didn't notice any
improvements for a single flow (with EMC disabled), I noticed a slight drop
instead with 128 flows in the classifier.

Probably this is due to the fact that I didn't apply yet the first patch of
the series (the one that increases the batch to 32), so I guess I'll defer
this patch until we can apply the rest of the series.

Also, if you guys see an improvement (and since you got some evidence with
VTune), I don't think it matters that on one particular system (mine) I
can't see any benefit.

Thanks,

Daniele
Bodireddy, Bhanuprakash Oct. 18, 2016, 4:20 p.m. UTC | #2
>-----Original Message-----

>From: Daniele Di Proietto [mailto:diproiettod@ovn.org]

>Sent: Tuesday, October 18, 2016 4:07 AM

>To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>

>Cc: dev@openvswitch.org

>Subject: Re: [ovs-dev] [PATCH v3 06/12] cmap: Remove prefetching in

>cmap_find_batch().

>

>

>

>2016-10-14 7:37 GMT-07:00 Bhanuprakash Bodireddy

><bhanuprakash.bodireddy@intel.com>:

>prefetching the data in to the caches isn't improving the performance in

>cmap_find_batch(). Moreover its found that there is slight improvement

>in performance with out prefetching.

>

>This patch removes prefetching from cmap_find_batch().

>

>Signed-off-by: Bhanuprakash Bodireddy

><bhanuprakash.bodireddy@intel.com>

>Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com>

>Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com>

>

>I tested this patch in isolation and on my system I didn't notice any

>improvements for a single flow (with EMC disabled), I noticed a slight drop

>instead with 128 flows in the classifier.

>Probably this is due to the fact that I didn't apply yet the first patch of the

>series (the one that increases the batch to 32), so I guess I'll defer this patch

>until we can apply the rest of the series.

>Also, if you guys see an improvement (and since you got some evidence with

>VTune), I don't think it matters that on one particular system (mine) I can't see

>any benefit.

I am testing this on haswell and VTune confirmed our observation. Also prefetching
Is done at 4 places in cmap_find_batch() and at two places the prefetching is done just before
the data is accessed.  
As prefetch instruction has some overhead, prefetching should be done well enough
in advance to have performance gains. Also prefetching too earlier can has negative effect
as the prefetched data can be flushed by other access.  We played around a bit and found
removing the prefetching doesn't impact the performance and hence submitted this patch.

Regards,
Bhanu Prakash. 

>

>Thanks,

>Daniele

>

>
diff mbox

Patch

diff --git a/lib/cmap.c b/lib/cmap.c
index 8c7312d..8097b56 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -393,11 +393,10 @@  cmap_find_batch(const struct cmap *cmap, unsigned long map,
     const struct cmap_bucket *b2s[sizeof map * CHAR_BIT];
     uint32_t c1s[sizeof map * CHAR_BIT];
 
-    /* Compute hashes and prefetch 1st buckets. */
+    /* Compute hashes. */
     ULLONG_FOR_EACH_1(i, map) {
         h1s[i] = rehash(impl, hashes[i]);
         b1s[i] = &impl->buckets[h1s[i] & impl->mask];
-        OVS_PREFETCH(b1s[i]);
     }
     /* Lookups, Round 1. Only look up at the first bucket. */
     ULLONG_FOR_EACH_1(i, map) {
@@ -411,15 +410,13 @@  cmap_find_batch(const struct cmap *cmap, unsigned long map,
         } while (OVS_UNLIKELY(counter_changed(b1, c1)));
 
         if (!node) {
-            /* Not found (yet); Prefetch the 2nd bucket. */
+            /* Not found (yet). */
             b2s[i] = &impl->buckets[other_hash(h1s[i]) & impl->mask];
-            OVS_PREFETCH(b2s[i]);
             c1s[i] = c1; /* We may need to check this after Round 2. */
             continue;
         }
         /* Found. */
         ULLONG_SET0(map, i); /* Ignore this on round 2. */
-        OVS_PREFETCH(node);
         nodes[i] = node;
     }
     /* Round 2. Look into the 2nd bucket, if needed. */
@@ -453,7 +450,6 @@  cmap_find_batch(const struct cmap *cmap, unsigned long map,
             continue;
         }
 found:
-        OVS_PREFETCH(node);
         nodes[i] = node;
     }
     return result;