[ovs-dev,1/4] compiler: Introduce OVS_PREFETCH variants.

Message ID 1515778879-60075-1-git-send-email-bhanuprakash.bodireddy@intel.com
State New
Delegated to: Ian Stokes
Headers show
Series
  • [ovs-dev,1/4] compiler: Introduce OVS_PREFETCH variants.
Related show

Commit Message

Bodireddy, Bhanuprakash Jan. 12, 2018, 5:41 p.m.
This commit introduces prefetch variants by using the GCC built-in
prefetch function.

The prefetch variants gives the user better control on designing data
caching strategy in order to increase cache efficiency and minimize
cache pollution. Data reference patterns here can be classified in to

 - Non-temporal(NT) - Data that is referenced once and not reused in
                      immediate future.
 - Temporal         - Data will be used again soon.

The Macro variants can be used where there are
 - Predictable memory access patterns.
 - Execution pipeline can stall if data isn't available.
 - Time consuming loops.

For example:

  OVS_PREFETCH_CACHE(addr, OPCH_LTR)
    - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
    - __builtin_prefetch(addr, 0, 1)
    - Prefetch data in to L3 cache for readonly purpose.

  OVS_PREFETCH_CACHE(addr, OPCH_HTW)
    - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
    - __builtin_prefetch(addr, 1, 3)
    - Prefetch data in to all caches in anticipation of write. In doing
      so it invalidates other cached copies so as to gain 'exclusive'
      access.

  OVS_PREFETCH(addr)
    - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
    - __builtin_prefetch(addr, 0, 3)
    - Prefetch data in to all caches in anticipation of read and that
      data will be used again soon (HTR - High Temporal Read).

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
---
 include/openvswitch/compiler.h | 147 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 139 insertions(+), 8 deletions(-)

Comments

Ben Pfaff Jan. 12, 2018, 6:20 p.m. | #1
Hi Bhanu, who do you think should review this series?  Is it something
that Ian should pick up for dpdk_merge?
Bodireddy, Bhanuprakash Jan. 12, 2018, 7:38 p.m. | #2
>-----Original Message-----
>From: Ben Pfaff [mailto:blp@ovn.org]
>Sent: Friday, January 12, 2018 6:20 PM
>To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>
>Cc: dev@openvswitch.org
>Subject: Re: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH
>variants.
>
>Hi Bhanu, who do you think should review this series?  Is it something that Ian
>should pick up for dpdk_merge?

Hi Ben,

I will check with Ian if he has time to review this. As the patch series doesn't
change any functionality at this point it shouldn't take much time.

-Bhanuprakash.
Ben Pfaff Jan. 12, 2018, 9:01 p.m. | #3
On Fri, Jan 12, 2018 at 07:38:49PM +0000, Bodireddy, Bhanuprakash wrote:
> >-----Original Message-----
> >From: Ben Pfaff [mailto:blp@ovn.org]
> >Sent: Friday, January 12, 2018 6:20 PM
> >To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>
> >Cc: dev@openvswitch.org
> >Subject: Re: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH
> >variants.
> >
> >Hi Bhanu, who do you think should review this series?  Is it something that Ian
> >should pick up for dpdk_merge?
> 
> Hi Ben,
> 
> I will check with Ian if he has time to review this. As the patch series doesn't
> change any functionality at this point it shouldn't take much time.

OK.

Let me know if someone else should review the series.

Patch

diff --git a/include/openvswitch/compiler.h b/include/openvswitch/compiler.h
index c7cb930..94bb24d 100644
--- a/include/openvswitch/compiler.h
+++ b/include/openvswitch/compiler.h
@@ -222,18 +222,149 @@ 
     static void f(void)
 #endif
 
-/* OVS_PREFETCH() can be used to instruct the CPU to fetch the cache
- * line containing the given address to a CPU cache.
- * OVS_PREFETCH_WRITE() should be used when the memory is going to be
- * written to.  Depending on the target CPU, this can generate the same
- * instruction as OVS_PREFETCH(), or bring the data into the cache in an
- * exclusive state. */
 #if __GNUC__
-#define OVS_PREFETCH(addr) __builtin_prefetch((addr))
-#define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
+enum cache_locality {
+    NON_TEMPORAL_LOCALITY,
+    LOW_TEMPORAL_LOCALITY,
+    MODERATE_TEMPORAL_LOCALITY,
+    HIGH_TEMPORAL_LOCALITY
+};
+
+enum cache_rw {
+    PREFETCH_READ,
+    PREFETCH_WRITE
+};
+
+/* The prefetch variants gives the user better control on designing data
+ * caching strategy in order to increase cache efficiency and minimize
+ * cache pollution. Data reference patterns here can be classified in to
+ *
+ *   Non-temporal(NT) - Data that is referenced once and not reused in
+ *                      immediate future.
+ *   Temporal         - Data will be used again soon.
+ *
+ * The Macro variants can be used where there are
+ *   o Predictable memory access patterns.
+ *   o Execution pipeline can stall if data isn't available.
+ *   o Time consuming loops.
+ *
+ * OVS_PREFETCH_CACHE() can be used to instruct the CPU to fetch the cache
+ * line containing the given address to a CPU cache. The second argument
+ * OPCH_XXR (or) OPCH_XXW is used to hint if the prefetched data is going
+ * to be read or written to by core.
+ *
+ * Example Usage:
+ *
+ *   OVS_PREFETCH_CACHE(addr, OPCH_LTR)
+ *       - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
+ *       - __builtin_prefetch(addr, 0, 1)
+ *       - Prefetch data in to L3 cache for readonly purpose.
+ *
+ *   OVS_PREFETCH_CACHE(addr, OPCH_HTW)
+ *       - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
+ *       - __builtin_prefetch(addr, 1, 3)
+ *       - Prefetch data in to all caches in anticipation of write. In doing
+ *         so it invalidates other cached copies so as to gain 'exclusive'
+ *         access.
+ *
+ *   OVS_PREFETCH(addr)
+ *       - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
+ *       - __builtin_prefetch(addr, 0, 3)
+ *       - Prefetch data in to all caches in anticipation of read and that
+ *         data will be used again soon (HTR - High Temporal Read).
+ *
+ * Implementation details of prefetch hint instructions may vary across
+ * different processors and microarchitectures.
+ *
+ * OPCH_NTW, OPCH_LTW, OPCH_MTW uses prefetchwt1 instruction and OPCH_HTW
+ * uses prefetchw instruction when available. Refer Documentation on how
+ * to enable prefetchwt1 instruction.
+ *
+ * PREFETCH HINT    Instruction     GCC builtin function
+ * -------------------------------------------------------
+ *   OPCH_NTR       prefetchnta  __builtin_prefetch(a, 0, 0)
+ *   OPCH_LTR       prefetcht2   __builtin_prefetch(a, 0, 1)
+ *   OPCH_MTR       prefetcht1   __builtin_prefetch(a, 0, 2)
+ *   OPCH_HTR       prefetcht0   __builtin_prefetch(a, 0, 3)
+ *
+ *   OPCH_NTW       prefetchwt1  __builtin_prefetch(a, 1, 0)
+ *   OPCH_LTW       prefetchwt1  __builtin_prefetch(a, 1, 1)
+ *   OPCH_MTW       prefetchwt1  __builtin_prefetch(a, 1, 2)
+ *   OPCH_HTW       prefetchw    __builtin_prefetch(a, 1, 3)
+ *
+ * */
+#define OVS_PREFETCH_CACHE_HINT                                             \
+    OPCH(OPCH_NTR, PREFETCH_READ, NON_TEMPORAL_LOCALITY,                    \
+         "Fetch data to non-temporal cache close to processor"              \
+         "to minimize cache pollution")                                     \
+    OPCH(OPCH_LTR, PREFETCH_READ, LOW_TEMPORAL_LOCALITY,                    \
+         "Fetch data to L2 and L3 cache")                                   \
+    OPCH(OPCH_MTR, PREFETCH_READ, MODERATE_TEMPORAL_LOCALITY,               \
+         "Fetch data to L2 and L3 caches, same as LTR on"                   \
+         "Nehalem, Westmere, Sandy Bridge and newer microarchitectures")    \
+    OPCH(OPCH_HTR, PREFETCH_READ, HIGH_TEMPORAL_LOCALITY,                   \
+         "Fetch data in to all cache levels L1, L2 and L3")                 \
+    OPCH(OPCH_NTW, PREFETCH_WRITE, NON_TEMPORAL_LOCALITY,                   \
+         "Fetch data to L2 and L3 cache in exclusive state"                 \
+         "in anticipation of write")                                        \
+    OPCH(OPCH_LTW, PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY,                   \
+         "Fetch data to L2 and L3 cache in exclusive state")                \
+    OPCH(OPCH_MTW, PREFETCH_WRITE, MODERATE_TEMPORAL_LOCALITY,              \
+         "Fetch data in to L2 and L3 caches in exclusive state")            \
+    OPCH(OPCH_HTW, PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY,                  \
+         "Fetch data in to all cache levels in exclusive state")
+
+/* Indexes for cache prefetch types. */
+enum {
+#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM##_INDEX,
+    OVS_PREFETCH_CACHE_HINT
+#undef OPCH
+};
+
+/* Cache prefetch types. */
+enum ovs_prefetch_type {
+#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM = 1 << ENUM##_INDEX,
+    OVS_PREFETCH_CACHE_HINT
+#undef OPCH
+};
+
+#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE)                           \
+{                                                                             \
+    case OPCH_NTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ, NON_TEMPORAL_LOCALITY);     \
+        break;                                                                \
+    case OPCH_LTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ, LOW_TEMPORAL_LOCALITY);     \
+        break;                                                                \
+    case OPCH_MTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ,                             \
+                           MODERATE_TEMPORAL_LOCALITY);                       \
+        break;                                                                \
+    case OPCH_HTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ, HIGH_TEMPORAL_LOCALITY);    \
+        break;                                                                \
+    case OPCH_NTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE, NON_TEMPORAL_LOCALITY);    \
+        break;                                                                \
+    case OPCH_LTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY);    \
+        break;                                                                \
+    case OPCH_MTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE,                            \
+                           MODERATE_TEMPORAL_LOCALITY);                       \
+        break;                                                                \
+    case OPCH_HTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY);   \
+        break;                                                                \
+}
+
+/* Retain this for backward compatibility. */
+#define OVS_PREFETCH(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTR)
+#define OVS_PREFETCH_WRITE(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTW)
 #else
 #define OVS_PREFETCH(addr)
 #define OVS_PREFETCH_WRITE(addr)
+#define OVS_PREFETCH_CACHE(addr, OP)
 #endif
 
 /* Build assertions.