[ovs-dev,RFC,1/5] compiler: Introduce OVS_PREFETCH variants.

Message ID 1512418610-84032-1-git-send-email-bhanuprakash.bodireddy@intel.com
State New
Headers show
Series
  • [ovs-dev,RFC,1/5] compiler: Introduce OVS_PREFETCH variants.
Related show

Commit Message

Bodireddy, Bhanuprakash Dec. 4, 2017, 8:16 p.m.
This commit introduces prefetch variants by using the GCC built-in
prefetch function.

The prefetch variants gives the user better control on designing data
caching strategy in order to increase cache efficiency and minimize
cache pollution. Data reference patterns here can be classified in to

 - Non-temporal(NT) - Data that is referenced once and not reused in
                      immediate future.
 - Temporal         - Data will be used again soon.

The Macro variants can be used where there are
 - Predictable memory access patterns.
 - Execution pipeline can stall if data isn't available.
 - Time consuming loops.

For example:

  OVS_PREFETCH_CACHE(addr, OPCH_LTR)
    - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
    - __builtin_prefetch(addr, 0, 1)
    - Prefetch data in to L3 cache for readonly purpose.

  OVS_PREFETCH_CACHE(addr, OPCH_HTW)
    - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
    - __builtin_prefetch(addr, 1, 3)
    - Prefetch data in to all caches in anticipation of write. In doing
      so it invalidates other cached copies so as to gain 'exclusive'
      access.

  OVS_PREFETCH(addr)
    - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
    - __builtin_prefetch(addr, 0, 3)
    - Prefetch data in to all caches in anticipation of read and that
      data will be used again soon (HTR - High Temporal Read).

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
---
 include/openvswitch/compiler.h | 90 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 87 insertions(+), 3 deletions(-)

Comments

Ben Pfaff Dec. 4, 2017, 8:31 p.m. | #1
On Mon, Dec 04, 2017 at 08:16:46PM +0000, Bhanuprakash Bodireddy wrote:
> This commit introduces prefetch variants by using the GCC built-in
> prefetch function.
> 
> The prefetch variants gives the user better control on designing data
> caching strategy in order to increase cache efficiency and minimize
> cache pollution. Data reference patterns here can be classified in to
> 
>  - Non-temporal(NT) - Data that is referenced once and not reused in
>                       immediate future.
>  - Temporal         - Data will be used again soon.
> 
> The Macro variants can be used where there are
>  - Predictable memory access patterns.
>  - Execution pipeline can stall if data isn't available.
>  - Time consuming loops.
> 
> For example:
> 
>   OVS_PREFETCH_CACHE(addr, OPCH_LTR)
>     - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
>     - __builtin_prefetch(addr, 0, 1)
>     - Prefetch data in to L3 cache for readonly purpose.
> 
>   OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>     - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
>     - __builtin_prefetch(addr, 1, 3)
>     - Prefetch data in to all caches in anticipation of write. In doing
>       so it invalidates other cached copies so as to gain 'exclusive'
>       access.
> 
>   OVS_PREFETCH(addr)
>     - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
>     - __builtin_prefetch(addr, 0, 3)
>     - Prefetch data in to all caches in anticipation of read and that
>       data will be used again soon (HTR - High Temporal Read).
> 
> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>

The information in this commit message seems like it could also be
useful as part of a code comment.

I didn't review the details of the patch.  I will leave that for others.
Bodireddy, Bhanuprakash Dec. 4, 2017, 8:43 p.m. | #2
Hi Ben,

>On Mon, Dec 04, 2017 at 08:16:46PM +0000, Bhanuprakash Bodireddy wrote:
>> This commit introduces prefetch variants by using the GCC built-in
>> prefetch function.
>>
>> The prefetch variants gives the user better control on designing data
>> caching strategy in order to increase cache efficiency and minimize
>> cache pollution. Data reference patterns here can be classified in to
>>
>>  - Non-temporal(NT) - Data that is referenced once and not reused in
>>                       immediate future.
>>  - Temporal         - Data will be used again soon.
>>
>> The Macro variants can be used where there are
>>  - Predictable memory access patterns.
>>  - Execution pipeline can stall if data isn't available.
>>  - Time consuming loops.
>>
>> For example:
>>
>>   OVS_PREFETCH_CACHE(addr, OPCH_LTR)
>>     - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
>>     - __builtin_prefetch(addr, 0, 1)
>>     - Prefetch data in to L3 cache for readonly purpose.
>>
>>   OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>>     - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
>>     - __builtin_prefetch(addr, 1, 3)
>>     - Prefetch data in to all caches in anticipation of write. In doing
>>       so it invalidates other cached copies so as to gain 'exclusive'
>>       access.
>>
>>   OVS_PREFETCH(addr)
>>     - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
>>     - __builtin_prefetch(addr, 0, 3)
>>     - Prefetch data in to all caches in anticipation of read and that
>>       data will be used again soon (HTR - High Temporal Read).
>>
>> Signed-off-by: Bhanuprakash Bodireddy
>> <bhanuprakash.bodireddy@intel.com>
>
>The information in this commit message seems like it could also be useful as
>part of a code comment.

This makes sense and I can include this in the code comments with some examples of usage.

- Bhanuprakash.

Patch

diff --git a/include/openvswitch/compiler.h b/include/openvswitch/compiler.h
index c7cb930..5d5553a 100644
--- a/include/openvswitch/compiler.h
+++ b/include/openvswitch/compiler.h
@@ -229,11 +229,95 @@ 
  * instruction as OVS_PREFETCH(), or bring the data into the cache in an
  * exclusive state. */
 #if __GNUC__
-#define OVS_PREFETCH(addr) __builtin_prefetch((addr))
-#define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
+enum cache_locality {
+    NON_TEMPORAL_LOCALITY,
+    LOW_TEMPORAL_LOCALITY,
+    MODERATE_TEMPORAL_LOCALITY,
+    HIGH_TEMPORAL_LOCALITY
+};
+
+enum cache_rw {
+    PREFETCH_READ,
+    PREFETCH_WRITE
+};
+
+/* Implementation details of prefetch hint instructions may vary across
+ * different processors and microarchitectures.
+ *
+ * OPCH_NTW, OPCH_LTW, OPCH_MTW uses prefetchwt1 instruction and OPCH_HTW
+ * uses prefetchw instruction when available.
+ * */
+#define OVS_PREFETCH_CACHE_HINT                                             \
+    OPCH(OPCH_NTR, PREFETCH_READ, NON_TEMPORAL_LOCALITY,                    \
+         "Fetch data to non-temporal cache to minimize cache pollution")    \
+    OPCH(OPCH_LTR, PREFETCH_READ, LOW_TEMPORAL_LOCALITY,                    \
+         "Fetch data to L2 and L3 cache")                                   \
+    OPCH(OPCH_MTR, PREFETCH_READ, MODERATE_TEMPORAL_LOCALITY,               \
+         "Fetch data to L2 and L3 caches, same as LTR on"                   \
+         "Nehalem, Westmere, Sandy Bridge and newer microarchitectures")    \
+    OPCH(OPCH_HTR, PREFETCH_READ, HIGH_TEMPORAL_LOCALITY,                   \
+         "Fetch data in to all cache levels L1, L2 and L3")                 \
+    OPCH(OPCH_NTW, PREFETCH_WRITE, NON_TEMPORAL_LOCALITY,                   \
+         "Fetch data to L2, and L3 cache in exclusive state"                \
+         "in anticipation of write")                                        \
+    OPCH(OPCH_LTW, PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY,                   \
+         "Fetch data to L2, and L3 cache in exclusive state")               \
+    OPCH(OPCH_MTW, PREFETCH_WRITE, MODERATE_TEMPORAL_LOCALITY,              \
+         "Fetch data in to L2 and L3 caches in exclusive state")            \
+    OPCH(OPCH_HTW, PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY,                  \
+         "Fetch data in to all cache levels in exclusive state")
+
+/* Indexes for cache prefetch types. */
+enum {
+#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM##_INDEX,
+    OVS_PREFETCH_CACHE_HINT
+#undef OPCH
+};
+
+/* Cache prefetch types. */
+enum ovs_prefetch_type {
+#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM = 1 << ENUM##_INDEX,
+    OVS_PREFETCH_CACHE_HINT
+#undef OPCH
+};
+
+#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE)                           \
+{                                                                             \
+    case OPCH_NTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ, NON_TEMPORAL_LOCALITY);     \
+        break;                                                                \
+    case OPCH_LTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ, LOW_TEMPORAL_LOCALITY);     \
+        break;                                                                \
+    case OPCH_MTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ,                             \
+                           MODERATE_TEMPORAL_LOCALITY);                       \
+        break;                                                                \
+    case OPCH_HTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ, HIGH_TEMPORAL_LOCALITY);    \
+        break;                                                                \
+    case OPCH_NTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE, NON_TEMPORAL_LOCALITY);    \
+        break;                                                                \
+    case OPCH_LTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY);    \
+        break;                                                                \
+    case OPCH_MTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE,                            \
+                           MODERATE_TEMPORAL_LOCALITY);                       \
+        break;                                                                \
+    case OPCH_HTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY);   \
+        break;                                                                \
+                                                                              \
+}
+
+/* Retain this for compatibility. */
+#define OVS_PREFETCH(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTR)
+#define OVS_PREFETCH_WRITE(addr) OVS_PREFETCH_CACHE(addr, OP)
 #else
 #define OVS_PREFETCH(addr)
-#define OVS_PREFETCH_WRITE(addr)
+#define OVS_PREFETCH_CACHE(addr, OP)
 #endif
 
 /* Build assertions.