[ovs-dev,1/4] compiler: Introduce OVS_PREFETCH variants.

Message ID 1515778879-60075-1-git-send-email-bhanuprakash.bodireddy@intel.com
State Changes Requested
Delegated to: Ian Stokes
Headers show
Series
  • [ovs-dev,1/4] compiler: Introduce OVS_PREFETCH variants.
Related show

Commit Message

Bodireddy, Bhanuprakash Jan. 12, 2018, 5:41 p.m.
This commit introduces prefetch variants by using the GCC built-in
prefetch function.

The prefetch variants gives the user better control on designing data
caching strategy in order to increase cache efficiency and minimize
cache pollution. Data reference patterns here can be classified in to

 - Non-temporal(NT) - Data that is referenced once and not reused in
                      immediate future.
 - Temporal         - Data will be used again soon.

The Macro variants can be used where there are
 - Predictable memory access patterns.
 - Execution pipeline can stall if data isn't available.
 - Time consuming loops.

For example:

  OVS_PREFETCH_CACHE(addr, OPCH_LTR)
    - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
    - __builtin_prefetch(addr, 0, 1)
    - Prefetch data in to L3 cache for readonly purpose.

  OVS_PREFETCH_CACHE(addr, OPCH_HTW)
    - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
    - __builtin_prefetch(addr, 1, 3)
    - Prefetch data in to all caches in anticipation of write. In doing
      so it invalidates other cached copies so as to gain 'exclusive'
      access.

  OVS_PREFETCH(addr)
    - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
    - __builtin_prefetch(addr, 0, 3)
    - Prefetch data in to all caches in anticipation of read and that
      data will be used again soon (HTR - High Temporal Read).

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
---
 include/openvswitch/compiler.h | 147 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 139 insertions(+), 8 deletions(-)

Comments

Ben Pfaff Jan. 12, 2018, 6:20 p.m. | #1
Hi Bhanu, who do you think should review this series?  Is it something
that Ian should pick up for dpdk_merge?
Bodireddy, Bhanuprakash Jan. 12, 2018, 7:38 p.m. | #2
>-----Original Message-----
>From: Ben Pfaff [mailto:blp@ovn.org]
>Sent: Friday, January 12, 2018 6:20 PM
>To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>
>Cc: dev@openvswitch.org
>Subject: Re: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH
>variants.
>
>Hi Bhanu, who do you think should review this series?  Is it something that Ian
>should pick up for dpdk_merge?

Hi Ben,

I will check with Ian if he has time to review this. As the patch series doesn't
change any functionality at this point it shouldn't take much time.

-Bhanuprakash.
Ben Pfaff Jan. 12, 2018, 9:01 p.m. | #3
On Fri, Jan 12, 2018 at 07:38:49PM +0000, Bodireddy, Bhanuprakash wrote:
> >-----Original Message-----
> >From: Ben Pfaff [mailto:blp@ovn.org]
> >Sent: Friday, January 12, 2018 6:20 PM
> >To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>
> >Cc: dev@openvswitch.org
> >Subject: Re: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH
> >variants.
> >
> >Hi Bhanu, who do you think should review this series?  Is it something that Ian
> >should pick up for dpdk_merge?
> 
> Hi Ben,
> 
> I will check with Ian if he has time to review this. As the patch series doesn't
> change any functionality at this point it shouldn't take much time.

OK.

Let me know if someone else should review the series.
Stokes, Ian March 13, 2018, 10:37 a.m. | #4
> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> bounces@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
> Sent: Friday, January 12, 2018 5:41 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH variants.
> 
> This commit introduces prefetch variants by using the GCC built-in
> prefetch function.
> 
> The prefetch variants gives the user better control on designing data
> caching strategy in order to increase cache efficiency and minimize cache
> pollution. Data reference patterns here can be classified in to
> 
>  - Non-temporal(NT) - Data that is referenced once and not reused in
>                       immediate future.
>  - Temporal         - Data will be used again soon.
> 
> The Macro variants can be used where there are
>  - Predictable memory access patterns.
>  - Execution pipeline can stall if data isn't available.
>  - Time consuming loops.
> 
> For example:
> 
>   OVS_PREFETCH_CACHE(addr, OPCH_LTR)
>     - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
>     - __builtin_prefetch(addr, 0, 1)
>     - Prefetch data in to L3 cache for readonly purpose.
> 
>   OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>     - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
>     - __builtin_prefetch(addr, 1, 3)
>     - Prefetch data in to all caches in anticipation of write. In doing
>       so it invalidates other cached copies so as to gain 'exclusive'
>       access.
> 
>   OVS_PREFETCH(addr)
>     - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
>     - __builtin_prefetch(addr, 0, 3)
>     - Prefetch data in to all caches in anticipation of read and that
>       data will be used again soon (HTR - High Temporal Read).
> 
> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
> ---
>  include/openvswitch/compiler.h | 147
> ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 139 insertions(+), 8 deletions(-)
> 
> diff --git a/include/openvswitch/compiler.h
> b/include/openvswitch/compiler.h index c7cb930..94bb24d 100644
> --- a/include/openvswitch/compiler.h
> +++ b/include/openvswitch/compiler.h
> @@ -222,18 +222,149 @@
>      static void f(void)
>  #endif
> 
> -/* OVS_PREFETCH() can be used to instruct the CPU to fetch the cache
> - * line containing the given address to a CPU cache.
> - * OVS_PREFETCH_WRITE() should be used when the memory is going to be
> - * written to.  Depending on the target CPU, this can generate the same
> - * instruction as OVS_PREFETCH(), or bring the data into the cache in an
> - * exclusive state. */
>  #if __GNUC__
> -#define OVS_PREFETCH(addr) __builtin_prefetch((addr)) -#define
> OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
> +enum cache_locality {
> +    NON_TEMPORAL_LOCALITY,
> +    LOW_TEMPORAL_LOCALITY,
> +    MODERATE_TEMPORAL_LOCALITY,
> +    HIGH_TEMPORAL_LOCALITY
> +};
> +
> +enum cache_rw {
> +    PREFETCH_READ,
> +    PREFETCH_WRITE
> +};
> +
> +/* The prefetch variants gives the user better control on designing
> +data
> + * caching strategy in order to increase cache efficiency and minimize
> + * cache pollution. Data reference patterns here can be classified in
> +to
> + *
> + *   Non-temporal(NT) - Data that is referenced once and not reused in
> + *                      immediate future.
> + *   Temporal         - Data will be used again soon.
> + *
> + * The Macro variants can be used where there are
> + *   o Predictable memory access patterns.
> + *   o Execution pipeline can stall if data isn't available.
> + *   o Time consuming loops.
> + *
> + * OVS_PREFETCH_CACHE() can be used to instruct the CPU to fetch the
> +cache
> + * line containing the given address to a CPU cache. The second
> +argument
> + * OPCH_XXR (or) OPCH_XXW is used to hint if the prefetched data is
> +going
> + * to be read or written to by core.
> + *
> + * Example Usage:
> + *
> + *   OVS_PREFETCH_CACHE(addr, OPCH_LTR)
> + *       - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
> + *       - __builtin_prefetch(addr, 0, 1)
> + *       - Prefetch data in to L3 cache for readonly purpose.
> + *
> + *   OVS_PREFETCH_CACHE(addr, OPCH_HTW)
> + *       - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
> + *       - __builtin_prefetch(addr, 1, 3)
> + *       - Prefetch data in to all caches in anticipation of write. In
> doing
> + *         so it invalidates other cached copies so as to gain
> 'exclusive'
> + *         access.
> + *
> + *   OVS_PREFETCH(addr)
> + *       - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
> + *       - __builtin_prefetch(addr, 0, 3)
> + *       - Prefetch data in to all caches in anticipation of read and
> that
> + *         data will be used again soon (HTR - High Temporal Read).
> + *
> + * Implementation details of prefetch hint instructions may vary across
> + * different processors and microarchitectures.

Herein lies a potential problem, have you tested this on systems that have different interpretations of the prefetch hints? What about systems that don't support it?

In some cases OVS will be compiled on one system but then deployed on another, they might not be the same HW platform. What happens in that case?

Will it behave as expected i.e. similar fashion to how prefetch currently behaves?

> + *
> + * OPCH_NTW, OPCH_LTW, OPCH_MTW uses prefetchwt1 instruction and
> +OPCH_HTW
> + * uses prefetchw instruction when available. Refer Documentation on
> +how
> + * to enable prefetchwt1 instruction.

Just to clarify, Is it HW documentation for a user's setup they must refer to?
Are there any extra setup steps for compilers etc. for these instructions?

I would expect something like this to be added to the OVS docs.

> + *
> + * PREFETCH HINT    Instruction     GCC builtin function
> + * -------------------------------------------------------
> + *   OPCH_NTR       prefetchnta  __builtin_prefetch(a, 0, 0)
> + *   OPCH_LTR       prefetcht2   __builtin_prefetch(a, 0, 1)
> + *   OPCH_MTR       prefetcht1   __builtin_prefetch(a, 0, 2)
> + *   OPCH_HTR       prefetcht0   __builtin_prefetch(a, 0, 3)
> + *
> + *   OPCH_NTW       prefetchwt1  __builtin_prefetch(a, 1, 0)
> + *   OPCH_LTW       prefetchwt1  __builtin_prefetch(a, 1, 1)
> + *   OPCH_MTW       prefetchwt1  __builtin_prefetch(a, 1, 2)
> + *   OPCH_HTW       prefetchw    __builtin_prefetch(a, 1, 3)
> + *
> + * */
> +#define OVS_PREFETCH_CACHE_HINT
> \
> +    OPCH(OPCH_NTR, PREFETCH_READ, NON_TEMPORAL_LOCALITY,
> \
> +         "Fetch data to non-temporal cache close to processor"
> \
> +         "to minimize cache pollution")
> \
> +    OPCH(OPCH_LTR, PREFETCH_READ, LOW_TEMPORAL_LOCALITY,
> \
> +         "Fetch data to L2 and L3 cache")
> \
> +    OPCH(OPCH_MTR, PREFETCH_READ, MODERATE_TEMPORAL_LOCALITY,
> \
> +         "Fetch data to L2 and L3 caches, same as LTR on"
> \
> +         "Nehalem, Westmere, Sandy Bridge and newer microarchitectures")
> \
> +    OPCH(OPCH_HTR, PREFETCH_READ, HIGH_TEMPORAL_LOCALITY,
> \
> +         "Fetch data in to all cache levels L1, L2 and L3")
> \
> +    OPCH(OPCH_NTW, PREFETCH_WRITE, NON_TEMPORAL_LOCALITY,
> \
> +         "Fetch data to L2 and L3 cache in exclusive state"
> \
> +         "in anticipation of write")
> \
> +    OPCH(OPCH_LTW, PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY,
> \
> +         "Fetch data to L2 and L3 cache in exclusive state")
> \
> +    OPCH(OPCH_MTW, PREFETCH_WRITE, MODERATE_TEMPORAL_LOCALITY,
> \
> +         "Fetch data in to L2 and L3 caches in exclusive state")
> \
> +    OPCH(OPCH_HTW, PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY,
> \
> +         "Fetch data in to all cache levels in exclusive state")
> +
> +/* Indexes for cache prefetch types. */ enum { #define OPCH(ENUM, RW,
> +LOCALITY, EXPLANATION) ENUM##_INDEX,
> +    OVS_PREFETCH_CACHE_HINT
> +#undef OPCH
> +};
> +
> +/* Cache prefetch types. */
> +enum ovs_prefetch_type {
> +#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM = 1 << ENUM##_INDEX,
> +    OVS_PREFETCH_CACHE_HINT
> +#undef OPCH
> +};
> +
> +#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE)

Checkpatch caught the following:

ERROR: Improper whitespace around control block
#164 FILE: include/openvswitch/compiler.h:331:
#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE)                           \

Lines checked: 204, Warnings: 0, Errors: 1> \
> +{
> \
> +    case OPCH_NTR:
> \
> +        __builtin_prefetch((addr), PREFETCH_READ, NON_TEMPORAL_LOCALITY);
> \
> +        break;
> \
> +    case OPCH_LTR:
> \
> +        __builtin_prefetch((addr), PREFETCH_READ, LOW_TEMPORAL_LOCALITY);
> \
> +        break;
> \
> +    case OPCH_MTR:
> \
> +        __builtin_prefetch((addr), PREFETCH_READ,
> \
> +                           MODERATE_TEMPORAL_LOCALITY);
> \
> +        break;
> \
> +    case OPCH_HTR:
> \
> +        __builtin_prefetch((addr), PREFETCH_READ,
> HIGH_TEMPORAL_LOCALITY);    \
> +        break;
> \
> +    case OPCH_NTW:
> \
> +        __builtin_prefetch((addr), PREFETCH_WRITE,
> NON_TEMPORAL_LOCALITY);    \
> +        break;
> \
> +    case OPCH_LTW:
> \
> +        __builtin_prefetch((addr), PREFETCH_WRITE,
> LOW_TEMPORAL_LOCALITY);    \
> +        break;
> \
> +    case OPCH_MTW:
> \
> +        __builtin_prefetch((addr), PREFETCH_WRITE,
> \
> +                           MODERATE_TEMPORAL_LOCALITY);
> \
> +        break;
> \
> +    case OPCH_HTW:
> \
> +        __builtin_prefetch((addr), PREFETCH_WRITE,
> HIGH_TEMPORAL_LOCALITY);   \
> +        break;
> \
> +}
> +
> +/* Retain this for backward compatibility. */ #define
> +OVS_PREFETCH(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTR) #define
> +OVS_PREFETCH_WRITE(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>  #else
>  #define OVS_PREFETCH(addr)
>  #define OVS_PREFETCH_WRITE(addr)
> +#define OVS_PREFETCH_CACHE(addr, OP)
>  #endif
> 
>  /* Build assertions.
> --
> 2.4.11
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Bodireddy, Bhanuprakash March 13, 2018, 2:52 p.m. | #5
>
>> -----Original Message-----
>> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
>> bounces@openvswitch.org] On Behalf Of Bhanuprakash Bodireddy
>> Sent: Friday, January 12, 2018 5:41 PM
>> To: dev@openvswitch.org
>> Subject: [ovs-dev] [PATCH 1/4] compiler: Introduce OVS_PREFETCH variants.
>>
>> This commit introduces prefetch variants by using the GCC built-in
>> prefetch function.
>>
>> The prefetch variants gives the user better control on designing data
>> caching strategy in order to increase cache efficiency and minimize
>> cache pollution. Data reference patterns here can be classified in to
>>
>>  - Non-temporal(NT) - Data that is referenced once and not reused in
>>                       immediate future.
>>  - Temporal         - Data will be used again soon.
>>
>> The Macro variants can be used where there are
>>  - Predictable memory access patterns.
>>  - Execution pipeline can stall if data isn't available.
>>  - Time consuming loops.
>>
>> For example:
>>
>>   OVS_PREFETCH_CACHE(addr, OPCH_LTR)
>>     - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
>>     - __builtin_prefetch(addr, 0, 1)
>>     - Prefetch data in to L3 cache for readonly purpose.
>>
>>   OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>>     - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
>>     - __builtin_prefetch(addr, 1, 3)
>>     - Prefetch data in to all caches in anticipation of write. In doing
>>       so it invalidates other cached copies so as to gain 'exclusive'
>>       access.
>>
>>   OVS_PREFETCH(addr)
>>     - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
>>     - __builtin_prefetch(addr, 0, 3)
>>     - Prefetch data in to all caches in anticipation of read and that
>>       data will be used again soon (HTR - High Temporal Read).
>>
>> Signed-off-by: Bhanuprakash Bodireddy
>> <bhanuprakash.bodireddy@intel.com>
>> ---
>>  include/openvswitch/compiler.h | 147
>> ++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 139 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/openvswitch/compiler.h
>> b/include/openvswitch/compiler.h index c7cb930..94bb24d 100644
>> --- a/include/openvswitch/compiler.h
>> +++ b/include/openvswitch/compiler.h
>> @@ -222,18 +222,149 @@
>>      static void f(void)
>>  #endif
>>
>> -/* OVS_PREFETCH() can be used to instruct the CPU to fetch the cache
>> - * line containing the given address to a CPU cache.
>> - * OVS_PREFETCH_WRITE() should be used when the memory is going to
>be
>> - * written to.  Depending on the target CPU, this can generate the
>> same
>> - * instruction as OVS_PREFETCH(), or bring the data into the cache in
>> an
>> - * exclusive state. */
>>  #if __GNUC__
>> -#define OVS_PREFETCH(addr) __builtin_prefetch((addr)) -#define
>> OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
>> +enum cache_locality {
>> +    NON_TEMPORAL_LOCALITY,
>> +    LOW_TEMPORAL_LOCALITY,
>> +    MODERATE_TEMPORAL_LOCALITY,
>> +    HIGH_TEMPORAL_LOCALITY
>> +};
>> +
>> +enum cache_rw {
>> +    PREFETCH_READ,
>> +    PREFETCH_WRITE
>> +};
>> +
>> +/* The prefetch variants gives the user better control on designing
>> +data
>> + * caching strategy in order to increase cache efficiency and
>> +minimize
>> + * cache pollution. Data reference patterns here can be classified in
>> +to
>> + *
>> + *   Non-temporal(NT) - Data that is referenced once and not reused in
>> + *                      immediate future.
>> + *   Temporal         - Data will be used again soon.
>> + *
>> + * The Macro variants can be used where there are
>> + *   o Predictable memory access patterns.
>> + *   o Execution pipeline can stall if data isn't available.
>> + *   o Time consuming loops.
>> + *
>> + * OVS_PREFETCH_CACHE() can be used to instruct the CPU to fetch the
>> +cache
>> + * line containing the given address to a CPU cache. The second
>> +argument
>> + * OPCH_XXR (or) OPCH_XXW is used to hint if the prefetched data is
>> +going
>> + * to be read or written to by core.
>> + *
>> + * Example Usage:
>> + *
>> + *   OVS_PREFETCH_CACHE(addr, OPCH_LTR)
>> + *       - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
>> + *       - __builtin_prefetch(addr, 0, 1)
>> + *       - Prefetch data in to L3 cache for readonly purpose.
>> + *
>> + *   OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>> + *       - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
>> + *       - __builtin_prefetch(addr, 1, 3)
>> + *       - Prefetch data in to all caches in anticipation of write. In
>> doing
>> + *         so it invalidates other cached copies so as to gain
>> 'exclusive'
>> + *         access.
>> + *
>> + *   OVS_PREFETCH(addr)
>> + *       - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
>> + *       - __builtin_prefetch(addr, 0, 3)
>> + *       - Prefetch data in to all caches in anticipation of read and
>> that
>> + *         data will be used again soon (HTR - High Temporal Read).
>> + *
>> + * Implementation details of prefetch hint instructions may vary
>> + across
>> + * different processors and microarchitectures.
>
>Herein lies a potential problem, have you tested this on systems that have
>different interpretations of the prefetch hints? What about systems that
>don't support it?

[BHANU] 
I have tested it on different intel micro architectures(Haswell, Broadwell, skylake).
I understand that you are concerned about ARM platform, I see that ARM do support prefetch variants and they have the same functionality as x86_64.

For example, the below code snippet when compiled on ARM64 with gcc 5.4

void pref(void *p) {
  
  __builtin_prefetch(p,0,0);
  __builtin_prefetch(p,0,1);
  __builtin_prefetch(p,0,2);
  __builtin_prefetch(p,0,3);

  __builtin_prefetch(p,1,0);
  __builtin_prefetch(p,1,1);  
  __builtin_prefetch(p,1,2);  
  __builtin_prefetch(p,1,3);  
}

ON ARM64 (gcc 5.4) :

pref:
        prfm    PLDL1STRM, [x0]
        prfm    PLDL3KEEP, [x0]
        prfm    PLDL2KEEP, [x0]
        prfm    PLDL1KEEP, [x0]
        prfm    PSTL1STRM, [x0]
        prfm    PSTL3KEEP, [x0]
        prfm    PSTL2KEEP, [x0]
        prfm    PSTL1KEEP, [x0]
        ret

On instruction details: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/PRFM_imm.html

The best way to verify different platforms and complier versions is to use https://gcc.godbolt.org/

>
>In some cases OVS will be compiled on one system but then deployed on
>another, they might not be the same HW platform. What happens in that
>case?

If the target doesn't support the prefetch, it might be a NOP on that platform and doesn't cause any application crashes or performance penalties.

>
>Will it behave as expected i.e. similar fashion to how prefetch currently
>behaves?

Yes.

>
>> + *
>> + * OPCH_NTW, OPCH_LTW, OPCH_MTW uses prefetchwt1 instruction and
>> +OPCH_HTW
>> + * uses prefetchw instruction when available. Refer Documentation on
>> +how
>> + * to enable prefetchwt1 instruction.
>
>Just to clarify, Is it HW documentation for a user's setup they must refer to?

[BHANU] 
Nope, I meant the OvS Documentation in this patch.
https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343101.html


>Are there any extra setup steps for compilers etc. for these instructions?

[BHANU] True, this has been clearly mentioned in the Documentation in the above specified link.
>
>I would expect something like this to be added to the OVS docs.
>
>> + *
>> + * PREFETCH HINT    Instruction     GCC builtin function
>> + * -------------------------------------------------------
>> + *   OPCH_NTR       prefetchnta  __builtin_prefetch(a, 0, 0)
>> + *   OPCH_LTR       prefetcht2   __builtin_prefetch(a, 0, 1)
>> + *   OPCH_MTR       prefetcht1   __builtin_prefetch(a, 0, 2)
>> + *   OPCH_HTR       prefetcht0   __builtin_prefetch(a, 0, 3)
>> + *
>> + *   OPCH_NTW       prefetchwt1  __builtin_prefetch(a, 1, 0)
>> + *   OPCH_LTW       prefetchwt1  __builtin_prefetch(a, 1, 1)
>> + *   OPCH_MTW       prefetchwt1  __builtin_prefetch(a, 1, 2)
>> + *   OPCH_HTW       prefetchw    __builtin_prefetch(a, 1, 3)
>> + *
>> + * */
>> +#define OVS_PREFETCH_CACHE_HINT
>> \
>> +    OPCH(OPCH_NTR, PREFETCH_READ, NON_TEMPORAL_LOCALITY,
>> \
>> +         "Fetch data to non-temporal cache close to processor"
>> \
>> +         "to minimize cache pollution")
>> \
>> +    OPCH(OPCH_LTR, PREFETCH_READ, LOW_TEMPORAL_LOCALITY,
>> \
>> +         "Fetch data to L2 and L3 cache")
>> \
>> +    OPCH(OPCH_MTR, PREFETCH_READ, MODERATE_TEMPORAL_LOCALITY,
>> \
>> +         "Fetch data to L2 and L3 caches, same as LTR on"
>> \
>> +         "Nehalem, Westmere, Sandy Bridge and newer
>> + microarchitectures")
>> \
>> +    OPCH(OPCH_HTR, PREFETCH_READ, HIGH_TEMPORAL_LOCALITY,
>> \
>> +         "Fetch data in to all cache levels L1, L2 and L3")
>> \
>> +    OPCH(OPCH_NTW, PREFETCH_WRITE, NON_TEMPORAL_LOCALITY,
>> \
>> +         "Fetch data to L2 and L3 cache in exclusive state"
>> \
>> +         "in anticipation of write")
>> \
>> +    OPCH(OPCH_LTW, PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY,
>> \
>> +         "Fetch data to L2 and L3 cache in exclusive state")
>> \
>> +    OPCH(OPCH_MTW, PREFETCH_WRITE,
>MODERATE_TEMPORAL_LOCALITY,
>> \
>> +         "Fetch data in to L2 and L3 caches in exclusive state")
>> \
>> +    OPCH(OPCH_HTW, PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY,
>> \
>> +         "Fetch data in to all cache levels in exclusive state")
>> +
>> +/* Indexes for cache prefetch types. */ enum { #define OPCH(ENUM, RW,
>> +LOCALITY, EXPLANATION) ENUM##_INDEX,
>> +    OVS_PREFETCH_CACHE_HINT
>> +#undef OPCH
>> +};
>> +
>> +/* Cache prefetch types. */
>> +enum ovs_prefetch_type {
>> +#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM = 1 <<
>ENUM##_INDEX,
>> +    OVS_PREFETCH_CACHE_HINT
>> +#undef OPCH
>> +};
>> +
>> +#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE)
>
>Checkpatch caught the following:
>
>ERROR: Improper whitespace around control block
>#164 FILE: include/openvswitch/compiler.h:331:
>#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE)                           \
>
>Lines checked: 204, Warnings: 0, Errors: 1> \

[BHANU]
I will fix this.

>> +{
>> \
>> +    case OPCH_NTR:
>> \
>> +        __builtin_prefetch((addr), PREFETCH_READ,
>> + NON_TEMPORAL_LOCALITY);
>> \
>> +        break;
>> \
>> +    case OPCH_LTR:
>> \
>> +        __builtin_prefetch((addr), PREFETCH_READ,
>> + LOW_TEMPORAL_LOCALITY);
>> \
>> +        break;
>> \
>> +    case OPCH_MTR:
>> \
>> +        __builtin_prefetch((addr), PREFETCH_READ,
>> \
>> +                           MODERATE_TEMPORAL_LOCALITY);
>> \
>> +        break;
>> \
>> +    case OPCH_HTR:
>> \
>> +        __builtin_prefetch((addr), PREFETCH_READ,
>> HIGH_TEMPORAL_LOCALITY);    \
>> +        break;
>> \
>> +    case OPCH_NTW:
>> \
>> +        __builtin_prefetch((addr), PREFETCH_WRITE,
>> NON_TEMPORAL_LOCALITY);    \
>> +        break;
>> \
>> +    case OPCH_LTW:
>> \
>> +        __builtin_prefetch((addr), PREFETCH_WRITE,
>> LOW_TEMPORAL_LOCALITY);    \
>> +        break;
>> \
>> +    case OPCH_MTW:
>> \
>> +        __builtin_prefetch((addr), PREFETCH_WRITE,
>> \
>> +                           MODERATE_TEMPORAL_LOCALITY);
>> \
>> +        break;
>> \
>> +    case OPCH_HTW:
>> \
>> +        __builtin_prefetch((addr), PREFETCH_WRITE,
>> HIGH_TEMPORAL_LOCALITY);   \
>> +        break;
>> \
>> +}
>> +
>> +/* Retain this for backward compatibility. */ #define
>> +OVS_PREFETCH(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTR) #define
>> +OVS_PREFETCH_WRITE(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTW)
>>  #else
>>  #define OVS_PREFETCH(addr)
>>  #define OVS_PREFETCH_WRITE(addr)
>> +#define OVS_PREFETCH_CACHE(addr, OP)
>>  #endif
>>
>>  /* Build assertions.
>> --
>> 2.4.11
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Patch

diff --git a/include/openvswitch/compiler.h b/include/openvswitch/compiler.h
index c7cb930..94bb24d 100644
--- a/include/openvswitch/compiler.h
+++ b/include/openvswitch/compiler.h
@@ -222,18 +222,149 @@ 
     static void f(void)
 #endif
 
-/* OVS_PREFETCH() can be used to instruct the CPU to fetch the cache
- * line containing the given address to a CPU cache.
- * OVS_PREFETCH_WRITE() should be used when the memory is going to be
- * written to.  Depending on the target CPU, this can generate the same
- * instruction as OVS_PREFETCH(), or bring the data into the cache in an
- * exclusive state. */
 #if __GNUC__
-#define OVS_PREFETCH(addr) __builtin_prefetch((addr))
-#define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1)
+enum cache_locality {
+    NON_TEMPORAL_LOCALITY,
+    LOW_TEMPORAL_LOCALITY,
+    MODERATE_TEMPORAL_LOCALITY,
+    HIGH_TEMPORAL_LOCALITY
+};
+
+enum cache_rw {
+    PREFETCH_READ,
+    PREFETCH_WRITE
+};
+
+/* The prefetch variants gives the user better control on designing data
+ * caching strategy in order to increase cache efficiency and minimize
+ * cache pollution. Data reference patterns here can be classified in to
+ *
+ *   Non-temporal(NT) - Data that is referenced once and not reused in
+ *                      immediate future.
+ *   Temporal         - Data will be used again soon.
+ *
+ * The Macro variants can be used where there are
+ *   o Predictable memory access patterns.
+ *   o Execution pipeline can stall if data isn't available.
+ *   o Time consuming loops.
+ *
+ * OVS_PREFETCH_CACHE() can be used to instruct the CPU to fetch the cache
+ * line containing the given address to a CPU cache. The second argument
+ * OPCH_XXR (or) OPCH_XXW is used to hint if the prefetched data is going
+ * to be read or written to by core.
+ *
+ * Example Usage:
+ *
+ *   OVS_PREFETCH_CACHE(addr, OPCH_LTR)
+ *       - OPCH_LTR : OVS PREFETCH CACHE HINT-LOW TEMPORAL READ.
+ *       - __builtin_prefetch(addr, 0, 1)
+ *       - Prefetch data in to L3 cache for readonly purpose.
+ *
+ *   OVS_PREFETCH_CACHE(addr, OPCH_HTW)
+ *       - OPCH_HTW : OVS PREFETCH CACHE HINT-HIGH TEMPORAL WRITE.
+ *       - __builtin_prefetch(addr, 1, 3)
+ *       - Prefetch data in to all caches in anticipation of write. In doing
+ *         so it invalidates other cached copies so as to gain 'exclusive'
+ *         access.
+ *
+ *   OVS_PREFETCH(addr)
+ *       - OPCH_HTR : OVS PREFETCH CACHE HINT-HIGH TEMPORAL READ.
+ *       - __builtin_prefetch(addr, 0, 3)
+ *       - Prefetch data in to all caches in anticipation of read and that
+ *         data will be used again soon (HTR - High Temporal Read).
+ *
+ * Implementation details of prefetch hint instructions may vary across
+ * different processors and microarchitectures.
+ *
+ * OPCH_NTW, OPCH_LTW, OPCH_MTW uses prefetchwt1 instruction and OPCH_HTW
+ * uses prefetchw instruction when available. Refer Documentation on how
+ * to enable prefetchwt1 instruction.
+ *
+ * PREFETCH HINT    Instruction     GCC builtin function
+ * -------------------------------------------------------
+ *   OPCH_NTR       prefetchnta  __builtin_prefetch(a, 0, 0)
+ *   OPCH_LTR       prefetcht2   __builtin_prefetch(a, 0, 1)
+ *   OPCH_MTR       prefetcht1   __builtin_prefetch(a, 0, 2)
+ *   OPCH_HTR       prefetcht0   __builtin_prefetch(a, 0, 3)
+ *
+ *   OPCH_NTW       prefetchwt1  __builtin_prefetch(a, 1, 0)
+ *   OPCH_LTW       prefetchwt1  __builtin_prefetch(a, 1, 1)
+ *   OPCH_MTW       prefetchwt1  __builtin_prefetch(a, 1, 2)
+ *   OPCH_HTW       prefetchw    __builtin_prefetch(a, 1, 3)
+ *
+ * */
+#define OVS_PREFETCH_CACHE_HINT                                             \
+    OPCH(OPCH_NTR, PREFETCH_READ, NON_TEMPORAL_LOCALITY,                    \
+         "Fetch data to non-temporal cache close to processor"              \
+         "to minimize cache pollution")                                     \
+    OPCH(OPCH_LTR, PREFETCH_READ, LOW_TEMPORAL_LOCALITY,                    \
+         "Fetch data to L2 and L3 cache")                                   \
+    OPCH(OPCH_MTR, PREFETCH_READ, MODERATE_TEMPORAL_LOCALITY,               \
+         "Fetch data to L2 and L3 caches, same as LTR on"                   \
+         "Nehalem, Westmere, Sandy Bridge and newer microarchitectures")    \
+    OPCH(OPCH_HTR, PREFETCH_READ, HIGH_TEMPORAL_LOCALITY,                   \
+         "Fetch data in to all cache levels L1, L2 and L3")                 \
+    OPCH(OPCH_NTW, PREFETCH_WRITE, NON_TEMPORAL_LOCALITY,                   \
+         "Fetch data to L2 and L3 cache in exclusive state"                 \
+         "in anticipation of write")                                        \
+    OPCH(OPCH_LTW, PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY,                   \
+         "Fetch data to L2 and L3 cache in exclusive state")                \
+    OPCH(OPCH_MTW, PREFETCH_WRITE, MODERATE_TEMPORAL_LOCALITY,              \
+         "Fetch data in to L2 and L3 caches in exclusive state")            \
+    OPCH(OPCH_HTW, PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY,                  \
+         "Fetch data in to all cache levels in exclusive state")
+
+/* Indexes for cache prefetch types. */
+enum {
+#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM##_INDEX,
+    OVS_PREFETCH_CACHE_HINT
+#undef OPCH
+};
+
+/* Cache prefetch types. */
+enum ovs_prefetch_type {
+#define OPCH(ENUM, RW, LOCALITY, EXPLANATION) ENUM = 1 << ENUM##_INDEX,
+    OVS_PREFETCH_CACHE_HINT
+#undef OPCH
+};
+
+#define OVS_PREFETCH_CACHE(addr, TYPE) switch(TYPE)                           \
+{                                                                             \
+    case OPCH_NTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ, NON_TEMPORAL_LOCALITY);     \
+        break;                                                                \
+    case OPCH_LTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ, LOW_TEMPORAL_LOCALITY);     \
+        break;                                                                \
+    case OPCH_MTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ,                             \
+                           MODERATE_TEMPORAL_LOCALITY);                       \
+        break;                                                                \
+    case OPCH_HTR:                                                            \
+        __builtin_prefetch((addr), PREFETCH_READ, HIGH_TEMPORAL_LOCALITY);    \
+        break;                                                                \
+    case OPCH_NTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE, NON_TEMPORAL_LOCALITY);    \
+        break;                                                                \
+    case OPCH_LTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE, LOW_TEMPORAL_LOCALITY);    \
+        break;                                                                \
+    case OPCH_MTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE,                            \
+                           MODERATE_TEMPORAL_LOCALITY);                       \
+        break;                                                                \
+    case OPCH_HTW:                                                            \
+        __builtin_prefetch((addr), PREFETCH_WRITE, HIGH_TEMPORAL_LOCALITY);   \
+        break;                                                                \
+}
+
+/* Retain this for backward compatibility. */
+#define OVS_PREFETCH(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTR)
+#define OVS_PREFETCH_WRITE(addr) OVS_PREFETCH_CACHE(addr, OPCH_HTW)
 #else
 #define OVS_PREFETCH(addr)
 #define OVS_PREFETCH_WRITE(addr)
+#define OVS_PREFETCH_CACHE(addr, OP)
 #endif
 
 /* Build assertions.