diff mbox series

[ovs-dev] packets: Prefetch the packet metadata in cacheline1.

Message ID 1510914362-103044-1-git-send-email-bhanuprakash.bodireddy@intel.com
State Superseded
Delegated to: Ian Stokes
Headers show
Series [ovs-dev] packets: Prefetch the packet metadata in cacheline1. | expand

Commit Message

Bodireddy, Bhanuprakash Nov. 17, 2017, 10:26 a.m. UTC
pkt_metadata_prefetch_init() is used to prefetch the packet metadata
before initializing the metadata in pkt_metadata_init(). This is done
for every packet in userspace datapath and is performance critical.

Commit 99fc16c0 prefetches only cachline0 and cacheline2 as the metadata
part of respective cachelines will be initialized by pkt_metadata_init().

However in VXLAN case when popping the vxlan header, netdev_vxlan_pop_header()
invokes pkt_metadata_init_tnl() which zeroes out metadata part of
cacheline1 that wasn't prefetched earlier and causes performance
degradation.

By prefetching cacheline1, 9% performance improvement is observed.

CC: Ben Pfaff <blp@ovn.org>
Fixes: 99fc16c0 ("Reorganize the pkt_metadata structure.")
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
---
 lib/packets.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Aaron Conole Nov. 20, 2017, 3:48 p.m. UTC | #1
Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> writes:

> pkt_metadata_prefetch_init() is used to prefetch the packet metadata
> before initializing the metadata in pkt_metadata_init(). This is done
> for every packet in userspace datapath and is performance critical.
>
> Commit 99fc16c0 prefetches only cachline0 and cacheline2 as the metadata
> part of respective cachelines will be initialized by pkt_metadata_init().
>
> However in VXLAN case when popping the vxlan header, netdev_vxlan_pop_header()
> invokes pkt_metadata_init_tnl() which zeroes out metadata part of
> cacheline1 that wasn't prefetched earlier and causes performance
> degradation.
>
> By prefetching cacheline1, 9% performance improvement is observed.

Do we see a degredation in the non-vxlan case?  If not, then I don't see
any reason not to apply this patch.

> CC: Ben Pfaff <blp@ovn.org>
> Fixes: 99fc16c0 ("Reorganize the pkt_metadata structure.")
> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com>
> ---
Bodireddy, Bhanuprakash Nov. 20, 2017, 4:23 p.m. UTC | #2
>
>Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> writes:
>
>> pkt_metadata_prefetch_init() is used to prefetch the packet metadata
>> before initializing the metadata in pkt_metadata_init(). This is done
>> for every packet in userspace datapath and is performance critical.
>>
>> Commit 99fc16c0 prefetches only cachline0 and cacheline2 as the
>> metadata part of respective cachelines will be initialized by
>pkt_metadata_init().
>>
>> However in VXLAN case when popping the vxlan header,
>> netdev_vxlan_pop_header() invokes pkt_metadata_init_tnl() which zeroes
>> out metadata part of
>> cacheline1 that wasn't prefetched earlier and causes performance
>> degradation.
>>
>> By prefetching cacheline1, 9% performance improvement is observed.
>
>Do we see a degredation in the non-vxlan case?  If not, then I don't see any
>reason not to apply this patch.

This patch doesn't impact the performance of non-vxlan cases and only have a positive impact in vxlan case.

- Bhanuprakash.
Bodireddy, Bhanuprakash Nov. 27, 2017, 4:35 p.m. UTC | #3
>>Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> writes:
>>
>>> pkt_metadata_prefetch_init() is used to prefetch the packet metadata
>>> before initializing the metadata in pkt_metadata_init(). This is done
>>> for every packet in userspace datapath and is performance critical.
>>>
>>> Commit 99fc16c0 prefetches only cachline0 and cacheline2 as the
>>> metadata part of respective cachelines will be initialized by
>>pkt_metadata_init().
>>>
>>> However in VXLAN case when popping the vxlan header,
>>> netdev_vxlan_pop_header() invokes pkt_metadata_init_tnl() which
>>> zeroes out metadata part of
>>> cacheline1 that wasn't prefetched earlier and causes performance
>>> degradation.
>>>
>>> By prefetching cacheline1, 9% performance improvement is observed.
>>
>>Do we see a degredation in the non-vxlan case?  If not, then I don't
>>see any reason not to apply this patch.
>
>This patch doesn't impact the performance of non-vxlan cases and only have a
>positive impact in vxlan case.

The commit message claims that the performance improvement was 9% with this patch
but when Sugesh was checking he wasn't getting that performance improvement on his Haswell.

I was chatting to Sugesh this afternoon on this patch and we found some interesting details and much
of this boils down to how the OvS is built .( Apart from HW, BIOS settings - TB disabled).

The test case here measure the VXLAN de capsulation performance alone for packet sizes of 118 bytes.
The OvS CFLAGS and throughput numbers are as below.

CFLAGS="-O2"
    Master              4.667 Mpps  
    With Patch       5.045 Mpps

CFLAGS="-O2 -msse4.2"
    Master              4.710 Mpps
    With Patch       5.097 Mpps

CFLAGS="-O2 -march=native"
    Master              5.072 Mpps
    With Patch       5.193 Mpps

CFLAGS="-Ofast -march=native"
    Master              5.349 Mpps
    With Patch       5.378 Mpps

This means the performance measurements/claims are difficult to assess and as one can see above with "-Ofast, -march=native"
the improvement is insignificant but this is very platform dependent due to "march=native" flag. Also the optimization flags seems to
make significant difference.

- Bhanuprakash.
Chandran, Sugesh Nov. 27, 2017, 5:57 p.m. UTC | #4
Hi Bhanu,

Regards
_Sugesh

> -----Original Message-----
> From: Bodireddy, Bhanuprakash
> Sent: Monday, November 27, 2017 4:35 PM
> To: 'Aaron Conole' <aconole@redhat.com>
> Cc: 'dev@openvswitch.org' <dev@openvswitch.org>; Ben Pfaff <blp@ovn.org>;
> Chandran, Sugesh <sugesh.chandran@intel.com>
> Subject: RE: [ovs-dev] [PATCH] packets: Prefetch the packet metadata in
> cacheline1.
> 
> >>Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> writes:
> >>
> >>> pkt_metadata_prefetch_init() is used to prefetch the packet metadata
> >>> before initializing the metadata in pkt_metadata_init(). This is
> >>> done for every packet in userspace datapath and is performance critical.
> >>>
> >>> Commit 99fc16c0 prefetches only cachline0 and cacheline2 as the
> >>> metadata part of respective cachelines will be initialized by
> >>pkt_metadata_init().
> >>>
> >>> However in VXLAN case when popping the vxlan header,
> >>> netdev_vxlan_pop_header() invokes pkt_metadata_init_tnl() which
> >>> zeroes out metadata part of
> >>> cacheline1 that wasn't prefetched earlier and causes performance
> >>> degradation.
> >>>
> >>> By prefetching cacheline1, 9% performance improvement is observed.
> >>
> >>Do we see a degredation in the non-vxlan case?  If not, then I don't
> >>see any reason not to apply this patch.
> >
> >This patch doesn't impact the performance of non-vxlan cases and only
> >have a positive impact in vxlan case.
> 
> The commit message claims that the performance improvement was 9% with
> this patch but when Sugesh was checking he wasn't getting that performance
> improvement on his Haswell.
> 
> I was chatting to Sugesh this afternoon on this patch and we found some
> interesting details and much of this boils down to how the OvS is built .( Apart
> from HW, BIOS settings - TB disabled).
> 
> The test case here measure the VXLAN de capsulation performance alone for
> packet sizes of 118 bytes.
> The OvS CFLAGS and throughput numbers are as below.
> 
> CFLAGS="-O2"
>     Master              4.667 Mpps
>     With Patch       5.045 Mpps
> 
> CFLAGS="-O2 -msse4.2"
>     Master              4.710 Mpps
>     With Patch       5.097 Mpps
> 
> CFLAGS="-O2 -march=native"
>     Master              5.072 Mpps
>     With Patch       5.193 Mpps
> 
> CFLAGS="-Ofast -march=native"
>     Master              5.349 Mpps
>     With Patch       5.378 Mpps
> 
> This means the performance measurements/claims are difficult to assess and as
> one can see above with "-Ofast, -march=native"
> the improvement is insignificant but this is very platform dependent due to
> "march=native" flag. Also the optimization flags seems to make significant
> difference.
[Sugesh] I also tested on my board with same set of configuration and getting the same result as yours.
So this patch offers performance improvement based on the compiler option. I am not sure whats the most preferred/used 
compiler option out there.
I always build OVS with CFLAGS="-Ofast -march=native" and the patch doesn't have a great improvement in it.

I don't mind Acking the patch, if you could re-send the patch with these results and options in the commit message. 
Atleast it will offer performance improvement for other build options.

> 
> - Bhanuprakash.
Bodireddy, Bhanuprakash Nov. 28, 2017, 2:24 p.m. UTC | #5
>-----Original Message-----
>From: Chandran, Sugesh
>Sent: Monday, November 27, 2017 5:58 PM
>To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>; 'Aaron
>Conole' <aconole@redhat.com>
>Cc: 'dev@openvswitch.org' <dev@openvswitch.org>; Ben Pfaff
><blp@ovn.org>
>Subject: RE: [ovs-dev] [PATCH] packets: Prefetch the packet metadata in
>cacheline1.
>
>Hi Bhanu,
>
>Regards
>_Sugesh
>
>> -----Original Message-----
>> From: Bodireddy, Bhanuprakash
>> Sent: Monday, November 27, 2017 4:35 PM
>> To: 'Aaron Conole' <aconole@redhat.com>
>> Cc: 'dev@openvswitch.org' <dev@openvswitch.org>; Ben Pfaff
>> <blp@ovn.org>; Chandran, Sugesh <sugesh.chandran@intel.com>
>> Subject: RE: [ovs-dev] [PATCH] packets: Prefetch the packet metadata
>> in cacheline1.
>>
>> >>Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> writes:
>> >>
>> >>> pkt_metadata_prefetch_init() is used to prefetch the packet
>> >>> metadata before initializing the metadata in pkt_metadata_init().
>> >>> This is done for every packet in userspace datapath and is performance
>critical.
>> >>>
>> >>> Commit 99fc16c0 prefetches only cachline0 and cacheline2 as the
>> >>> metadata part of respective cachelines will be initialized by
>> >>pkt_metadata_init().
>> >>>
>> >>> However in VXLAN case when popping the vxlan header,
>> >>> netdev_vxlan_pop_header() invokes pkt_metadata_init_tnl() which
>> >>> zeroes out metadata part of
>> >>> cacheline1 that wasn't prefetched earlier and causes performance
>> >>> degradation.
>> >>>
>> >>> By prefetching cacheline1, 9% performance improvement is observed.
>> >>
>> >>Do we see a degredation in the non-vxlan case?  If not, then I don't
>> >>see any reason not to apply this patch.
>> >
>> >This patch doesn't impact the performance of non-vxlan cases and only
>> >have a positive impact in vxlan case.
>>
>> The commit message claims that the performance improvement was 9%
>with
>> this patch but when Sugesh was checking he wasn't getting that
>> performance improvement on his Haswell.
>>
>> I was chatting to Sugesh this afternoon on this patch and we found
>> some interesting details and much of this boils down to how the OvS is
>> built .( Apart from HW, BIOS settings - TB disabled).
>>
>> The test case here measure the VXLAN de capsulation performance alone
>> for packet sizes of 118 bytes.
>> The OvS CFLAGS and throughput numbers are as below.
>>
>> CFLAGS="-O2"
>>     Master              4.667 Mpps
>>     With Patch       5.045 Mpps
>>
>> CFLAGS="-O2 -msse4.2"
>>     Master              4.710 Mpps
>>     With Patch       5.097 Mpps
>>
>> CFLAGS="-O2 -march=native"
>>     Master              5.072 Mpps
>>     With Patch       5.193 Mpps
>>
>> CFLAGS="-Ofast -march=native"
>>     Master              5.349 Mpps
>>     With Patch       5.378 Mpps
>>
>> This means the performance measurements/claims are difficult to assess
>> and as one can see above with "-Ofast, -march=native"
>> the improvement is insignificant but this is very platform dependent
>> due to "march=native" flag. Also the optimization flags seems to make
>> significant difference.
>[Sugesh] I also tested on my board with same set of configuration and getting
>the same result as yours.
>So this patch offers performance improvement based on the compiler option.
>I am not sure whats the most preferred/used compiler option out there.
>I always build OVS with CFLAGS="-Ofast -march=native" and the patch
>doesn't have a great improvement in it.
>
>I don't mind Acking the patch, if you could re-send the patch with these
>results and options in the commit message.
>Atleast it will offer performance improvement for other build options.

Thanks Sugesh for testing this out. I will send out v2 of this with the information I mentioned in
the earlier mail included in the commit message.

Bhanuprakash.
diff mbox series

Patch

diff --git a/lib/packets.h b/lib/packets.h
index 461f488..2e8c0f1 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -159,7 +159,8 @@  pkt_metadata_init(struct pkt_metadata *md, odp_port_t port)
 }
 
 /* This function prefetches the cachelines touched by pkt_metadata_init()
- * For performance reasons the two functions should be kept in sync. */
+ * and pkt_metadata_init_tnl().  For performance reasons the two functions
+ * should be kept in sync. */
 static inline void
 pkt_metadata_prefetch_init(struct pkt_metadata *md)
 {
@@ -167,6 +168,10 @@  pkt_metadata_prefetch_init(struct pkt_metadata *md)
      * be initialized later in pkt_metadata_init(). */
     OVS_PREFETCH(md->cacheline0);
 
+    /* Prefetch cacheline1 as members of this cacheline will be zeroed out
+     * in pkt_metadata_init_tnl(). */
+    OVS_PREFETCH(md->cacheline1);
+
     /* Prefetch cachline2 as ip_dst & ipv6_dst fields will be initialized. */
     OVS_PREFETCH(md->cacheline2);
 }