diff mbox

[ovs-dev,RFC] dpcls: Avoid one 8-byte chunk in subtable mask.

Message ID BCFD2BB875535045A5368D9ADBF2969909DFDCFD@IRSMSX101.ger.corp.intel.com
State Superseded
Headers show

Commit Message

Fischetti, Antonio Jan. 4, 2017, 3:04 p.m. UTC
Thanks Jarno, we’ve just posted a new patch based on your feedback.


From: Jarno Rajahalme [mailto:jarno@ovn.org]

Sent: Wednesday, January 4, 2017 1:41 AM
To: Fischetti, Antonio <antonio.fischetti@intel.com>
Cc: dev@openvswitch.org; Bodireddy, Bhanuprakash <bhanuprakash.bodireddy@intel.com>
Subject: Re: [PATCH RFC] dpcls: Avoid one 8-byte chunk in subtable mask.

On Dec 22, 2016, at 4:02 AM, Fischetti, Antonio <antonio.fischetti@intel.com<mailto:antonio.fischetti@intel.com>> wrote:

Thanks Jarno for your feedback, please find below my replies inline.

-----Original Message-----
From: Jarno Rajahalme [mailto:jarno@ovn.org]

Sent: Tuesday, December 20, 2016 11:23 PM
To: Fischetti, Antonio <antonio.fischetti@intel.com<mailto:antonio.fischetti@intel.com>>
Cc: <dev@openvswitch.org<mailto:dev@openvswitch.org>> <dev@openvswitch.org<mailto:dev@openvswitch.org>>; Bodireddy, Bhanuprakash
Subject: Re: [PATCH RFC] dpcls: Avoid one 8-byte chunk in subtable mask.

On Dec 19, 2016, at 2:43 PM, antonio.fischetti@intel.com<mailto:antonio.fischetti@intel.com> wrote:

This patch skips the chunk comprising of dp_hash and in_port in the
subtable mask when the dpcls refers to a physical port.  This will
slightly speed up the hash computation as one expensive function call
to hash_add64() can be skipped.

Do you have any performance test results to share?
[Antonio F]
We got a performance improvement of approx. ~4-5% with the following tests.
We used commit ID: ba7283e97fe80920a222249eb9f6f4211ccb4ccf
and IXIA Generator: monodir. traffic with 64 bytes packets

Test 1.
Flow: in_port=1 actions=output:2

Case 1a. Original + Emc disabled:                                                7.40 Mpps
Case 1b. Original + Emc dis. + this patch:   7.76 Mpps             => +0.36 Mpps (4.9%)

Test 2.
Flow: ip,nw_src=,nw_dst= actions=output:2

Case 2a. Original + Emc disabled:                                                6.83 Mpps
Case 2b. Original + Emc dis. + this patch:   7.10 Mpps             => +0.27 Mpps (4.0%)

Test 3.
Flow: dl_vlan=5 actions=output:2

Case 3a. Original + Emc disabled:                                                7.47 Mpps
Case 3b. Original + Emc dis. + this patch:   7.76 Mpps             => +0.29 Mpps (3.9%)

Test 4.
4 IXIA streams mixed up with the same percentage.
Flows to catch all IXIA streams:
ip,nw_src=,nw_dst= actions=output:2
ip,nw_dst= actions=output:2
tcp,nw_dst=,tp_dst=4369 actions=output:2
dl_vlan=5 actions=output:2

Case 4a. Original + Emc disabled:                                                3.9 Mpps
Case 4b. Original + Emc dis. + this patch:   4.1 Mpps               => +0.2 Mpps (5.1%)

Nice results for such a small change.


The idea behind is that the packets coming from a physical port
shall have the same 8-bytes chunk in their dp_hash/in_port portion of
the miniflow.  When hash is computed in the NIC, dp_hash is zero leaving
only the in_port in the chunk.  This doesn't contribute effectively in
spreading hash values and avoiding collisions.
This approach could be extended to other port types.

At first this seems too hacky for me. However, since the dpcls is
explicitly selected for the in_port, it makes sense to wildcard the
in_port in the dpcls lookup itself. This has nothing to do with the port
type. For dp_hash, it would typically not be matched, so there is no need
to worry about it. Also, when it is matched (after recirculation), the
in_port value would be the same as before recirculation, so it is not safe
to assume anything about dp_hash based on the in_port value.

Proposals for improvement below.

Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com<mailto:antonio.fischetti@intel.com>>

Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com<mailto:bhanuprakash.bodireddy@intel.com>>

Co-authored-by: Bhanuprakash Bodireddy bhanuprakash.bodireddy@intel.com<mailto:bhanuprakash.bodireddy@intel.com>
lib/dpif-netdev.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)

   FLOWMAP_FOR_EACH_INDEX(idx, fmap) {
+        if (phy_port && idx == DPHASH_INPORT_MAP_IDX) {
+            /* This chunk contains Sw-Hash and Port Nr.  As all packets

+             * from the same physical port have the same in_port value
+             * and dp_hash set to 0, this 8-bytes chunk will be
ignored. */

+            continue;
+        }
       uint64_t mask_u64 = flow_u64_value(&match->wc.masks, idx);

       if (mask_u64) {
@@ -2278,7 +2294,15 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread

   struct dpcls *cls;
   odp_port_t in_port = match->flow.in_port.odp_port;

-    netdev_flow_mask_init(&mask, match);
+    struct dpif_port dpifport;
+    if (pmd && pmd->dp) {
+        const struct dpif * pdpif = pmd->dp->dpif;
+        if (pdpif) {
+            dpif_netdev_port_query_by_number(pdpif, in_port,

+        }
+    }
+    netdev_flow_mask_init(&mask, match, dpifport.type);

Instead of the above you could move the in_port.odp_port assert later in
the function to be done before the netdev_flow_mask_init() call, and then
after the assert zero it out, and after the netdev_flow_mask_init() make
it all-ones again.

This has the effect of always wildcarding in_port in the dpcls, which is
explicitly selected based on the port number. Then, in the typical case
where dp_hash is also wildcarded, the end result is the same as what you
intended with this patch.

[Antonio F]
If I understand correctly your suggestion

1. netdev_flow_mask_init() shouldn't be changed at all.

2. Inside dp_netdev_flow_add() I should do
     match->wc.masks.in_port.odp_port = 0;
     netdev_flow_mask_init(&mask, match);
     match->wc.masks.in_port.odp_port = ODPP_NONE;

3. This change could apply to both dpdk and vhostuser ports, so there's no
   need to check the port type. Is that correct?

Yes, for me this would be more obviously correct, especially if you move the later assert on the odp_port value before the code in 2, and add comment that we wildcard the odp_port value in the mask, as we select the classifier based on the port number and each flow in the selected classifier has this same port number. Maybe also explain that this is done for performance reasons, so that someone does not change it back to simplify code.


   /* Make sure wc does not have metadata. */
   ovs_assert(!FLOWMAP_HAS_FIELD(&mask.mf.map, metadata)
              && !FLOWMAP_HAS_FIELD(&mask.mf.map, regs));
diff mbox


diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1a47a45..cd8715f 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1856,18 +1856,34 @@  netdev_flow_key_from_flow(struct netdev_flow_key

/* Initialize a netdev_flow_key 'mask' from 'match'. */
static inline void
netdev_flow_mask_init(struct netdev_flow_key *mask,
-                      const struct match *match)
+                      const struct match *match,
+                      char * port_type)

It’s better to handle all this in the caller and not modify this function
at all.

   uint64_t *dst = miniflow_values(&mask->mf);
   struct flowmap fmap;
   uint32_t hash = 0;
   size_t idx;
+    bool phy_port = false;
+    if (port_type && !strcmp(port_type, "dpdk")) {
+        phy_port = true;
+    }

   /* Only check masks that make sense for the flow. */
   flow_wc_map(&match->flow, &fmap);
+    /* Check that dp_hash and in_port must be into the same structure
chunk. */

+    BUILD_ASSERT_DECL(offsetof(struct flow, dp_hash)/sizeof(*dst) ==
+            offsetof(struct flow, in_port)/sizeof(*dst));
+#define DPHASH_INPORT_MAP_IDX (offsetof(struct flow,