diff mbox series

[ovs-dev,v9,14/16] dpif-netdev: Optimize dp output action

Message ID 20210212171718.2189798-15-harry.van.haaren@intel.com
State New
Headers show
Series DPIF Framework + Optimizations | expand

Commit Message

Harry van Haaren Feb. 12, 2021, 5:17 p.m. UTC
This commit optimizes the output action, by enabling the compiler to
optimize the code better through reducing code complexity.

The core concept of this optimization is that the array-length checks
have already been performed above the copying code, so can be removed.
Removing of the per-packet length checks allows the compiler to auto-vectorize
the stores using SIMD registers.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>

---

v8: Add NEWS entry.
---
 NEWS              |  1 +
 lib/dpif-netdev.c | 23 ++++++++++++++++++-----
 2 files changed, 19 insertions(+), 5 deletions(-)
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 5f1e3b5e0..2ffc155f9 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,7 @@  Post-v2.15.0
      * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the
        CPU supports it. This enhances performance by using the native vpopcount
        instructions, instead of the emulated version of vpopcount.
+     * Optimize dp_netdev_output by enhancing compiler optimization potential.
 
 v2.15.0 - xx xxx xxxx
 ---------------------
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 5e83755d7..b2cf1bd46 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -7254,12 +7254,25 @@  dp_execute_output_action(struct dp_netdev_pmd_thread *pmd,
         pmd->n_output_batches++;
     }
 
-    struct dp_packet *packet;
-    DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) {
-        p->output_pkts_rxqs[dp_packet_batch_size(&p->output_pkts)] =
-            pmd->ctx.last_rxq;
-        dp_packet_batch_add(&p->output_pkts, packet);
+    /* The above checks ensure that there is enough space in the output batch.
+     * Using dp_packet_batch_add() has a branch to check if the batch is full.
+     * This branch reduces the compiler's ability to optimize efficiently. The
+     * below code implements packet movement between batches without checks,
+     * with the required semantics of output batch perhaps containing packets.
+     */
+    int batch_size = dp_packet_batch_size(packets_);
+    int out_batch_idx = dp_packet_batch_size(&p->output_pkts);
+    struct dp_netdev_rxq *rxq = pmd->ctx.last_rxq;
+    struct dp_packet_batch *output_batch = &p->output_pkts;
+
+    for (int i = 0; i < batch_size; i++) {
+        struct dp_packet *packet = packets_->packets[i];
+        p->output_pkts_rxqs[out_batch_idx] = rxq;
+        output_batch->packets[out_batch_idx] = packet;
+        out_batch_idx++;
     }
+    output_batch->count += batch_size;
+
     return true;
 }