diff mbox series

[ovs-dev] netdev-dpdk: Fix Tx queue false sharing.

Message ID 20190826145404.3111-1-i.maximets@samsung.com
State Accepted
Headers show
Series [ovs-dev] netdev-dpdk: Fix Tx queue false sharing. | expand

Commit Message

Ilya Maximets Aug. 26, 2019, 2:54 p.m. UTC
'tx_q' array is allocated for each DPDK netdev.  'struct dpdk_tx_queue'
is 8 bytes long, so 8 tx queues are sharing the same cache line in
case of 64B cacheline size.  This causes 'false sharing' issue in
mutliqueue case because taking the spinlock implies write to memory
i.e. cache invalidation.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---

I didn't really test the performance impact yet, so testing is very
welcome.  Relevant test case could be PVP with 8 queues in HW NIC and
8 queues in vhost-user inerface and 8 PMD threads.

 lib/netdev-dpdk.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

Comments

Eelco Chaudron Sept. 12, 2019, 1:07 p.m. UTC | #1
On 26 Aug 2019, at 16:54, Ilya Maximets wrote:

> 'tx_q' array is allocated for each DPDK netdev.  'struct 
> dpdk_tx_queue'
> is 8 bytes long, so 8 tx queues are sharing the same cache line in
> case of 64B cacheline size.  This causes 'false sharing' issue in
> mutliqueue case because taking the spinlock implies write to memory
> i.e. cache invalidation.
>
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
>
> I didn't really test the performance impact yet, so testing is very
> welcome.  Relevant test case could be PVP with 8 queues in HW NIC and
> 8 queues in vhost-user inerface and 8 PMD threads.

Did a quick test on my setup, but I do not have enough cores to do 8 PMD 
threads and 8 VM cores. So I did a 4 core PMD/VM test running PV (not 
PVP as we do not care about receiving traffic from the VHOST) using 64 
bytes packets and 100 L3 flows.

Without the patch, I get 17,408,585 pps and with the patch 17,573,089 
pps.

The patch looks good to me…

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Ilya Maximets Oct. 23, 2019, 10:01 a.m. UTC | #2
On 12.09.2019 15:07, Eelco Chaudron wrote:
> 
> 
> On 26 Aug 2019, at 16:54, Ilya Maximets wrote:
> 
>> 'tx_q' array is allocated for each DPDK netdev.  'struct dpdk_tx_queue'
>> is 8 bytes long, so 8 tx queues are sharing the same cache line in
>> case of 64B cacheline size.  This causes 'false sharing' issue in
>> mutliqueue case because taking the spinlock implies write to memory
>> i.e. cache invalidation.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>> ---
>>
>> I didn't really test the performance impact yet, so testing is very
>> welcome.  Relevant test case could be PVP with 8 queues in HW NIC and
>> 8 queues in vhost-user inerface and 8 PMD threads.
> 
> Did a quick test on my setup, but I do not have enough cores to do 8 PMD threads and 8 VM cores. So I did a 4 core PMD/VM test running PV (not PVP as we do not care about receiving traffic from the VHOST) using 64 bytes packets and 100 L3 flows.
> 
> Without the patch, I get 17,408,585 pps and with the patch 17,573,089 pps.
> 
> The patch looks good to me…
> 
> Acked-by: Eelco Chaudron <echaudro@redhat.com>

Thanks!  I'm expecting better results with more queues, but
it's a good improvement for 4 queues too.

Applied to master.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index bc20d6843..adfe76a63 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -298,14 +298,17 @@  struct dpdk_mp {
  };
 
 /* There should be one 'struct dpdk_tx_queue' created for
- * each cpu core. */
+ * each netdev tx queue. */
 struct dpdk_tx_queue {
-    rte_spinlock_t tx_lock;        /* Protects the members and the NIC queue
-                                    * from concurrent access.  It is used only
-                                    * if the queue is shared among different
-                                    * pmd threads (see 'concurrent_txq'). */
-    int map;                       /* Mapping of configured vhost-user queues
-                                    * to enabled by guest. */
+    /* Padding to make dpdk_tx_queue exactly one cache line long. */
+    PADDED_MEMBERS(CACHE_LINE_SIZE,
+        /* Protects the members and the NIC queue from concurrent access.
+         * It is used only if the queue is shared among different pmd threads
+         * (see 'concurrent_txq'). */
+        rte_spinlock_t tx_lock;
+        /* Mapping of configured vhost-user queue to enabled by guest. */
+        int map;
+    );
 };
 
 /* dpdk has no way to remove dpdk ring ethernet devices