[ovs-dev] afxdp: Reduce afxdp's batch size to match kernel's xdp batch size
diff mbox series

Message ID 1576890223-24011-1-git-send-email-pkusunyifeng@gmail.com
State New
Headers show
Series
  • [ovs-dev] afxdp: Reduce afxdp's batch size to match kernel's xdp batch size
Related show

Commit Message

Yifeng Sun Dec. 21, 2019, 1:03 a.m. UTC
William reported that there is iperf TCP issue between two afxdp ports:

[  3] local 10.1.1.2 port 40384 connected with 10.1.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  17.0 MBytes   143 Mbits/sec
[  3]  1.0- 2.0 sec  9.62 MBytes  80.7 Mbits/sec
[  3]  2.0- 3.0 sec  6.75 MBytes  56.6 Mbits/sec
[  3]  3.0- 4.0 sec  11.0 MBytes  92.3 Mbits/sec
[  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
[  3] 10.0-11.0 sec  0.00 Bytes  0.00 bits/sec

The reason is, currently, netdev-afxdp's batch size is 32 while kernel's
xdp batch size is only 16. This can result in exhausting of sock wmem if
netdev-afxdp keeps sending large number of packets. Later on, when ARP
expires at one side of TCP connection, ARP packets can be delayed or
even dropped because sock wmen is already full.

This patch fixes this issue by reducing netdev-afxdp's batch size so
as to match kernel's xdp batch size. Now iperf TCP works correctly.

[  3] local 10.1.1.2 port 57770 connected with 10.1.1.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   262 MBytes  2.20 Gbits/sec
[  3]  1.0- 2.0 sec   299 MBytes  2.51 Gbits/sec
[  3]  2.0- 3.0 sec   271 MBytes  2.27 Gbits/sec
[  3]  3.0- 4.0 sec   247 MBytes  2.07 Gbits/sec
[  3]  4.0- 5.0 sec   290 MBytes  2.43 Gbits/sec
[  3]  5.0- 6.0 sec   292 MBytes  2.45 Gbits/sec
[  3]  6.0- 7.0 sec   223 MBytes  1.87 Gbits/sec
[  3]  7.0- 8.0 sec   243 MBytes  2.04 Gbits/sec
[  3]  8.0- 9.0 sec   234 MBytes  1.97 Gbits/sec
[  3]  9.0-10.0 sec   238 MBytes  2.00 Gbits/sec

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-November/365076.html
Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
---
 lib/netdev-afxdp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ilya Maximets Dec. 23, 2019, 8:21 a.m. UTC | #1
On Sat, Dec 21, 2019 at 2:03 AM Yifeng Sun <pkusunyifeng@gmail.com> wrote:
>
> William reported that there is iperf TCP issue between two afxdp ports:
>
> [  3] local 10.1.1.2 port 40384 connected with 10.1.1.1 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0- 1.0 sec  17.0 MBytes   143 Mbits/sec
> [  3]  1.0- 2.0 sec  9.62 MBytes  80.7 Mbits/sec
> [  3]  2.0- 3.0 sec  6.75 MBytes  56.6 Mbits/sec
> [  3]  3.0- 4.0 sec  11.0 MBytes  92.3 Mbits/sec
> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> [  3] 10.0-11.0 sec  0.00 Bytes  0.00 bits/sec
>
> The reason is, currently, netdev-afxdp's batch size is 32 while kernel's
> xdp batch size is only 16. This can result in exhausting of sock wmem if
> netdev-afxdp keeps sending large number of packets. Later on, when ARP
> expires at one side of TCP connection, ARP packets can be delayed or
> even dropped because sock wmen is already full.
>
> This patch fixes this issue by reducing netdev-afxdp's batch size so
> as to match kernel's xdp batch size. Now iperf TCP works correctly.

I didn't look at the veth driver implementation yet, but if your issue
analysis is correct, driver doesn't process all the packets we're
trying to send.  In this case changing the batch size should not fully
fix the issue since we're still could push packets fast enough to fill
queues that will not be drained by kernel or some packets could stuck
inside queues if we'll not send other packets.   This sounds more like
a missing napi rescheduling or incorrect work with need-wakeup feature
inside the veth driver.  I could look at it on the next week
(travelling now).

Anyway, we should not ultimately change batch size because it will
affect performance on all modes and all drivers.  Since your
workaround fixes the issue at least partially, same multi-kick
workaround for this case as we have for generic mode might work here
too.  Could you, please, check?

Best regards, Ilya Maximets.
Yifeng Sun Dec. 23, 2019, 7:03 p.m. UTC | #2
Thanks Ilya. This patch is actually a quick fix.
Sure, I will check generic mode later.

Thanks,
Yifeng

On Mon, Dec 23, 2019 at 12:22 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On Sat, Dec 21, 2019 at 2:03 AM Yifeng Sun <pkusunyifeng@gmail.com> wrote:
> >
> > William reported that there is iperf TCP issue between two afxdp ports:
> >
> > [  3] local 10.1.1.2 port 40384 connected with 10.1.1.1 port 5001
> > [ ID] Interval       Transfer     Bandwidth
> > [  3]  0.0- 1.0 sec  17.0 MBytes   143 Mbits/sec
> > [  3]  1.0- 2.0 sec  9.62 MBytes  80.7 Mbits/sec
> > [  3]  2.0- 3.0 sec  6.75 MBytes  56.6 Mbits/sec
> > [  3]  3.0- 4.0 sec  11.0 MBytes  92.3 Mbits/sec
> > [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3] 10.0-11.0 sec  0.00 Bytes  0.00 bits/sec
> >
> > The reason is, currently, netdev-afxdp's batch size is 32 while kernel's
> > xdp batch size is only 16. This can result in exhausting of sock wmem if
> > netdev-afxdp keeps sending large number of packets. Later on, when ARP
> > expires at one side of TCP connection, ARP packets can be delayed or
> > even dropped because sock wmen is already full.
> >
> > This patch fixes this issue by reducing netdev-afxdp's batch size so
> > as to match kernel's xdp batch size. Now iperf TCP works correctly.
>
> I didn't look at the veth driver implementation yet, but if your issue
> analysis is correct, driver doesn't process all the packets we're
> trying to send.  In this case changing the batch size should not fully
> fix the issue since we're still could push packets fast enough to fill
> queues that will not be drained by kernel or some packets could stuck
> inside queues if we'll not send other packets.   This sounds more like
> a missing napi rescheduling or incorrect work with need-wakeup feature
> inside the veth driver.  I could look at it on the next week
> (travelling now).
>
> Anyway, we should not ultimately change batch size because it will
> affect performance on all modes and all drivers.  Since your
> workaround fixes the issue at least partially, same multi-kick
> workaround for this case as we have for generic mode might work here
> too.  Could you, please, check?
>
> Best regards, Ilya Maximets.

Patch
diff mbox series

diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index 58365ed483e3..38bbbeb055cc 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -82,7 +82,7 @@  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
  * enough for most corner cases.
  */
 #define NUM_FRAMES          (4 * (PROD_NUM_DESCS + CONS_NUM_DESCS))
-#define BATCH_SIZE          NETDEV_MAX_BURST
+#define BATCH_SIZE          16
 
 BUILD_ASSERT_DECL(IS_POW2(NUM_FRAMES));
 BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);