Message ID | 1576890223-24011-1-git-send-email-pkusunyifeng@gmail.com |
---|---|
State | Rejected |
Headers | show |
Series | [ovs-dev] afxdp: Reduce afxdp's batch size to match kernel's xdp batch size | expand |
On Sat, Dec 21, 2019 at 2:03 AM Yifeng Sun <pkusunyifeng@gmail.com> wrote: > > William reported that there is iperf TCP issue between two afxdp ports: > > [ 3] local 10.1.1.2 port 40384 connected with 10.1.1.1 port 5001 > [ ID] Interval Transfer Bandwidth > [ 3] 0.0- 1.0 sec 17.0 MBytes 143 Mbits/sec > [ 3] 1.0- 2.0 sec 9.62 MBytes 80.7 Mbits/sec > [ 3] 2.0- 3.0 sec 6.75 MBytes 56.6 Mbits/sec > [ 3] 3.0- 4.0 sec 11.0 MBytes 92.3 Mbits/sec > [ 3] 5.0- 6.0 sec 0.00 Bytes 0.00 bits/sec > [ 3] 6.0- 7.0 sec 0.00 Bytes 0.00 bits/sec > [ 3] 7.0- 8.0 sec 0.00 Bytes 0.00 bits/sec > [ 3] 8.0- 9.0 sec 0.00 Bytes 0.00 bits/sec > [ 3] 9.0-10.0 sec 0.00 Bytes 0.00 bits/sec > [ 3] 10.0-11.0 sec 0.00 Bytes 0.00 bits/sec > > The reason is, currently, netdev-afxdp's batch size is 32 while kernel's > xdp batch size is only 16. This can result in exhausting of sock wmem if > netdev-afxdp keeps sending large number of packets. Later on, when ARP > expires at one side of TCP connection, ARP packets can be delayed or > even dropped because sock wmen is already full. > > This patch fixes this issue by reducing netdev-afxdp's batch size so > as to match kernel's xdp batch size. Now iperf TCP works correctly. I didn't look at the veth driver implementation yet, but if your issue analysis is correct, driver doesn't process all the packets we're trying to send. In this case changing the batch size should not fully fix the issue since we're still could push packets fast enough to fill queues that will not be drained by kernel or some packets could stuck inside queues if we'll not send other packets. This sounds more like a missing napi rescheduling or incorrect work with need-wakeup feature inside the veth driver. I could look at it on the next week (travelling now). Anyway, we should not ultimately change batch size because it will affect performance on all modes and all drivers. Since your workaround fixes the issue at least partially, same multi-kick workaround for this case as we have for generic mode might work here too. Could you, please, check? Best regards, Ilya Maximets.
Thanks Ilya. This patch is actually a quick fix. Sure, I will check generic mode later. Thanks, Yifeng On Mon, Dec 23, 2019 at 12:22 AM Ilya Maximets <i.maximets@ovn.org> wrote: > > On Sat, Dec 21, 2019 at 2:03 AM Yifeng Sun <pkusunyifeng@gmail.com> wrote: > > > > William reported that there is iperf TCP issue between two afxdp ports: > > > > [ 3] local 10.1.1.2 port 40384 connected with 10.1.1.1 port 5001 > > [ ID] Interval Transfer Bandwidth > > [ 3] 0.0- 1.0 sec 17.0 MBytes 143 Mbits/sec > > [ 3] 1.0- 2.0 sec 9.62 MBytes 80.7 Mbits/sec > > [ 3] 2.0- 3.0 sec 6.75 MBytes 56.6 Mbits/sec > > [ 3] 3.0- 4.0 sec 11.0 MBytes 92.3 Mbits/sec > > [ 3] 5.0- 6.0 sec 0.00 Bytes 0.00 bits/sec > > [ 3] 6.0- 7.0 sec 0.00 Bytes 0.00 bits/sec > > [ 3] 7.0- 8.0 sec 0.00 Bytes 0.00 bits/sec > > [ 3] 8.0- 9.0 sec 0.00 Bytes 0.00 bits/sec > > [ 3] 9.0-10.0 sec 0.00 Bytes 0.00 bits/sec > > [ 3] 10.0-11.0 sec 0.00 Bytes 0.00 bits/sec > > > > The reason is, currently, netdev-afxdp's batch size is 32 while kernel's > > xdp batch size is only 16. This can result in exhausting of sock wmem if > > netdev-afxdp keeps sending large number of packets. Later on, when ARP > > expires at one side of TCP connection, ARP packets can be delayed or > > even dropped because sock wmen is already full. > > > > This patch fixes this issue by reducing netdev-afxdp's batch size so > > as to match kernel's xdp batch size. Now iperf TCP works correctly. > > I didn't look at the veth driver implementation yet, but if your issue > analysis is correct, driver doesn't process all the packets we're > trying to send. In this case changing the batch size should not fully > fix the issue since we're still could push packets fast enough to fill > queues that will not be drained by kernel or some packets could stuck > inside queues if we'll not send other packets. This sounds more like > a missing napi rescheduling or incorrect work with need-wakeup feature > inside the veth driver. I could look at it on the next week > (travelling now). > > Anyway, we should not ultimately change batch size because it will > affect performance on all modes and all drivers. Since your > workaround fixes the issue at least partially, same multi-kick > workaround for this case as we have for generic mode might work here > too. Could you, please, check? > > Best regards, Ilya Maximets.
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index 58365ed483e3..38bbbeb055cc 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -82,7 +82,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); * enough for most corner cases. */ #define NUM_FRAMES (4 * (PROD_NUM_DESCS + CONS_NUM_DESCS)) -#define BATCH_SIZE NETDEV_MAX_BURST +#define BATCH_SIZE 16 BUILD_ASSERT_DECL(IS_POW2(NUM_FRAMES)); BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
William reported that there is iperf TCP issue between two afxdp ports: [ 3] local 10.1.1.2 port 40384 connected with 10.1.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 17.0 MBytes 143 Mbits/sec [ 3] 1.0- 2.0 sec 9.62 MBytes 80.7 Mbits/sec [ 3] 2.0- 3.0 sec 6.75 MBytes 56.6 Mbits/sec [ 3] 3.0- 4.0 sec 11.0 MBytes 92.3 Mbits/sec [ 3] 5.0- 6.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 6.0- 7.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 7.0- 8.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 8.0- 9.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 9.0-10.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 10.0-11.0 sec 0.00 Bytes 0.00 bits/sec The reason is, currently, netdev-afxdp's batch size is 32 while kernel's xdp batch size is only 16. This can result in exhausting of sock wmem if netdev-afxdp keeps sending large number of packets. Later on, when ARP expires at one side of TCP connection, ARP packets can be delayed or even dropped because sock wmen is already full. This patch fixes this issue by reducing netdev-afxdp's batch size so as to match kernel's xdp batch size. Now iperf TCP works correctly. [ 3] local 10.1.1.2 port 57770 connected with 10.1.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 262 MBytes 2.20 Gbits/sec [ 3] 1.0- 2.0 sec 299 MBytes 2.51 Gbits/sec [ 3] 2.0- 3.0 sec 271 MBytes 2.27 Gbits/sec [ 3] 3.0- 4.0 sec 247 MBytes 2.07 Gbits/sec [ 3] 4.0- 5.0 sec 290 MBytes 2.43 Gbits/sec [ 3] 5.0- 6.0 sec 292 MBytes 2.45 Gbits/sec [ 3] 6.0- 7.0 sec 223 MBytes 1.87 Gbits/sec [ 3] 7.0- 8.0 sec 243 MBytes 2.04 Gbits/sec [ 3] 8.0- 9.0 sec 234 MBytes 1.97 Gbits/sec [ 3] 9.0-10.0 sec 238 MBytes 2.00 Gbits/sec Reported-by: William Tu <u9012063@gmail.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-November/365076.html Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> --- lib/netdev-afxdp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)