Message ID | 1533283098-2397-1-git-send-email-makita.toshiaki@lab.ntt.co.jp |
---|---|
Headers | show |
Series | veth: Driver XDP | expand |
On Fri, 3 Aug 2018 16:58:08 +0900 Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: > This patch set introduces driver XDP for veth. > Basically this is used in conjunction with redirect action of another XDP > program. > > NIC -----------> veth===veth > (XDP) (redirect) (XDP) > I'm was playing with V7 on my testlab yesterday and I noticed one fundamental issue. You are not updating the "ifconfig" stats counters, when in XDP mode. This makes receive or send via XDP invisible to sysadm/management tools. This for-sure is going to cause confusion... I took a closer look at other driver. The ixgbe driver is doing the right thing. Driver i40e have a bug, where RX/TX stats are swapped getting (strange!). The mlx5 driver is not updating the regular RX/TX counters, but A LOT of other ethtool stats counters (which are the ones I usually monitor when testing). So, given other drivers also didn't get this right, we need to have a discussion outside your/this patchset. Thus, I don't want to stop/stall this patchset, but this is something we need to fixup in a followup patchset to other drivers as well. Thus, I'm acking the patchset, but I request that we do a joint effort of fixing this as followup patches. Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
On 18/08/03 (金) 18:45, Jesper Dangaard Brouer wrote: > On Fri, 3 Aug 2018 16:58:08 +0900 > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: > >> This patch set introduces driver XDP for veth. >> Basically this is used in conjunction with redirect action of another XDP >> program. >> >> NIC -----------> veth===veth >> (XDP) (redirect) (XDP) >> > > I'm was playing with V7 on my testlab yesterday and I noticed one > fundamental issue. You are not updating the "ifconfig" stats counters, > when in XDP mode. This makes receive or send via XDP invisible to > sysadm/management tools. This for-sure is going to cause confusion... Yes, I did not update stats on ndo_xdp_xmit. My intention was that I'm going to make another patch set to make stats nice after this, but did not state that in the cover letter. Sorry about that. > I took a closer look at other driver. The ixgbe driver is doing the > right thing. Driver i40e have a bug, where RX/TX stats are swapped > getting (strange!). The mlx5 driver is not updating the regular RX/TX > counters, but A LOT of other ethtool stats counters (which are the ones > I usually monitor when testing). > > So, given other drivers also didn't get this right, we need to have a > discussion outside your/this patchset. Thus, I don't want to > stop/stall this patchset, but this is something we need to fixup in a > followup patchset to other drivers as well. One of the reason why I did not include the stats patches in this series is that as you say basically stats in many drivers do not look correct and I thought the correctness is not strictly required for now. In fact I recently fixed virtio_net stats which only updated packets counter but not bytes counter on XDP_DROP. Another reason is that it will hurt the performance without more aggressive stats structure change. Drop counter is currently atomic so it would cause heavy cache contention on multiqueue env. The plan is to make this per-cpu or per-queue first. Also I want to introduce per-queue stats for ethtool, so the change would be relatively big and probably not fit in this series all together. > Thus, I'm acking the patchset, but I request that we do a joint effort > of fixing this as followup patches. Sure, at least for veth I'm going to make a followup patches. > Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Thank you for your thorough review! Toshiaki Makita
On 2018-08-03 11:45, Jesper Dangaard Brouer wrote: > On Fri, 3 Aug 2018 16:58:08 +0900 > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote: > >> This patch set introduces driver XDP for veth. >> Basically this is used in conjunction with redirect action of another XDP >> program. >> >> NIC -----------> veth===veth >> (XDP) (redirect) (XDP) >> > > I'm was playing with V7 on my testlab yesterday and I noticed one > fundamental issue. You are not updating the "ifconfig" stats counters, > when in XDP mode. This makes receive or send via XDP invisible to > sysadm/management tools. This for-sure is going to cause confusion... > > I took a closer look at other driver. The ixgbe driver is doing the > right thing. Driver i40e have a bug, where RX/TX stats are swapped > getting (strange!). Indeed! Thanks for finding/reporting this! I'll have look! Björn > The mlx5 driver is not updating the regular RX/TX > counters, but A LOT of other ethtool stats counters (which are the ones > I usually monitor when testing). > > So, given other drivers also didn't get this right, we need to have a > discussion outside your/this patchset. Thus, I don't want to > stop/stall this patchset, but this is something we need to fixup in a > followup patchset to other drivers as well. > > Thus, I'm acking the patchset, but I request that we do a joint effort > of fixing this as followup patches. > > Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> >
On 08/03/2018 09:58 AM, Toshiaki Makita wrote: > This patch set introduces driver XDP for veth. > Basically this is used in conjunction with redirect action of another XDP > program. > > NIC -----------> veth===veth > (XDP) (redirect) (XDP) > > In this case xdp_frame can be forwarded to the peer veth without > modification, so we can expect far better performance than generic XDP. > > > Envisioned use-cases > -------------------- > > * Container managed XDP program > Container host redirects frames to containers by XDP redirect action, and > privileged containers can deploy their own XDP programs. > > * XDP program cascading > Two or more XDP programs can be called for each packet by redirecting > xdp frames to veth. > > * Internal interface for an XDP bridge > When using XDP redirection to create a virtual bridge, veth can be used > to create an internal interface for the bridge. > > > Implementation > -------------- > > This changeset is making use of NAPI to implement ndo_xdp_xmit and > XDP_TX/REDIRECT. This is mainly because XDP heavily relies on NAPI > context. > - patch 1: Export a function needed for veth XDP. > - patch 2-3: Basic implementation of veth XDP. > - patch 4-6: Add ndo_xdp_xmit. > - patch 7-9: Add XDP_TX and XDP_REDIRECT. > - patch 10: Performance optimization for multi-queue env. > > > Tests and performance numbers > ----------------------------- > > Tested with a simple XDP program which only redirects packets between > NIC and veth. I used i40e 25G NIC (XXV710) for the physical NIC. The > server has 20 of Xeon Silver 2.20 GHz cores. > > pktgen --(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP) > > The rightmost veth loads XDP progs and just does DROP or TX. The number > of packets is measured in the XDP progs. The leftmost pktgen sends > packets at 37.1 Mpps (almost 25G wire speed). > > veth XDP action Flows Mpps > ================================ > DROP 1 10.6 > DROP 2 21.2 > DROP 100 36.0 > TX 1 5.0 > TX 2 10.0 > TX 100 31.0 > > I also measured netperf TCP_STREAM but was not so great performance due > to lack of tx/rx checksum offload and TSO, etc. > > netperf <--(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP PASS) > > Direction Flows Gbps > ============================== > external->veth 1 20.8 > external->veth 2 23.5 > external->veth 100 23.6 > veth->external 1 9.0 > veth->external 2 17.8 > veth->external 100 22.9 > > Also tested doing ifup/down or load/unload a XDP program repeatedly > during processing XDP packets in order to check if enabling/disabling > NAPI is working as expected, and found no problems. > > v8: > - Don't use xdp_frame pointer address to calculate skb->head, headroom, > and xdp_buff.data_hard_start. > > v7: > - Introduce xdp_scrub_frame() to clear kernel pointers in xdp_frame and > use it instead of memset(). > > v6: > - Check skb->len only if reallocation is needed. > - Add __GFP_NOWARN to alloc_page() since it can be triggered by external > events. > - Fix sparse warning around EXPORT_SYMBOL. > > v5: > - Fix broken SOBs. > > v4: > - Don't adjust MTU automatically. > - Skip peer IFF_UP check on .ndo_xdp_xmit() because it is unnecessary. > Add comments to explain that. > - Use redirect_info instead of xdp_mem_info for storing no_direct flag > to avoid per packet copy cost. > > v3: > - Drop skb bulk xmit patch since it makes little performance > difference. The hotspot in TCP skb xmit at this point is checksum > computation in skb_segment and packet copy on XDP_REDIRECT due to > cloned/nonlinear skb. > - Fix race on closing device. > - Add extack messages in ndo_bpf. > > v2: > - Squash NAPI patch with "Add driver XDP" patch. > - Remove conversion from xdp_frame to skb when NAPI is not enabled. > - Introduce per-queue XDP ring (patch 8). > - Introduce bulk skb xmit when XDP is enabled on the peer (patch 9). > > Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> > > Toshiaki Makita (10): > net: Export skb_headers_offset_update > veth: Add driver XDP > veth: Avoid drops by oversized packets when XDP is enabled > xdp: Helper function to clear kernel pointers in xdp_frame > veth: Handle xdp_frames in xdp napi ring > veth: Add ndo_xdp_xmit > bpf: Make redirect_info accessible from modules > xdp: Helpers for disabling napi_direct of xdp_return_frame > veth: Add XDP TX and REDIRECT > veth: Support per queue XDP ring > > drivers/net/veth.c | 750 ++++++++++++++++++++++++++++++++++++++++++++++++- > include/linux/filter.h | 35 +++ > include/linux/skbuff.h | 1 + > include/net/xdp.h | 7 + > net/core/filter.c | 29 +- > net/core/skbuff.c | 3 +- > net/core/xdp.c | 6 +- > 7 files changed, 801 insertions(+), 30 deletions(-) > Applied to bpf-next, thanks Toshiaki!
This patch set introduces driver XDP for veth. Basically this is used in conjunction with redirect action of another XDP program. NIC -----------> veth===veth (XDP) (redirect) (XDP) In this case xdp_frame can be forwarded to the peer veth without modification, so we can expect far better performance than generic XDP. Envisioned use-cases -------------------- * Container managed XDP program Container host redirects frames to containers by XDP redirect action, and privileged containers can deploy their own XDP programs. * XDP program cascading Two or more XDP programs can be called for each packet by redirecting xdp frames to veth. * Internal interface for an XDP bridge When using XDP redirection to create a virtual bridge, veth can be used to create an internal interface for the bridge. Implementation -------------- This changeset is making use of NAPI to implement ndo_xdp_xmit and XDP_TX/REDIRECT. This is mainly because XDP heavily relies on NAPI context. - patch 1: Export a function needed for veth XDP. - patch 2-3: Basic implementation of veth XDP. - patch 4-6: Add ndo_xdp_xmit. - patch 7-9: Add XDP_TX and XDP_REDIRECT. - patch 10: Performance optimization for multi-queue env. Tests and performance numbers ----------------------------- Tested with a simple XDP program which only redirects packets between NIC and veth. I used i40e 25G NIC (XXV710) for the physical NIC. The server has 20 of Xeon Silver 2.20 GHz cores. pktgen --(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP) The rightmost veth loads XDP progs and just does DROP or TX. The number of packets is measured in the XDP progs. The leftmost pktgen sends packets at 37.1 Mpps (almost 25G wire speed). veth XDP action Flows Mpps ================================ DROP 1 10.6 DROP 2 21.2 DROP 100 36.0 TX 1 5.0 TX 2 10.0 TX 100 31.0 I also measured netperf TCP_STREAM but was not so great performance due to lack of tx/rx checksum offload and TSO, etc. netperf <--(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP PASS) Direction Flows Gbps ============================== external->veth 1 20.8 external->veth 2 23.5 external->veth 100 23.6 veth->external 1 9.0 veth->external 2 17.8 veth->external 100 22.9 Also tested doing ifup/down or load/unload a XDP program repeatedly during processing XDP packets in order to check if enabling/disabling NAPI is working as expected, and found no problems. v8: - Don't use xdp_frame pointer address to calculate skb->head, headroom, and xdp_buff.data_hard_start. v7: - Introduce xdp_scrub_frame() to clear kernel pointers in xdp_frame and use it instead of memset(). v6: - Check skb->len only if reallocation is needed. - Add __GFP_NOWARN to alloc_page() since it can be triggered by external events. - Fix sparse warning around EXPORT_SYMBOL. v5: - Fix broken SOBs. v4: - Don't adjust MTU automatically. - Skip peer IFF_UP check on .ndo_xdp_xmit() because it is unnecessary. Add comments to explain that. - Use redirect_info instead of xdp_mem_info for storing no_direct flag to avoid per packet copy cost. v3: - Drop skb bulk xmit patch since it makes little performance difference. The hotspot in TCP skb xmit at this point is checksum computation in skb_segment and packet copy on XDP_REDIRECT due to cloned/nonlinear skb. - Fix race on closing device. - Add extack messages in ndo_bpf. v2: - Squash NAPI patch with "Add driver XDP" patch. - Remove conversion from xdp_frame to skb when NAPI is not enabled. - Introduce per-queue XDP ring (patch 8). - Introduce bulk skb xmit when XDP is enabled on the peer (patch 9). Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Toshiaki Makita (10): net: Export skb_headers_offset_update veth: Add driver XDP veth: Avoid drops by oversized packets when XDP is enabled xdp: Helper function to clear kernel pointers in xdp_frame veth: Handle xdp_frames in xdp napi ring veth: Add ndo_xdp_xmit bpf: Make redirect_info accessible from modules xdp: Helpers for disabling napi_direct of xdp_return_frame veth: Add XDP TX and REDIRECT veth: Support per queue XDP ring drivers/net/veth.c | 750 ++++++++++++++++++++++++++++++++++++++++++++++++- include/linux/filter.h | 35 +++ include/linux/skbuff.h | 1 + include/net/xdp.h | 7 + net/core/filter.c | 29 +- net/core/skbuff.c | 3 +- net/core/xdp.c | 6 +- 7 files changed, 801 insertions(+), 30 deletions(-)