mbox series

[v8,bpf-next,00/10] veth: Driver XDP

Message ID 1533283098-2397-1-git-send-email-makita.toshiaki@lab.ntt.co.jp
Headers show
Series veth: Driver XDP | expand

Message

Toshiaki Makita Aug. 3, 2018, 7:58 a.m. UTC
This patch set introduces driver XDP for veth.
Basically this is used in conjunction with redirect action of another XDP
program.

  NIC -----------> veth===veth
 (XDP) (redirect)        (XDP)

In this case xdp_frame can be forwarded to the peer veth without
modification, so we can expect far better performance than generic XDP.


Envisioned use-cases
--------------------

* Container managed XDP program
Container host redirects frames to containers by XDP redirect action, and
privileged containers can deploy their own XDP programs.

* XDP program cascading
Two or more XDP programs can be called for each packet by redirecting
xdp frames to veth.

* Internal interface for an XDP bridge
When using XDP redirection to create a virtual bridge, veth can be used
to create an internal interface for the bridge.


Implementation
--------------

This changeset is making use of NAPI to implement ndo_xdp_xmit and
XDP_TX/REDIRECT. This is mainly because XDP heavily relies on NAPI
context.
 - patch 1: Export a function needed for veth XDP.
 - patch 2-3: Basic implementation of veth XDP.
 - patch 4-6: Add ndo_xdp_xmit.
 - patch 7-9: Add XDP_TX and XDP_REDIRECT.
 - patch 10: Performance optimization for multi-queue env.


Tests and performance numbers
-----------------------------

Tested with a simple XDP program which only redirects packets between
NIC and veth. I used i40e 25G NIC (XXV710) for the physical NIC. The
server has 20 of Xeon Silver 2.20 GHz cores.

  pktgen --(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP)

The rightmost veth loads XDP progs and just does DROP or TX. The number
of packets is measured in the XDP progs. The leftmost pktgen sends
packets at 37.1 Mpps (almost 25G wire speed).

veth XDP action    Flows    Mpps
================================
DROP                   1    10.6
DROP                   2    21.2
DROP                 100    36.0
TX                     1     5.0
TX                     2    10.0
TX                   100    31.0

I also measured netperf TCP_STREAM but was not so great performance due
to lack of tx/rx checksum offload and TSO, etc.

  netperf <--(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP PASS)

Direction         Flows   Gbps
==============================
external->veth        1   20.8
external->veth        2   23.5
external->veth      100   23.6
veth->external        1    9.0
veth->external        2   17.8
veth->external      100   22.9

Also tested doing ifup/down or load/unload a XDP program repeatedly
during processing XDP packets in order to check if enabling/disabling
NAPI is working as expected, and found no problems.

v8:
- Don't use xdp_frame pointer address to calculate skb->head, headroom,
  and xdp_buff.data_hard_start.

v7:
- Introduce xdp_scrub_frame() to clear kernel pointers in xdp_frame and
  use it instead of memset().

v6:
- Check skb->len only if reallocation is needed.
- Add __GFP_NOWARN to alloc_page() since it can be triggered by external
  events.
- Fix sparse warning around EXPORT_SYMBOL.

v5:
- Fix broken SOBs.

v4:
- Don't adjust MTU automatically.
- Skip peer IFF_UP check on .ndo_xdp_xmit() because it is unnecessary.
  Add comments to explain that.
- Use redirect_info instead of xdp_mem_info for storing no_direct flag
  to avoid per packet copy cost.

v3:
- Drop skb bulk xmit patch since it makes little performance
  difference. The hotspot in TCP skb xmit at this point is checksum
  computation in skb_segment and packet copy on XDP_REDIRECT due to
  cloned/nonlinear skb.
- Fix race on closing device.
- Add extack messages in ndo_bpf.

v2:
- Squash NAPI patch with "Add driver XDP" patch.
- Remove conversion from xdp_frame to skb when NAPI is not enabled.
- Introduce per-queue XDP ring (patch 8).
- Introduce bulk skb xmit when XDP is enabled on the peer (patch 9).

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>

Toshiaki Makita (10):
  net: Export skb_headers_offset_update
  veth: Add driver XDP
  veth: Avoid drops by oversized packets when XDP is enabled
  xdp: Helper function to clear kernel pointers in xdp_frame
  veth: Handle xdp_frames in xdp napi ring
  veth: Add ndo_xdp_xmit
  bpf: Make redirect_info accessible from modules
  xdp: Helpers for disabling napi_direct of xdp_return_frame
  veth: Add XDP TX and REDIRECT
  veth: Support per queue XDP ring

 drivers/net/veth.c     | 750 ++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/filter.h |  35 +++
 include/linux/skbuff.h |   1 +
 include/net/xdp.h      |   7 +
 net/core/filter.c      |  29 +-
 net/core/skbuff.c      |   3 +-
 net/core/xdp.c         |   6 +-
 7 files changed, 801 insertions(+), 30 deletions(-)

Comments

Jesper Dangaard Brouer Aug. 3, 2018, 9:45 a.m. UTC | #1
On Fri,  3 Aug 2018 16:58:08 +0900
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:

> This patch set introduces driver XDP for veth.
> Basically this is used in conjunction with redirect action of another XDP
> program.
> 
>   NIC -----------> veth===veth
>  (XDP) (redirect)        (XDP)
> 

I'm was playing with V7 on my testlab yesterday and I noticed one
fundamental issue.  You are not updating the "ifconfig" stats counters,
when in XDP mode.  This makes receive or send via XDP invisible to
sysadm/management tools.  This for-sure is going to cause confusion...

I took a closer look at other driver. The ixgbe driver is doing the
right thing.  Driver i40e have a bug, where RX/TX stats are swapped
getting (strange!).  The mlx5 driver is not updating the regular RX/TX
counters, but A LOT of other ethtool stats counters (which are the ones
I usually monitor when testing).

So, given other drivers also didn't get this right, we need to have a
discussion outside your/this patchset.  Thus, I don't want to
stop/stall this patchset, but this is something we need to fixup in a
followup patchset to other drivers as well.

Thus, I'm acking the patchset, but I request that we do a joint effort
of fixing this as followup patches.

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Toshiaki Makita Aug. 3, 2018, 11:14 a.m. UTC | #2
On 18/08/03 (金) 18:45, Jesper Dangaard Brouer wrote:
> On Fri,  3 Aug 2018 16:58:08 +0900
> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
> 
>> This patch set introduces driver XDP for veth.
>> Basically this is used in conjunction with redirect action of another XDP
>> program.
>>
>>    NIC -----------> veth===veth
>>   (XDP) (redirect)        (XDP)
>>
> 
> I'm was playing with V7 on my testlab yesterday and I noticed one
> fundamental issue.  You are not updating the "ifconfig" stats counters,
> when in XDP mode.  This makes receive or send via XDP invisible to
> sysadm/management tools.  This for-sure is going to cause confusion...

Yes, I did not update stats on ndo_xdp_xmit. My intention was that I'm 
going to make another patch set to make stats nice after this, but did 
not state that in the cover letter. Sorry about that.

> I took a closer look at other driver. The ixgbe driver is doing the
> right thing.  Driver i40e have a bug, where RX/TX stats are swapped
> getting (strange!).  The mlx5 driver is not updating the regular RX/TX
> counters, but A LOT of other ethtool stats counters (which are the ones
> I usually monitor when testing).
> 
> So, given other drivers also didn't get this right, we need to have a
> discussion outside your/this patchset.  Thus, I don't want to
> stop/stall this patchset, but this is something we need to fixup in a
> followup patchset to other drivers as well.

One of the reason why I did not include the stats patches in this series 
is that as you say basically stats in many drivers do not look correct 
and I thought the correctness is not strictly required for now.
In fact I recently fixed virtio_net stats which only updated packets 
counter but not bytes counter on XDP_DROP.

Another reason is that it will hurt the performance without more 
aggressive stats structure change. Drop counter is currently atomic so 
it would cause heavy cache contention on multiqueue env. The plan is to 
make this per-cpu or per-queue first. Also I want to introduce per-queue 
stats for ethtool, so the change would be relatively big and probably 
not fit in this series all together.

> Thus, I'm acking the patchset, but I request that we do a joint effort
> of fixing this as followup patches.

Sure, at least for veth I'm going to make a followup patches.

> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Thank you for your thorough review!

Toshiaki Makita
Björn Töpel Aug. 6, 2018, 8:22 a.m. UTC | #3
On 2018-08-03 11:45, Jesper Dangaard Brouer wrote:
> On Fri,  3 Aug 2018 16:58:08 +0900
> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
> 
>> This patch set introduces driver XDP for veth.
>> Basically this is used in conjunction with redirect action of another XDP
>> program.
>>
>>    NIC -----------> veth===veth
>>   (XDP) (redirect)        (XDP)
>>
> 
> I'm was playing with V7 on my testlab yesterday and I noticed one
> fundamental issue.  You are not updating the "ifconfig" stats counters,
> when in XDP mode.  This makes receive or send via XDP invisible to
> sysadm/management tools.  This for-sure is going to cause confusion...
> 
> I took a closer look at other driver. The ixgbe driver is doing the
> right thing.  Driver i40e have a bug, where RX/TX stats are swapped
> getting (strange!).  

Indeed! Thanks for finding/reporting this! I'll have look!


Björn

> The mlx5 driver is not updating the regular RX/TX
> counters, but A LOT of other ethtool stats counters (which are the ones
> I usually monitor when testing).
> 
> So, given other drivers also didn't get this right, we need to have a
> discussion outside your/this patchset.  Thus, I don't want to
> stop/stall this patchset, but this is something we need to fixup in a
> followup patchset to other drivers as well.
> 
> Thus, I'm acking the patchset, but I request that we do a joint effort
> of fixing this as followup patches.
> 
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
>
Daniel Borkmann Aug. 10, 2018, 2:24 p.m. UTC | #4
On 08/03/2018 09:58 AM, Toshiaki Makita wrote:
> This patch set introduces driver XDP for veth.
> Basically this is used in conjunction with redirect action of another XDP
> program.
> 
>   NIC -----------> veth===veth
>  (XDP) (redirect)        (XDP)
> 
> In this case xdp_frame can be forwarded to the peer veth without
> modification, so we can expect far better performance than generic XDP.
> 
> 
> Envisioned use-cases
> --------------------
> 
> * Container managed XDP program
> Container host redirects frames to containers by XDP redirect action, and
> privileged containers can deploy their own XDP programs.
> 
> * XDP program cascading
> Two or more XDP programs can be called for each packet by redirecting
> xdp frames to veth.
> 
> * Internal interface for an XDP bridge
> When using XDP redirection to create a virtual bridge, veth can be used
> to create an internal interface for the bridge.
> 
> 
> Implementation
> --------------
> 
> This changeset is making use of NAPI to implement ndo_xdp_xmit and
> XDP_TX/REDIRECT. This is mainly because XDP heavily relies on NAPI
> context.
>  - patch 1: Export a function needed for veth XDP.
>  - patch 2-3: Basic implementation of veth XDP.
>  - patch 4-6: Add ndo_xdp_xmit.
>  - patch 7-9: Add XDP_TX and XDP_REDIRECT.
>  - patch 10: Performance optimization for multi-queue env.
> 
> 
> Tests and performance numbers
> -----------------------------
> 
> Tested with a simple XDP program which only redirects packets between
> NIC and veth. I used i40e 25G NIC (XXV710) for the physical NIC. The
> server has 20 of Xeon Silver 2.20 GHz cores.
> 
>   pktgen --(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP)
> 
> The rightmost veth loads XDP progs and just does DROP or TX. The number
> of packets is measured in the XDP progs. The leftmost pktgen sends
> packets at 37.1 Mpps (almost 25G wire speed).
> 
> veth XDP action    Flows    Mpps
> ================================
> DROP                   1    10.6
> DROP                   2    21.2
> DROP                 100    36.0
> TX                     1     5.0
> TX                     2    10.0
> TX                   100    31.0
> 
> I also measured netperf TCP_STREAM but was not so great performance due
> to lack of tx/rx checksum offload and TSO, etc.
> 
>   netperf <--(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP PASS)
> 
> Direction         Flows   Gbps
> ==============================
> external->veth        1   20.8
> external->veth        2   23.5
> external->veth      100   23.6
> veth->external        1    9.0
> veth->external        2   17.8
> veth->external      100   22.9
> 
> Also tested doing ifup/down or load/unload a XDP program repeatedly
> during processing XDP packets in order to check if enabling/disabling
> NAPI is working as expected, and found no problems.
> 
> v8:
> - Don't use xdp_frame pointer address to calculate skb->head, headroom,
>   and xdp_buff.data_hard_start.
> 
> v7:
> - Introduce xdp_scrub_frame() to clear kernel pointers in xdp_frame and
>   use it instead of memset().
> 
> v6:
> - Check skb->len only if reallocation is needed.
> - Add __GFP_NOWARN to alloc_page() since it can be triggered by external
>   events.
> - Fix sparse warning around EXPORT_SYMBOL.
> 
> v5:
> - Fix broken SOBs.
> 
> v4:
> - Don't adjust MTU automatically.
> - Skip peer IFF_UP check on .ndo_xdp_xmit() because it is unnecessary.
>   Add comments to explain that.
> - Use redirect_info instead of xdp_mem_info for storing no_direct flag
>   to avoid per packet copy cost.
> 
> v3:
> - Drop skb bulk xmit patch since it makes little performance
>   difference. The hotspot in TCP skb xmit at this point is checksum
>   computation in skb_segment and packet copy on XDP_REDIRECT due to
>   cloned/nonlinear skb.
> - Fix race on closing device.
> - Add extack messages in ndo_bpf.
> 
> v2:
> - Squash NAPI patch with "Add driver XDP" patch.
> - Remove conversion from xdp_frame to skb when NAPI is not enabled.
> - Introduce per-queue XDP ring (patch 8).
> - Introduce bulk skb xmit when XDP is enabled on the peer (patch 9).
> 
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> 
> Toshiaki Makita (10):
>   net: Export skb_headers_offset_update
>   veth: Add driver XDP
>   veth: Avoid drops by oversized packets when XDP is enabled
>   xdp: Helper function to clear kernel pointers in xdp_frame
>   veth: Handle xdp_frames in xdp napi ring
>   veth: Add ndo_xdp_xmit
>   bpf: Make redirect_info accessible from modules
>   xdp: Helpers for disabling napi_direct of xdp_return_frame
>   veth: Add XDP TX and REDIRECT
>   veth: Support per queue XDP ring
> 
>  drivers/net/veth.c     | 750 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/filter.h |  35 +++
>  include/linux/skbuff.h |   1 +
>  include/net/xdp.h      |   7 +
>  net/core/filter.c      |  29 +-
>  net/core/skbuff.c      |   3 +-
>  net/core/xdp.c         |   6 +-
>  7 files changed, 801 insertions(+), 30 deletions(-)
> 

Applied to bpf-next, thanks Toshiaki!