mbox series

[bpf-next,V4,00/14] xdp: new XDP rx-queue info concept

Message ID 151497504273.18176.10177133999720101758.stgit@firesoul
Headers show
Series xdp: new XDP rx-queue info concept | expand

Message

Jesper Dangaard Brouer Jan. 3, 2018, 10:25 a.m. UTC
V4:
* Added reviewers/acks to patches
* Fix patch desc in i40e that got out-of-sync with code
* Add SPDX license headers for the two new files added in patch 14

V3:
* Fixed bug in virtio_net driver
* Removed export of xdp_rxq_info_init()

V2:
* Changed API exposed to drivers
  - Removed invocation of "init" in drivers, and only call "reg"
    (Suggested by Saeed)
  - Allow "reg" to fail and handle this in drivers
    (Suggested by David Ahern)
* Removed the SINKQ qtype, instead allow to register as "unused"
* Also fixed some drivers during testing on actual HW (noted in patches)

There is a need for XDP to know more about the RX-queue a given XDP
frames have arrived on.  For both the XDP bpf-prog and kernel side.

Instead of extending struct xdp_buff each time new info is needed,
this patchset takes a different approach.  Struct xdp_buff is only
extended with a pointer to a struct xdp_rxq_info (allowing for easier
extending this later).  This xdp_rxq_info contains information related
to how the driver have setup the individual RX-queue's.  This is
read-mostly information, and all xdp_buff frames (in drivers
napi_poll) point to the same xdp_rxq_info (per RX-queue).

We stress this data/cache-line is for read-mostly info.  This is NOT
for dynamic per packet info, use the data_meta for such use-cases.

This patchset start out small, and only expose ingress_ifindex and the
RX-queue index to the XDP/BPF program. Access to tangible info like
the ingress ifindex and RX queue index, is fairly easy to comprehent.
The other future use-cases could allow XDP frames to be recycled back
to the originating device driver, by providing info on RX device and
queue number.

As XDP doesn't have driver feature flags, and eBPF code due to
bpf-tail-calls cannot determine that XDP driver invoke it, this
patchset have to update every driver that support XDP.

For driver developers (review individual driver patches!):

The xdp_rxq_info is tied to the drivers RX-ring(s). Whenever a RX-ring
modification require (temporary) stopping RX frames, then the
xdp_rxq_info should (likely) also be unregistred and re-registered,
especially if reallocating the pages in the ring. Make sure ethtool
set_channels does the right thing. When replacing XDP prog, if and
only if RX-ring need to be changed, then also re-register the
xdp_rxq_info.

I'm Cc'ing the individual driver patches to the registered maintainers.

Testing:

I've only tested the NIC drivers I have hardware for.  The general
test procedure is to (DUT = Device Under Test):
 (1) run pktgen script pktgen_sample04_many_flows.sh       (against DUT)
 (2) run samples/bpf program xdp_rxq_info --dev $DEV       (on DUT)
 (3) runtime modify number of NIC queues via ethtool -L    (on DUT)
 (4) runtime modify number of NIC ring-size via ethtool -G (on DUT)

Patch based on git tree bpf-next (at commit fb982666e380c1632a):
 https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/

---

Jesper Dangaard Brouer (14):
      xdp: base API for new XDP rx-queue info concept
      xdp/mlx5: setup xdp_rxq_info
      i40e: setup xdp_rxq_info
      ixgbe: setup xdp_rxq_info
      xdp/qede: setup xdp_rxq_info and intro xdp_rxq_info_is_reg
      mlx4: setup xdp_rxq_info
      bnxt_en: setup xdp_rxq_info
      nfp: setup xdp_rxq_info
      thunderx: setup xdp_rxq_info
      tun: setup xdp_rxq_info
      virtio_net: setup xdp_rxq_info
      xdp: generic XDP handling of xdp_rxq_info
      bpf: finally expose xdp_rxq_info to XDP bpf-programs
      samples/bpf: program demonstrating access to xdp_rxq_info


 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   10 
 drivers/net/ethernet/broadcom/bnxt/bnxt.h          |    2 
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c      |    1 
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   11 
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c |    4 
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h |    2 
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |    2 
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |   18 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |    3 
 drivers/net/ethernet/intel/ixgbe/ixgbe.h           |    2 
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c   |    4 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   10 
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c     |    3 
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |   13 
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h       |    4 
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |    4 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |    9 
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |    1 
 drivers/net/ethernet/netronome/nfp/nfp_net.h       |    5 
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |   10 
 drivers/net/ethernet/qlogic/qede/qede.h            |    2 
 drivers/net/ethernet/qlogic/qede/qede_fp.c         |    1 
 drivers/net/ethernet/qlogic/qede/qede_main.c       |   10 
 drivers/net/tun.c                                  |   24 +
 drivers/net/virtio_net.c                           |   14 -
 include/linux/filter.h                             |    2 
 include/linux/netdevice.h                          |    2 
 include/net/xdp.h                                  |   48 ++
 include/uapi/linux/bpf.h                           |    3 
 net/core/Makefile                                  |    2 
 net/core/dev.c                                     |   69 ++-
 net/core/filter.c                                  |   19 +
 net/core/xdp.c                                     |   73 +++
 samples/bpf/Makefile                               |    4 
 samples/bpf/xdp_rxq_info_kern.c                    |   96 ++++
 samples/bpf/xdp_rxq_info_user.c                    |  531 ++++++++++++++++++++
 36 files changed, 990 insertions(+), 28 deletions(-)
 create mode 100644 include/net/xdp.h
 create mode 100644 net/core/xdp.c
 create mode 100644 samples/bpf/xdp_rxq_info_kern.c
 create mode 100644 samples/bpf/xdp_rxq_info_user.c

--

Comments

Alexei Starovoitov Jan. 5, 2018, 11:59 p.m. UTC | #1
On Wed, Jan 03, 2018 at 11:25:08AM +0100, Jesper Dangaard Brouer wrote:
> V4:
> * Added reviewers/acks to patches
> * Fix patch desc in i40e that got out-of-sync with code
> * Add SPDX license headers for the two new files added in patch 14
> 
> V3:
> * Fixed bug in virtio_net driver
> * Removed export of xdp_rxq_info_init()
> 
> V2:
> * Changed API exposed to drivers
>   - Removed invocation of "init" in drivers, and only call "reg"
>     (Suggested by Saeed)
>   - Allow "reg" to fail and handle this in drivers
>     (Suggested by David Ahern)
> * Removed the SINKQ qtype, instead allow to register as "unused"
> * Also fixed some drivers during testing on actual HW (noted in patches)
> 
> There is a need for XDP to know more about the RX-queue a given XDP
> frames have arrived on.  For both the XDP bpf-prog and kernel side.
> 
> Instead of extending struct xdp_buff each time new info is needed,
> this patchset takes a different approach.  Struct xdp_buff is only
> extended with a pointer to a struct xdp_rxq_info (allowing for easier
> extending this later).  This xdp_rxq_info contains information related
> to how the driver have setup the individual RX-queue's.  This is
> read-mostly information, and all xdp_buff frames (in drivers
> napi_poll) point to the same xdp_rxq_info (per RX-queue).
> 
> We stress this data/cache-line is for read-mostly info.  This is NOT
> for dynamic per packet info, use the data_meta for such use-cases.
> 
> This patchset start out small, and only expose ingress_ifindex and the
> RX-queue index to the XDP/BPF program. Access to tangible info like
> the ingress ifindex and RX queue index, is fairly easy to comprehent.
> The other future use-cases could allow XDP frames to be recycled back
> to the originating device driver, by providing info on RX device and
> queue number.
> 
> As XDP doesn't have driver feature flags, and eBPF code due to
> bpf-tail-calls cannot determine that XDP driver invoke it, this
> patchset have to update every driver that support XDP.
> 
> For driver developers (review individual driver patches!):
> 
> The xdp_rxq_info is tied to the drivers RX-ring(s). Whenever a RX-ring
> modification require (temporary) stopping RX frames, then the
> xdp_rxq_info should (likely) also be unregistred and re-registered,
> especially if reallocating the pages in the ring. Make sure ethtool
> set_channels does the right thing. When replacing XDP prog, if and
> only if RX-ring need to be changed, then also re-register the
> xdp_rxq_info.
> 
> I'm Cc'ing the individual driver patches to the registered maintainers.
> 
> Testing:
> 
> I've only tested the NIC drivers I have hardware for.  The general
> test procedure is to (DUT = Device Under Test):
>  (1) run pktgen script pktgen_sample04_many_flows.sh       (against DUT)
>  (2) run samples/bpf program xdp_rxq_info --dev $DEV       (on DUT)
>  (3) runtime modify number of NIC queues via ethtool -L    (on DUT)
>  (4) runtime modify number of NIC ring-size via ethtool -G (on DUT)
> 
> Patch based on git tree bpf-next (at commit fb982666e380c1632a):
>  https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/

Applied, thank you Jesper.

I think Michael's suggested micro optimization for patch 7 can
be done as a follow up.