Message ID | 151316396600.14967.5648145904814220715.stgit@firesoul |
---|---|
State | RFC, archived |
Delegated to: | BPF Maintainers |
Headers | show |
Series | xdp: new XDP rx-queue info concept | expand |
On 12/13/17 4:19 AM, Jesper Dangaard Brouer wrote: > + > +void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq) > +{ > + xdp_rxq->reg_state = REG_STATE_UNREGISTRED; > +} > +EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg); > + > +void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq) > +{ > + if (xdp_rxq->reg_state == REG_STATE_REGISTRED) { > + WARN(1, "Missing unregister, handled but fix driver\n"); > + xdp_rxq_info_unreg(xdp_rxq); > + } > + memset(xdp_rxq, 0, sizeof(*xdp_rxq)); > + xdp_rxq->queue_index = U32_MAX; > + xdp_rxq->reg_state = REG_STATE_NEW; > +} > +EXPORT_SYMBOL_GPL(xdp_rxq_info_init); > + > +void xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq) > +{ > + WARN(!xdp_rxq->dev, "Missing net_device from driver"); > + WARN(xdp_rxq->queue_index == U32_MAX, "Miss queue_index from driver"); > + WARN(!(xdp_rxq->reg_state == REG_STATE_NEW),"API violation, miss init"); > + xdp_rxq->reg_state = REG_STATE_REGISTRED; > +} > +EXPORT_SYMBOL_GPL(xdp_rxq_info_reg); > Rather than WARN()'s why not make the _reg and _init functions return an int that indicates an error? For example you don't want to continue if the dev is expected but missing.
On Wed, 13 Dec 2017 19:34:40 -0700 David Ahern <dsahern@gmail.com> wrote: > On 12/13/17 4:19 AM, Jesper Dangaard Brouer wrote: > > + > > +void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq) > > +{ > > + xdp_rxq->reg_state = REG_STATE_UNREGISTRED; > > +} > > +EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg); > > + > > +void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq) > > +{ > > + if (xdp_rxq->reg_state == REG_STATE_REGISTRED) { > > + WARN(1, "Missing unregister, handled but fix driver\n"); > > + xdp_rxq_info_unreg(xdp_rxq); > > + } > > + memset(xdp_rxq, 0, sizeof(*xdp_rxq)); > > + xdp_rxq->queue_index = U32_MAX; > > + xdp_rxq->reg_state = REG_STATE_NEW; > > +} > > +EXPORT_SYMBOL_GPL(xdp_rxq_info_init); > > + > > +void xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq) > > +{ > > + WARN(!xdp_rxq->dev, "Missing net_device from driver"); > > + WARN(xdp_rxq->queue_index == U32_MAX, "Miss queue_index from driver"); > > + WARN(!(xdp_rxq->reg_state == REG_STATE_NEW),"API violation, miss init"); > > + xdp_rxq->reg_state = REG_STATE_REGISTRED; > > +} > > +EXPORT_SYMBOL_GPL(xdp_rxq_info_reg); > > > > Rather than WARN()'s why not make the _reg and _init functions return an > int that indicates an error? For example you don't want to continue if > the dev is expected but missing. Handling return-errors in the drivers complicated the driver code, as it involves unraveling and deallocating other RX-rings etc (that were already allocated) if the reg fails. (Also notice next patch will allow dev == NULL, if right ptype is set). I'm not completely rejecting you idea, as this is a good optimization trick, which is to move validation checks to setup-time, thus allowing less validation checks at runtime. I sort-of actually already did this, as I allow bpf to deref dev without NULL check. I would argue this is good enough, as we will crash in a predictable way, as above WARN will point to which driver violated the API. If people think it is valuable I can change this API to return an err? I guess, it would be more future-proof to do this, as we (Bjørn, Michael, Andy) want to extend this to implement a XDP frame/mem return code-path. And the register call will likely have to allocate some resource that could fail, which need to be handled... If we do this, we might as well (slab) alloc the xdp_rxq_info structure to reduce the bloat in the drivers RX-rings to a single pointer (and a pointer to xdp_rxq_info is what xdp_buff.rxq need).
On 12/18/17 3:55 AM, Jesper Dangaard Brouer wrote: > > Handling return-errors in the drivers complicated the driver code, as it > involves unraveling and deallocating other RX-rings etc (that were > already allocated) if the reg fails. (Also notice next patch will allow > dev == NULL, if right ptype is set). > > I'm not completely rejecting you idea, as this is a good optimization > trick, which is to move validation checks to setup-time, thus allowing > less validation checks at runtime. I sort-of actually already did > this, as I allow bpf to deref dev without NULL check. I would argue > this is good enough, as we will crash in a predictable way, as above > WARN will point to which driver violated the API. > > If people think it is valuable I can change this API to return an err? Saeed's suggested API in a comment on patch 12 also removes most of the WARN_ONs as it sets the device and index: xdp_rxq_info_reg(netdev, rxq_index) { rxqueue = dev->_rx + rxq_index; xdp_rxq = rxqueue.xdp_rxq; xdp_rxq_info_init(xdp_rxq); xdp_rxq.dev = netdev; xdp_rxq.queue_index = rxq_index; } xdp_rxq_info_unreg(netdev, rxq_index) { ... }
On Mon, 18 Dec 2017 06:23:40 -0700 David Ahern <dsahern@gmail.com> wrote: > On 12/18/17 3:55 AM, Jesper Dangaard Brouer wrote: > > > > Handling return-errors in the drivers complicated the driver code, as it > > involves unraveling and deallocating other RX-rings etc (that were > > already allocated) if the reg fails. (Also notice next patch will allow > > dev == NULL, if right ptype is set). > > > > I'm not completely rejecting you idea, as this is a good optimization > > trick, which is to move validation checks to setup-time, thus allowing > > less validation checks at runtime. I sort-of actually already did > > this, as I allow bpf to deref dev without NULL check. I would argue > > this is good enough, as we will crash in a predictable way, as above > > WARN will point to which driver violated the API. > > > > If people think it is valuable I can change this API to return an err? > > Saeed's suggested API in a comment on patch 12 also removes most of the > WARN_ONs as it sets the device and index: > > xdp_rxq_info_reg(netdev, rxq_index) > { > rxqueue = netdev->_rx + rxq_index; > xdp_rxq = rxqueue.xdp_rxq; > xdp_rxq_info_init(xdp_rxq); > xdp_rxq.dev = netdev; > xdp_rxq.queue_index = rxq_index; > } > > xdp_rxq_info_unreg(netdev, rxq_index) > { > ... > } No, we still need the other WARN_ON's. I don't understand why you think above API is better. In case netdev==NULL the system will simply crash on deref of netdev. That case happened for both drivers i40e and mlx5, when I was adding this. The WARN_ON help me quickly identify the issue, and in both drivers it was a non-critical error, as these queues are not used by XDP. IHMO a better experience for the driver developer. IHMO WARN_ON's are a good thing. For example the: if (xdp_rxq->reg_state == REG_STATE_REGISTERED) WARN(1, "Missing unregister, handled but fix driver\n"); Just helped me identify a bug in i40e driver. It turns out that changing the RX-ring queue size via ethtool <-G|--set-ring> (_not_ the number of RX-rings, but frames per RX-ring). Then i40e_set_ringparam() allocates some temp RX-rings and copy-around struct contents, causing this strange issue. It will not crash with our currently simple content, but later this would cause a hard-to-debug issue. I'm happy I could catch this now, instead of later as a strange crash. The WARN's are there to assist driver developers when using this API in their drivers (better than crash/BUG_ON as they don't have to dig-up their serial cable console). For me it is also part of the documentation, as it document the API assumptions/assertions together with a small text field.
On Mon, 18 Dec 2017 11:55:01 +0100 Jesper Dangaard Brouer <brouer@redhat.com> wrote: > On Wed, 13 Dec 2017 19:34:40 -0700 > David Ahern <dsahern@gmail.com> wrote: > > > On 12/13/17 4:19 AM, Jesper Dangaard Brouer wrote: > > > + > > > +void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq) > > > +{ > > > + xdp_rxq->reg_state = REG_STATE_UNREGISTRED; > > > +} > > > +EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg); > > > + > > > +void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq) > > > +{ > > > + if (xdp_rxq->reg_state == REG_STATE_REGISTRED) { > > > + WARN(1, "Missing unregister, handled but fix driver\n"); > > > + xdp_rxq_info_unreg(xdp_rxq); > > > + } > > > + memset(xdp_rxq, 0, sizeof(*xdp_rxq)); > > > + xdp_rxq->queue_index = U32_MAX; > > > + xdp_rxq->reg_state = REG_STATE_NEW; > > > +} > > > +EXPORT_SYMBOL_GPL(xdp_rxq_info_init); > > > + > > > +void xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq) > > > +{ > > > + WARN(!xdp_rxq->dev, "Missing net_device from driver"); > > > + WARN(xdp_rxq->queue_index == U32_MAX, "Miss queue_index from driver"); > > > + WARN(!(xdp_rxq->reg_state == REG_STATE_NEW),"API violation, miss init"); > > > + xdp_rxq->reg_state = REG_STATE_REGISTRED; > > > +} > > > +EXPORT_SYMBOL_GPL(xdp_rxq_info_reg); > > > > > > > Rather than WARN()'s why not make the _reg and _init functions return an > > int that indicates an error? For example you don't want to continue if > > the dev is expected but missing. > > Handling return-errors in the drivers complicated the driver code, as it > involves unraveling and deallocating other RX-rings etc (that were > already allocated) if the reg fails. (Also notice next patch will allow > dev == NULL, if right ptype is set). > > I'm not completely rejecting you idea, as this is a good optimization > trick, which is to move validation checks to setup-time, thus allowing > less validation checks at runtime. I sort-of actually already did > this, as I allow bpf to deref dev without NULL check. I would argue > this is good enough, as we will crash in a predictable way, as above > WARN will point to which driver violated the API. > > If people think it is valuable I can change this API to return an err? I will take Ahern's suggestion of returning an err-code, but only from xdp_rxq_info_reg(). And I'm going to move xdp_rxq_info_init to be an internal function (which Saeed also implicitly suggested). I'm working through the drivers now, and only two drivers don't have a proper error-return for handling xdp_rxq_info_reg() could fail. I've also extended xdp_rxq_info_reg() to take args dev + idx, to reduce the code-lines (given we now also have to check return code, this got too big). Thus, reg is a single call with if-return-check. > I guess, it would be more future-proof to do this, as we (Bjørn, > Michael, Andy) want to extend this to implement a XDP frame/mem return > code-path. And the register call will likely have to allocate some > resource that could fail, which need to be handled... I'm mostly doing it for above reason, as I'm hoping to avoid touching every XDP driver once again. It is a real pain. > If we do this, we might as well (slab) alloc the xdp_rxq_info > structure to reduce the bloat in the drivers RX-rings to a single > pointer (and a pointer to xdp_rxq_info is what xdp_buff.rxq need). I've dropped my idea of (slab) allocating the xdp_rxq_info structure. I started coding this up, but realized the number of lines added per driver got too excessive for no apparent gain. (e.g. I also needed to take the numa-node into account in some drivers).
diff --git a/include/linux/filter.h b/include/linux/filter.h index 5feb441d3dd9..111107fcace6 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -19,6 +19,7 @@ #include <linux/cryptohash.h> #include <linux/set_memory.h> +#include <net/xdp.h> #include <net/sch_generic.h> #include <uapi/linux/filter.h> @@ -496,6 +497,7 @@ struct xdp_buff { void *data_end; void *data_meta; void *data_hard_start; + struct xdp_rxq_info *rxq; }; /* Compute the linear packet data range [data, data_end) which diff --git a/include/net/xdp.h b/include/net/xdp.h new file mode 100644 index 000000000000..e4acd198fd60 --- /dev/null +++ b/include/net/xdp.h @@ -0,0 +1,45 @@ +/* include/net/xdp.h + * + * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc. + * Released under terms in GPL version 2. See COPYING. + */ +#ifndef __LINUX_NET_XDP_H__ +#define __LINUX_NET_XDP_H__ + +/** + * DOC: XDP RX-queue information + * + * The XDP RX-queue info (xdp_rxq_info) is associated with the driver + * level RX-ring queues. It is information that is specific to how + * the driver have configured a given RX-ring queue. + * + * Each xdp_buff frame received in the driver carry a (pointer) + * reference to this xdp_rxq_info structure. This provides the XDP + * data-path read-access to RX-info for both kernel and bpf-side + * (limited subset). + * + * For now, direct access is only safe while running in NAPI/softirq + * context. + * + * The driver usage API is an init, register and unregister API. + * + * The struct is not directly tied to the XDP prog. A new XDP prog + * can be attached as long as it doesn't change the underlying + * RX-ring. If the RX-ring does change significantly, the NIC driver + * naturally need to stop the RX-ring before purging and reallocating + * memory. In that process the driver MUST call unregistor (which + * also apply for driver shutdown and unload). The init and register + * API is also mandatory during RX-ring setup. + */ + +struct xdp_rxq_info { + struct net_device *dev; + u32 queue_index; + u32 reg_state; +} ____cacheline_aligned; /* perf critical, avoid false-sharing */ + +void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq); +void xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq); +void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq); + +#endif /* __LINUX_NET_XDP_H__ */ diff --git a/net/core/Makefile b/net/core/Makefile index 1fd0a9c88b1b..6dbbba8c57ae 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -11,7 +11,7 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o obj-y += dev.o ethtool.o dev_addr_lists.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o \ sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \ - fib_notifier.o + fib_notifier.o xdp.o obj-y += net-sysfs.o obj-$(CONFIG_PROC_FS) += net-procfs.o diff --git a/net/core/xdp.c b/net/core/xdp.c new file mode 100644 index 000000000000..a9d2dd7b1ede --- /dev/null +++ b/net/core/xdp.c @@ -0,0 +1,40 @@ +/* net/core/xdp.c + * + * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc. + * Released under terms in GPL version 2. See COPYING. + */ +#include <linux/types.h> +#include <linux/mm.h> + +#include <net/xdp.h> + +#define REG_STATE_NEW 0x0 +#define REG_STATE_REGISTRED 0x1 +#define REG_STATE_UNREGISTRED 0x2 + +void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq) +{ + xdp_rxq->reg_state = REG_STATE_UNREGISTRED; +} +EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg); + +void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq) +{ + if (xdp_rxq->reg_state == REG_STATE_REGISTRED) { + WARN(1, "Missing unregister, handled but fix driver\n"); + xdp_rxq_info_unreg(xdp_rxq); + } + memset(xdp_rxq, 0, sizeof(*xdp_rxq)); + xdp_rxq->queue_index = U32_MAX; + xdp_rxq->reg_state = REG_STATE_NEW; +} +EXPORT_SYMBOL_GPL(xdp_rxq_info_init); + +void xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq) +{ + WARN(!xdp_rxq->dev, "Missing net_device from driver"); + WARN(xdp_rxq->queue_index == U32_MAX, "Miss queue_index from driver"); + WARN(!(xdp_rxq->reg_state == REG_STATE_NEW),"API violation, miss init"); + xdp_rxq->reg_state = REG_STATE_REGISTRED; +} +EXPORT_SYMBOL_GPL(xdp_rxq_info_reg);
This patch only introduce the core data structures and API functions. All XDP enabled drivers must use the API before this info can used. There is a need for XDP to know more about the RX-queue a given XDP frames have arrived on. For both the XDP bpf-prog and kernel side. Instead of extending xdp_buff each time new info is needed, the patch creates a separate read-mostly struct xdp_rxq_info, that contains this info. We stress this data/cache-line is for read-only info. This is NOT for dynamic per packet info, use the data_meta for such use-cases. The performance advantage is this info can be setup at RX-ring init time, instead of updating N-members in xdp_buff. A possible (driver level) micro optimization is that xdp_buff->rxq assignment could be done once per XDP/NAPI loop. The extra pointer deref only happens for program needing access to this info (thus, no slowdown to existing use-cases). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> --- include/linux/filter.h | 2 ++ include/net/xdp.h | 45 +++++++++++++++++++++++++++++++++++++++++++++ net/core/Makefile | 2 +- net/core/xdp.c | 40 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 88 insertions(+), 1 deletion(-) create mode 100644 include/net/xdp.h create mode 100644 net/core/xdp.c