From patchwork Fri Dec 7 11:44:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= X-Patchwork-Id: 1009380 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43B9cn0gd6z9rxp for ; Fri, 7 Dec 2018 22:44:57 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726085AbeLGLoz (ORCPT ); Fri, 7 Dec 2018 06:44:55 -0500 Received: from mga04.intel.com ([192.55.52.120]:49448 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725988AbeLGLoy (ORCPT ); Fri, 7 Dec 2018 06:44:54 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Dec 2018 03:44:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,326,1539673200"; d="scan'208";a="96922156" Received: from yhameiri-mobl1.ger.corp.intel.com (HELO btopel-mobl.ger.intel.com) ([10.255.41.173]) by orsmga007.jf.intel.com with ESMTP; 07 Dec 2018 03:44:50 -0800 From: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= To: bjorn.topel@gmail.com, magnus.karlsson@intel.com, magnus.karlsson@gmail.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org Cc: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , brouer@redhat.com, u9012063@gmail.com, qi.z.zhang@intel.com Subject: [PATCH bpf-next 1/7] xsk: simplify AF_XDP socket teardown Date: Fri, 7 Dec 2018 12:44:25 +0100 Message-Id: <20181207114431.18038-2-bjorn.topel@gmail.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181207114431.18038-1-bjorn.topel@gmail.com> References: <20181207114431.18038-1-bjorn.topel@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Björn Töpel Prior this commit, when the struct socket object was being released, the UMEM did not have its reference count decreased. Instead, this was done in the struct sock sk_destruct function. There is no reason to keep the UMEM reference around when the socket is being orphaned, so in this patch the xdp_put_mem is called in the xsk_release function. This results in that the xsk_destruct function can be removed! Note that, it still holds that a struct xsk_sock reference might still linger in the XSKMAP after the UMEM is released, e.g. if a user does not clear the XSKMAP prior to closing the process. This sock will be in a "released" zombie like state, until the XSKMAP is removed. Signed-off-by: Björn Töpel --- net/xdp/xsk.c | 16 +--------------- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 07156f43d295..a03268454a27 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -366,6 +366,7 @@ static int xsk_release(struct socket *sock) xskq_destroy(xs->rx); xskq_destroy(xs->tx); + xdp_put_umem(xs->umem); sock_orphan(sk); sock->sk = NULL; @@ -713,18 +714,6 @@ static const struct proto_ops xsk_proto_ops = { .sendpage = sock_no_sendpage, }; -static void xsk_destruct(struct sock *sk) -{ - struct xdp_sock *xs = xdp_sk(sk); - - if (!sock_flag(sk, SOCK_DEAD)) - return; - - xdp_put_umem(xs->umem); - - sk_refcnt_debug_dec(sk); -} - static int xsk_create(struct net *net, struct socket *sock, int protocol, int kern) { @@ -751,9 +740,6 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol, sk->sk_family = PF_XDP; - sk->sk_destruct = xsk_destruct; - sk_refcnt_debug_inc(sk); - sock_set_flag(sk, SOCK_RCU_FREE); xs = xdp_sk(sk); From patchwork Fri Dec 7 11:44:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= X-Patchwork-Id: 1009381 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43B9cs59Ccz9rxp for ; Fri, 7 Dec 2018 22:45:01 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726090AbeLGLpA (ORCPT ); Fri, 7 Dec 2018 06:45:00 -0500 Received: from mga04.intel.com ([192.55.52.120]:49448 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725988AbeLGLpA (ORCPT ); Fri, 7 Dec 2018 06:45:00 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Dec 2018 03:44:59 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,326,1539673200"; d="scan'208";a="96922168" Received: from yhameiri-mobl1.ger.corp.intel.com (HELO btopel-mobl.ger.intel.com) ([10.255.41.173]) by orsmga007.jf.intel.com with ESMTP; 07 Dec 2018 03:44:54 -0800 From: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= To: bjorn.topel@gmail.com, magnus.karlsson@intel.com, magnus.karlsson@gmail.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org Cc: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , brouer@redhat.com, u9012063@gmail.com, qi.z.zhang@intel.com Subject: [PATCH bpf-next 2/7] xsk: add XDP_ATTACH bind() flag Date: Fri, 7 Dec 2018 12:44:26 +0100 Message-Id: <20181207114431.18038-3-bjorn.topel@gmail.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181207114431.18038-1-bjorn.topel@gmail.com> References: <20181207114431.18038-1-bjorn.topel@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Björn Töpel In this commit the XDP_ATTACH bind() flag is introduced. When an XDP socket is bound with this flag set, the socket will be associated with a certain netdev Rx queue. The idea is that the XDP socket users do not have to deal with the XSKMAP, or even an XDP program. Instead XDP_ATTACH will "attach" an XDP socket to a queue, load a builtin XDP program that forwards all received packets from the attached queue to the socket. An XDP socket bound with this option performs better, since the BPF program is smaller, and the kernel code-path also has fewer instructions. This commit only introduces the first part of XDP_ATTACH, namely associating the XDP socket to a netdev Rx queue. To redirect XDP frames to an attached socket, the XDP program must use the bpf_xsk_redirect that will be introduced in the next commit. Signed-off-by: Björn Töpel --- include/linux/netdevice.h | 1 + include/net/xdp_sock.h | 2 ++ include/uapi/linux/if_xdp.h | 1 + net/xdp/xsk.c | 50 +++++++++++++++++++++++++++++-------- 4 files changed, 43 insertions(+), 11 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 94fb2e12f117..a6cc68d2504c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -743,6 +743,7 @@ struct netdev_rx_queue { struct xdp_rxq_info xdp_rxq; #ifdef CONFIG_XDP_SOCKETS struct xdp_umem *umem; + struct xdp_sock *xsk; #endif } ____cacheline_aligned_in_smp; diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 13acb9803a6d..95315eb0410a 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -72,7 +72,9 @@ struct xdp_sock { struct xdp_buff; #ifdef CONFIG_XDP_SOCKETS +int xsk_generic_attached_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); +int xsk_attached_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); void xsk_flush(struct xdp_sock *xs); bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs); diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index caed8b1614ff..bd76235c2749 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -16,6 +16,7 @@ #define XDP_SHARED_UMEM (1 << 0) #define XDP_COPY (1 << 1) /* Force copy-mode */ #define XDP_ZEROCOPY (1 << 2) /* Force zero-copy mode */ +#define XDP_ATTACH (1 << 3) struct sockaddr_xdp { __u16 sxdp_family; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index a03268454a27..1eff7ac8596d 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -100,17 +100,20 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) return err; } -int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +int xsk_attached_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) { - u32 len; + u32 len = xdp->data_end - xdp->data; + + return (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY) ? + __xsk_rcv_zc(xs, xdp, len) : __xsk_rcv(xs, xdp, len); +} +int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +{ if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) return -EINVAL; - len = xdp->data_end - xdp->data; - - return (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY) ? - __xsk_rcv_zc(xs, xdp, len) : __xsk_rcv(xs, xdp, len); + return xsk_attached_rcv(xs, xdp); } void xsk_flush(struct xdp_sock *xs) @@ -119,7 +122,7 @@ void xsk_flush(struct xdp_sock *xs) xs->sk.sk_data_ready(&xs->sk); } -int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +int xsk_generic_attached_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) { u32 metalen = xdp->data - xdp->data_meta; u32 len = xdp->data_end - xdp->data; @@ -127,9 +130,6 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) u64 addr; int err; - if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) - return -EINVAL; - if (!xskq_peek_addr(xs->umem->fq, &addr) || len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) { xs->rx_dropped++; @@ -152,6 +152,14 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) return err; } +int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +{ + if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) + return -EINVAL; + + return xsk_generic_attached_rcv(xs, xdp); +} + void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries) { xskq_produce_flush_addr_n(umem->cq, nb_entries); @@ -339,6 +347,18 @@ static int xsk_init_queue(u32 entries, struct xsk_queue **queue, return 0; } +static void xsk_detach(struct xdp_sock *xs) +{ + WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, NULL); +} + +static int xsk_attach(struct xdp_sock *xs, struct net_device *dev, u16 qid) +{ + WRITE_ONCE(dev->_rx[qid].xsk, xs); + + return 0; +} + static int xsk_release(struct socket *sock) { struct sock *sk = sock->sk; @@ -359,6 +379,7 @@ static int xsk_release(struct socket *sock) /* Wait for driver to stop using the xdp socket. */ xdp_del_sk_umem(xs->umem, xs); + xsk_detach(xs); xs->dev = NULL; synchronize_net(); dev_put(dev); @@ -432,7 +453,8 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) struct xdp_sock *umem_xs; struct socket *sock; - if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY)) { + if ((flags & XDP_COPY) || (flags & XDP_ZEROCOPY) || + (flags & XDP_ATTACH)) { /* Cannot specify flags for shared sockets. */ err = -EINVAL; goto out_unlock; @@ -478,6 +500,12 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) err = xdp_umem_assign_dev(xs->umem, dev, qid, flags); if (err) goto out_unlock; + + if (flags & XDP_ATTACH) { + err = xsk_attach(xs, dev, qid); + if (err) + goto out_unlock; + } } xs->dev = dev; From patchwork Fri Dec 7 11:44:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= X-Patchwork-Id: 1009382 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43B9cx1jhDz9rxp for ; Fri, 7 Dec 2018 22:45:05 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726096AbeLGLpE (ORCPT ); Fri, 7 Dec 2018 06:45:04 -0500 Received: from mga04.intel.com ([192.55.52.120]:49448 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725985AbeLGLpD (ORCPT ); Fri, 7 Dec 2018 06:45:03 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Dec 2018 03:45:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,326,1539673200"; d="scan'208";a="96922195" Received: from yhameiri-mobl1.ger.corp.intel.com (HELO btopel-mobl.ger.intel.com) ([10.255.41.173]) by orsmga007.jf.intel.com with ESMTP; 07 Dec 2018 03:44:59 -0800 From: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= To: bjorn.topel@gmail.com, magnus.karlsson@intel.com, magnus.karlsson@gmail.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org Cc: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , brouer@redhat.com, u9012063@gmail.com, qi.z.zhang@intel.com Subject: [PATCH bpf-next 3/7] bpf: add bpf_xsk_redirect function Date: Fri, 7 Dec 2018 12:44:27 +0100 Message-Id: <20181207114431.18038-4-bjorn.topel@gmail.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181207114431.18038-1-bjorn.topel@gmail.com> References: <20181207114431.18038-1-bjorn.topel@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Björn Töpel The bpf_xsk_redirect function is a new redirect bpf function, in addition to bpf_redirect/bpf_redirect_map. If an XDP socket has been attached to a netdev Rx queue via the XDP_ATTACH bind() option and bpf_xsk_redirect is called, the packet will be redirected to the attached socket. The bpf_xsk_redirect function returns XDP_REDIRECT if there is a socket attached to the originated queue, otherwise XDP_PASS. This commit also adds the corresponding trace points for the redirect call. Signed-off-by: Björn Töpel --- include/linux/filter.h | 4 ++ include/trace/events/xdp.h | 61 ++++++++++++++++++++++ include/uapi/linux/bpf.h | 14 +++++- net/core/filter.c | 100 +++++++++++++++++++++++++++++++++++++ 4 files changed, 178 insertions(+), 1 deletion(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index d16deead65c6..691b5c1003c8 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -525,6 +525,10 @@ struct bpf_redirect_info { u32 flags; struct bpf_map *map; struct bpf_map *map_to_flush; +#ifdef CONFIG_XDP_SOCKETS + struct xdp_sock *xsk; + struct xdp_sock *xsk_to_flush; +#endif u32 kern_flags; }; diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index e95cb86b65cf..30f399bd462b 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -158,6 +158,67 @@ struct _bpf_dtab_netdev { trace_xdp_redirect_map_err(dev, xdp, devmap_ifindex(fwd, map), \ err, map, idx) +DECLARE_EVENT_CLASS(xsk_redirect_template, + + TP_PROTO(const struct net_device *dev, + const struct bpf_prog *xdp, + int err, + struct xdp_buff *xbuff), + + TP_ARGS(dev, xdp, err, xbuff), + + TP_STRUCT__entry( + __field(int, prog_id) + __field(u32, act) + __field(int, ifindex) + __field(int, err) + __field(u32, queue_index) + __field(enum xdp_mem_type, mem_type) + ), + + TP_fast_assign( + __entry->prog_id = xdp->aux->id; + __entry->act = XDP_REDIRECT; + __entry->ifindex = dev->ifindex; + __entry->err = err; + __entry->queue_index = xbuff->rxq->queue_index; + __entry->mem_type = xbuff->rxq->mem.type; + ), + + TP_printk("prog_id=%d action=%s ifindex=%d err=%d queue_index=%d" + " mem_type=%d", + __entry->prog_id, + __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB), + __entry->ifindex, + __entry->err, + __entry->queue_index, + __entry->mem_type) +); + +DEFINE_EVENT(xsk_redirect_template, xsk_redirect, + TP_PROTO(const struct net_device *dev, + const struct bpf_prog *xdp, + int err, + struct xdp_buff *xbuff), + + TP_ARGS(dev, xdp, err, xbuff) +); + +DEFINE_EVENT(xsk_redirect_template, xsk_redirect_err, + TP_PROTO(const struct net_device *dev, + const struct bpf_prog *xdp, + int err, + struct xdp_buff *xbuff), + + TP_ARGS(dev, xdp, err, xbuff) +); + +#define _trace_xsk_redirect(dev, xdp, xbuff) \ + trace_xsk_redirect(dev, xdp, 0, xbuff) + +#define _trace_xsk_redirect_err(dev, xdp, xbuff, err) \ + trace_xsk_redirect_err(dev, xdp, err, xbuff) + TRACE_EVENT(xdp_cpumap_kthread, TP_PROTO(int map_id, unsigned int processed, unsigned int drops, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index a84fd232d934..2912d87a39ba 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2298,6 +2298,17 @@ union bpf_attr { * payload and/or *pop* value being to large. * Return * 0 on success, or a negative error in case of failure. + * + * int bpf_xsk_redirect(struct xdp_buff *xdp_md) + * Description + * Redirect the packet to the attached XDP socket, if any. + * An XDP socket can be attached to a network interface Rx + * queue by passing the XDP_ATTACH option at bind point of + * the socket. + * + * Return + * **XDP_REDIRECT** if there is an XDP socket attached to the Rx + * queue receiving the frame, otherwise **XDP_PASS**. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2391,7 +2402,8 @@ union bpf_attr { FN(map_pop_elem), \ FN(map_peek_elem), \ FN(msg_push_data), \ - FN(msg_pop_data), + FN(msg_pop_data), \ + FN(xsk_redirect), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/net/core/filter.c b/net/core/filter.c index 3d54af4c363d..86c5fe5a9ec0 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3415,6 +3415,17 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, return 0; } +static void xdp_do_flush_xsk(struct bpf_redirect_info *ri) +{ +#ifdef CONFIG_XDP_SOCKETS + struct xdp_sock *xsk = ri->xsk_to_flush; + + ri->xsk_to_flush = NULL; + if (xsk) + xsk_flush(xsk); +#endif +} + void xdp_do_flush_map(void) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); @@ -3436,6 +3447,8 @@ void xdp_do_flush_map(void) break; } } + + xdp_do_flush_xsk(ri); } EXPORT_SYMBOL_GPL(xdp_do_flush_map); @@ -3501,6 +3514,30 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, return err; } +#ifdef CONFIG_XDP_SOCKETS +static int xdp_do_xsk_redirect(struct net_device *dev, struct xdp_buff *xdp, + struct bpf_prog *xdp_prog, + struct bpf_redirect_info *ri) +{ + struct xdp_sock *xsk = ri->xsk; + int err; + + ri->xsk = NULL; + ri->xsk_to_flush = xsk; + + err = xsk_attached_rcv(xsk, xdp); + if (unlikely(err)) + goto err; + + _trace_xsk_redirect(dev, xdp_prog, xdp); + return 0; + +err: + _trace_xsk_redirect_err(dev, xdp_prog, xdp, err); + return err; +} +#endif /* CONFIG_XDP_SOCKETS */ + int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { @@ -3510,6 +3547,10 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, if (likely(map)) return xdp_do_redirect_map(dev, xdp, xdp_prog, map, ri); +#ifdef CONFIG_XDP_SOCKETS + if (ri->xsk) + return xdp_do_xsk_redirect(dev, xdp, xdp_prog, ri); +#endif return xdp_do_redirect_slow(dev, xdp, xdp_prog, ri); } EXPORT_SYMBOL_GPL(xdp_do_redirect); @@ -3560,6 +3601,33 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, return err; } +#ifdef CONFIG_XDP_SOCKETS +static int xdp_do_generic_xsk_redirect(struct net_device *dev, + struct xdp_buff *xdp, + struct bpf_prog *xdp_prog, + struct sk_buff *skb) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct xdp_sock *xsk = ri->xsk; + int err; + + ri->xsk = NULL; + ri->xsk_to_flush = NULL; + + err = xsk_generic_attached_rcv(xsk, xdp); + if (err) + goto err; + + consume_skb(skb); + _trace_xsk_redirect(dev, xdp_prog, xdp); + return 0; + +err: + _trace_xsk_redirect_err(dev, xdp_prog, xdp, err); + return err; +} +#endif /* CONFIG_XDP_SOCKETS */ + int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { @@ -3572,6 +3640,11 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, if (map) return xdp_do_generic_redirect_map(dev, skb, xdp, xdp_prog, map); +#ifdef CONFIG_XDP_SOCKETS + if (ri->xsk) + return xdp_do_generic_xsk_redirect(dev, xdp, xdp_prog, skb); +#endif + ri->ifindex = 0; fwd = dev_get_by_index_rcu(dev_net(dev), index); if (unlikely(!fwd)) { @@ -3639,6 +3712,29 @@ static const struct bpf_func_proto bpf_xdp_redirect_map_proto = { .arg3_type = ARG_ANYTHING, }; +#ifdef CONFIG_XDP_SOCKETS +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp) +{ + struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct xdp_sock *xsk; + + xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk); + if (xsk) { + ri->xsk = xsk; + return XDP_REDIRECT; + } + + return XDP_PASS; +} + +static const struct bpf_func_proto bpf_xdp_xsk_redirect_proto = { + .func = bpf_xdp_xsk_redirect, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, +}; +#endif /* CONFIG_XDP_SOCKETS */ + static unsigned long bpf_skb_copy(void *dst_buff, const void *skb, unsigned long off, unsigned long len) { @@ -5510,6 +5606,10 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_xdp_sk_lookup_tcp_proto; case BPF_FUNC_sk_release: return &bpf_sk_release_proto; +#endif +#ifdef CONFIG_XDP_SOCKETS + case BPF_FUNC_xsk_redirect: + return &bpf_xdp_xsk_redirect_proto; #endif default: return bpf_base_func_proto(func_id); From patchwork Fri Dec 7 11:44:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= X-Patchwork-Id: 1009383 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43B9d14z21z9rxp for ; Fri, 7 Dec 2018 22:45:09 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726099AbeLGLpI (ORCPT ); Fri, 7 Dec 2018 06:45:08 -0500 Received: from mga04.intel.com ([192.55.52.120]:49448 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726022AbeLGLpI (ORCPT ); Fri, 7 Dec 2018 06:45:08 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Dec 2018 03:45:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,326,1539673200"; d="scan'208";a="96922218" Received: from yhameiri-mobl1.ger.corp.intel.com (HELO btopel-mobl.ger.intel.com) ([10.255.41.173]) by orsmga007.jf.intel.com with ESMTP; 07 Dec 2018 03:45:03 -0800 From: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= To: bjorn.topel@gmail.com, magnus.karlsson@intel.com, magnus.karlsson@gmail.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org Cc: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , brouer@redhat.com, u9012063@gmail.com, qi.z.zhang@intel.com Subject: [PATCH bpf-next 4/7] bpf: prepare for builtin bpf program Date: Fri, 7 Dec 2018 12:44:28 +0100 Message-Id: <20181207114431.18038-5-bjorn.topel@gmail.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181207114431.18038-1-bjorn.topel@gmail.com> References: <20181207114431.18038-1-bjorn.topel@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Björn Töpel Break up bpf_prog_load into one function that allocates, initializes and verifies a bpf program, and one that allocates a file descriptor. The former function will be used in a later commit to load a builtin BPF program. Signed-off-by: Björn Töpel --- kernel/bpf/syscall.c | 59 ++++++++++++++++++++++++++------------------ 1 file changed, 35 insertions(+), 24 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index aa05aa38f4a8..ee1328625330 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1441,7 +1441,8 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type, /* last field in 'union bpf_attr' used by this command */ #define BPF_PROG_LOAD_LAST_FIELD func_info_cnt -static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr) +static struct bpf_prog *__bpf_prog_load(union bpf_attr *attr, + union bpf_attr __user *uattr) { enum bpf_prog_type type = attr->prog_type; struct bpf_prog *prog; @@ -1450,45 +1451,45 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr) bool is_gpl; if (CHECK_ATTR(BPF_PROG_LOAD)) - return -EINVAL; + return ERR_PTR(-EINVAL); if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT | BPF_F_ANY_ALIGNMENT)) - return -EINVAL; + return ERR_PTR(-EINVAL); if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && (attr->prog_flags & BPF_F_ANY_ALIGNMENT) && !capable(CAP_SYS_ADMIN)) - return -EPERM; + return ERR_PTR(-EPERM); /* copy eBPF program license from user space */ if (strncpy_from_user(license, u64_to_user_ptr(attr->license), sizeof(license) - 1) < 0) - return -EFAULT; + return ERR_PTR(-EFAULT); license[sizeof(license) - 1] = 0; /* eBPF programs must be GPL compatible to use GPL-ed functions */ is_gpl = license_is_gpl_compatible(license); if (attr->insn_cnt == 0 || attr->insn_cnt > BPF_MAXINSNS) - return -E2BIG; + return ERR_PTR(-E2BIG); if (type == BPF_PROG_TYPE_KPROBE && attr->kern_version != LINUX_VERSION_CODE) - return -EINVAL; + return ERR_PTR(-EINVAL); if (type != BPF_PROG_TYPE_SOCKET_FILTER && type != BPF_PROG_TYPE_CGROUP_SKB && !capable(CAP_SYS_ADMIN)) - return -EPERM; + return ERR_PTR(-EPERM); bpf_prog_load_fixup_attach_type(attr); if (bpf_prog_load_check_attach_type(type, attr->expected_attach_type)) - return -EINVAL; + return ERR_PTR(-EINVAL); /* plain bpf_prog allocation */ prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER); if (!prog) - return -ENOMEM; + return ERR_PTR(-ENOMEM); prog->expected_attach_type = attr->expected_attach_type; @@ -1544,20 +1545,8 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr) if (err) goto free_used_maps; - err = bpf_prog_new_fd(prog); - if (err < 0) { - /* failed to allocate fd. - * bpf_prog_put() is needed because the above - * bpf_prog_alloc_id() has published the prog - * to the userspace and the userspace may - * have refcnt-ed it through BPF_PROG_GET_FD_BY_ID. - */ - bpf_prog_put(prog); - return err; - } - bpf_prog_kallsyms_add(prog); - return err; + return prog; free_used_maps: kvfree(prog->aux->func_info); @@ -1570,7 +1559,29 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr) security_bpf_prog_free(prog->aux); free_prog_nouncharge: bpf_prog_free(prog); - return err; + return ERR_PTR(err); +} + +static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr) +{ + struct bpf_prog *prog = __bpf_prog_load(attr, uattr); + int fd; + + if (IS_ERR(prog)) + return PTR_ERR(prog); + + fd = bpf_prog_new_fd(prog); + if (fd < 0) { + /* failed to allocate fd. + * bpf_prog_put() is needed because the above + * bpf_prog_alloc_id() has published the prog + * to the userspace and the userspace may + * have refcnt-ed it through BPF_PROG_GET_FD_BY_ID. + */ + bpf_prog_put(prog); + } + + return fd; } #define BPF_OBJ_LAST_FIELD file_flags From patchwork Fri Dec 7 11:44:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= X-Patchwork-Id: 1009384 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43B9d55LgKz9rxp for ; Fri, 7 Dec 2018 22:45:13 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726105AbeLGLpM (ORCPT ); Fri, 7 Dec 2018 06:45:12 -0500 Received: from mga04.intel.com ([192.55.52.120]:49448 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726022AbeLGLpM (ORCPT ); Fri, 7 Dec 2018 06:45:12 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Dec 2018 03:45:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,326,1539673200"; d="scan'208";a="96922235" Received: from yhameiri-mobl1.ger.corp.intel.com (HELO btopel-mobl.ger.intel.com) ([10.255.41.173]) by orsmga007.jf.intel.com with ESMTP; 07 Dec 2018 03:45:08 -0800 From: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= To: bjorn.topel@gmail.com, magnus.karlsson@intel.com, magnus.karlsson@gmail.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org Cc: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , brouer@redhat.com, u9012063@gmail.com, qi.z.zhang@intel.com Subject: [PATCH bpf-next 5/7] bpf: add function to load builtin BPF program Date: Fri, 7 Dec 2018 12:44:29 +0100 Message-Id: <20181207114431.18038-6-bjorn.topel@gmail.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181207114431.18038-1-bjorn.topel@gmail.com> References: <20181207114431.18038-1-bjorn.topel@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Björn Töpel The added bpf_prog_load_builtin can be used to load and verify a BPF program that originates from the kernel. We call this a "builtin BPF program". A builtin program can be used for convenience, e.g. it allows for the kernel to use the bpf infrastructure for internal tasks. This functionality will be used by AF_XDP sockets in a later commit. Signed-off-by: Björn Töpel --- include/linux/bpf.h | 2 ++ kernel/bpf/syscall.c | 32 ++++++++++++++++++++++++-------- 2 files changed, 26 insertions(+), 8 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e82b7039fc66..e810bfeb6239 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -563,6 +563,8 @@ static inline int bpf_map_attr_numa_node(const union bpf_attr *attr) struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type); int array_map_alloc_check(union bpf_attr *attr); +struct bpf_prog *bpf_prog_load_builtin(union bpf_attr *attr); + #else /* !CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get(u32 ufd) { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index ee1328625330..323831e1a1e2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1461,10 +1461,16 @@ static struct bpf_prog *__bpf_prog_load(union bpf_attr *attr, !capable(CAP_SYS_ADMIN)) return ERR_PTR(-EPERM); - /* copy eBPF program license from user space */ - if (strncpy_from_user(license, u64_to_user_ptr(attr->license), - sizeof(license) - 1) < 0) - return ERR_PTR(-EFAULT); + /* NB! If uattr is NULL, a builtin BPF is being loaded. */ + if (uattr) { + /* copy eBPF program license from user space */ + if (strncpy_from_user(license, u64_to_user_ptr(attr->license), + sizeof(license) - 1) < 0) + return ERR_PTR(-EFAULT); + } else { + strncpy(license, (const char *)(unsigned long)attr->license, + sizeof(license) - 1); + } license[sizeof(license) - 1] = 0; /* eBPF programs must be GPL compatible to use GPL-ed functions */ @@ -1505,10 +1511,15 @@ static struct bpf_prog *__bpf_prog_load(union bpf_attr *attr, prog->len = attr->insn_cnt; - err = -EFAULT; - if (copy_from_user(prog->insns, u64_to_user_ptr(attr->insns), - bpf_prog_insn_size(prog)) != 0) - goto free_prog; + if (uattr) { + err = -EFAULT; + if (copy_from_user(prog->insns, u64_to_user_ptr(attr->insns), + bpf_prog_insn_size(prog)) != 0) + goto free_prog; + } else { + memcpy(prog->insns, (void *)(unsigned long)attr->insns, + bpf_prog_insn_size(prog)); + } prog->orig_prog = NULL; prog->jited = 0; @@ -1584,6 +1595,11 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr) return fd; } +struct bpf_prog *bpf_prog_load_builtin(union bpf_attr *attr) +{ + return __bpf_prog_load(attr, NULL); +} + #define BPF_OBJ_LAST_FIELD file_flags static int bpf_obj_pin(const union bpf_attr *attr) From patchwork Fri Dec 7 11:44:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= X-Patchwork-Id: 1009385 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43B9dC0q03z9rxp for ; Fri, 7 Dec 2018 22:45:19 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726112AbeLGLpS (ORCPT ); Fri, 7 Dec 2018 06:45:18 -0500 Received: from mga04.intel.com ([192.55.52.120]:49448 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726079AbeLGLpR (ORCPT ); Fri, 7 Dec 2018 06:45:17 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Dec 2018 03:45:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,326,1539673200"; d="scan'208";a="96922249" Received: from yhameiri-mobl1.ger.corp.intel.com (HELO btopel-mobl.ger.intel.com) ([10.255.41.173]) by orsmga007.jf.intel.com with ESMTP; 07 Dec 2018 03:45:12 -0800 From: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= To: bjorn.topel@gmail.com, magnus.karlsson@intel.com, magnus.karlsson@gmail.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org Cc: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , brouer@redhat.com, u9012063@gmail.com, qi.z.zhang@intel.com Subject: [PATCH bpf-next 6/7] xsk: load a builtin XDP program on XDP_ATTACH Date: Fri, 7 Dec 2018 12:44:30 +0100 Message-Id: <20181207114431.18038-7-bjorn.topel@gmail.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181207114431.18038-1-bjorn.topel@gmail.com> References: <20181207114431.18038-1-bjorn.topel@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Björn Töpel This commit extends the XDP_ATTACH bind option by loading a builtin XDP program. The builtin program is the simplest program possible to redirect a frame to an attached socket. In restricted C it would look like this: SEC("xdp") int xdp_prog(struct xdp_md *ctx) { return bpf_xsk_redirect(ctx); } For many XDP socket users, this program would be the most common one. The builtin program loaded via XDP_ATTACH behaves, from an install-to-netdev/uninstall-from-netdev point of view, different from regular XDP programs. The easiest way to look at it is as a 2-level hierarchy, where regular XDP programs has precedence over the builtin one. If no regular XDP program is installed to the netdev, the builtin will be install. If the builtin program is installed, and a regular is installed, the regular XDP will have precedence over the builtin one. Further, if a regular program is installed, and later removed, the builtin one will automatically be installed. The sxdp_flags field of struct sockaddr_xdp gets two new options XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the corresponding XDP netlink install flags. Signed-off-by: Björn Töpel --- include/linux/netdevice.h | 10 +++++ include/uapi/linux/if_xdp.h | 10 +++-- net/core/dev.c | 84 ++++++++++++++++++++++++++++++++--- net/xdp/xsk.c | 88 +++++++++++++++++++++++++++++++++++-- 4 files changed, 179 insertions(+), 13 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index a6cc68d2504c..a3094f1a9fcb 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2039,6 +2039,13 @@ struct net_device { struct lock_class_key *qdisc_running_key; bool proto_down; unsigned wol_enabled:1; + +#ifdef CONFIG_XDP_SOCKETS + struct bpf_prog *xsk_prog; + u32 xsk_prog_flags; + bool xsk_prog_running; + int xsk_prog_ref; +#endif }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -3638,6 +3645,9 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq, int *ret); typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); +int dev_xsk_prog_install(struct net_device *dev, struct bpf_prog *prog, + u32 flags); +void dev_xsk_prog_uninstall(struct net_device *dev); int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, int fd, u32 flags); u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op, diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index bd76235c2749..b8fb3200f640 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -13,10 +13,12 @@ #include /* Options for the sxdp_flags field */ -#define XDP_SHARED_UMEM (1 << 0) -#define XDP_COPY (1 << 1) /* Force copy-mode */ -#define XDP_ZEROCOPY (1 << 2) /* Force zero-copy mode */ -#define XDP_ATTACH (1 << 3) +#define XDP_SHARED_UMEM (1 << 0) +#define XDP_COPY (1 << 1) /* Force copy-mode */ +#define XDP_ZEROCOPY (1 << 2) /* Force zero-copy mode */ +#define XDP_ATTACH (1 << 3) +#define XDP_BUILTIN_SKB_MODE (1 << 4) +#define XDP_BUILTIN_DRV_MODE (1 << 5) struct sockaddr_xdp { __u16 sxdp_family; diff --git a/net/core/dev.c b/net/core/dev.c index abe50c424b29..0a1c30da2f87 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7879,6 +7879,70 @@ static void dev_xdp_uninstall(struct net_device *dev) NULL)); } +#ifdef CONFIG_XDP_SOCKETS +int dev_xsk_prog_install(struct net_device *dev, struct bpf_prog *prog, + u32 flags) +{ + ASSERT_RTNL(); + + if (dev->xsk_prog) { + if (prog != dev->xsk_prog) + return -EINVAL; + if (flags && flags != dev->xsk_prog_flags) + return -EINVAL; + } + + if (dev->xsk_prog) { + dev->xsk_prog_ref++; + return 0; + } + + dev->xsk_prog = bpf_prog_inc(prog); + dev->xsk_prog_flags = flags | XDP_FLAGS_UPDATE_IF_NOEXIST; + dev->xsk_prog_ref = 1; + (void)dev_change_xdp_fd(dev, NULL, -1, dev->xsk_prog_flags); + return 0; +} + +void dev_xsk_prog_uninstall(struct net_device *dev) +{ + ASSERT_RTNL(); + + if (--dev->xsk_prog_ref == 0) { + bpf_prog_put(dev->xsk_prog); + dev->xsk_prog = NULL; + if (dev->xsk_prog_running) + (void)dev_change_xdp_fd(dev, NULL, -1, + dev->xsk_prog_flags); + dev->xsk_prog_flags = 0; + dev->xsk_prog_running = false; + } +} +#endif + +static void dev_xsk_prog_pre_load(struct net_device *dev, int fd, + struct bpf_prog **prog, u32 *flags) +{ +#ifdef CONFIG_XDP_SOCKETS + if (fd >= 0) + return; + + if (dev->xsk_prog) { + *prog = bpf_prog_inc(dev->xsk_prog); + *flags = dev->xsk_prog_flags; + dev->xsk_prog_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST; + } +#endif +} + +static void dev_xsk_prog_post_load(struct net_device *dev, int err, + struct bpf_prog *prog) +{ +#ifdef CONFIG_XDP_SOCKETS + dev->xsk_prog_running = prog && prog == dev->xsk_prog && err >= 0; +#endif +} + /** * dev_change_xdp_fd - set or clear a bpf program for a device rx path * @dev: device @@ -7909,16 +7973,25 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, if (bpf_op == bpf_chk) bpf_chk = generic_xdp_install; - if (fd >= 0) { + dev_xsk_prog_pre_load(dev, fd, &prog, &flags); + + if (fd >= 0 || prog) { if (__dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG) || - __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW)) + __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW)) { + if (prog) + bpf_prog_put(prog); return -EEXIST; + } if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) && - __dev_xdp_query(dev, bpf_op, query)) + __dev_xdp_query(dev, bpf_op, query)) { + if (prog) + bpf_prog_put(prog); return -EBUSY; + } - prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP, - bpf_op == ops->ndo_bpf); + if (!prog) + prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP, + bpf_op == ops->ndo_bpf); if (IS_ERR(prog)) return PTR_ERR(prog); @@ -7934,6 +8007,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, if (err < 0 && prog) bpf_prog_put(prog); + dev_xsk_prog_post_load(dev, err, prog); return err; } diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 1eff7ac8596d..0d15d25694c4 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -30,6 +30,15 @@ #define TX_BATCH_SIZE 16 +static const struct bpf_insn xsk_redirect_prog_insn[] = { + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_xsk_redirect), + BPF_EXIT_INSN(), +}; + +/* Synchronized via rtnl lock */ +static struct bpf_prog *xsk_redirect_prog; /*builtin XDP program */ +static int xsk_redirect_prog_ref; + static struct xdp_sock *xdp_sk(struct sock *sk) { return (struct xdp_sock *)sk; @@ -347,16 +356,87 @@ static int xsk_init_queue(u32 entries, struct xsk_queue **queue, return 0; } +static struct bpf_prog *xsk_builtin_prog_get(void) +{ + union bpf_attr attr = {}; + struct bpf_prog *prog; + + if (xsk_redirect_prog) { + xsk_redirect_prog_ref++; + return xsk_redirect_prog; + } + + attr.prog_type = BPF_PROG_TYPE_XDP; + attr.insn_cnt = ARRAY_SIZE(xsk_redirect_prog_insn); + attr.insns = (uintptr_t)xsk_redirect_prog_insn; + attr.license = (uintptr_t)"GPL"; + memcpy(attr.prog_name, "AF_XDP_BUILTIN", + min_t(size_t, sizeof("AF_XDP_BUILTIN") - 1, + BPF_OBJ_NAME_LEN - 1)); + + prog = bpf_prog_load_builtin(&attr); + if (IS_ERR(prog)) { + WARN(1, "Failed (%d) to load builtin XDP program!\n", + (int)PTR_ERR(prog)); + return prog; + } + + xsk_redirect_prog = prog; + xsk_redirect_prog_ref = 1; + return xsk_redirect_prog; +} + +static void xsk_builtin_prog_put(void) +{ + if (--xsk_redirect_prog_ref == 0) { + bpf_prog_put(xsk_redirect_prog); + xsk_redirect_prog = NULL; + } +} + static void xsk_detach(struct xdp_sock *xs) { - WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, NULL); + rtnl_lock(); + if (READ_ONCE(xs->dev->_rx[xs->queue_id].xsk)) { + dev_xsk_prog_uninstall(xs->dev); + xsk_builtin_prog_put(); + WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, NULL); + } + rtnl_unlock(); } -static int xsk_attach(struct xdp_sock *xs, struct net_device *dev, u16 qid) +static int xsk_attach(struct xdp_sock *xs, struct net_device *dev, u16 qid, + u32 bind_flags) { - WRITE_ONCE(dev->_rx[qid].xsk, xs); + struct bpf_prog *prog; + u32 flags = 0; + int err; + + rtnl_lock(); + prog = xsk_builtin_prog_get(); + if (IS_ERR(prog)) { + err = PTR_ERR(prog); + goto out; + } + if (bind_flags & XDP_BUILTIN_SKB_MODE) + flags |= XDP_FLAGS_SKB_MODE; + if (bind_flags & XDP_BUILTIN_DRV_MODE) + flags |= XDP_FLAGS_DRV_MODE; + + err = dev_xsk_prog_install(dev, prog, flags); + if (err) + goto out_put; + + WRITE_ONCE(dev->_rx[qid].xsk, xs); + rtnl_unlock(); return 0; + +out_put: + xsk_builtin_prog_put(); +out: + rtnl_unlock(); + return err; } static int xsk_release(struct socket *sock) @@ -502,7 +582,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) goto out_unlock; if (flags & XDP_ATTACH) { - err = xsk_attach(xs, dev, qid); + err = xsk_attach(xs, dev, qid, flags); if (err) goto out_unlock; }