diff mbox series

[bpf-next,6/7] xsk: load a builtin XDP program on XDP_ATTACH

Message ID 20181207114431.18038-7-bjorn.topel@gmail.com
State Changes Requested, archived
Delegated to: BPF Maintainers
Headers show
Series Add XDP_ATTACH bind() flag to AF_XDP sockets | expand

Commit Message

Björn Töpel Dec. 7, 2018, 11:44 a.m. UTC
From: Björn Töpel <bjorn.topel@intel.com>

This commit extends the XDP_ATTACH bind option by loading a builtin
XDP program.

The builtin program is the simplest program possible to redirect a
frame to an attached socket. In restricted C it would look like this:

  SEC("xdp")
  int xdp_prog(struct xdp_md *ctx)
  {
  	return bpf_xsk_redirect(ctx);
  }

For many XDP socket users, this program would be the most common one.

The builtin program loaded via XDP_ATTACH behaves, from an
install-to-netdev/uninstall-from-netdev point of view, different from
regular XDP programs. The easiest way to look at it is as a 2-level
hierarchy, where regular XDP programs has precedence over the builtin
one.

If no regular XDP program is installed to the netdev, the builtin will
be install. If the builtin program is installed, and a regular is
installed, the regular XDP will have precedence over the builtin one.

Further, if a regular program is installed, and later removed, the
builtin one will automatically be installed.

The sxdp_flags field of struct sockaddr_xdp gets two new options
XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
corresponding XDP netlink install flags.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 include/linux/netdevice.h   | 10 +++++
 include/uapi/linux/if_xdp.h | 10 +++--
 net/core/dev.c              | 84 ++++++++++++++++++++++++++++++++---
 net/xdp/xsk.c               | 88 +++++++++++++++++++++++++++++++++++--
 4 files changed, 179 insertions(+), 13 deletions(-)

Comments

Jakub Kicinski Dec. 10, 2018, 2:17 a.m. UTC | #1
On Fri,  7 Dec 2018 12:44:30 +0100, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> This commit extends the XDP_ATTACH bind option by loading a builtin
> XDP program.
> 
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
> 
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
>   	return bpf_xsk_redirect(ctx);
>   }
> 
> For many XDP socket users, this program would be the most common one.
> 
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, different from
> regular XDP programs. The easiest way to look at it is as a 2-level
> hierarchy, where regular XDP programs has precedence over the builtin
> one.
> 
> If no regular XDP program is installed to the netdev, the builtin will
> be install. If the builtin program is installed, and a regular is
> installed, the regular XDP will have precedence over the builtin one.
> 
> Further, if a regular program is installed, and later removed, the
> builtin one will automatically be installed.
> 
> The sxdp_flags field of struct sockaddr_xdp gets two new options
> XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> corresponding XDP netlink install flags.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Why not add this functionality to libbpf?  Libbpf would really benefit
from better AF_XDP support, this would be a trivial part of it.

Are we going to stop here or take the next logical step of not invoking
the BPF program (and paying retpoline cost) at all if we know its the
"built-in/just redirect to the xsk" program?  Because if the answer is
"yes" we could just cut to the chase we can avoid all this unnecessary
complexity.
Björn Töpel Dec. 10, 2018, 7:29 a.m. UTC | #2
Den mån 10 dec. 2018 kl 03:17 skrev Jakub Kicinski
<jakub.kicinski@netronome.com>:
>
> On Fri,  7 Dec 2018 12:44:30 +0100, Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > This commit extends the XDP_ATTACH bind option by loading a builtin
> > XDP program.
> >
> > The builtin program is the simplest program possible to redirect a
> > frame to an attached socket. In restricted C it would look like this:
> >
> >   SEC("xdp")
> >   int xdp_prog(struct xdp_md *ctx)
> >   {
> >       return bpf_xsk_redirect(ctx);
> >   }
> >
> > For many XDP socket users, this program would be the most common one.
> >
> > The builtin program loaded via XDP_ATTACH behaves, from an
> > install-to-netdev/uninstall-from-netdev point of view, different from
> > regular XDP programs. The easiest way to look at it is as a 2-level
> > hierarchy, where regular XDP programs has precedence over the builtin
> > one.
> >
> > If no regular XDP program is installed to the netdev, the builtin will
> > be install. If the builtin program is installed, and a regular is
> > installed, the regular XDP will have precedence over the builtin one.
> >
> > Further, if a regular program is installed, and later removed, the
> > builtin one will automatically be installed.
> >
> > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > corresponding XDP netlink install flags.
> >
> > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>
> Why not add this functionality to libbpf?  Libbpf would really benefit
> from better AF_XDP support, this would be a trivial part of it.
>

Hmm, maybe this would be a better intermediate step (given the
discussions on per-queue programs/builtins). I.e. the first 3 patches
just assigning an AF_XDP socket to a Rx queue, and the
bpf_xsk_redirect, and supply the builtin program as part of libbpf.

Regarding AF_XDP and libbpf -- we're planning to post the first AF_XDP
patches this week.

> Are we going to stop here or take the next logical step of not invoking
> the BPF program (and paying retpoline cost) at all if we know its the
> "built-in/just redirect to the xsk" program?  Because if the answer is
> "yes" we could just cut to the chase we can avoid all this unnecessary
> complexity.

A next step could be avoiding the retpoline by a builtin check in
bpf_prog_run_xdp (unless you're HW offloaded). However, we do not want
to bypass the BPF control-plane! The indirect call in bpf_prog_run_xdp
is really a performance killer. A batching bpf_prog_run_xdp version
would help some, but it wouldn't be as good as removing the call.

What I don't like about my series is having raw BPF-programs (flow of
instructions instructions) in the kernel. It's messy and hard to read.
Maybe it would be possible to teach the verifier to detect that the
program is "a trivial pass everything to a socket". :-)

As you say, making the XDP loading more complex does leave a bad taste
in the mouth.


Björn
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a6cc68d2504c..a3094f1a9fcb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2039,6 +2039,13 @@  struct net_device {
 	struct lock_class_key	*qdisc_running_key;
 	bool			proto_down;
 	unsigned		wol_enabled:1;
+
+#ifdef CONFIG_XDP_SOCKETS
+	struct bpf_prog		*xsk_prog;
+	u32			xsk_prog_flags;
+	bool			xsk_prog_running;
+	int			xsk_prog_ref;
+#endif
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -3638,6 +3645,9 @@  struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
 
 typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
+int dev_xsk_prog_install(struct net_device *dev, struct bpf_prog *prog,
+			 u32 flags);
+void dev_xsk_prog_uninstall(struct net_device *dev);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		      int fd, u32 flags);
 u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op,
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index bd76235c2749..b8fb3200f640 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -13,10 +13,12 @@ 
 #include <linux/types.h>
 
 /* Options for the sxdp_flags field */
-#define XDP_SHARED_UMEM	(1 << 0)
-#define XDP_COPY	(1 << 1) /* Force copy-mode */
-#define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
-#define XDP_ATTACH	(1 << 3)
+#define XDP_SHARED_UMEM		(1 << 0)
+#define XDP_COPY		(1 << 1) /* Force copy-mode */
+#define XDP_ZEROCOPY		(1 << 2) /* Force zero-copy mode */
+#define XDP_ATTACH		(1 << 3)
+#define XDP_BUILTIN_SKB_MODE	(1 << 4)
+#define XDP_BUILTIN_DRV_MODE	(1 << 5)
 
 struct sockaddr_xdp {
 	__u16 sxdp_family;
diff --git a/net/core/dev.c b/net/core/dev.c
index abe50c424b29..0a1c30da2f87 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7879,6 +7879,70 @@  static void dev_xdp_uninstall(struct net_device *dev)
 					NULL));
 }
 
+#ifdef CONFIG_XDP_SOCKETS
+int dev_xsk_prog_install(struct net_device *dev, struct bpf_prog *prog,
+			 u32 flags)
+{
+	ASSERT_RTNL();
+
+	if (dev->xsk_prog) {
+		if (prog != dev->xsk_prog)
+			return -EINVAL;
+		if (flags && flags != dev->xsk_prog_flags)
+			return -EINVAL;
+	}
+
+	if (dev->xsk_prog) {
+		dev->xsk_prog_ref++;
+		return 0;
+	}
+
+	dev->xsk_prog = bpf_prog_inc(prog);
+	dev->xsk_prog_flags = flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
+	dev->xsk_prog_ref = 1;
+	(void)dev_change_xdp_fd(dev, NULL, -1, dev->xsk_prog_flags);
+	return 0;
+}
+
+void dev_xsk_prog_uninstall(struct net_device *dev)
+{
+	ASSERT_RTNL();
+
+	if (--dev->xsk_prog_ref == 0) {
+		bpf_prog_put(dev->xsk_prog);
+		dev->xsk_prog = NULL;
+		if (dev->xsk_prog_running)
+			(void)dev_change_xdp_fd(dev, NULL, -1,
+						dev->xsk_prog_flags);
+		dev->xsk_prog_flags = 0;
+		dev->xsk_prog_running = false;
+	}
+}
+#endif
+
+static void dev_xsk_prog_pre_load(struct net_device *dev, int fd,
+				  struct bpf_prog **prog, u32 *flags)
+{
+#ifdef CONFIG_XDP_SOCKETS
+	if (fd >= 0)
+		return;
+
+	if (dev->xsk_prog) {
+		*prog = bpf_prog_inc(dev->xsk_prog);
+		*flags = dev->xsk_prog_flags;
+		dev->xsk_prog_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
+	}
+#endif
+}
+
+static void dev_xsk_prog_post_load(struct net_device *dev, int err,
+				   struct bpf_prog *prog)
+{
+#ifdef CONFIG_XDP_SOCKETS
+	dev->xsk_prog_running = prog && prog == dev->xsk_prog && err >= 0;
+#endif
+}
+
 /**
  *	dev_change_xdp_fd - set or clear a bpf program for a device rx path
  *	@dev: device
@@ -7909,16 +7973,25 @@  int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	if (bpf_op == bpf_chk)
 		bpf_chk = generic_xdp_install;
 
-	if (fd >= 0) {
+	dev_xsk_prog_pre_load(dev, fd, &prog, &flags);
+
+	if (fd >= 0 || prog) {
 		if (__dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG) ||
-		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW))
+		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW)) {
+			if (prog)
+				bpf_prog_put(prog);
 			return -EEXIST;
+		}
 		if ((flags & XDP_FLAGS_UPDATE_IF_NOEXIST) &&
-		    __dev_xdp_query(dev, bpf_op, query))
+		    __dev_xdp_query(dev, bpf_op, query)) {
+			if (prog)
+				bpf_prog_put(prog);
 			return -EBUSY;
+		}
 
-		prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP,
-					     bpf_op == ops->ndo_bpf);
+		if (!prog)
+			prog = bpf_prog_get_type_dev(fd, BPF_PROG_TYPE_XDP,
+						     bpf_op == ops->ndo_bpf);
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 
@@ -7934,6 +8007,7 @@  int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	if (err < 0 && prog)
 		bpf_prog_put(prog);
 
+	dev_xsk_prog_post_load(dev, err, prog);
 	return err;
 }
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 1eff7ac8596d..0d15d25694c4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -30,6 +30,15 @@ 
 
 #define TX_BATCH_SIZE 16
 
+static const struct bpf_insn xsk_redirect_prog_insn[] = {
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_xsk_redirect),
+	BPF_EXIT_INSN(),
+};
+
+/* Synchronized via rtnl lock */
+static struct bpf_prog *xsk_redirect_prog; /*builtin XDP program */
+static int xsk_redirect_prog_ref;
+
 static struct xdp_sock *xdp_sk(struct sock *sk)
 {
 	return (struct xdp_sock *)sk;
@@ -347,16 +356,87 @@  static int xsk_init_queue(u32 entries, struct xsk_queue **queue,
 	return 0;
 }
 
+static struct bpf_prog *xsk_builtin_prog_get(void)
+{
+	union bpf_attr attr = {};
+	struct bpf_prog *prog;
+
+	if (xsk_redirect_prog) {
+		xsk_redirect_prog_ref++;
+		return xsk_redirect_prog;
+	}
+
+	attr.prog_type = BPF_PROG_TYPE_XDP;
+	attr.insn_cnt = ARRAY_SIZE(xsk_redirect_prog_insn);
+	attr.insns = (uintptr_t)xsk_redirect_prog_insn;
+	attr.license = (uintptr_t)"GPL";
+	memcpy(attr.prog_name, "AF_XDP_BUILTIN",
+	       min_t(size_t, sizeof("AF_XDP_BUILTIN") - 1,
+		     BPF_OBJ_NAME_LEN - 1));
+
+	prog = bpf_prog_load_builtin(&attr);
+	if (IS_ERR(prog)) {
+		WARN(1, "Failed (%d) to load builtin XDP program!\n",
+		     (int)PTR_ERR(prog));
+		return prog;
+	}
+
+	xsk_redirect_prog = prog;
+	xsk_redirect_prog_ref = 1;
+	return xsk_redirect_prog;
+}
+
+static void xsk_builtin_prog_put(void)
+{
+	if (--xsk_redirect_prog_ref == 0) {
+		bpf_prog_put(xsk_redirect_prog);
+		xsk_redirect_prog = NULL;
+	}
+}
+
 static void xsk_detach(struct xdp_sock *xs)
 {
-	WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, NULL);
+	rtnl_lock();
+	if (READ_ONCE(xs->dev->_rx[xs->queue_id].xsk)) {
+		dev_xsk_prog_uninstall(xs->dev);
+		xsk_builtin_prog_put();
+		WRITE_ONCE(xs->dev->_rx[xs->queue_id].xsk, NULL);
+	}
+	rtnl_unlock();
 }
 
-static int xsk_attach(struct xdp_sock *xs, struct net_device *dev, u16 qid)
+static int xsk_attach(struct xdp_sock *xs, struct net_device *dev, u16 qid,
+		      u32 bind_flags)
 {
-	WRITE_ONCE(dev->_rx[qid].xsk, xs);
+	struct bpf_prog *prog;
+	u32 flags = 0;
+	int err;
+
+	rtnl_lock();
+	prog = xsk_builtin_prog_get();
+	if (IS_ERR(prog)) {
+		err = PTR_ERR(prog);
+		goto out;
+	}
 
+	if (bind_flags & XDP_BUILTIN_SKB_MODE)
+		flags |= XDP_FLAGS_SKB_MODE;
+	if (bind_flags & XDP_BUILTIN_DRV_MODE)
+		flags |= XDP_FLAGS_DRV_MODE;
+
+	err = dev_xsk_prog_install(dev, prog, flags);
+	if (err)
+		goto out_put;
+
+	WRITE_ONCE(dev->_rx[qid].xsk, xs);
+	rtnl_unlock();
 	return 0;
+
+out_put:
+	xsk_builtin_prog_put();
+out:
+	rtnl_unlock();
+	return err;
 }
 
 static int xsk_release(struct socket *sock)
@@ -502,7 +582,7 @@  static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 			goto out_unlock;
 
 		if (flags & XDP_ATTACH) {
-			err = xsk_attach(xs, dev, qid);
+			err = xsk_attach(xs, dev, qid, flags);
 			if (err)
 				goto out_unlock;
 		}