mbox series

[bpf-next,0/7] Add XDP_ATTACH bind() flag to AF_XDP sockets

Message ID 20181207114431.18038-1-bjorn.topel@gmail.com
Headers show
Series Add XDP_ATTACH bind() flag to AF_XDP sockets | expand

Message

Björn Töpel Dec. 7, 2018, 11:44 a.m. UTC
From: Björn Töpel <bjorn.topel@intel.com>

Hi!

This patch set adds support for a new XDP socket bind option,
XDP_ATTACH.

The rationale behind attach is performance and ease of use. Many XDP
socket users just need a simple way of creating/binding a socket and
receiving frames right away without loading an XDP program.

XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
is a kernel provided XDP program that is installed to the netdev when
XDP_ATTACH is being passed as a bind() flag.

The builtin program is the simplest program possible to redirect a
frame to an attached socket. In restricted C it would look like this:
    
  SEC("xdp")
  int xdp_prog(struct xdp_md *ctx)
  {
        return bpf_xsk_redirect(ctx);
  }
    
The builtin program loaded via XDP_ATTACH behaves, from an
install-to-netdev/uninstall-from-netdev point of view, differently
from regular XDP programs. The easiest way to look at it is as a
2-level hierarchy, where regular XDP programs has precedence over the
builtin one.
    
If no regular XDP program is installed to the netdev, the builtin will
be install. If the builtin program is installed, and a regular is
installed, regular XDP program will have precedence over the builtin
one.
    
Further, if a regular program is installed, and later removed, the
builtin one will automatically be installed.
    
The sxdp_flags field of struct sockaddr_xdp gets two new options
XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
corresponding XDP netlink install flags.

The builtin XDP program functionally adds even more complexity to the
already hard to read dev_change_xdp_fd. Maybe it would be simpler to
store the program in the struct net_device together with the install
flags instead of calling the ndo_bpf multiple times?

The outline of the series is as following:
  patch 1-2: Introduce the first part of XDP_ATTACH, simply adding
             the socket to the netdev structure.
  patch 3:   Add a new BPF function, bpf_xsk_redirect, that 
             redirects a frame to an attached socket.
  patch 4-5: Preparatory commits for built in BPF programs
  patch 6:   Make XDP_ATTACH load a builtin XDP program
  patch 7:   Extend the samples application with XDP_ATTACH
             support

Patch 1 through 3 gives the performance boost and make it possible to
use AF_XDP sockets without an XSKMAP, but still requires an explicit
XDP program to be loaded.

Patch 4 through 6 make it possible to use XDP socket without explictly
loading an XDP program.

The performance numbers for rxdrop (Intel(R) Xeon(R) Gold 6154 CPU @
3.00GHz):

XDP_SKB:
XSKMAP:     2.8 Mpps
XDP_ATTACH: 2.9 Mpps

XDP_DRV - copy:
XSKMAP:     8.5 Mpps
XDP_ATTACH: 9.3 Mpps

XDP_DRV - zero-copy:
XSKMAP:     15.1 Mpps
XDP_ATTACH: 17.3 Mpps

Thanks!
Björn


Björn Töpel (7):
  xsk: simplify AF_XDP socket teardown
  xsk: add XDP_ATTACH bind() flag
  bpf: add bpf_xsk_redirect function
  bpf: prepare for builtin bpf program
  bpf: add function to load builtin BPF program
  xsk: load a builtin XDP program on XDP_ATTACH
  samples: bpf: add support for XDP_ATTACH to xdpsock

 include/linux/bpf.h         |   2 +
 include/linux/filter.h      |   4 +
 include/linux/netdevice.h   |  11 +++
 include/net/xdp_sock.h      |   2 +
 include/trace/events/xdp.h  |  61 +++++++++++++++
 include/uapi/linux/bpf.h    |  14 +++-
 include/uapi/linux/if_xdp.h |   9 ++-
 kernel/bpf/syscall.c        |  91 ++++++++++++++--------
 net/core/dev.c              |  84 +++++++++++++++++++--
 net/core/filter.c           | 100 ++++++++++++++++++++++++
 net/xdp/xsk.c               | 146 +++++++++++++++++++++++++++++-------
 samples/bpf/xdpsock_user.c  | 108 ++++++++++++++++----------
 12 files changed, 524 insertions(+), 108 deletions(-)

Comments

Jesper Dangaard Brouer Dec. 7, 2018, 1:42 p.m. UTC | #1
On Fri,  7 Dec 2018 12:44:24 +0100
Björn Töpel <bjorn.topel@gmail.com> wrote:

> The rationale behind attach is performance and ease of use. Many XDP
> socket users just need a simple way of creating/binding a socket and
> receiving frames right away without loading an XDP program.
> 
> XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> is a kernel provided XDP program that is installed to the netdev when
> XDP_ATTACH is being passed as a bind() flag.
> 
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
>     
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
>         return bpf_xsk_redirect(ctx);
>   }
>     
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, differently
> from regular XDP programs. The easiest way to look at it is as a
> 2-level hierarchy, where regular XDP programs has precedence over the
> builtin one.
>     
> If no regular XDP program is installed to the netdev, the builtin will
> be install. If the builtin program is installed, and a regular is
> installed, regular XDP program will have precedence over the builtin
> one.
>     
> Further, if a regular program is installed, and later removed, the
> builtin one will automatically be installed.
>     
> The sxdp_flags field of struct sockaddr_xdp gets two new options
> XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> corresponding XDP netlink install flags.
> 
> The builtin XDP program functionally adds even more complexity to the
> already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> store the program in the struct net_device together with the install
> flags instead of calling the ndo_bpf multiple times?

(As far as I can see from reading the code, correct me if I'm wrong.)

If an AF_XDP program uses XDP_ATTACH, then it installs the
builtin-program as the XDP program on the "entire" device.  That means
all RX-queues will call this XDP-bpf program (indirect call), and it is
actually only relevant for the specific queue_index.  Yes, the helper
call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
and return XDP_PASS if it is NULL:

+BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
+{
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	struct xdp_sock *xsk;
+
+	xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
+	if (xsk) {
+		ri->xsk = xsk;
+		return XDP_REDIRECT;
+	}
+
+	return XDP_PASS;
+}

Why do every normal XDP_PASS packet have to pay this overhead (indirect
call), when someone loads an AF_XDP socket program?  The AF_XDP socket
is tied hard and only relevant to a specific RX-queue (which is why we
get a performance boost due to SPSC queues).

I acknowledge there is a need for this, but this use-case shows there
is a need for attaching XDP programs per RX-queue basis.
Björn Töpel Dec. 7, 2018, 2:01 p.m. UTC | #2
Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
>
> On Fri,  7 Dec 2018 12:44:24 +0100
> Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> > The rationale behind attach is performance and ease of use. Many XDP
> > socket users just need a simple way of creating/binding a socket and
> > receiving frames right away without loading an XDP program.
> >
> > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > is a kernel provided XDP program that is installed to the netdev when
> > XDP_ATTACH is being passed as a bind() flag.
> >
> > The builtin program is the simplest program possible to redirect a
> > frame to an attached socket. In restricted C it would look like this:
> >
> >   SEC("xdp")
> >   int xdp_prog(struct xdp_md *ctx)
> >   {
> >         return bpf_xsk_redirect(ctx);
> >   }
> >
> > The builtin program loaded via XDP_ATTACH behaves, from an
> > install-to-netdev/uninstall-from-netdev point of view, differently
> > from regular XDP programs. The easiest way to look at it is as a
> > 2-level hierarchy, where regular XDP programs has precedence over the
> > builtin one.
> >
> > If no regular XDP program is installed to the netdev, the builtin will
> > be install. If the builtin program is installed, and a regular is
> > installed, regular XDP program will have precedence over the builtin
> > one.
> >
> > Further, if a regular program is installed, and later removed, the
> > builtin one will automatically be installed.
> >
> > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > corresponding XDP netlink install flags.
> >
> > The builtin XDP program functionally adds even more complexity to the
> > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > store the program in the struct net_device together with the install
> > flags instead of calling the ndo_bpf multiple times?
>
> (As far as I can see from reading the code, correct me if I'm wrong.)
>
> If an AF_XDP program uses XDP_ATTACH, then it installs the
> builtin-program as the XDP program on the "entire" device.  That means
> all RX-queues will call this XDP-bpf program (indirect call), and it is
> actually only relevant for the specific queue_index.  Yes, the helper
> call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> and return XDP_PASS if it is NULL:
>

Yes, you are correct. The builtin XDP program, just like a regular XDP
program, affects the whole netdev. So, yes the non-AF_XDP queues would
get a performance hit from this. Just to reiterate -- this isn't new
for this series. This has always been the case for XDP when acting on
just one queue.

> +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> +{
> +       struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> +       struct xdp_sock *xsk;
> +
> +       xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> +       if (xsk) {
> +               ri->xsk = xsk;
> +               return XDP_REDIRECT;
> +       }
> +
> +       return XDP_PASS;
> +}
>
> Why do every normal XDP_PASS packet have to pay this overhead (indirect
> call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> is tied hard and only relevant to a specific RX-queue (which is why we
> get a performance boost due to SPSC queues).
>
> I acknowledge there is a need for this, but this use-case shows there
> is a need for attaching XDP programs per RX-queue basis.
>

From my AF_XDP perspective, having a program per queue would make
sense. The discussion of a per-queue has been up before, and I think
the conclusion was that it would be too complex from a
configuration/tooling point-of-view. Again, for AF_XDP this would be
great.

When we started to hack on AF_PACKET v4, we had some ideas of doing
the "queue slicing" on a netdev level. So, e.g. take a netdev, and
create, say, macvlans that took over parts of parents queues
(something in line of what John did with NETIF_F_HW_L2FW_DOFFLOAD for
macvlan) and then use the macvlan interface as the dedicated AF_XDP
interface.

Personally, I like the current queue slicing model, and having a way
of loading an XDP program per queue would be nice -- unless the UX for
the poor sysadmin will be terrible. :-)


Björn

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
Björn Töpel Dec. 7, 2018, 3:27 p.m. UTC | #3
Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
>
> On Fri,  7 Dec 2018 12:44:24 +0100
> Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> > The rationale behind attach is performance and ease of use. Many XDP
> > socket users just need a simple way of creating/binding a socket and
> > receiving frames right away without loading an XDP program.
> >
> > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > is a kernel provided XDP program that is installed to the netdev when
> > XDP_ATTACH is being passed as a bind() flag.
> >
> > The builtin program is the simplest program possible to redirect a
> > frame to an attached socket. In restricted C it would look like this:
> >
> >   SEC("xdp")
> >   int xdp_prog(struct xdp_md *ctx)
> >   {
> >         return bpf_xsk_redirect(ctx);
> >   }
> >
> > The builtin program loaded via XDP_ATTACH behaves, from an
> > install-to-netdev/uninstall-from-netdev point of view, differently
> > from regular XDP programs. The easiest way to look at it is as a
> > 2-level hierarchy, where regular XDP programs has precedence over the
> > builtin one.
> >
> > If no regular XDP program is installed to the netdev, the builtin will
> > be install. If the builtin program is installed, and a regular is
> > installed, regular XDP program will have precedence over the builtin
> > one.
> >
> > Further, if a regular program is installed, and later removed, the
> > builtin one will automatically be installed.
> >
> > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > corresponding XDP netlink install flags.
> >
> > The builtin XDP program functionally adds even more complexity to the
> > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > store the program in the struct net_device together with the install
> > flags instead of calling the ndo_bpf multiple times?
>
> (As far as I can see from reading the code, correct me if I'm wrong.)
>
> If an AF_XDP program uses XDP_ATTACH, then it installs the
> builtin-program as the XDP program on the "entire" device.  That means
> all RX-queues will call this XDP-bpf program (indirect call), and it is
> actually only relevant for the specific queue_index.  Yes, the helper
> call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> and return XDP_PASS if it is NULL:
>

A side-note: What one can do, which I did for the plumbers work, is
bypassing the indirect call in bpf_prog_run_xdp by doing a "if the XDP
program is a builtin, call internal_bpf_xsk_redirect directly". Then,
the XDP_PASS path wont be taxed by the indirect call -- but only for
this special XDP_ATTACH program. And you'll still get an additional
if-statement in for the skb-path.


Björn

> +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> +{
> +       struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> +       struct xdp_sock *xsk;
> +
> +       xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> +       if (xsk) {
> +               ri->xsk = xsk;
> +               return XDP_REDIRECT;
> +       }
> +
> +       return XDP_PASS;
> +}
>
> Why do every normal XDP_PASS packet have to pay this overhead (indirect
> call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> is tied hard and only relevant to a specific RX-queue (which is why we
> get a performance boost due to SPSC queues).
>
> I acknowledge there is a need for this, but this use-case shows there
> is a need for attaching XDP programs per RX-queue basis.
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
Alexei Starovoitov Dec. 7, 2018, 9:21 p.m. UTC | #4
On Fri, Dec 07, 2018 at 12:44:24PM +0100, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Hi!
> 
> This patch set adds support for a new XDP socket bind option,
> XDP_ATTACH.
> 
> The rationale behind attach is performance and ease of use. Many XDP
> socket users just need a simple way of creating/binding a socket and
> receiving frames right away without loading an XDP program.
> 
> XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> is a kernel provided XDP program that is installed to the netdev when
> XDP_ATTACH is being passed as a bind() flag.
> 
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
>     
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
>         return bpf_xsk_redirect(ctx);
>   }
>     
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, differently
> from regular XDP programs. The easiest way to look at it is as a
> 2-level hierarchy, where regular XDP programs has precedence over the
> builtin one.

The feature makes sense to me.
May be XDP_ATTACH_BUILTIN would be a better name ?
Also I think it needs another parameter to say which builtin
program to use.
This unconditional xsk_redirect is fine for performance
benchmarking, but for production I suspect the users would want
an easy way to stay safe when they're playing with AF_XDP.
So another builtin program that redirects ssh and ping traffic
back to the kernel would be a nice addition.
Björn Töpel Dec. 8, 2018, 9:31 a.m. UTC | #5
Den fre 7 dec. 2018 kl 22:21 skrev Alexei Starovoitov
<alexei.starovoitov@gmail.com>:
>
> On Fri, Dec 07, 2018 at 12:44:24PM +0100, Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > Hi!
> >
> > This patch set adds support for a new XDP socket bind option,
> > XDP_ATTACH.
> >
> > The rationale behind attach is performance and ease of use. Many XDP
> > socket users just need a simple way of creating/binding a socket and
> > receiving frames right away without loading an XDP program.
> >
> > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > is a kernel provided XDP program that is installed to the netdev when
> > XDP_ATTACH is being passed as a bind() flag.
> >
> > The builtin program is the simplest program possible to redirect a
> > frame to an attached socket. In restricted C it would look like this:
> >
> >   SEC("xdp")
> >   int xdp_prog(struct xdp_md *ctx)
> >   {
> >         return bpf_xsk_redirect(ctx);
> >   }
> >
> > The builtin program loaded via XDP_ATTACH behaves, from an
> > install-to-netdev/uninstall-from-netdev point of view, differently
> > from regular XDP programs. The easiest way to look at it is as a
> > 2-level hierarchy, where regular XDP programs has precedence over the
> > builtin one.
>
> The feature makes sense to me.
> May be XDP_ATTACH_BUILTIN would be a better name ?

Yes, agree, or maybe XDP_BUILTIN_ATTACH? Regardless, I'll change the
name for the next revision.

> Also I think it needs another parameter to say which builtin
> program to use.

Yup, I had a plan to add the parameter when it's actually more than
*one* builtin, but you're right, let's do it right away. I'll add a
builtin prog enum field to the struct sockaddr_xdp.

> This unconditional xsk_redirect is fine for performance
> benchmarking, but for production I suspect the users would want
> an easy way to stay safe when they're playing with AF_XDP.

For setups that don't direct the flows explicitly by HW filters,  yes!

> So another builtin program that redirects ssh and ping traffic
> back to the kernel would be a nice addition.
>

I suspect AF_XDP users would prefer redirecting packets to the kernel
via the CPUMAP instead of XDP_PASS -- not paying for the ipstack on
the AF_XDP core. Another builtin would be a tcpdump-like behavior, but
that would require an XDP clone (which Magnus is actually
experimenting with!).

I'll address your input and get back with a new revision. Thanks for
spending time on the series!


Björn
Qi Zhang Dec. 8, 2018, 10:08 a.m. UTC | #6
Hi:

Seems there are only 6 patches of the patch set in patchwork

https://patchwork.ozlabs.org/project/netdev/list/?submitter=70569
https://patchwork.ozlabs.org/project/netdev/list/?series=80389

Anyone can help to check why patch 7/7 is missing?

Thanks
Qi

> -----Original Message-----
> From: Björn Töpel [mailto:bjorn.topel@gmail.com]
> Sent: Friday, December 7, 2018 7:45 PM
> To: bjorn.topel@gmail.com; Karlsson, Magnus <magnus.karlsson@intel.com>;
> magnus.karlsson@gmail.com; ast@kernel.org; daniel@iogearbox.net;
> netdev@vger.kernel.org
> Cc: Topel, Bjorn <bjorn.topel@intel.com>; brouer@redhat.com;
> u9012063@gmail.com; Zhang, Qi Z <qi.z.zhang@intel.com>
> Subject: [PATCH bpf-next 7/7] samples: bpf: add support for XDP_ATTACH to
> xdpsock
> 
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Teach the sample xdpsock application about XDP_ATTACH.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  samples/bpf/xdpsock_user.c | 108 +++++++++++++++++++++++--------------
>  1 file changed, 67 insertions(+), 41 deletions(-)
> 
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c index
> 57ecadc58403..12f908b60468 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -633,7 +633,8 @@ static void int_exit(int sig)  {
>  	(void)sig;
>  	dump_stats();
> -	bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
> +	if (!(opt_xdp_bind_flags & XDP_ATTACH))
> +		bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
>  	exit(EXIT_SUCCESS);
>  }
> 
> @@ -650,6 +651,7 @@ static struct option long_options[] = {
>  	{"interval", required_argument, 0, 'n'},
>  	{"zero-copy", no_argument, 0, 'z'},
>  	{"copy", no_argument, 0, 'c'},
> +	{"attach", no_argument, 0, 'a'},
>  	{0, 0, 0, 0}
>  };
> 
> @@ -670,6 +672,7 @@ static void usage(const char *prog)
>  		"  -n, --interval=n	Specify statistics update interval (default 1
> sec).\n"
>  		"  -z, --zero-copy      Force zero-copy mode.\n"
>  		"  -c, --copy           Force copy mode.\n"
> +		"  -a, --attach         XDP_ATTACH mode.\n"
>  		"\n";
>  	fprintf(stderr, str, prog);
>  	exit(EXIT_FAILURE);
> @@ -682,7 +685,7 @@ static void parse_command_line(int argc, char **argv)
>  	opterr = 0;
> 
>  	for (;;) {
> -		c = getopt_long(argc, argv, "rtli:q:psSNn:cz", long_options,
> +		c = getopt_long(argc, argv, "rtli:q:psSNn:cza", long_options,
>  				&option_index);
>  		if (c == -1)
>  			break;
> @@ -725,11 +728,22 @@ static void parse_command_line(int argc, char
> **argv)
>  		case 'c':
>  			opt_xdp_bind_flags |= XDP_COPY;
>  			break;
> +		case 'a':
> +			opt_xdp_bind_flags |= XDP_ATTACH;
> +			break;
>  		default:
>  			usage(basename(argv[0]));
>  		}
>  	}
> 
> +	if (opt_xdp_bind_flags & XDP_ATTACH) {
> +		if (opt_xdp_flags & XDP_FLAGS_DRV_MODE)
> +			opt_xdp_bind_flags |= XDP_BUILTIN_DRV_MODE;
> +		if (opt_xdp_flags & XDP_FLAGS_SKB_MODE)
> +			opt_xdp_bind_flags |= XDP_BUILTIN_SKB_MODE;
> +
> +	}
> +
>  	opt_ifindex = if_nametoindex(opt_if);
>  	if (!opt_ifindex) {
>  		fprintf(stderr, "ERROR: interface \"%s\" does not exist\n", @@
> -903,7 +917,7 @@ int main(int argc, char **argv)
>  	struct bpf_prog_load_attr prog_load_attr = {
>  		.prog_type	= BPF_PROG_TYPE_XDP,
>  	};
> -	int prog_fd, qidconf_map, xsks_map;
> +	int prog_fd, qidconf_map, xsks_map = -1;
>  	struct bpf_object *obj;
>  	char xdp_filename[256];
>  	struct bpf_map *map;
> @@ -918,59 +932,71 @@ int main(int argc, char **argv)
>  		exit(EXIT_FAILURE);
>  	}
> 
> -	snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o", argv[0]);
> -	prog_load_attr.file = xdp_filename;
> +	if (!(opt_xdp_bind_flags & XDP_ATTACH)) {
> +		snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o",
> +			 argv[0]);
> +		prog_load_attr.file = xdp_filename;
> 
> -	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
> -		exit(EXIT_FAILURE);
> -	if (prog_fd < 0) {
> -		fprintf(stderr, "ERROR: no program found: %s\n",
> -			strerror(prog_fd));
> -		exit(EXIT_FAILURE);
> -	}
> +		if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
> +			exit(EXIT_FAILURE);
> +		if (prog_fd < 0) {
> +			fprintf(stderr, "ERROR: no program found: %s\n",
> +				strerror(prog_fd));
> +			exit(EXIT_FAILURE);
> +		}
> 
> -	map = bpf_object__find_map_by_name(obj, "qidconf_map");
> -	qidconf_map = bpf_map__fd(map);
> -	if (qidconf_map < 0) {
> -		fprintf(stderr, "ERROR: no qidconf map found: %s\n",
> -			strerror(qidconf_map));
> -		exit(EXIT_FAILURE);
> -	}
> +		map = bpf_object__find_map_by_name(obj, "qidconf_map");
> +		qidconf_map = bpf_map__fd(map);
> +		if (qidconf_map < 0) {
> +			fprintf(stderr, "ERROR: no qidconf map found: %s\n",
> +				strerror(qidconf_map));
> +			exit(EXIT_FAILURE);
> +		}
> 
> -	map = bpf_object__find_map_by_name(obj, "xsks_map");
> -	xsks_map = bpf_map__fd(map);
> -	if (xsks_map < 0) {
> -		fprintf(stderr, "ERROR: no xsks map found: %s\n",
> -			strerror(xsks_map));
> -		exit(EXIT_FAILURE);
> -	}
> +		map = bpf_object__find_map_by_name(obj, "xsks_map");
> +		xsks_map = bpf_map__fd(map);
> +		if (xsks_map < 0) {
> +			fprintf(stderr, "ERROR: no xsks map found: %s\n",
> +				strerror(xsks_map));
> +			exit(EXIT_FAILURE);
> +		}
> 
> -	if (bpf_set_link_xdp_fd(opt_ifindex, prog_fd, opt_xdp_flags) < 0) {
> -		fprintf(stderr, "ERROR: link set xdp fd failed\n");
> -		exit(EXIT_FAILURE);
> -	}
> +		if (bpf_set_link_xdp_fd(opt_ifindex, prog_fd,
> +					opt_xdp_flags) < 0) {
> +			fprintf(stderr, "ERROR: link set xdp fd failed\n");
> +			exit(EXIT_FAILURE);
> +		}
> 
> -	ret = bpf_map_update_elem(qidconf_map, &key, &opt_queue, 0);
> -	if (ret) {
> -		fprintf(stderr, "ERROR: bpf_map_update_elem qidconf\n");
> -		exit(EXIT_FAILURE);
> +		ret = bpf_map_update_elem(qidconf_map, &key, &opt_queue, 0);
> +		if (ret) {
> +			fprintf(stderr, "ERROR: bpf_map_update_elem qidconf\n");
> +			exit(EXIT_FAILURE);
> +		}
>  	}
> 
>  	/* Create sockets... */
>  	xsks[num_socks++] = xsk_configure(NULL);
> 
>  #if RR_LB
> +	if (opt_xdp_bind_flags & XDP_ATTACH)
> +		exit(EXIT_FAILURE);
> +
>  	for (i = 0; i < MAX_SOCKS - 1; i++)
>  		xsks[num_socks++] = xsk_configure(xsks[0]->umem);  #endif
> 
> -	/* ...and insert them into the map. */
> -	for (i = 0; i < num_socks; i++) {
> -		key = i;
> -		ret = bpf_map_update_elem(xsks_map, &key, &xsks[i]->sfd, 0);
> -		if (ret) {
> -			fprintf(stderr, "ERROR: bpf_map_update_elem %d\n", i);
> -			exit(EXIT_FAILURE);
> +	if (!(opt_xdp_bind_flags & XDP_ATTACH)) {
> +		/* ...and insert them into the map. */
> +		for (i = 0; i < num_socks; i++) {
> +			key = i;
> +			ret = bpf_map_update_elem(xsks_map, &key,
> +						  &xsks[i]->sfd, 0);
> +			if (ret) {
> +				fprintf(stderr,
> +					"ERROR: bpf_map_update_elem %d\n",
> +					i);
> +				exit(EXIT_FAILURE);
> +			}
>  		}
>  	}
> 
> --
> 2.19.1
Jesper Dangaard Brouer Dec. 8, 2018, 2:52 p.m. UTC | #7
On Fri, 7 Dec 2018 15:01:55 +0100
Björn Töpel <bjorn.topel@gmail.com> wrote:

> Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
> >
> > On Fri,  7 Dec 2018 12:44:24 +0100
> > Björn Töpel <bjorn.topel@gmail.com> wrote:
> >  
> > > The rationale behind attach is performance and ease of use. Many XDP
> > > socket users just need a simple way of creating/binding a socket and
> > > receiving frames right away without loading an XDP program.
> > >
> > > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > > is a kernel provided XDP program that is installed to the netdev when
> > > XDP_ATTACH is being passed as a bind() flag.
> > >
> > > The builtin program is the simplest program possible to redirect a
> > > frame to an attached socket. In restricted C it would look like this:
> > >
> > >   SEC("xdp")
> > >   int xdp_prog(struct xdp_md *ctx)
> > >   {
> > >         return bpf_xsk_redirect(ctx);
> > >   }
> > >
> > > The builtin program loaded via XDP_ATTACH behaves, from an
> > > install-to-netdev/uninstall-from-netdev point of view, differently
> > > from regular XDP programs. The easiest way to look at it is as a
> > > 2-level hierarchy, where regular XDP programs has precedence over the
> > > builtin one.
> > >
> > > If no regular XDP program is installed to the netdev, the builtin will
> > > be install. If the builtin program is installed, and a regular is
> > > installed, regular XDP program will have precedence over the builtin
> > > one.
> > >
> > > Further, if a regular program is installed, and later removed, the
> > > builtin one will automatically be installed.
> > >
> > > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > > corresponding XDP netlink install flags.
> > >
> > > The builtin XDP program functionally adds even more complexity to the
> > > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > > store the program in the struct net_device together with the install
> > > flags instead of calling the ndo_bpf multiple times?  
> >
> > (As far as I can see from reading the code, correct me if I'm wrong.)
> >
> > If an AF_XDP program uses XDP_ATTACH, then it installs the
> > builtin-program as the XDP program on the "entire" device.  That means
> > all RX-queues will call this XDP-bpf program (indirect call), and it is
> > actually only relevant for the specific queue_index.  Yes, the helper
> > call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> > and return XDP_PASS if it is NULL:
> >  
> 
> Yes, you are correct. The builtin XDP program, just like a regular XDP
> program, affects the whole netdev. So, yes the non-AF_XDP queues would
> get a performance hit from this. Just to reiterate -- this isn't new
> for this series. This has always been the case for XDP when acting on
> just one queue.
> 
> > +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> > +{
> > +       struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> > +       struct xdp_sock *xsk;
> > +
> > +       xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> > +       if (xsk) {
> > +               ri->xsk = xsk;
> > +               return XDP_REDIRECT;
> > +       }
> > +
> > +       return XDP_PASS;
> > +}
> >
> > Why do every normal XDP_PASS packet have to pay this overhead (indirect
> > call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> > is tied hard and only relevant to a specific RX-queue (which is why we
> > get a performance boost due to SPSC queues).
> >
> > I acknowledge there is a need for this, but this use-case shows there
> > is a need for attaching XDP programs per RX-queue basis.
> >  
> 
> From my AF_XDP perspective, having a program per queue would make
> sense. The discussion of a per-queue has been up before, and I think
> the conclusion was that it would be too complex from a
> configuration/tooling point-of-view. Again, for AF_XDP this would be
> great.

I really think it is about time that we introduce these per-queue XDP
programs.  From a configuration/tooling point-of-view, your proposed
solution have the same issues, we have to solve. (Like listing
introspection need to show the per-queue XDP/BPF programs). And you have
almost solved it, by keeping the per-queue program in net_device
struct.  

Alexei already requested more types of builtin programs.  But as the
XDP-driver API can only load XDP for the entire NIC, then how can you
use two builtin programs at the same time?
Jesper Dangaard Brouer Dec. 8, 2018, 3:12 p.m. UTC | #8
On Fri, 7 Dec 2018 13:21:08 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> for production I suspect the users would want
> an easy way to stay safe when they're playing with AF_XDP.
> So another builtin program that redirects ssh and ping traffic
> back to the kernel would be a nice addition.

Are you saying a buildin program that need to parse different kinds of
Eth-type headers (DSA, VLAN, QinqQ) and find the TCP port to match port
22 to return XDP_PASS, or else call AF_XDP redurect. That seems to be
pure overhead for this fast-path buildin program for AF_XDP.

Would a solution be to install a NIC hardware filter that redirect SSH
port 22 to another RX-queue. And then have a buildin program that
returns XDP_PASS installed on that RX-queue.   And change Bjørns
semantics, such that RX-queue programs takes precedence over the global
XDP program. This would also be a good fail safe in general for XDP.

If the RX-queues take precedence, I can use this fail safe approach.
E.g. when I want to test my new global XDP program, I'll use ethtool
match my management IP and send that to a specific RX-queue and my
fail-safe BPF program.
Andrew Lunn Dec. 8, 2018, 4:55 p.m. UTC | #9
On Sat, Dec 08, 2018 at 04:12:12PM +0100, Jesper Dangaard Brouer wrote:
> On Fri, 7 Dec 2018 13:21:08 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > for production I suspect the users would want
> > an easy way to stay safe when they're playing with AF_XDP.
> > So another builtin program that redirects ssh and ping traffic
> > back to the kernel would be a nice addition.
> 
> Are you saying a buildin program that need to parse different kinds of
> Eth-type headers (DSA, VLAN, QinqQ)

Hi Jesper

Parsing DSA headers is quite hard, since most of them are not
discoverable, you need to know they are there, and where they actually
are.

I also don't think there are any use cases for actually trying to
parse them on the master interface. You are more likely to want to run
the eBPF program on the slave interface, once the DSA header has been
removed, and it is clear what interface the frame is actually for.

	 Andrew
Björn Töpel Dec. 8, 2018, 6:43 p.m. UTC | #10
Den lör 8 dec. 2018 kl 15:52 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
>
> On Fri, 7 Dec 2018 15:01:55 +0100
> Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> > Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
> > >
> > > On Fri,  7 Dec 2018 12:44:24 +0100
> > > Björn Töpel <bjorn.topel@gmail.com> wrote:
> > >
> > > > The rationale behind attach is performance and ease of use. Many XDP
> > > > socket users just need a simple way of creating/binding a socket and
> > > > receiving frames right away without loading an XDP program.
> > > >
> > > > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > > > is a kernel provided XDP program that is installed to the netdev when
> > > > XDP_ATTACH is being passed as a bind() flag.
> > > >
> > > > The builtin program is the simplest program possible to redirect a
> > > > frame to an attached socket. In restricted C it would look like this:
> > > >
> > > >   SEC("xdp")
> > > >   int xdp_prog(struct xdp_md *ctx)
> > > >   {
> > > >         return bpf_xsk_redirect(ctx);
> > > >   }
> > > >
> > > > The builtin program loaded via XDP_ATTACH behaves, from an
> > > > install-to-netdev/uninstall-from-netdev point of view, differently
> > > > from regular XDP programs. The easiest way to look at it is as a
> > > > 2-level hierarchy, where regular XDP programs has precedence over the
> > > > builtin one.
> > > >
> > > > If no regular XDP program is installed to the netdev, the builtin will
> > > > be install. If the builtin program is installed, and a regular is
> > > > installed, regular XDP program will have precedence over the builtin
> > > > one.
> > > >
> > > > Further, if a regular program is installed, and later removed, the
> > > > builtin one will automatically be installed.
> > > >
> > > > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > > > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > > > corresponding XDP netlink install flags.
> > > >
> > > > The builtin XDP program functionally adds even more complexity to the
> > > > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > > > store the program in the struct net_device together with the install
> > > > flags instead of calling the ndo_bpf multiple times?
> > >
> > > (As far as I can see from reading the code, correct me if I'm wrong.)
> > >
> > > If an AF_XDP program uses XDP_ATTACH, then it installs the
> > > builtin-program as the XDP program on the "entire" device.  That means
> > > all RX-queues will call this XDP-bpf program (indirect call), and it is
> > > actually only relevant for the specific queue_index.  Yes, the helper
> > > call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> > > and return XDP_PASS if it is NULL:
> > >
> >
> > Yes, you are correct. The builtin XDP program, just like a regular XDP
> > program, affects the whole netdev. So, yes the non-AF_XDP queues would
> > get a performance hit from this. Just to reiterate -- this isn't new
> > for this series. This has always been the case for XDP when acting on
> > just one queue.
> >
> > > +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> > > +{
> > > +       struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> > > +       struct xdp_sock *xsk;
> > > +
> > > +       xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> > > +       if (xsk) {
> > > +               ri->xsk = xsk;
> > > +               return XDP_REDIRECT;
> > > +       }
> > > +
> > > +       return XDP_PASS;
> > > +}
> > >
> > > Why do every normal XDP_PASS packet have to pay this overhead (indirect
> > > call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> > > is tied hard and only relevant to a specific RX-queue (which is why we
> > > get a performance boost due to SPSC queues).
> > >
> > > I acknowledge there is a need for this, but this use-case shows there
> > > is a need for attaching XDP programs per RX-queue basis.
> > >
> >
> > From my AF_XDP perspective, having a program per queue would make
> > sense. The discussion of a per-queue has been up before, and I think
> > the conclusion was that it would be too complex from a
> > configuration/tooling point-of-view. Again, for AF_XDP this would be
> > great.
>
> I really think it is about time that we introduce these per-queue XDP
> programs.  From a configuration/tooling point-of-view, your proposed
> solution have the same issues, we have to solve. (Like listing
> introspection need to show the per-queue XDP/BPF programs). And you have
> almost solved it, by keeping the per-queue program in net_device
> struct.
>

I just to emphasize; My series is only a convenience method for
loading a (sticky/hierarchical) XDP program and a new bpf-function.
Nothing more. I'm not sure what you mean when you say that "proposed
solution have the same issues". Do you mean XDP in general?

*I'm* all in for a per-queue XDP program -- but I might be biased due
to AF_XDP :-P Let's explore in this thread what that would look like
and the semantics of that!

> Alexei already requested more types of builtin programs.  But as the
> XDP-driver API can only load XDP for the entire NIC, then how can you
> use two builtin programs at the same time?
>

You can't. Again, this is just another way of loading an XDP program.
If socket A loads a builtin B on dev C, and socket X tries to load a
builtin Y on dev C this will fail/clash. This is not different from
today. My view on builtin has been "an XDP program provided by the
kernel". A related problem is if socket A attaches with XDP_DRV and
socket B attaches with XDP_SKB on the same netdev. (Ick, this is the
messy parts of XDP. :-()

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
Björn Töpel Dec. 8, 2018, 6:50 p.m. UTC | #11
Den lör 8 dec. 2018 kl 16:12 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
>
> On Fri, 7 Dec 2018 13:21:08 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > for production I suspect the users would want
> > an easy way to stay safe when they're playing with AF_XDP.
> > So another builtin program that redirects ssh and ping traffic
> > back to the kernel would be a nice addition.
>
> Are you saying a buildin program that need to parse different kinds of
> Eth-type headers (DSA, VLAN, QinqQ) and find the TCP port to match port
> 22 to return XDP_PASS, or else call AF_XDP redurect. That seems to be
> pure overhead for this fast-path buildin program for AF_XDP.
>
> Would a solution be to install a NIC hardware filter that redirect SSH
> port 22 to another RX-queue. And then have a buildin program that
> returns XDP_PASS installed on that RX-queue.   And change Bjørns
> semantics, such that RX-queue programs takes precedence over the global
> XDP program. This would also be a good fail safe in general for XDP.
>

Exactly this; I'd say this is the most common way of using AF_XDP,
i.e. steer a certain flow to a Rx queue, and *all* packets on that
queue are pulled to the AF_XDP socket. This is why the builtin is the
way it is, it's what is being used, not only for benchmarking
scenarios.

> If the RX-queues take precedence, I can use this fail safe approach.
> E.g. when I want to test my new global XDP program, I'll use ethtool
> match my management IP and send that to a specific RX-queue and my
> fail-safe BPF program.
>

Interesting take to have a *per queue* XDP program that overrides the
regular one. Maybe this is a better way to use builtins?

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
Jesper Dangaard Brouer Dec. 8, 2018, 8:37 p.m. UTC | #12
On Sat, 8 Dec 2018 17:55:05 +0100
Andrew Lunn <andrew@lunn.ch> wrote:

> On Sat, Dec 08, 2018 at 04:12:12PM +0100, Jesper Dangaard Brouer wrote:
> > On Fri, 7 Dec 2018 13:21:08 -0800
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >   
> > > for production I suspect the users would want
> > > an easy way to stay safe when they're playing with AF_XDP.
> > > So another builtin program that redirects ssh and ping traffic
> > > back to the kernel would be a nice addition.  
> > 
> > Are you saying a buildin program that need to parse different kinds of
> > Eth-type headers (DSA, VLAN, QinqQ)  
> 
> Hi Jesper
> 
> Parsing DSA headers is quite hard, since most of them are not
> discoverable, you need to know they are there, and where they actually
> are.
> 
> I also don't think there are any use cases for actually trying to
> parse them on the master interface. You are more likely to want to run
> the eBPF program on the slave interface, once the DSA header has been
> removed, and it is clear what interface the frame is actually for.

That is not how XDP works. XDP must run on the "master" physical driver
interface, as the "slave" interface is a virtual DSA interface.  I did
mention DSA because I'm handling that on the EspressoBin implementation.
Jesper Dangaard Brouer Dec. 8, 2018, 8:42 p.m. UTC | #13
On Sat, 8 Dec 2018 19:43:38 +0100
Björn Töpel <bjorn.topel@gmail.com> wrote:

> Den lör 8 dec. 2018 kl 15:52 skrev Jesper Dangaard Brouer <brouer@redhat.com>:
> >
> > On Fri, 7 Dec 2018 15:01:55 +0100
> > Björn Töpel <bjorn.topel@gmail.com> wrote:
> >  
> > > Den fre 7 dec. 2018 kl 14:42 skrev Jesper Dangaard Brouer <brouer@redhat.com>:  
> > > >
> > > > On Fri,  7 Dec 2018 12:44:24 +0100
> > > > Björn Töpel <bjorn.topel@gmail.com> wrote:
> > > >  
> > > > > The rationale behind attach is performance and ease of use. Many XDP
> > > > > socket users just need a simple way of creating/binding a socket and
> > > > > receiving frames right away without loading an XDP program.
> > > > >
> > > > > XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> > > > > is a kernel provided XDP program that is installed to the netdev when
> > > > > XDP_ATTACH is being passed as a bind() flag.
> > > > >
> > > > > The builtin program is the simplest program possible to redirect a
> > > > > frame to an attached socket. In restricted C it would look like this:
> > > > >
> > > > >   SEC("xdp")
> > > > >   int xdp_prog(struct xdp_md *ctx)
> > > > >   {
> > > > >         return bpf_xsk_redirect(ctx);
> > > > >   }
> > > > >
> > > > > The builtin program loaded via XDP_ATTACH behaves, from an
> > > > > install-to-netdev/uninstall-from-netdev point of view, differently
> > > > > from regular XDP programs. The easiest way to look at it is as a
> > > > > 2-level hierarchy, where regular XDP programs has precedence over the
> > > > > builtin one.
> > > > >
> > > > > If no regular XDP program is installed to the netdev, the builtin will
> > > > > be install. If the builtin program is installed, and a regular is
> > > > > installed, regular XDP program will have precedence over the builtin
> > > > > one.
> > > > >
> > > > > Further, if a regular program is installed, and later removed, the
> > > > > builtin one will automatically be installed.
> > > > >
> > > > > The sxdp_flags field of struct sockaddr_xdp gets two new options
> > > > > XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> > > > > corresponding XDP netlink install flags.
> > > > >
> > > > > The builtin XDP program functionally adds even more complexity to the
> > > > > already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> > > > > store the program in the struct net_device together with the install
> > > > > flags instead of calling the ndo_bpf multiple times?  
> > > >
> > > > (As far as I can see from reading the code, correct me if I'm wrong.)
> > > >
> > > > If an AF_XDP program uses XDP_ATTACH, then it installs the
> > > > builtin-program as the XDP program on the "entire" device.  That means
> > > > all RX-queues will call this XDP-bpf program (indirect call), and it is
> > > > actually only relevant for the specific queue_index.  Yes, the helper
> > > > call does check that the 'xdp->rxq->queue_index' for an attached 'xsk'
> > > > and return XDP_PASS if it is NULL:
> > > >  
> > >
> > > Yes, you are correct. The builtin XDP program, just like a regular XDP
> > > program, affects the whole netdev. So, yes the non-AF_XDP queues would
> > > get a performance hit from this. Just to reiterate -- this isn't new
> > > for this series. This has always been the case for XDP when acting on
> > > just one queue.
> > >  
> > > > +BPF_CALL_1(bpf_xdp_xsk_redirect, struct xdp_buff *, xdp)
> > > > +{
> > > > +       struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> > > > +       struct xdp_sock *xsk;
> > > > +
> > > > +       xsk = READ_ONCE(xdp->rxq->dev->_rx[xdp->rxq->queue_index].xsk);
> > > > +       if (xsk) {
> > > > +               ri->xsk = xsk;
> > > > +               return XDP_REDIRECT;
> > > > +       }
> > > > +
> > > > +       return XDP_PASS;
> > > > +}
> > > >
> > > > Why do every normal XDP_PASS packet have to pay this overhead (indirect
> > > > call), when someone loads an AF_XDP socket program?  The AF_XDP socket
> > > > is tied hard and only relevant to a specific RX-queue (which is why we
> > > > get a performance boost due to SPSC queues).
> > > >
> > > > I acknowledge there is a need for this, but this use-case shows there
> > > > is a need for attaching XDP programs per RX-queue basis.
> > > >  
> > >
> > > From my AF_XDP perspective, having a program per queue would make
> > > sense. The discussion of a per-queue has been up before, and I think
> > > the conclusion was that it would be too complex from a
> > > configuration/tooling point-of-view. Again, for AF_XDP this would be
> > > great.  
> >
> > I really think it is about time that we introduce these per-queue XDP
> > programs.  From a configuration/tooling point-of-view, your proposed
> > solution have the same issues, we have to solve. (Like listing
> > introspection need to show the per-queue XDP/BPF programs). And you have
> > almost solved it, by keeping the per-queue program in net_device
> > struct.
> >  
> 
> I just to emphasize; My series is only a convenience method for
> loading a (sticky/hierarchical) XDP program and a new bpf-function.
> Nothing more. I'm not sure what you mean when you say that "proposed
> solution have the same issues". Do you mean XDP in general?
> 
> *I'm* all in for a per-queue XDP program -- but I might be biased due
> to AF_XDP :-P Let's explore in this thread what that would look like
> and the semantics of that!
>

Yes, please! (p.s. I'm going on vacation next week, which is
unfortunate as I really want to find a good semantics for this).


> > Alexei already requested more types of builtin programs.  But as the
> > XDP-driver API can only load XDP for the entire NIC, then how can
> > you use two builtin programs at the same time?
> >  
> 
> You can't. Again, this is just another way of loading an XDP program.
> If socket A loads a builtin B on dev C, and socket X tries to load a
> builtin Y on dev C this will fail/clash. This is not different from
> today. My view on builtin has been "an XDP program provided by the
> kernel". A related problem is if socket A attaches with XDP_DRV and
> socket B attaches with XDP_SKB on the same netdev. (Ick, this is the
> messy parts of XDP. :-()

This again show we need per-RX-queue XDP programs.
Andrew Lunn Dec. 8, 2018, 8:48 p.m. UTC | #14
> That is not how XDP works. XDP must run on the "master" physical driver
> interface, as the "slave" interface is a virtual DSA interface.  I did
> mention DSA because I'm handling that on the EspressoBin implementation.

Hi Jesper

It is going to be interesting to see how you do that.

As a user, i want to attach the XDP program to a slave interface. So i
assume you somehow redirect that to the master, with some extra meta
data to allow XDP on the master to detect packets for a specific
slave?

   Andrew
Björn Töpel Dec. 10, 2018, 2:01 p.m. UTC | #15
Den fre 7 dec. 2018 kl 12:44 skrev Björn Töpel <bjorn.topel@gmail.com>:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Hi!
>
> This patch set adds support for a new XDP socket bind option,
> XDP_ATTACH.
>
> The rationale behind attach is performance and ease of use. Many XDP
> socket users just need a simple way of creating/binding a socket and
> receiving frames right away without loading an XDP program.
>
> XDP_ATTACH adds a mechanism we call "builtin XDP program" that simply
> is a kernel provided XDP program that is installed to the netdev when
> XDP_ATTACH is being passed as a bind() flag.
>
> The builtin program is the simplest program possible to redirect a
> frame to an attached socket. In restricted C it would look like this:
>
>   SEC("xdp")
>   int xdp_prog(struct xdp_md *ctx)
>   {
>         return bpf_xsk_redirect(ctx);
>   }
>
> The builtin program loaded via XDP_ATTACH behaves, from an
> install-to-netdev/uninstall-from-netdev point of view, differently
> from regular XDP programs. The easiest way to look at it is as a
> 2-level hierarchy, where regular XDP programs has precedence over the
> builtin one.
>
> If no regular XDP program is installed to the netdev, the builtin will
> be install. If the builtin program is installed, and a regular is
> installed, regular XDP program will have precedence over the builtin
> one.
>
> Further, if a regular program is installed, and later removed, the
> builtin one will automatically be installed.
>
> The sxdp_flags field of struct sockaddr_xdp gets two new options
> XDP_BUILTIN_SKB_MODE and XDP_BUILTIN_DRV_MODE, which maps to the
> corresponding XDP netlink install flags.
>
> The builtin XDP program functionally adds even more complexity to the
> already hard to read dev_change_xdp_fd. Maybe it would be simpler to
> store the program in the struct net_device together with the install
> flags instead of calling the ndo_bpf multiple times?
>
> The outline of the series is as following:
>   patch 1-2: Introduce the first part of XDP_ATTACH, simply adding
>              the socket to the netdev structure.
>   patch 3:   Add a new BPF function, bpf_xsk_redirect, that
>              redirects a frame to an attached socket.
>   patch 4-5: Preparatory commits for built in BPF programs
>   patch 6:   Make XDP_ATTACH load a builtin XDP program
>   patch 7:   Extend the samples application with XDP_ATTACH
>              support
>
> Patch 1 through 3 gives the performance boost and make it possible to
> use AF_XDP sockets without an XSKMAP, but still requires an explicit
> XDP program to be loaded.
>
> Patch 4 through 6 make it possible to use XDP socket without explictly
> loading an XDP program.
>
> The performance numbers for rxdrop (Intel(R) Xeon(R) Gold 6154 CPU @
> 3.00GHz):
>
> XDP_SKB:
> XSKMAP:     2.8 Mpps
> XDP_ATTACH: 2.9 Mpps
>
> XDP_DRV - copy:
> XSKMAP:     8.5 Mpps
> XDP_ATTACH: 9.3 Mpps
>
> XDP_DRV - zero-copy:
> XSKMAP:     15.1 Mpps
> XDP_ATTACH: 17.3 Mpps
>
> Thanks!
> Björn
>

Firstly, thanks for all the feedback on the series.

I will do a respin of the series, but without the builtin
functionality, IOW only the XDP_ATTACH bind() option to assign an XDP
socket to a netdev Rx queue, and the corresponding bpf_xsk_redirect
(patch 1 to 3). This will give a perfomance boost, and also requires a
less complicated XDP program.

The builtin/sticky XDP program has too many down-sides:

* Explicit BPF instructions in the kernel.
* Composability: mix regular BPF programs with builtin ones.
* The sticky XDP program is not intuitive for a user, e.g. "ip link
  dev eth0 xdp off" will not disable the sticky one
* More complex XDP install code.

We'll this route instead: Introduce XDP_ATTACH/bpf_xsk_redirect (as
above) and let libbpf load the trivial AF_XDP program.


Björn

>
> Björn Töpel (7):
>   xsk: simplify AF_XDP socket teardown
>   xsk: add XDP_ATTACH bind() flag
>   bpf: add bpf_xsk_redirect function
>   bpf: prepare for builtin bpf program
>   bpf: add function to load builtin BPF program
>   xsk: load a builtin XDP program on XDP_ATTACH
>   samples: bpf: add support for XDP_ATTACH to xdpsock
>
>  include/linux/bpf.h         |   2 +
>  include/linux/filter.h      |   4 +
>  include/linux/netdevice.h   |  11 +++
>  include/net/xdp_sock.h      |   2 +
>  include/trace/events/xdp.h  |  61 +++++++++++++++
>  include/uapi/linux/bpf.h    |  14 +++-
>  include/uapi/linux/if_xdp.h |   9 ++-
>  kernel/bpf/syscall.c        |  91 ++++++++++++++--------
>  net/core/dev.c              |  84 +++++++++++++++++++--
>  net/core/filter.c           | 100 ++++++++++++++++++++++++
>  net/xdp/xsk.c               | 146 +++++++++++++++++++++++++++++-------
>  samples/bpf/xdpsock_user.c  | 108 ++++++++++++++++----------
>  12 files changed, 524 insertions(+), 108 deletions(-)
>
> --
> 2.19.1
>