diff mbox

[RFC,v2,2/5] net: add ndo to set bpf prog in adapter rx

Message ID 1460090930-11219-2-git-send-email-bblanco@plumgrid.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Brenden Blanco April 8, 2016, 4:48 a.m. UTC
Add two new set/get netdev ops for drivers implementing the
BPF_PROG_TYPE_PHYS_DEV filter.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
---
 include/linux/netdevice.h | 13 +++++++++++++
 net/core/dev.c            | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

Comments

Jesper Dangaard Brouer April 8, 2016, 9:38 a.m. UTC | #1
On Thu,  7 Apr 2016 21:48:47 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote:

> Add two new set/get netdev ops for drivers implementing the
> BPF_PROG_TYPE_PHYS_DEV filter.
> 
> Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
[...]
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index cb4e508..3acf732 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
[...]
> @@ -1102,6 +1103,14 @@ struct tc_to_netdev {
>   *	appropriate rx headroom value allows avoiding skb head copy on
>   *	forward. Setting a negative value resets the rx headroom to the
>   *	default value.
> + * int  (*ndo_bpf_set)(struct net_device *dev, struct bpf_prog *prog);
> + *	This function is used to set or clear a bpf program used in the
> + *	earliest stages of packet rx. The prog will have been loaded as
> + *	BPF_PROG_TYPE_PHYS_DEV. The callee is responsible for calling
> + *	bpf_prog_put on any old progs that are stored, but not on the passed
> + *	in prog.
> + * bool (*ndo_bpf_get)(struct net_device *dev);
> + *	This function is used to check if a bpf program is set on the device.
>   *

This interface for the entire device, right.  I can imagine that users
want to attach a eBPF program per RX queue.  Can we adapt the interface
to support this? (could default to adding it all queues)


I'm also wondering if we should add a "flags" parameter.  Or maybe we
can extend 'struct bpf_prog' with I have in mind.

When the eBPF program is attached to a RX queue, I want to know if the
program want to modify packet-data, up-front.

The problem is that most drivers use dma_sync, which means that data is
considered 'read-only' (the "considered" part depend on DMA engine, and
we might find a DMA loop-hole for some configs).
  If the program want to write, the driver have the option of
reconfiguring the driver routine to use dma_unmap, before handing over
the page.  Or driver can also choose to realloc the specific RX ring
queue pages as single pages (using dma_map/unmap consistently).
 This also allow us to give a return code indicating given driver does
not support writable packet-pages (rejecting program).
Brenden Blanco April 8, 2016, 4:39 p.m. UTC | #2
On Fri, Apr 08, 2016 at 11:38:58AM +0200, Jesper Dangaard Brouer wrote:
> 
> On Thu,  7 Apr 2016 21:48:47 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote:
> 
> > Add two new set/get netdev ops for drivers implementing the
> > BPF_PROG_TYPE_PHYS_DEV filter.
> > 
> > Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
> [...]
> > 
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index cb4e508..3acf732 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> [...]
> > @@ -1102,6 +1103,14 @@ struct tc_to_netdev {
> >   *	appropriate rx headroom value allows avoiding skb head copy on
> >   *	forward. Setting a negative value resets the rx headroom to the
> >   *	default value.
> > + * int  (*ndo_bpf_set)(struct net_device *dev, struct bpf_prog *prog);
> > + *	This function is used to set or clear a bpf program used in the
> > + *	earliest stages of packet rx. The prog will have been loaded as
> > + *	BPF_PROG_TYPE_PHYS_DEV. The callee is responsible for calling
> > + *	bpf_prog_put on any old progs that are stored, but not on the passed
> > + *	in prog.
> > + * bool (*ndo_bpf_get)(struct net_device *dev);
> > + *	This function is used to check if a bpf program is set on the device.
> >   *
> 
> This interface for the entire device, right.  I can imagine that users
> want to attach a eBPF program per RX queue.  Can we adapt the interface
> to support this? (could default to adding it all queues)
> 
Currently yes, for the entire device. I don't see rx queue exposed in
common setlink APIs. Wouldn't this be available only through ethtool,
generally? That would be a significant change to the api, but not a lot
of code. I would defer to others on which is cleaner. An alternative
could be to always run the program, but expose the queue number in
struct bpf_phys_dev_md. That is not as flexible since the program is
still shared, but maybe still useful.
> 
> I'm also wondering if we should add a "flags" parameter.  Or maybe we
> can extend 'struct bpf_prog' with I have in mind.
> 
> When the eBPF program is attached to a RX queue, I want to know if the
> program want to modify packet-data, up-front.
> 
> The problem is that most drivers use dma_sync, which means that data is
> considered 'read-only' (the "considered" part depend on DMA engine, and
> we might find a DMA loop-hole for some configs).
>   If the program want to write, the driver have the option of
> reconfiguring the driver routine to use dma_unmap, before handing over
> the page.  Or driver can also choose to realloc the specific RX ring
> queue pages as single pages (using dma_map/unmap consistently).
>  This also allow us to give a return code indicating given driver does
> not support writable packet-pages (rejecting program).
When write-mode is enabled for this prog type, we'll add the flag. I
don't want to add unused flags prematurely. When we add such support, it
should be available in the bpf_prog struct, similar to the cb_access or
dst_needed bool fields.
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb4e508..3acf732 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -61,6 +61,7 @@  struct wireless_dev;
 /* 802.15.4 specific */
 struct wpan_dev;
 struct mpls_dev;
+struct bpf_prog;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops);
@@ -1102,6 +1103,14 @@  struct tc_to_netdev {
  *	appropriate rx headroom value allows avoiding skb head copy on
  *	forward. Setting a negative value resets the rx headroom to the
  *	default value.
+ * int  (*ndo_bpf_set)(struct net_device *dev, struct bpf_prog *prog);
+ *	This function is used to set or clear a bpf program used in the
+ *	earliest stages of packet rx. The prog will have been loaded as
+ *	BPF_PROG_TYPE_PHYS_DEV. The callee is responsible for calling
+ *	bpf_prog_put on any old progs that are stored, but not on the passed
+ *	in prog.
+ * bool (*ndo_bpf_get)(struct net_device *dev);
+ *	This function is used to check if a bpf program is set on the device.
  *
  */
 struct net_device_ops {
@@ -1292,6 +1301,9 @@  struct net_device_ops {
 						       struct sk_buff *skb);
 	void			(*ndo_set_rx_headroom)(struct net_device *dev,
 						       int needed_headroom);
+	int			(*ndo_bpf_set)(struct net_device *dev,
+					       struct bpf_prog *prog);
+	bool			(*ndo_bpf_get)(struct net_device *dev);
 };
 
 /**
@@ -3251,6 +3263,7 @@  int dev_get_phys_port_id(struct net_device *dev,
 int dev_get_phys_port_name(struct net_device *dev,
 			   char *name, size_t len);
 int dev_change_proto_down(struct net_device *dev, bool proto_down);
+int dev_change_bpf_fd(struct net_device *dev, int fd);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
diff --git a/net/core/dev.c b/net/core/dev.c
index 273f10d..7cf749c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -94,6 +94,7 @@ 
 #include <linux/ethtool.h>
 #include <linux/notifier.h>
 #include <linux/skbuff.h>
+#include <linux/bpf.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
 #include <net/busy_poll.h>
@@ -6483,6 +6484,43 @@  int dev_change_proto_down(struct net_device *dev, bool proto_down)
 EXPORT_SYMBOL(dev_change_proto_down);
 
 /**
+ *	dev_change_bpf_fd - set or clear a bpf program for a device
+ *	@dev: device
+ *	@fd: new program fd or negative value to clear
+ *
+ *	Set or clear a bpf program for a device
+ */
+int dev_change_bpf_fd(struct net_device *dev, int fd)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct bpf_prog *prog = NULL;
+	int err;
+
+	if (!ops->ndo_bpf_set)
+		return -EOPNOTSUPP;
+	if (!netif_device_present(dev))
+		return -ENODEV;
+
+	if (fd >= 0) {
+		prog = bpf_prog_get(fd);
+		if (IS_ERR(prog))
+			return PTR_ERR(prog);
+
+		if (prog->type != BPF_PROG_TYPE_PHYS_DEV) {
+			bpf_prog_put(prog);
+			return -EINVAL;
+		}
+	}
+
+	err = ops->ndo_bpf_set(dev, prog);
+	if (err < 0 && prog)
+		bpf_prog_put(prog);
+
+	return err;
+}
+EXPORT_SYMBOL(dev_change_bpf_fd);
+
+/**
  *	dev_new_index	-	allocate an ifindex
  *	@net: the applicable net namespace
  *