diff mbox

[net-next,2/4] packet: add eBPF fanout mode

Message ID 1439567427-19504-3-git-send-email-willemb@google.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Willem de Bruijn Aug. 14, 2015, 3:50 p.m. UTC
From: Willem de Bruijn <willemb@google.com>

Add a fanout mode that accepts an eBPF program to select a socket.

Update the internal eBPF program by passing to socket option
SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf().

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/uapi/linux/if_packet.h |  1 +
 net/packet/af_packet.c         | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

Comments

Alexei Starovoitov Aug. 14, 2015, 5:03 p.m. UTC | #1
On 8/14/15 8:50 AM, Willem de Bruijn wrote:
> +static int fanout_set_data_ebpf(struct packet_fanout *f, char __user *data,
> +				unsigned int len)
> +{
> +	struct bpf_prog *new;
> +	u32 fd;
> +
> +	if (len != sizeof(fd))
> +		return -EINVAL;
> +	if (copy_from_user(&fd, data, len))
> +		return -EFAULT;
> +
> +	new = bpf_prog_get(fd);
> +	if (IS_ERR(new))
> +		return PTR_ERR(new);
> +
> +	__fanout_set_data_bpf(f, new);
> +	return 0;
> +}

all looks great except in the above the check:
         if (new->type != BPF_PROG_TYPE_SOCKET_FILTER) {
                 bpf_prog_put(new);
                 return -EINVAL;
         }
is missing. Otherwise user will be able to attach programs
of wrong types to fanout.

Also instead of:
  #define PACKET_FANOUT_BPF		6
  #define PACKET_FANOUT_EBPF		7

I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous.
This is how bpf manpage distinguishes them.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willem de Bruijn Aug. 14, 2015, 6:47 p.m. UTC | #2
On Fri, Aug 14, 2015 at 1:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 8/14/15 8:50 AM, Willem de Bruijn wrote:
>>
>> +static int fanout_set_data_ebpf(struct packet_fanout *f, char __user
>> *data,
>> +                               unsigned int len)
>> +{
>> +       struct bpf_prog *new;
>> +       u32 fd;
>> +
>> +       if (len != sizeof(fd))
>> +               return -EINVAL;
>> +       if (copy_from_user(&fd, data, len))
>> +               return -EFAULT;
>> +
>> +       new = bpf_prog_get(fd);
>> +       if (IS_ERR(new))
>> +               return PTR_ERR(new);
>> +
>> +       __fanout_set_data_bpf(f, new);
>> +       return 0;
>> +}
>
>
> all looks great except in the above the check:
>         if (new->type != BPF_PROG_TYPE_SOCKET_FILTER) {
>                 bpf_prog_put(new);
>                 return -EINVAL;
>         }
> is missing. Otherwise user will be able to attach programs
> of wrong types to fanout.

Ai, good point!

> Also instead of:
>  #define PACKET_FANOUT_BPF              6
>  #define PACKET_FANOUT_EBPF             7
>
> I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous.
> This is how bpf manpage distinguishes them.
>

Sounds good. I'll make both changes in v2. Thanks for reviewing, Alexei.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Aug. 14, 2015, 7:01 p.m. UTC | #3
[ @Willem: RH email doesn't exist anymore, I took it out, otherwise
   every reply gets a bounce. ;) ]

On 08/14/2015 07:03 PM, Alexei Starovoitov wrote:
> On 8/14/15 8:50 AM, Willem de Bruijn wrote:
...
> all looks great except in the above the check:
>          if (new->type != BPF_PROG_TYPE_SOCKET_FILTER) {
>                  bpf_prog_put(new);
>                  return -EINVAL;
>          }
> is missing. Otherwise user will be able to attach programs
> of wrong types to fanout.
>
> Also instead of:
>   #define PACKET_FANOUT_BPF        6
>   #define PACKET_FANOUT_EBPF        7
>
> I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous.
> This is how bpf manpage distinguishes them.

We have SO_ATTACH_FILTER and SO_ATTACH_BPF, could also be
analogous for fanout, if we want to be consistent with the API?

But C/E prefix seems okay too, how you want ...

Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED),
perhaps we should also apply it on fanout?

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willem de Bruijn Aug. 14, 2015, 7:27 p.m. UTC | #4
> [ @Willem: RH email doesn't exist anymore, I took it out, otherwise
>   every reply gets a bounce. ;) ]

Sorry for using the wrong address, Daniel.

>> Also instead of:
>>   #define PACKET_FANOUT_BPF        6
>>   #define PACKET_FANOUT_EBPF        7
>>
>> I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous.
>> This is how bpf manpage distinguishes them.
>
> We have SO_ATTACH_FILTER and SO_ATTACH_BPF, could also be
> analogous for fanout, if we want to be consistent with the API?
>
> But C/E prefix seems okay too, how you want ...

I don't feel very strongly, either. But CBPF/EBPF is a bit more
descriptive, so let's do that.

> Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED),
> perhaps we should also apply it on fanout?

Good point. With classic bpf, packet access control is fully
enforced in per-socket filters, but playing with load balancing
filters could allow an adversary to infer some information
about the dropped packets*. With eBPF and maps, access
is even more direct. Let's support locking of fanout filters in
place.

I intend to test the existing socket flag. No need to add a
separate flag for the fanout group, as far as I can see.


(*) I noticed that a similar unintended effect also causes the
PACKET_FANOUT_LB selftest to be flaky: filters on the
sockets ensure that the test only reads expected packets.
But, all traffic makes it through packet_rcv_fanout. Packets
that are later dropped by sk_filter have already incremented
rr_cur. Worst case, with 2 sockets and each accepted packet
interleaved with a dropped packet, all packets are queued on
only one socket. Test flakiness is fixed, e.g., by running in a
private network namespace. The implementation behavior
may be unexpected in other, production, environments.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Aug. 14, 2015, 7:46 p.m. UTC | #5
On 08/14/2015 09:27 PM, Willem de Bruijn wrote:
...
>> Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED),
>> perhaps we should also apply it on fanout?
>
> Good point. With classic bpf, packet access control is fully
> enforced in per-socket filters, but playing with load balancing
> filters could allow an adversary to infer some information
> about the dropped packets*. With eBPF and maps, access
> is even more direct. Let's support locking of fanout filters in
> place.

Right, a process could share a map between the fanout lb filter
and actual sk filter, i.e. to look up how much actually passed
through on the later sk level filter, and use that information
in addition for its lb decisions.

> I intend to test the existing socket flag. No need to add a
> separate flag for the fanout group, as far as I can see.

Agreed, should be okay.

Thanks Willem!

> (*) I noticed that a similar unintended effect also causes the
> PACKET_FANOUT_LB selftest to be flaky: filters on the
> sockets ensure that the test only reads expected packets.
> But, all traffic makes it through packet_rcv_fanout. Packets
> that are later dropped by sk_filter have already incremented
> rr_cur. Worst case, with 2 sockets and each accepted packet
> interleaved with a dropped packet, all packets are queued on
> only one socket. Test flakiness is fixed, e.g., by running in a
> private network namespace. The implementation behavior
> may be unexpected in other, production, environments.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willem de Bruijn Aug. 15, 2015, 2:28 a.m. UTC | #6
On Fri, Aug 14, 2015 at 3:46 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 08/14/2015 09:27 PM, Willem de Bruijn wrote:
> ...
>>>
>>> Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED),
>>> perhaps we should also apply it on fanout?
>>
>>
>> Good point. With classic bpf, packet access control is fully
>> enforced in per-socket filters, but playing with load balancing
>> filters could allow an adversary to infer some information
>> about the dropped packets*. With eBPF and maps, access
>> is even more direct. Let's support locking of fanout filters in
>> place.
>
>
> Right, a process could share a map between the fanout lb filter
> and actual sk filter, i.e. to look up how much actually passed
> through on the later sk level filter, and use that information
> in addition for its lb decisions.
>
>> I intend to test the existing socket flag. No need to add a
>> separate flag for the fanout group, as far as I can see.
>
>
> Agreed, should be okay.

Great. Thanks for the suggestion, Daniel! I'll send a v2 the
three suggested changes in a minute.

>
> Thanks Willem!
>
>> (*) I noticed that a similar unintended effect also causes the
>> PACKET_FANOUT_LB selftest to be flaky: filters on the
>> sockets ensure that the test only reads expected packets.
>> But, all traffic makes it through packet_rcv_fanout. Packets
>> that are later dropped by sk_filter have already incremented
>> rr_cur. Worst case, with 2 sockets and each accepted packet
>> interleaved with a dropped packet, all packets are queued on
>> only one socket. Test flakiness is fixed, e.g., by running in a
>> private network namespace. The implementation behavior
>> may be unexpected in other, production, environments.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
index d41280a..daa0ddd 100644
--- a/include/uapi/linux/if_packet.h
+++ b/include/uapi/linux/if_packet.h
@@ -64,6 +64,7 @@  struct sockaddr_ll {
 #define PACKET_FANOUT_RND		4
 #define PACKET_FANOUT_QM		5
 #define PACKET_FANOUT_BPF		6
+#define PACKET_FANOUT_EBPF		7
 #define PACKET_FANOUT_FLAG_ROLLOVER	0x1000
 #define PACKET_FANOUT_FLAG_DEFRAG	0x8000
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 80d68c9..6352b7d 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1472,6 +1472,7 @@  static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
 		idx = fanout_demux_rollover(f, skb, 0, false, num);
 		break;
 	case PACKET_FANOUT_BPF:
+	case PACKET_FANOUT_EBPF:
 		idx = fanout_demux_bpf(f, skb, num);
 		break;
 	}
@@ -1529,6 +1530,7 @@  static void fanout_init_data(struct packet_fanout *f)
 		atomic_set(&f->rr_cur, 0);
 		break;
 	case PACKET_FANOUT_BPF:
+	case PACKET_FANOUT_EBPF:
 		RCU_INIT_POINTER(f->bpf_prog, NULL);
 		break;
 	}
@@ -1569,12 +1571,33 @@  static int fanout_set_data_bpf(struct packet_fanout *f, char __user *data,
 	return 0;
 }
 
+static int fanout_set_data_ebpf(struct packet_fanout *f, char __user *data,
+				unsigned int len)
+{
+	struct bpf_prog *new;
+	u32 fd;
+
+	if (len != sizeof(fd))
+		return -EINVAL;
+	if (copy_from_user(&fd, data, len))
+		return -EFAULT;
+
+	new = bpf_prog_get(fd);
+	if (IS_ERR(new))
+		return PTR_ERR(new);
+
+	__fanout_set_data_bpf(f, new);
+	return 0;
+}
+
 static int fanout_set_data(struct packet_fanout *f, char __user *data,
 			   unsigned int len)
 {
 	switch (f->type) {
 	case PACKET_FANOUT_BPF:
 		return fanout_set_data_bpf(f, data, len);
+	case PACKET_FANOUT_EBPF:
+		return fanout_set_data_ebpf(f, data, len);
 	default:
 		return -EINVAL;
 	};
@@ -1584,6 +1607,7 @@  static void fanout_release_data(struct packet_fanout *f)
 {
 	switch (f->type) {
 	case PACKET_FANOUT_BPF:
+	case PACKET_FANOUT_EBPF:
 		__fanout_set_data_bpf(f, NULL);
 	};
 }
@@ -1606,6 +1630,7 @@  static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 	case PACKET_FANOUT_RND:
 	case PACKET_FANOUT_QM:
 	case PACKET_FANOUT_BPF:
+	case PACKET_FANOUT_EBPF:
 		break;
 	default:
 		return -EINVAL;