Message ID | 1439567427-19504-3-git-send-email-willemb@google.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On 8/14/15 8:50 AM, Willem de Bruijn wrote: > +static int fanout_set_data_ebpf(struct packet_fanout *f, char __user *data, > + unsigned int len) > +{ > + struct bpf_prog *new; > + u32 fd; > + > + if (len != sizeof(fd)) > + return -EINVAL; > + if (copy_from_user(&fd, data, len)) > + return -EFAULT; > + > + new = bpf_prog_get(fd); > + if (IS_ERR(new)) > + return PTR_ERR(new); > + > + __fanout_set_data_bpf(f, new); > + return 0; > +} all looks great except in the above the check: if (new->type != BPF_PROG_TYPE_SOCKET_FILTER) { bpf_prog_put(new); return -EINVAL; } is missing. Otherwise user will be able to attach programs of wrong types to fanout. Also instead of: #define PACKET_FANOUT_BPF 6 #define PACKET_FANOUT_EBPF 7 I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous. This is how bpf manpage distinguishes them. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 14, 2015 at 1:03 PM, Alexei Starovoitov <ast@plumgrid.com> wrote: > On 8/14/15 8:50 AM, Willem de Bruijn wrote: >> >> +static int fanout_set_data_ebpf(struct packet_fanout *f, char __user >> *data, >> + unsigned int len) >> +{ >> + struct bpf_prog *new; >> + u32 fd; >> + >> + if (len != sizeof(fd)) >> + return -EINVAL; >> + if (copy_from_user(&fd, data, len)) >> + return -EFAULT; >> + >> + new = bpf_prog_get(fd); >> + if (IS_ERR(new)) >> + return PTR_ERR(new); >> + >> + __fanout_set_data_bpf(f, new); >> + return 0; >> +} > > > all looks great except in the above the check: > if (new->type != BPF_PROG_TYPE_SOCKET_FILTER) { > bpf_prog_put(new); > return -EINVAL; > } > is missing. Otherwise user will be able to attach programs > of wrong types to fanout. Ai, good point! > Also instead of: > #define PACKET_FANOUT_BPF 6 > #define PACKET_FANOUT_EBPF 7 > > I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous. > This is how bpf manpage distinguishes them. > Sounds good. I'll make both changes in v2. Thanks for reviewing, Alexei. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ @Willem: RH email doesn't exist anymore, I took it out, otherwise every reply gets a bounce. ;) ] On 08/14/2015 07:03 PM, Alexei Starovoitov wrote: > On 8/14/15 8:50 AM, Willem de Bruijn wrote: ... > all looks great except in the above the check: > if (new->type != BPF_PROG_TYPE_SOCKET_FILTER) { > bpf_prog_put(new); > return -EINVAL; > } > is missing. Otherwise user will be able to attach programs > of wrong types to fanout. > > Also instead of: > #define PACKET_FANOUT_BPF 6 > #define PACKET_FANOUT_EBPF 7 > > I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous. > This is how bpf manpage distinguishes them. We have SO_ATTACH_FILTER and SO_ATTACH_BPF, could also be analogous for fanout, if we want to be consistent with the API? But C/E prefix seems okay too, how you want ... Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED), perhaps we should also apply it on fanout? Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> [ @Willem: RH email doesn't exist anymore, I took it out, otherwise > every reply gets a bounce. ;) ] Sorry for using the wrong address, Daniel. >> Also instead of: >> #define PACKET_FANOUT_BPF 6 >> #define PACKET_FANOUT_EBPF 7 >> >> I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous. >> This is how bpf manpage distinguishes them. > > We have SO_ATTACH_FILTER and SO_ATTACH_BPF, could also be > analogous for fanout, if we want to be consistent with the API? > > But C/E prefix seems okay too, how you want ... I don't feel very strongly, either. But CBPF/EBPF is a bit more descriptive, so let's do that. > Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED), > perhaps we should also apply it on fanout? Good point. With classic bpf, packet access control is fully enforced in per-socket filters, but playing with load balancing filters could allow an adversary to infer some information about the dropped packets*. With eBPF and maps, access is even more direct. Let's support locking of fanout filters in place. I intend to test the existing socket flag. No need to add a separate flag for the fanout group, as far as I can see. (*) I noticed that a similar unintended effect also causes the PACKET_FANOUT_LB selftest to be flaky: filters on the sockets ensure that the test only reads expected packets. But, all traffic makes it through packet_rcv_fanout. Packets that are later dropped by sk_filter have already incremented rr_cur. Worst case, with 2 sockets and each accepted packet interleaved with a dropped packet, all packets are queued on only one socket. Test flakiness is fixed, e.g., by running in a private network namespace. The implementation behavior may be unexpected in other, production, environments. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/14/2015 09:27 PM, Willem de Bruijn wrote: ... >> Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED), >> perhaps we should also apply it on fanout? > > Good point. With classic bpf, packet access control is fully > enforced in per-socket filters, but playing with load balancing > filters could allow an adversary to infer some information > about the dropped packets*. With eBPF and maps, access > is even more direct. Let's support locking of fanout filters in > place. Right, a process could share a map between the fanout lb filter and actual sk filter, i.e. to look up how much actually passed through on the later sk level filter, and use that information in addition for its lb decisions. > I intend to test the existing socket flag. No need to add a > separate flag for the fanout group, as far as I can see. Agreed, should be okay. Thanks Willem! > (*) I noticed that a similar unintended effect also causes the > PACKET_FANOUT_LB selftest to be flaky: filters on the > sockets ensure that the test only reads expected packets. > But, all traffic makes it through packet_rcv_fanout. Packets > that are later dropped by sk_filter have already incremented > rr_cur. Worst case, with 2 sockets and each accepted packet > interleaved with a dropped packet, all packets are queued on > only one socket. Test flakiness is fixed, e.g., by running in a > private network namespace. The implementation behavior > may be unexpected in other, production, environments. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 14, 2015 at 3:46 PM, Daniel Borkmann <daniel@iogearbox.net> wrote: > On 08/14/2015 09:27 PM, Willem de Bruijn wrote: > ... >>> >>> Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED), >>> perhaps we should also apply it on fanout? >> >> >> Good point. With classic bpf, packet access control is fully >> enforced in per-socket filters, but playing with load balancing >> filters could allow an adversary to infer some information >> about the dropped packets*. With eBPF and maps, access >> is even more direct. Let's support locking of fanout filters in >> place. > > > Right, a process could share a map between the fanout lb filter > and actual sk filter, i.e. to look up how much actually passed > through on the later sk level filter, and use that information > in addition for its lb decisions. > >> I intend to test the existing socket flag. No need to add a >> separate flag for the fanout group, as far as I can see. > > > Agreed, should be okay. Great. Thanks for the suggestion, Daniel! I'll send a v2 the three suggested changes in a minute. > > Thanks Willem! > >> (*) I noticed that a similar unintended effect also causes the >> PACKET_FANOUT_LB selftest to be flaky: filters on the >> sockets ensure that the test only reads expected packets. >> But, all traffic makes it through packet_rcv_fanout. Packets >> that are later dropped by sk_filter have already incremented >> rr_cur. Worst case, with 2 sockets and each accepted packet >> interleaved with a dropped packet, all packets are queued on >> only one socket. Test flakiness is fixed, e.g., by running in a >> private network namespace. The implementation behavior >> may be unexpected in other, production, environments. >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h index d41280a..daa0ddd 100644 --- a/include/uapi/linux/if_packet.h +++ b/include/uapi/linux/if_packet.h @@ -64,6 +64,7 @@ struct sockaddr_ll { #define PACKET_FANOUT_RND 4 #define PACKET_FANOUT_QM 5 #define PACKET_FANOUT_BPF 6 +#define PACKET_FANOUT_EBPF 7 #define PACKET_FANOUT_FLAG_ROLLOVER 0x1000 #define PACKET_FANOUT_FLAG_DEFRAG 0x8000 diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 80d68c9..6352b7d 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1472,6 +1472,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev, idx = fanout_demux_rollover(f, skb, 0, false, num); break; case PACKET_FANOUT_BPF: + case PACKET_FANOUT_EBPF: idx = fanout_demux_bpf(f, skb, num); break; } @@ -1529,6 +1530,7 @@ static void fanout_init_data(struct packet_fanout *f) atomic_set(&f->rr_cur, 0); break; case PACKET_FANOUT_BPF: + case PACKET_FANOUT_EBPF: RCU_INIT_POINTER(f->bpf_prog, NULL); break; } @@ -1569,12 +1571,33 @@ static int fanout_set_data_bpf(struct packet_fanout *f, char __user *data, return 0; } +static int fanout_set_data_ebpf(struct packet_fanout *f, char __user *data, + unsigned int len) +{ + struct bpf_prog *new; + u32 fd; + + if (len != sizeof(fd)) + return -EINVAL; + if (copy_from_user(&fd, data, len)) + return -EFAULT; + + new = bpf_prog_get(fd); + if (IS_ERR(new)) + return PTR_ERR(new); + + __fanout_set_data_bpf(f, new); + return 0; +} + static int fanout_set_data(struct packet_fanout *f, char __user *data, unsigned int len) { switch (f->type) { case PACKET_FANOUT_BPF: return fanout_set_data_bpf(f, data, len); + case PACKET_FANOUT_EBPF: + return fanout_set_data_ebpf(f, data, len); default: return -EINVAL; }; @@ -1584,6 +1607,7 @@ static void fanout_release_data(struct packet_fanout *f) { switch (f->type) { case PACKET_FANOUT_BPF: + case PACKET_FANOUT_EBPF: __fanout_set_data_bpf(f, NULL); }; } @@ -1606,6 +1630,7 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) case PACKET_FANOUT_RND: case PACKET_FANOUT_QM: case PACKET_FANOUT_BPF: + case PACKET_FANOUT_EBPF: break; default: return -EINVAL;