Message ID | 1398223137-5463-1-git-send-email-ast@plumgrid.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Alexei Starovoitov <ast@plumgrid.com> Date: Tue, 22 Apr 2014 20:18:57 -0700 > exisiting BPF verifier allows uninitialized access to registers, > 'ret A' is considered to be a valid filter. > So initialize A and X to zero to prevent leaking kernel memory > In the future BPF verifier will be rejecting such filters > > Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Has the code always been like this? Did the eBPF changes introduce this problem either directly or indirectly? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 22, 2014 at 8:57 PM, David Miller <davem@davemloft.net> wrote: > From: Alexei Starovoitov <ast@plumgrid.com> > Date: Tue, 22 Apr 2014 20:18:57 -0700 > >> exisiting BPF verifier allows uninitialized access to registers, >> 'ret A' is considered to be a valid filter. >> So initialize A and X to zero to prevent leaking kernel memory >> In the future BPF verifier will be rejecting such filters >> >> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> > > Has the code always been like this? the previous interpreter had: unsigned int sk_run_filter(const struct sk_buff *skb, const struct sock_filter *fentry) { void *ptr; u32 A = 0; /* Accumulator */ u32 X = 0; /* Index Register */ ... so it wasn't affected. > Did the eBPF changes introduce this problem either directly or > indirectly? this bug is an oversight on my side. I believe in the net-next it should be fixed by making verifier smarter instead of wasting run time cycles to initialize regs. For now the same fix is needed for both net and net-next. We can remove extra assignments when verifier becomes smarter. ebpf verifier that I posted earlier had checks for uninitialized regs and stack. I missed lack of uninit regs part in classic verifier when ebpf verifier and jit were dropped from the patch set. Sorry about this slip. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-04-22 at 23:57 -0400, David Miller wrote: > From: Alexei Starovoitov <ast@plumgrid.com> > Date: Tue, 22 Apr 2014 20:18:57 -0700 > > > exisiting BPF verifier allows uninitialized access to registers, > > 'ret A' is considered to be a valid filter. > > So initialize A and X to zero to prevent leaking kernel memory > > In the future BPF verifier will be rejecting such filters > > > > Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> > > Has the code always been like this? > > Did the eBPF changes introduce this problem either directly or > indirectly? Original code was fine AFAIK Fixes: bd4cf0ed331a2 ("net: filter: rework/optimize internal BPF interpreter's instruction set") David, is it possible for you to push net-next tree ? Thank ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/23/2014 06:59 AM, Alexei Starovoitov wrote: > On Tue, Apr 22, 2014 at 8:57 PM, David Miller <davem@davemloft.net> wrote: >> From: Alexei Starovoitov <ast@plumgrid.com> >> Date: Tue, 22 Apr 2014 20:18:57 -0700 >> >>> exisiting BPF verifier allows uninitialized access to registers, >>> 'ret A' is considered to be a valid filter. >>> So initialize A and X to zero to prevent leaking kernel memory >>> In the future BPF verifier will be rejecting such filters >>> >>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> >> >> Has the code always been like this? I think it would be much cleaner to just prevent such filters that only contain a 'ret A', or 'ret X' w/o a load into X from attaching. By that, we would need to do slightly more work on the verifier side (which seemed to allow such cases since ever), but therefore can spare us the extra work in fast path. Only 'ret A' doesn't make sense, and similarly a program that never loads anything into X, but wants to return it. We already did a similar thing for the scratch memory store where we checked stores into M[] and loads from M[] into A. I can take a look if you want ... I have some pending BPF cleanups anyway from before the merge window. > the previous interpreter had: > unsigned int sk_run_filter(const struct sk_buff *skb, > const struct sock_filter *fentry) > { > void *ptr; > u32 A = 0; /* Accumulator */ > u32 X = 0; /* Index Register */ > ... > so it wasn't affected. > >> Did the eBPF changes introduce this problem either directly or >> indirectly? > > this bug is an oversight on my side. > I believe in the net-next it should be fixed by making verifier smarter > instead of wasting run time cycles to initialize regs. > For now the same fix is needed for both net and net-next. > We can remove extra assignments when verifier becomes smarter. > ebpf verifier that I posted earlier had checks for uninitialized regs and stack. > I missed lack of uninit regs part in classic verifier when ebpf verifier and jit > were dropped from the patch set. > Sorry about this slip. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/23/2014 05:18 AM, Alexei Starovoitov wrote: > exisiting BPF verifier allows uninitialized access to registers, > 'ret A' is considered to be a valid filter. > So initialize A and X to zero to prevent leaking kernel memory > In the future BPF verifier will be rejecting such filters > > Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> > Cc: Daniel Borkmann <dborkman@redhat.com> I gave it some more thought, it's actually more than just 'ret A' from your description, user programs could be used to generate code like 'add X' or 'add #42' instead of a txa or load instruction [where A is previously uninitialized], as it was always initialized to 0 before. We prevent to do a 'ret X', but X could be uninitialized and then transferred over to taint A etc. So yeah, this patch is the most simple way to prevent that w/o huge complexity. Therefore, it's better this way: Acked-by: Daniel Borkmann <dborkman@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/23/2014 07:13 AM, Eric Dumazet wrote: > On Tue, 2014-04-22 at 23:57 -0400, David Miller wrote: >> From: Alexei Starovoitov <ast@plumgrid.com> >> Date: Tue, 22 Apr 2014 20:18:57 -0700 >> >>> exisiting BPF verifier allows uninitialized access to registers, >>> 'ret A' is considered to be a valid filter. >>> So initialize A and X to zero to prevent leaking kernel memory >>> In the future BPF verifier will be rejecting such filters >>> >>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> >> >> Has the code always been like this? >> >> Did the eBPF changes introduce this problem either directly or >> indirectly? > > Original code was fine AFAIK Yep. > Fixes: bd4cf0ed331a2 ("net: filter: rework/optimize internal BPF interpreter's instruction set") > > David, is it possible for you to push net-next tree ? I think this would need to go into net tree first and then for the upcoming changes, net would need to be merged into net-next. This will create a minor merge conflict with the BPF prandom extension; fix is to remove the defines near __get_random_u32(), as they will be above __sk_run_filter() already. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-04-23 at 13:45 +0200, Daniel Borkmann wrote: > On 04/23/2014 07:13 AM, Eric Dumazet wrote: > > > > David, is it possible for you to push net-next tree ? > > I think this would need to go into net tree first and then > for the upcoming changes, net would need to be merged into > net-next. This will create a minor merge conflict with the > BPF prandom extension; fix is to remove the defines near > __get_random_u32(), as they will be above __sk_run_filter() > already. My suggestion to push net-next was not related to the bpf patch. I needed net-next being fresh for a backport into Google kernels. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 23, 2014 at 12:53 AM, Daniel Borkmann <dborkman@redhat.com> wrote: > On 04/23/2014 05:18 AM, Alexei Starovoitov wrote: >> >> exisiting BPF verifier allows uninitialized access to registers, >> 'ret A' is considered to be a valid filter. >> So initialize A and X to zero to prevent leaking kernel memory >> In the future BPF verifier will be rejecting such filters >> >> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> >> Cc: Daniel Borkmann <dborkman@redhat.com> > > > I gave it some more thought, it's actually more than just 'ret A' > from your description, user programs could be used to generate code > like 'add X' or 'add #42' instead of a txa or load instruction [where > A is previously uninitialized], as it was always initialized to 0 > before. We prevent to do a 'ret X', but X could be uninitialized and > then transferred over to taint A etc. So yeah, this patch is the most > simple way to prevent that w/o huge complexity. Therefore, it's > better this way: > > Acked-by: Daniel Borkmann <dborkman@redhat.com> Thanks. the uninitialized checking is indeed not as simple as rejecting 'ret A'. verifier needs to consider data flow between registers based on control graph. Just linear scan won't be correct. Also for ebpf there are 10+1 registers to consider, so ebpf verifier will take care of both (classic and extended) at once. classic verifier can stay as-is and eventually removed. For example after 'bpf_call' insn the registers r1-r5 are scratched, so they cannot be read, r10 is frame pointer and is read only, etc. will prepare ebpf verifier patch asap. the steps will be: - classic verifier (as-is today) - bpf->ebpf converter (as-is today) - ebpf verifier after insane amount of testing we will be able to remove classic verifier, since ebpf verifier will be doing it more thorough. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Alexei Starovoitov <ast@plumgrid.com> Date: Tue, 22 Apr 2014 21:59:32 -0700 > On Tue, Apr 22, 2014 at 8:57 PM, David Miller <davem@davemloft.net> wrote: >> From: Alexei Starovoitov <ast@plumgrid.com> >> Date: Tue, 22 Apr 2014 20:18:57 -0700 >> >>> exisiting BPF verifier allows uninitialized access to registers, >>> 'ret A' is considered to be a valid filter. >>> So initialize A and X to zero to prevent leaking kernel memory >>> In the future BPF verifier will be rejecting such filters >>> >>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> >> >> Has the code always been like this? > > the previous interpreter had: > unsigned int sk_run_filter(const struct sk_buff *skb, > const struct sock_filter *fentry) > { > void *ptr; > u32 A = 0; /* Accumulator */ > u32 X = 0; /* Index Register */ > ... > so it wasn't affected. > >> Did the eBPF changes introduce this problem either directly or >> indirectly? > > this bug is an oversight on my side. > I believe in the net-next it should be fixed by making verifier smarter > instead of wasting run time cycles to initialize regs. > For now the same fix is needed for both net and net-next. > We can remove extra assignments when verifier becomes smarter. > ebpf verifier that I posted earlier had checks for uninitialized regs and stack. > I missed lack of uninit regs part in classic verifier when ebpf verifier and jit > were dropped from the patch set. I think you will need to add the initializations during validation rather than fail because existing BPF filters would be accepted and run. You can't just make them fail unexpectedly after all of these years. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 22 Apr 2014 22:13:00 -0700 > On Tue, 2014-04-22 at 23:57 -0400, David Miller wrote: >> From: Alexei Starovoitov <ast@plumgrid.com> >> Date: Tue, 22 Apr 2014 20:18:57 -0700 >> >> > exisiting BPF verifier allows uninitialized access to registers, >> > 'ret A' is considered to be a valid filter. >> > So initialize A and X to zero to prevent leaking kernel memory >> > In the future BPF verifier will be rejecting such filters >> > >> > Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> >> >> Has the code always been like this? >> >> Did the eBPF changes introduce this problem either directly or >> indirectly? > > Original code was fine AFAIK > > Fixes: bd4cf0ed331a2 ("net: filter: rework/optimize internal BPF interpreter's instruction set") > > David, is it possible for you to push net-next tree ? What exactly are you asking me to do? Put this patch in the net-next tree? Or are you asking me to merge net into net-next after I apply it? It's definitely a 'net' patch. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Daniel Borkmann <dborkman@redhat.com> Date: Wed, 23 Apr 2014 09:02:54 +0200 > On 04/23/2014 06:59 AM, Alexei Starovoitov wrote: >> On Tue, Apr 22, 2014 at 8:57 PM, David Miller <davem@davemloft.net> >> wrote: >>> From: Alexei Starovoitov <ast@plumgrid.com> >>> Date: Tue, 22 Apr 2014 20:18:57 -0700 >>> >>>> exisiting BPF verifier allows uninitialized access to registers, >>>> 'ret A' is considered to be a valid filter. >>>> So initialize A and X to zero to prevent leaking kernel memory >>>> In the future BPF verifier will be rejecting such filters >>>> >>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> >>> >>> Has the code always been like this? > > I think it would be much cleaner to just prevent such filters that > only contain a 'ret A', or 'ret X' w/o a load into X from attaching. That choice had to be done two decades ago, not now. We have to accept such filters, we always have. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/23/2014 06:51 PM, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 22 Apr 2014 22:13:00 -0700 > >> On Tue, 2014-04-22 at 23:57 -0400, David Miller wrote: >>> From: Alexei Starovoitov <ast@plumgrid.com> >>> Date: Tue, 22 Apr 2014 20:18:57 -0700 >>> >>>> exisiting BPF verifier allows uninitialized access to registers, >>>> 'ret A' is considered to be a valid filter. >>>> So initialize A and X to zero to prevent leaking kernel memory >>>> In the future BPF verifier will be rejecting such filters >>>> >>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> >>> >>> Has the code always been like this? >>> >>> Did the eBPF changes introduce this problem either directly or >>> indirectly? >> >> Original code was fine AFAIK >> >> Fixes: bd4cf0ed331a2 ("net: filter: rework/optimize internal BPF interpreter's instruction set") >> >> David, is it possible for you to push net-next tree ? > > What exactly are you asking me to do? Put this patch in the net-next tree? > Or are you asking me to merge net into net-next after I apply it? > > It's definitely a 'net' patch. I think Eric already clarified it in [1]. It's definitely against net tree. From my side, it would be awesome, if you could put this into net and then merge net into net-next as I have some pending stuff on top of this fix. That would at least avoid a bigger merge conflict later on. Thanks a lot, Dave. [1] http://patchwork.ozlabs.org/patch/341693/, April 23, 2014, 1:39 p.m. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-04-23 at 12:51 -0400, David Miller wrote: > > David, is it possible for you to push net-next tree ? > > What exactly are you asking me to do? Put this patch in the net-next tree? > Or are you asking me to merge net into net-next after I apply it? > > It's definitely a 'net' patch. This net-next push request had nothing to do with this bpf patch, sorry for the confusion. I even missed a s to the 'Thanks', you see ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/23/2014 06:52 PM, David Miller wrote: ... >> I think it would be much cleaner to just prevent such filters that >> only contain a 'ret A', or 'ret X' w/o a load into X from attaching. > > That choice had to be done two decades ago, not now. > > We have to accept such filters, we always have. Yep, very true. :/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Alexei Starovoitov <ast@plumgrid.com> Date: Tue, 22 Apr 2014 20:18:57 -0700 > exisiting BPF verifier allows uninitialized access to registers, > 'ret A' is considered to be a valid filter. > So initialize A and X to zero to prevent leaking kernel memory > In the future BPF verifier will be rejecting such filters > > Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 23, 2014 at 9:50 AM, David Miller <davem@davemloft.net> wrote: > From: Alexei Starovoitov <ast@plumgrid.com> > Date: Tue, 22 Apr 2014 21:59:32 -0700 > >> On Tue, Apr 22, 2014 at 8:57 PM, David Miller <davem@davemloft.net> wrote: >>> From: Alexei Starovoitov <ast@plumgrid.com> >>> Date: Tue, 22 Apr 2014 20:18:57 -0700 >>> >>>> exisiting BPF verifier allows uninitialized access to registers, >>>> 'ret A' is considered to be a valid filter. >>>> So initialize A and X to zero to prevent leaking kernel memory >>>> In the future BPF verifier will be rejecting such filters >>>> >>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> >>> >>> Has the code always been like this? >> >> the previous interpreter had: >> unsigned int sk_run_filter(const struct sk_buff *skb, >> const struct sock_filter *fentry) >> { >> void *ptr; >> u32 A = 0; /* Accumulator */ >> u32 X = 0; /* Index Register */ >> ... >> so it wasn't affected. >> >>> Did the eBPF changes introduce this problem either directly or >>> indirectly? >> >> this bug is an oversight on my side. >> I believe in the net-next it should be fixed by making verifier smarter >> instead of wasting run time cycles to initialize regs. >> For now the same fix is needed for both net and net-next. >> We can remove extra assignments when verifier becomes smarter. >> ebpf verifier that I posted earlier had checks for uninitialized regs and stack. >> I missed lack of uninit regs part in classic verifier when ebpf verifier and jit >> were dropped from the patch set. > > I think you will need to add the initializations during validation > rather than fail because existing BPF filters would be accepted and > run. You can't just make them fail unexpectedly after all of these > years. In this particular case we can indeed make verifier smarter to the point that once it detects uninitialized A/X in a filter, it will add explicit init after conversion, so we don't have to pay the penalty at run time. It's a very good suggestion. At the same time I think 'ret A' falls into category of the 20 year old bug. The filters that using undocumented behavior are destined to be broken sooner or later. tcpdump doesn't generate such filters. Valid filters will not be affected by strict checking, but I think users will be actually happy to see kernel rejecting their buggy filters. I think we can have a global flag for bpf verifier whether to reject such filters or accept them by adding explicit inits under 'bpf_permissive_loading' flag. Thoughts? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 23 Apr 2014 06:39:04 -0700 > I needed net-next being fresh for a backport into Google kernels. I try to have this merge taken care of by the end of today. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-04-23 at 13:38 -0700, Alexei Starovoitov wrote: > In this particular case we can indeed make verifier smarter to the point > that once it detects uninitialized A/X in a filter, it will add explicit init > after conversion, so we don't have to pay the penalty at run time. > It's a very good suggestion. > > At the same time I think 'ret A' falls into category of the 20 year old bug. > The filters that using undocumented behavior are destined to be > broken sooner or later. tcpdump doesn't generate such filters. > Valid filters will not be affected by strict checking, but I think users > will be actually happy to see kernel rejecting their buggy filters. > I think we can have a global flag for bpf verifier whether to reject > such filters or accept them by adding explicit inits under > 'bpf_permissive_loading' flag. > Thoughts? I do not really understand the concerns and this discussion over this bug. BPF JIT does basic checks and forces A=0 if first instruction doesn't set A. ( arch/x86/net/bpf_jit_comp.c line 263 ) X is forced to 0 anyway if X is ever used in the JITed filter. ( line 227, and we have a typo in comment : s/leek/leak/ ) This is not a big deal. If you want to 'optimize', fine, but don't break programs. There is no way you add a 'flag' to break a user program, even if _you_ believe its broken. Maybe it assumed initial X / A contents were 0. Even if not documented, we cannot break this 20 years after the facts. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 23, 2014 at 2:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2014-04-23 at 13:38 -0700, Alexei Starovoitov wrote: > >> In this particular case we can indeed make verifier smarter to the point >> that once it detects uninitialized A/X in a filter, it will add explicit init >> after conversion, so we don't have to pay the penalty at run time. >> It's a very good suggestion. >> >> At the same time I think 'ret A' falls into category of the 20 year old bug. >> The filters that using undocumented behavior are destined to be >> broken sooner or later. tcpdump doesn't generate such filters. >> Valid filters will not be affected by strict checking, but I think users >> will be actually happy to see kernel rejecting their buggy filters. >> I think we can have a global flag for bpf verifier whether to reject >> such filters or accept them by adding explicit inits under >> 'bpf_permissive_loading' flag. >> Thoughts? > > > I do not really understand the concerns and this discussion over this > bug. > > BPF JIT does basic checks and forces A=0 if first instruction doesn't > set A. > > ( arch/x86/net/bpf_jit_comp.c line 263 ) > > X is forced to 0 anyway if X is ever used in the JITed filter. > ( line 227, and we have a typo in comment : s/leek/leak/ ) yes, x86 jit is smart to deal around this problem in an efficient way. sparc/arm jits copy pasted the idea correctly. but s390 jit seems to have a bug, since it's not clearing A when 1st insn is BPF_S_LDX_B_MSH That shouldn't be a jit job, especially since it can be done by verifier. classic bpf has 2 regs, ebpf has 10. Checking 1st insn won't work for ebpf. > This is not a big deal. If you want to 'optimize', fine, but don't break > programs. Fine. I'm not saying to break them by default. I'm suggesting to give users a flag to detect invalid programs. > There is no way you add a 'flag' to break a user program, even > if _you_ believe its broken. > > Maybe it assumed initial X / A contents were 0. > > Even if not documented, we cannot break this 20 years after the facts. I'm not suggesting to have this flag by default, but don't you want kernel to say that bpf program is using uninitialized register? imo the more checking can be done the better user experience will be. Just like compiler warnings. Better to have an option to know what's wrong with the filter than not having it at all. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-04-23 at 15:19 -0700, Alexei Starovoitov wrote: > yes, x86 jit is smart to deal around this problem in an efficient way. > sparc/arm jits copy pasted the idea correctly. Well... not really smart, but sufficient enough (even if I am sure we probably have some bugs) > but s390 jit seems to have a bug, since it's not clearing A > when 1st insn is BPF_S_LDX_B_MSH Sure, JIT is hard and quite risky (You forgot to CC Heiko & Martin, they probably can send the needed patch) Its not yet enabled by default for this reason. I am worried that all recent bpf changes raised the bar so high that nobody but you can really understand what is going on, unless spending a _lot_ of time on it. net/core/filter.c was about 880 lines in previous release, its now 1800+ lines. netdev became a list discussing compiler technology, assembly code and ABI level stuff lately. Say I want to check why we have this code, potentially modifying arbitrary kernel memory : BPF_STX_BPF_XADD_BPF_W: /* lock xadd *(u32 *)(A + insn->off) += X */ atomic_add((u32) X, (atomic_t *)(unsigned long) (A + insn->off)); CONT; BPF_STX_BPF_XADD_BPF_DW: /* lock xadd *(u64 *)(A + insn->off) += X */ atomic64_add((u64) X, (atomic64_t *)(unsigned long) (A + insn->off)); CONT; No comments on this scary code, and I couldn't find why it is there. $ git grep -n2 BPF_XADD include/linux/filter.h-19-/* ld/ldx fields */ include/linux/filter.h-20-#define BPF_DW 0x18 /* double word */ include/linux/filter.h:21:#define BPF_XADD 0xc0 /* exclusive add */ include/linux/filter.h-22- include/linux/filter.h-23-/* alu/jmp fields */ -- net/core/filter.c-222- DL(BPF_STX, BPF_MEM, BPF_W), net/core/filter.c-223- DL(BPF_STX, BPF_MEM, BPF_DW), net/core/filter.c:224: DL(BPF_STX, BPF_XADD, BPF_W), net/core/filter.c:225: DL(BPF_STX, BPF_XADD, BPF_DW), net/core/filter.c-226- DL(BPF_ST, BPF_MEM, BPF_B), net/core/filter.c-227- DL(BPF_ST, BPF_MEM, BPF_H), -- net/core/filter.c-479- LDST(BPF_DW, u64) net/core/filter.c-480-#undef LDST net/core/filter.c:481: BPF_STX_BPF_XADD_BPF_W: /* lock xadd *(u32 *)(A + insn->off) += X */ net/core/filter.c-482- atomic_add((u32) X, (atomic_t *)(unsigned long) net/core/filter.c-483- (A + insn->off)); net/core/filter.c-484- CONT; net/core/filter.c:485: BPF_STX_BPF_XADD_BPF_DW: /* lock xadd *(u64 *)(A + insn->off) += X */ net/core/filter.c-486- atomic64_add((u64) X, (atomic64_t *)(unsigned long) net/core/filter.c-487- (A + insn->off)); This looks like premature inclusion, at very minimum. Sure, I probably can find answer reading past months netdev/lkml traffic ? Sorry if this looks like ranting, but I would rather complain before thousand of other lines of code are merged. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 23, 2014 at 7:55 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2014-04-23 at 15:19 -0700, Alexei Starovoitov wrote: > >> yes, x86 jit is smart to deal around this problem in an efficient way. >> sparc/arm jits copy pasted the idea correctly. > > Well... not really smart, but sufficient enough (even if I am sure we > probably have some bugs) > >> but s390 jit seems to have a bug, since it's not clearing A >> when 1st insn is BPF_S_LDX_B_MSH > > Sure, JIT is hard and quite risky (You forgot to CC Heiko & Martin, they > probably can send the needed patch) > > Its not yet enabled by default for this reason. > > I am worried that all recent bpf changes raised the bar so high that > nobody but you can really understand what is going on, unless spending > a _lot_ of time on it. I'm trying to explain it as much as I can. If you're willing to listen, I'm happy to answer any questions. > net/core/filter.c was about 880 lines in previous release, its now 1800+ > lines. but the whole sk_encode/decode functions are obsolete. Now we can remove some code too. > netdev became a list discussing compiler technology, > assembly code and ABI level stuff lately. > > Say I want to check why we have this code, potentially modifying > arbitrary kernel memory : > > BPF_STX_BPF_XADD_BPF_W: /* lock xadd *(u32 *)(A + insn->off) += X */ > atomic_add((u32) X, (atomic_t *)(unsigned long) > (A + insn->off)); > CONT; > BPF_STX_BPF_XADD_BPF_DW: /* lock xadd *(u64 *)(A + insn->off) += X */ > atomic64_add((u64) X, (atomic64_t *)(unsigned long) > (A + insn->off)); > CONT; > > > > No comments on this scary code, and I couldn't find why it is there. I'm sorry but this very list forced me to drop all the comments and docs around ebpf verifier that go into details how this is verified and why it is safe. I can repost the whole thing as one big series of patches or send the links to previous patches where approach to safety is explained. I thought the preference was to phase it in gradually. Smallest patch at a time. Don't break anything. That's what I'm trying to do. > $ git grep -n2 BPF_XADD > include/linux/filter.h-19-/* ld/ldx fields */ > include/linux/filter.h-20-#define BPF_DW 0x18 /* double word */ > include/linux/filter.h:21:#define BPF_XADD 0xc0 /* exclusive add */ > include/linux/filter.h-22- > include/linux/filter.h-23-/* alu/jmp fields */ > -- > net/core/filter.c-222- DL(BPF_STX, BPF_MEM, BPF_W), > net/core/filter.c-223- DL(BPF_STX, BPF_MEM, BPF_DW), > net/core/filter.c:224: DL(BPF_STX, BPF_XADD, BPF_W), > net/core/filter.c:225: DL(BPF_STX, BPF_XADD, BPF_DW), > net/core/filter.c-226- DL(BPF_ST, BPF_MEM, BPF_B), > net/core/filter.c-227- DL(BPF_ST, BPF_MEM, BPF_H), > -- > net/core/filter.c-479- LDST(BPF_DW, u64) > net/core/filter.c-480-#undef LDST > net/core/filter.c:481: BPF_STX_BPF_XADD_BPF_W: /* lock xadd *(u32 *)(A + insn->off) += X */ > net/core/filter.c-482- atomic_add((u32) X, (atomic_t *)(unsigned long) > net/core/filter.c-483- (A + insn->off)); > net/core/filter.c-484- CONT; > net/core/filter.c:485: BPF_STX_BPF_XADD_BPF_DW: /* lock xadd *(u64 *)(A + insn->off) += X */ > net/core/filter.c-486- atomic64_add((u64) X, (atomic64_t *)(unsigned long) > net/core/filter.c-487- (A + insn->off)); > > This looks like premature inclusion, at very minimum. > > Sure, I probably can find answer reading past months netdev/lkml traffic ? yes. it's more than one month. It was posted first back in September. ebpf architecture didn't change since then. Just new use cases were found. I thought to apply ebpf to ovs initially, but now it looks like tracing filters will benefit it more, so they took a priority. seccomp, nids, etc > Sorry if this looks like ranting, but I would rather complain before thousand > of other lines of code are merged. It's a valid concern. I need to share and explain more. Point taken. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 23 Apr 2014 19:55:09 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2014-04-23 at 15:19 -0700, Alexei Starovoitov wrote: > > > yes, x86 jit is smart to deal around this problem in an efficient way. > > sparc/arm jits copy pasted the idea correctly. > > Well... not really smart, but sufficient enough (even if I am sure we > probably have some bugs) > > > but s390 jit seems to have a bug, since it's not clearing A > > when 1st insn is BPF_S_LDX_B_MSH > > Sure, JIT is hard and quite risky (You forgot to CC Heiko & Martin, they > probably can send the needed patch) That should be a one-line patch which removes the BPF_S_LDX_B_MSH opcode from the switch statement in bpf_jit_noleaks.
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 23 Apr 2014 10:14:44 -0700 > This net-next push request had nothing to do with this bpf patch, sorry > for the confusion. I just did the merge, sorry for taking so long. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2014-04-24 at 13:24 -0400, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 23 Apr 2014 10:14:44 -0700 > > > This net-next push request had nothing to do with this bpf patch, sorry > > for the confusion. > > I just did the merge, sorry for taking so long. No problem, thanks David ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/24/2014 07:24 PM, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 23 Apr 2014 10:14:44 -0700 > >> This net-next push request had nothing to do with this bpf patch, sorry >> for the confusion. > > I just did the merge, sorry for taking so long. Great, thanks Dave! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/24/2014 05:22 AM, Alexei Starovoitov wrote: ... >> net/core/filter.c was about 880 lines in previous release, its now 1800+ >> lines. > > but the whole sk_encode/decode functions are obsolete. > Now we can remove some code too. Indeed, these can eventually be removed; naturally, a bigger portion of these changes go into the conversion functions into the new layout. The interpreter part itself is conceptually the same, seccomp as you might have noticed actually got a lot easier thanks to the BPF changes, and the new instruction layout is much easier to handle for JITs (i.e. the BPF extensions), which is ongoing work. Clearly, there are successive integration steps required. As such, also the documentation will get further improved for all that in the next days. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/filter.c b/net/core/filter.c index cd58614..9d79ca0 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -122,6 +122,13 @@ noinline u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) return 0; } +/* Register mappings for user programs. */ +#define A_REG 0 +#define X_REG 7 +#define TMP_REG 8 +#define ARG2_REG 2 +#define ARG3_REG 3 + /** * __sk_run_filter - run a filter on a given context * @ctx: buffer to run the filter on @@ -242,6 +249,8 @@ unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn) regs[FP_REG] = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; regs[ARG1_REG] = (u64) (unsigned long) ctx; + regs[A_REG] = 0; + regs[X_REG] = 0; select_insn: goto *jumptable[insn->code]; @@ -643,13 +652,6 @@ static u64 __get_raw_cpu_id(u64 ctx, u64 A, u64 X, u64 r4, u64 r5) return raw_smp_processor_id(); } -/* Register mappings for user programs. */ -#define A_REG 0 -#define X_REG 7 -#define TMP_REG 8 -#define ARG2_REG 2 -#define ARG3_REG 3 - static bool convert_bpf_extensions(struct sock_filter *fp, struct sock_filter_int **insnp) {
exisiting BPF verifier allows uninitialized access to registers, 'ret A' is considered to be a valid filter. So initialize A and X to zero to prevent leaking kernel memory In the future BPF verifier will be rejecting such filters Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Daniel Borkmann <dborkman@redhat.com> --- net/core/filter.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)