[net] net: filter: initialize A and X registers

Message ID 1398223137-5463-1-git-send-email-ast@plumgrid.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Alexei Starovoitov April 23, 2014, 3:18 a.m.
exisiting BPF verifier allows uninitialized access to registers,
'ret A' is considered to be a valid filter.
So initialize A and X to zero to prevent leaking kernel memory
In the future BPF verifier will be rejecting such filters

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
---
 net/core/filter.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

Comments

David Miller April 23, 2014, 3:57 a.m. | #1
From: Alexei Starovoitov <ast@plumgrid.com>
Date: Tue, 22 Apr 2014 20:18:57 -0700

> exisiting BPF verifier allows uninitialized access to registers,
> 'ret A' is considered to be a valid filter.
> So initialize A and X to zero to prevent leaking kernel memory
> In the future BPF verifier will be rejecting such filters
> 
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>

Has the code always been like this?

Did the eBPF changes introduce this problem either directly or
indirectly?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 23, 2014, 4:59 a.m. | #2
On Tue, Apr 22, 2014 at 8:57 PM, David Miller <davem@davemloft.net> wrote:
> From: Alexei Starovoitov <ast@plumgrid.com>
> Date: Tue, 22 Apr 2014 20:18:57 -0700
>
>> exisiting BPF verifier allows uninitialized access to registers,
>> 'ret A' is considered to be a valid filter.
>> So initialize A and X to zero to prevent leaking kernel memory
>> In the future BPF verifier will be rejecting such filters
>>
>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>
> Has the code always been like this?

the previous interpreter had:
unsigned int sk_run_filter(const struct sk_buff *skb,
                           const struct sock_filter *fentry)
{
        void *ptr;
        u32 A = 0;                      /* Accumulator */
        u32 X = 0;                      /* Index Register */
...
so it wasn't affected.

> Did the eBPF changes introduce this problem either directly or
> indirectly?

this bug is an oversight on my side.
I believe in the net-next it should be fixed by making verifier smarter
instead of wasting run time cycles to initialize regs.
For now the same fix is needed for both net and net-next.
We can remove extra assignments when verifier becomes smarter.
ebpf verifier that I posted earlier had checks for uninitialized regs and stack.
I missed lack of uninit regs part in classic verifier when ebpf verifier and jit
were dropped from the patch set.
Sorry about this slip.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 23, 2014, 5:13 a.m. | #3
On Tue, 2014-04-22 at 23:57 -0400, David Miller wrote:
> From: Alexei Starovoitov <ast@plumgrid.com>
> Date: Tue, 22 Apr 2014 20:18:57 -0700
> 
> > exisiting BPF verifier allows uninitialized access to registers,
> > 'ret A' is considered to be a valid filter.
> > So initialize A and X to zero to prevent leaking kernel memory
> > In the future BPF verifier will be rejecting such filters
> > 
> > Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> 
> Has the code always been like this?
> 
> Did the eBPF changes introduce this problem either directly or
> indirectly?

Original code was fine AFAIK

Fixes: bd4cf0ed331a2 ("net: filter: rework/optimize internal BPF interpreter's instruction set")

David, is it possible for you to push net-next tree ?

Thank !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 23, 2014, 7:02 a.m. | #4
On 04/23/2014 06:59 AM, Alexei Starovoitov wrote:
> On Tue, Apr 22, 2014 at 8:57 PM, David Miller <davem@davemloft.net> wrote:
>> From: Alexei Starovoitov <ast@plumgrid.com>
>> Date: Tue, 22 Apr 2014 20:18:57 -0700
>>
>>> exisiting BPF verifier allows uninitialized access to registers,
>>> 'ret A' is considered to be a valid filter.
>>> So initialize A and X to zero to prevent leaking kernel memory
>>> In the future BPF verifier will be rejecting such filters
>>>
>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>>
>> Has the code always been like this?

I think it would be much cleaner to just prevent such filters that
only contain a 'ret A', or 'ret X' w/o a load into X from attaching.
By that, we would need to do slightly more work on the verifier
side (which seemed to allow such cases since ever), but therefore
can spare us the extra work in fast path.

Only 'ret A' doesn't make sense, and similarly a program that never
loads anything into X, but wants to return it. We already did a
similar thing for the scratch memory store where we checked stores
into M[] and loads from M[] into A. I can take a look if you want ...
I have some pending BPF cleanups anyway from before the merge window.

> the previous interpreter had:
> unsigned int sk_run_filter(const struct sk_buff *skb,
>                             const struct sock_filter *fentry)
> {
>          void *ptr;
>          u32 A = 0;                      /* Accumulator */
>          u32 X = 0;                      /* Index Register */
> ...
> so it wasn't affected.
>
>> Did the eBPF changes introduce this problem either directly or
>> indirectly?
>
> this bug is an oversight on my side.
> I believe in the net-next it should be fixed by making verifier smarter
> instead of wasting run time cycles to initialize regs.
> For now the same fix is needed for both net and net-next.
> We can remove extra assignments when verifier becomes smarter.
> ebpf verifier that I posted earlier had checks for uninitialized regs and stack.
> I missed lack of uninit regs part in classic verifier when ebpf verifier and jit
> were dropped from the patch set.
> Sorry about this slip.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 23, 2014, 7:53 a.m. | #5
On 04/23/2014 05:18 AM, Alexei Starovoitov wrote:
> exisiting BPF verifier allows uninitialized access to registers,
> 'ret A' is considered to be a valid filter.
> So initialize A and X to zero to prevent leaking kernel memory
> In the future BPF verifier will be rejecting such filters
>
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> Cc: Daniel Borkmann <dborkman@redhat.com>

I gave it some more thought, it's actually more than just 'ret A'
from your description, user programs could be used to generate code
like 'add X' or 'add #42' instead of a txa or load instruction [where
A is previously uninitialized], as it was always initialized to 0
before. We prevent to do a 'ret X', but X could be uninitialized and
then transferred over to taint A etc. So yeah, this patch is the most
simple way to prevent that w/o huge complexity. Therefore, it's
better this way:

Acked-by: Daniel Borkmann <dborkman@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 23, 2014, 11:45 a.m. | #6
On 04/23/2014 07:13 AM, Eric Dumazet wrote:
> On Tue, 2014-04-22 at 23:57 -0400, David Miller wrote:
>> From: Alexei Starovoitov <ast@plumgrid.com>
>> Date: Tue, 22 Apr 2014 20:18:57 -0700
>>
>>> exisiting BPF verifier allows uninitialized access to registers,
>>> 'ret A' is considered to be a valid filter.
>>> So initialize A and X to zero to prevent leaking kernel memory
>>> In the future BPF verifier will be rejecting such filters
>>>
>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>>
>> Has the code always been like this?
>>
>> Did the eBPF changes introduce this problem either directly or
>> indirectly?
>
> Original code was fine AFAIK

Yep.

> Fixes: bd4cf0ed331a2 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
>
> David, is it possible for you to push net-next tree ?

I think this would need to go into net tree first and then
for the upcoming changes, net would need to be merged into
net-next. This will create a minor merge conflict with the
BPF prandom extension; fix is to remove the defines near
__get_random_u32(), as they will be above __sk_run_filter()
already.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 23, 2014, 1:39 p.m. | #7
On Wed, 2014-04-23 at 13:45 +0200, Daniel Borkmann wrote:
> On 04/23/2014 07:13 AM, Eric Dumazet wrote:
> >
> > David, is it possible for you to push net-next tree ?
> 
> I think this would need to go into net tree first and then
> for the upcoming changes, net would need to be merged into
> net-next. This will create a minor merge conflict with the
> BPF prandom extension; fix is to remove the defines near
> __get_random_u32(), as they will be above __sk_run_filter()
> already.

My suggestion to push net-next was not related to the bpf patch.

I needed net-next being fresh for a backport into Google kernels.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 23, 2014, 4:13 p.m. | #8
On Wed, Apr 23, 2014 at 12:53 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 04/23/2014 05:18 AM, Alexei Starovoitov wrote:
>>
>> exisiting BPF verifier allows uninitialized access to registers,
>> 'ret A' is considered to be a valid filter.
>> So initialize A and X to zero to prevent leaking kernel memory
>> In the future BPF verifier will be rejecting such filters
>>
>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>> Cc: Daniel Borkmann <dborkman@redhat.com>
>
>
> I gave it some more thought, it's actually more than just 'ret A'
> from your description, user programs could be used to generate code
> like 'add X' or 'add #42' instead of a txa or load instruction [where
> A is previously uninitialized], as it was always initialized to 0
> before. We prevent to do a 'ret X', but X could be uninitialized and
> then transferred over to taint A etc. So yeah, this patch is the most
> simple way to prevent that w/o huge complexity. Therefore, it's
> better this way:
>
> Acked-by: Daniel Borkmann <dborkman@redhat.com>

Thanks.

the uninitialized checking is indeed not as simple as rejecting 'ret A'.
verifier needs to consider data flow between registers based
on control graph. Just linear scan won't be correct.
Also for ebpf there are 10+1 registers to consider, so ebpf verifier
will take care of both (classic and extended) at once.
classic verifier can stay as-is and eventually removed.
For example after 'bpf_call' insn the registers r1-r5 are scratched,
so they cannot be read, r10 is frame pointer and is read only, etc.
will prepare ebpf verifier patch asap.
the steps will be:
- classic verifier (as-is today)
- bpf->ebpf converter (as-is today)
- ebpf verifier
after insane amount of testing we will be able to remove classic verifier,
since ebpf verifier will be doing it more thorough.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 23, 2014, 4:50 p.m. | #9
From: Alexei Starovoitov <ast@plumgrid.com>
Date: Tue, 22 Apr 2014 21:59:32 -0700

> On Tue, Apr 22, 2014 at 8:57 PM, David Miller <davem@davemloft.net> wrote:
>> From: Alexei Starovoitov <ast@plumgrid.com>
>> Date: Tue, 22 Apr 2014 20:18:57 -0700
>>
>>> exisiting BPF verifier allows uninitialized access to registers,
>>> 'ret A' is considered to be a valid filter.
>>> So initialize A and X to zero to prevent leaking kernel memory
>>> In the future BPF verifier will be rejecting such filters
>>>
>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>>
>> Has the code always been like this?
> 
> the previous interpreter had:
> unsigned int sk_run_filter(const struct sk_buff *skb,
>                            const struct sock_filter *fentry)
> {
>         void *ptr;
>         u32 A = 0;                      /* Accumulator */
>         u32 X = 0;                      /* Index Register */
> ...
> so it wasn't affected.
> 
>> Did the eBPF changes introduce this problem either directly or
>> indirectly?
> 
> this bug is an oversight on my side.
> I believe in the net-next it should be fixed by making verifier smarter
> instead of wasting run time cycles to initialize regs.
> For now the same fix is needed for both net and net-next.
> We can remove extra assignments when verifier becomes smarter.
> ebpf verifier that I posted earlier had checks for uninitialized regs and stack.
> I missed lack of uninit regs part in classic verifier when ebpf verifier and jit
> were dropped from the patch set.

I think you will need to add the initializations during validation
rather than fail because existing BPF filters would be accepted and
run.  You can't just make them fail unexpectedly after all of these
years.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 23, 2014, 4:51 p.m. | #10
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 22 Apr 2014 22:13:00 -0700

> On Tue, 2014-04-22 at 23:57 -0400, David Miller wrote:
>> From: Alexei Starovoitov <ast@plumgrid.com>
>> Date: Tue, 22 Apr 2014 20:18:57 -0700
>> 
>> > exisiting BPF verifier allows uninitialized access to registers,
>> > 'ret A' is considered to be a valid filter.
>> > So initialize A and X to zero to prevent leaking kernel memory
>> > In the future BPF verifier will be rejecting such filters
>> > 
>> > Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>> 
>> Has the code always been like this?
>> 
>> Did the eBPF changes introduce this problem either directly or
>> indirectly?
> 
> Original code was fine AFAIK
> 
> Fixes: bd4cf0ed331a2 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
> 
> David, is it possible for you to push net-next tree ?

What exactly are you asking me to do?  Put this patch in the net-next tree?
Or are you asking me to merge net into net-next after I apply it?

It's definitely a 'net' patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 23, 2014, 4:52 p.m. | #11
From: Daniel Borkmann <dborkman@redhat.com>
Date: Wed, 23 Apr 2014 09:02:54 +0200

> On 04/23/2014 06:59 AM, Alexei Starovoitov wrote:
>> On Tue, Apr 22, 2014 at 8:57 PM, David Miller <davem@davemloft.net>
>> wrote:
>>> From: Alexei Starovoitov <ast@plumgrid.com>
>>> Date: Tue, 22 Apr 2014 20:18:57 -0700
>>>
>>>> exisiting BPF verifier allows uninitialized access to registers,
>>>> 'ret A' is considered to be a valid filter.
>>>> So initialize A and X to zero to prevent leaking kernel memory
>>>> In the future BPF verifier will be rejecting such filters
>>>>
>>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>>>
>>> Has the code always been like this?
> 
> I think it would be much cleaner to just prevent such filters that
> only contain a 'ret A', or 'ret X' w/o a load into X from attaching.

That choice had to be done two decades ago, not now.

We have to accept such filters, we always have.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 23, 2014, 5:10 p.m. | #12
On 04/23/2014 06:51 PM, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 22 Apr 2014 22:13:00 -0700
>
>> On Tue, 2014-04-22 at 23:57 -0400, David Miller wrote:
>>> From: Alexei Starovoitov <ast@plumgrid.com>
>>> Date: Tue, 22 Apr 2014 20:18:57 -0700
>>>
>>>> exisiting BPF verifier allows uninitialized access to registers,
>>>> 'ret A' is considered to be a valid filter.
>>>> So initialize A and X to zero to prevent leaking kernel memory
>>>> In the future BPF verifier will be rejecting such filters
>>>>
>>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>>>
>>> Has the code always been like this?
>>>
>>> Did the eBPF changes introduce this problem either directly or
>>> indirectly?
>>
>> Original code was fine AFAIK
>>
>> Fixes: bd4cf0ed331a2 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
>>
>> David, is it possible for you to push net-next tree ?
>
> What exactly are you asking me to do?  Put this patch in the net-next tree?
> Or are you asking me to merge net into net-next after I apply it?
>
> It's definitely a 'net' patch.

I think Eric already clarified it in [1].

It's definitely against net tree.

 From my side, it would be awesome, if you could put this into net and
then merge net into net-next as I have some pending stuff on top of
this fix. That would at least avoid a bigger merge conflict later on.

Thanks a lot, Dave.

  [1] http://patchwork.ozlabs.org/patch/341693/, April 23, 2014, 1:39 p.m.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 23, 2014, 5:14 p.m. | #13
On Wed, 2014-04-23 at 12:51 -0400, David Miller wrote:

> > David, is it possible for you to push net-next tree ?
> 
> What exactly are you asking me to do?  Put this patch in the net-next tree?
> Or are you asking me to merge net into net-next after I apply it?
> 
> It's definitely a 'net' patch.

This net-next push request had nothing to do with this bpf patch, sorry
for the confusion.

I even missed a s to the 'Thanks', you see ;)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 23, 2014, 5:20 p.m. | #14
On 04/23/2014 06:52 PM, David Miller wrote:
...
>> I think it would be much cleaner to just prevent such filters that
>> only contain a 'ret A', or 'ret X' w/o a load into X from attaching.
>
> That choice had to be done two decades ago, not now.
>
> We have to accept such filters, we always have.

Yep, very true. :/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 23, 2014, 7:35 p.m. | #15
From: Alexei Starovoitov <ast@plumgrid.com>
Date: Tue, 22 Apr 2014 20:18:57 -0700

> exisiting BPF verifier allows uninitialized access to registers,
> 'ret A' is considered to be a valid filter.
> So initialize A and X to zero to prevent leaking kernel memory
> In the future BPF verifier will be rejecting such filters
> 
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 23, 2014, 8:38 p.m. | #16
On Wed, Apr 23, 2014 at 9:50 AM, David Miller <davem@davemloft.net> wrote:
> From: Alexei Starovoitov <ast@plumgrid.com>
> Date: Tue, 22 Apr 2014 21:59:32 -0700
>
>> On Tue, Apr 22, 2014 at 8:57 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Alexei Starovoitov <ast@plumgrid.com>
>>> Date: Tue, 22 Apr 2014 20:18:57 -0700
>>>
>>>> exisiting BPF verifier allows uninitialized access to registers,
>>>> 'ret A' is considered to be a valid filter.
>>>> So initialize A and X to zero to prevent leaking kernel memory
>>>> In the future BPF verifier will be rejecting such filters
>>>>
>>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>>>
>>> Has the code always been like this?
>>
>> the previous interpreter had:
>> unsigned int sk_run_filter(const struct sk_buff *skb,
>>                            const struct sock_filter *fentry)
>> {
>>         void *ptr;
>>         u32 A = 0;                      /* Accumulator */
>>         u32 X = 0;                      /* Index Register */
>> ...
>> so it wasn't affected.
>>
>>> Did the eBPF changes introduce this problem either directly or
>>> indirectly?
>>
>> this bug is an oversight on my side.
>> I believe in the net-next it should be fixed by making verifier smarter
>> instead of wasting run time cycles to initialize regs.
>> For now the same fix is needed for both net and net-next.
>> We can remove extra assignments when verifier becomes smarter.
>> ebpf verifier that I posted earlier had checks for uninitialized regs and stack.
>> I missed lack of uninit regs part in classic verifier when ebpf verifier and jit
>> were dropped from the patch set.
>
> I think you will need to add the initializations during validation
> rather than fail because existing BPF filters would be accepted and
> run.  You can't just make them fail unexpectedly after all of these
> years.

In this particular case we can indeed make verifier smarter to the point
that once it detects uninitialized A/X in a filter, it will add explicit init
after conversion, so we don't have to pay the penalty at run time.
It's a very good suggestion.

At the same time I think 'ret A' falls into category of the 20 year old bug.
The filters that using undocumented behavior are destined to be
broken sooner or later. tcpdump doesn't generate such filters.
Valid filters will not be affected by strict checking, but I think users
will be actually happy to see kernel rejecting their buggy filters.
I think we can have a global flag for bpf verifier whether to reject
such filters or accept them by adding explicit inits under
'bpf_permissive_loading' flag.
Thoughts?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 23, 2014, 9:07 p.m. | #17
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 23 Apr 2014 06:39:04 -0700

> I needed net-next being fresh for a backport into Google kernels.

I try to have this merge taken care of by the end of today.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 23, 2014, 9:39 p.m. | #18
On Wed, 2014-04-23 at 13:38 -0700, Alexei Starovoitov wrote:

> In this particular case we can indeed make verifier smarter to the point
> that once it detects uninitialized A/X in a filter, it will add explicit init
> after conversion, so we don't have to pay the penalty at run time.
> It's a very good suggestion.
> 
> At the same time I think 'ret A' falls into category of the 20 year old bug.
> The filters that using undocumented behavior are destined to be
> broken sooner or later. tcpdump doesn't generate such filters.
> Valid filters will not be affected by strict checking, but I think users
> will be actually happy to see kernel rejecting their buggy filters.
> I think we can have a global flag for bpf verifier whether to reject
> such filters or accept them by adding explicit inits under
> 'bpf_permissive_loading' flag.
> Thoughts?


I do not really understand the concerns and this discussion over this
bug.

BPF JIT does basic checks and forces A=0 if first instruction doesn't
set A.

( arch/x86/net/bpf_jit_comp.c line 263 )

X is forced to 0 anyway if X is ever used in the JITed filter.
( line 227, and we have a typo in comment : s/leek/leak/ )

This is not a big deal. If you want to 'optimize', fine, but don't break
programs.

There is no way you add a 'flag' to break a user program, even
if _you_ believe its broken.

Maybe it assumed initial X / A contents were 0.

Even if not documented, we cannot break this 20 years after the facts.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 23, 2014, 10:19 p.m. | #19
On Wed, Apr 23, 2014 at 2:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-04-23 at 13:38 -0700, Alexei Starovoitov wrote:
>
>> In this particular case we can indeed make verifier smarter to the point
>> that once it detects uninitialized A/X in a filter, it will add explicit init
>> after conversion, so we don't have to pay the penalty at run time.
>> It's a very good suggestion.
>>
>> At the same time I think 'ret A' falls into category of the 20 year old bug.
>> The filters that using undocumented behavior are destined to be
>> broken sooner or later. tcpdump doesn't generate such filters.
>> Valid filters will not be affected by strict checking, but I think users
>> will be actually happy to see kernel rejecting their buggy filters.
>> I think we can have a global flag for bpf verifier whether to reject
>> such filters or accept them by adding explicit inits under
>> 'bpf_permissive_loading' flag.
>> Thoughts?
>
>
> I do not really understand the concerns and this discussion over this
> bug.
>
> BPF JIT does basic checks and forces A=0 if first instruction doesn't
> set A.
>
> ( arch/x86/net/bpf_jit_comp.c line 263 )
>
> X is forced to 0 anyway if X is ever used in the JITed filter.
> ( line 227, and we have a typo in comment : s/leek/leak/ )

yes, x86 jit is smart to deal around this problem in an efficient way.
sparc/arm jits copy pasted the idea correctly.
but s390 jit seems to have a bug, since it's not clearing A
when 1st insn is BPF_S_LDX_B_MSH
That shouldn't be a jit job, especially since it can be done by verifier.
classic bpf has 2 regs, ebpf has 10. Checking 1st insn won't work
for ebpf.

> This is not a big deal. If you want to 'optimize', fine, but don't break
> programs.

Fine. I'm not saying to break them by default. I'm suggesting
to give users a flag to detect invalid programs.

> There is no way you add a 'flag' to break a user program, even
> if _you_ believe its broken.
>
> Maybe it assumed initial X / A contents were 0.
>
> Even if not documented, we cannot break this 20 years after the facts.

I'm not suggesting to have this flag by default, but don't you want
kernel to say that bpf program is using uninitialized register?
imo the more checking can be done the better user experience will be.
Just like compiler warnings. Better to have an option to know
what's wrong with the filter than not having it at all.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 24, 2014, 2:55 a.m. | #20
On Wed, 2014-04-23 at 15:19 -0700, Alexei Starovoitov wrote:

> yes, x86 jit is smart to deal around this problem in an efficient way.
> sparc/arm jits copy pasted the idea correctly.

Well... not really smart, but sufficient enough (even if I am sure we
probably have some bugs)

> but s390 jit seems to have a bug, since it's not clearing A
> when 1st insn is BPF_S_LDX_B_MSH

Sure, JIT is hard and quite risky (You forgot to CC Heiko & Martin, they
probably can send the needed patch)

Its not yet enabled by default for this reason.

I am worried that all recent bpf changes raised the bar so high that
nobody but you can really understand what is going on, unless spending
a _lot_ of time on it.

net/core/filter.c was about 880 lines in previous release, its now 1800+
lines.

netdev became a list discussing compiler technology, 
assembly code and ABI level stuff lately.

Say I want to check why we have this code, potentially modifying
arbitrary kernel memory :

        BPF_STX_BPF_XADD_BPF_W: /* lock xadd *(u32 *)(A + insn->off) += X */
                atomic_add((u32) X, (atomic_t *)(unsigned long)
                           (A + insn->off));
                CONT;
        BPF_STX_BPF_XADD_BPF_DW: /* lock xadd *(u64 *)(A + insn->off) += X */
                atomic64_add((u64) X, (atomic64_t *)(unsigned long)
                             (A + insn->off));
                CONT;



No comments on this scary code, and I couldn't find why it is there.

$ git grep -n2 BPF_XADD
include/linux/filter.h-19-/* ld/ldx fields */
include/linux/filter.h-20-#define BPF_DW                0x18    /* double word */
include/linux/filter.h:21:#define BPF_XADD      0xc0    /* exclusive add */
include/linux/filter.h-22-
include/linux/filter.h-23-/* alu/jmp fields */
--
net/core/filter.c-222-          DL(BPF_STX, BPF_MEM, BPF_W),
net/core/filter.c-223-          DL(BPF_STX, BPF_MEM, BPF_DW),
net/core/filter.c:224:          DL(BPF_STX, BPF_XADD, BPF_W),
net/core/filter.c:225:          DL(BPF_STX, BPF_XADD, BPF_DW),
net/core/filter.c-226-          DL(BPF_ST, BPF_MEM, BPF_B),
net/core/filter.c-227-          DL(BPF_ST, BPF_MEM, BPF_H),
--
net/core/filter.c-479-  LDST(BPF_DW, u64)
net/core/filter.c-480-#undef LDST
net/core/filter.c:481:  BPF_STX_BPF_XADD_BPF_W: /* lock xadd *(u32 *)(A + insn->off) += X */
net/core/filter.c-482-          atomic_add((u32) X, (atomic_t *)(unsigned long)
net/core/filter.c-483-                     (A + insn->off));
net/core/filter.c-484-          CONT;
net/core/filter.c:485:  BPF_STX_BPF_XADD_BPF_DW: /* lock xadd *(u64 *)(A + insn->off) += X */
net/core/filter.c-486-          atomic64_add((u64) X, (atomic64_t *)(unsigned long)
net/core/filter.c-487-                       (A + insn->off));

This looks like premature inclusion, at very minimum.

Sure, I probably can find answer reading past months netdev/lkml traffic ?

Sorry if this looks like ranting, but I would rather complain before thousand
of other lines of code are merged.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 24, 2014, 3:22 a.m. | #21
On Wed, Apr 23, 2014 at 7:55 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-04-23 at 15:19 -0700, Alexei Starovoitov wrote:
>
>> yes, x86 jit is smart to deal around this problem in an efficient way.
>> sparc/arm jits copy pasted the idea correctly.
>
> Well... not really smart, but sufficient enough (even if I am sure we
> probably have some bugs)
>
>> but s390 jit seems to have a bug, since it's not clearing A
>> when 1st insn is BPF_S_LDX_B_MSH
>
> Sure, JIT is hard and quite risky (You forgot to CC Heiko & Martin, they
> probably can send the needed patch)
>
> Its not yet enabled by default for this reason.
>
> I am worried that all recent bpf changes raised the bar so high that
> nobody but you can really understand what is going on, unless spending
> a _lot_ of time on it.

I'm trying to explain it as much as I can. If you're willing to listen,
I'm happy to answer any questions.

> net/core/filter.c was about 880 lines in previous release, its now 1800+
> lines.

but the whole sk_encode/decode functions are obsolete.
Now we can remove some code too.

> netdev became a list discussing compiler technology,
> assembly code and ABI level stuff lately.
>
> Say I want to check why we have this code, potentially modifying
> arbitrary kernel memory :
>
>         BPF_STX_BPF_XADD_BPF_W: /* lock xadd *(u32 *)(A + insn->off) += X */
>                 atomic_add((u32) X, (atomic_t *)(unsigned long)
>                            (A + insn->off));
>                 CONT;
>         BPF_STX_BPF_XADD_BPF_DW: /* lock xadd *(u64 *)(A + insn->off) += X */
>                 atomic64_add((u64) X, (atomic64_t *)(unsigned long)
>                              (A + insn->off));
>                 CONT;
>
>
>
> No comments on this scary code, and I couldn't find why it is there.

I'm sorry but this very list forced me to drop all the comments and docs
around ebpf verifier that go into details how this is verified and why
it is safe.
I can repost the whole thing as one big series of patches
or send the links to previous patches where approach to safety is explained.
I thought the preference was to phase it in gradually.
Smallest patch at a time. Don't break anything. That's what I'm trying to do.

> $ git grep -n2 BPF_XADD
> include/linux/filter.h-19-/* ld/ldx fields */
> include/linux/filter.h-20-#define BPF_DW                0x18    /* double word */
> include/linux/filter.h:21:#define BPF_XADD      0xc0    /* exclusive add */
> include/linux/filter.h-22-
> include/linux/filter.h-23-/* alu/jmp fields */
> --
> net/core/filter.c-222-          DL(BPF_STX, BPF_MEM, BPF_W),
> net/core/filter.c-223-          DL(BPF_STX, BPF_MEM, BPF_DW),
> net/core/filter.c:224:          DL(BPF_STX, BPF_XADD, BPF_W),
> net/core/filter.c:225:          DL(BPF_STX, BPF_XADD, BPF_DW),
> net/core/filter.c-226-          DL(BPF_ST, BPF_MEM, BPF_B),
> net/core/filter.c-227-          DL(BPF_ST, BPF_MEM, BPF_H),
> --
> net/core/filter.c-479-  LDST(BPF_DW, u64)
> net/core/filter.c-480-#undef LDST
> net/core/filter.c:481:  BPF_STX_BPF_XADD_BPF_W: /* lock xadd *(u32 *)(A + insn->off) += X */
> net/core/filter.c-482-          atomic_add((u32) X, (atomic_t *)(unsigned long)
> net/core/filter.c-483-                     (A + insn->off));
> net/core/filter.c-484-          CONT;
> net/core/filter.c:485:  BPF_STX_BPF_XADD_BPF_DW: /* lock xadd *(u64 *)(A + insn->off) += X */
> net/core/filter.c-486-          atomic64_add((u64) X, (atomic64_t *)(unsigned long)
> net/core/filter.c-487-                       (A + insn->off));
>
> This looks like premature inclusion, at very minimum.
>
> Sure, I probably can find answer reading past months netdev/lkml traffic ?

yes. it's more than one month. It was posted first back in September.
ebpf architecture didn't change since then.
Just new use cases were found.
I thought to apply ebpf to ovs initially, but now it looks like tracing filters
will benefit it more, so they took a priority. seccomp, nids, etc

> Sorry if this looks like ranting, but I would rather complain before thousand
> of other lines of code are merged.

It's a valid concern. I need to share and explain more. Point taken.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Schwidefsky April 24, 2014, 7:07 a.m. | #22
On Wed, 23 Apr 2014 19:55:09 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Wed, 2014-04-23 at 15:19 -0700, Alexei Starovoitov wrote:
> 
> > yes, x86 jit is smart to deal around this problem in an efficient way.
> > sparc/arm jits copy pasted the idea correctly.
> 
> Well... not really smart, but sufficient enough (even if I am sure we
> probably have some bugs)
> 
> > but s390 jit seems to have a bug, since it's not clearing A
> > when 1st insn is BPF_S_LDX_B_MSH
> 
> Sure, JIT is hard and quite risky (You forgot to CC Heiko & Martin, they
> probably can send the needed patch)

That should be a one-line patch which removes the BPF_S_LDX_B_MSH opcode
from the switch statement in bpf_jit_noleaks.
David Miller April 24, 2014, 5:24 p.m. | #23
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 23 Apr 2014 10:14:44 -0700

> This net-next push request had nothing to do with this bpf patch, sorry
> for the confusion.

I just did the merge, sorry for taking so long.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 24, 2014, 6:11 p.m. | #24
On Thu, 2014-04-24 at 13:24 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 23 Apr 2014 10:14:44 -0700
> 
> > This net-next push request had nothing to do with this bpf patch, sorry
> > for the confusion.
> 
> I just did the merge, sorry for taking so long.

No problem, thanks David !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 24, 2014, 10:18 p.m. | #25
On 04/24/2014 07:24 PM, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 23 Apr 2014 10:14:44 -0700
>
>> This net-next push request had nothing to do with this bpf patch, sorry
>> for the confusion.
>
> I just did the merge, sorry for taking so long.

Great, thanks Dave!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 25, 2014, 8:23 a.m. | #26
On 04/24/2014 05:22 AM, Alexei Starovoitov wrote:
...
>> net/core/filter.c was about 880 lines in previous release, its now 1800+
>> lines.
>
> but the whole sk_encode/decode functions are obsolete.
> Now we can remove some code too.

Indeed, these can eventually be removed; naturally, a bigger portion
of these changes go into the conversion functions into the new layout.
The interpreter part itself is conceptually the same, seccomp as you
might have noticed actually got a lot easier thanks to the BPF
changes, and the new instruction layout is much easier to handle for
JITs (i.e. the BPF extensions), which is ongoing work. Clearly, there
are successive integration steps required. As such, also the
documentation will get further improved for all that in the next days.

Thanks,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/core/filter.c b/net/core/filter.c
index cd58614..9d79ca0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -122,6 +122,13 @@  noinline u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 	return 0;
 }
 
+/* Register mappings for user programs. */
+#define A_REG		0
+#define X_REG		7
+#define TMP_REG		8
+#define ARG2_REG	2
+#define ARG3_REG	3
+
 /**
  *	__sk_run_filter - run a filter on a given context
  *	@ctx: buffer to run the filter on
@@ -242,6 +249,8 @@  unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
 
 	regs[FP_REG]  = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
 	regs[ARG1_REG] = (u64) (unsigned long) ctx;
+	regs[A_REG] = 0;
+	regs[X_REG] = 0;
 
 select_insn:
 	goto *jumptable[insn->code];
@@ -643,13 +652,6 @@  static u64 __get_raw_cpu_id(u64 ctx, u64 A, u64 X, u64 r4, u64 r5)
 	return raw_smp_processor_id();
 }
 
-/* Register mappings for user programs. */
-#define A_REG		0
-#define X_REG		7
-#define TMP_REG		8
-#define ARG2_REG	2
-#define ARG3_REG	3
-
 static bool convert_bpf_extensions(struct sock_filter *fp,
 				   struct sock_filter_int **insnp)
 {