diff mbox

ebpf: emit correct src_reg for conditional jumps

Message ID 1441931107-17673-1-git-send-email-tycho.andersen@canonical.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Tycho Andersen Sept. 11, 2015, 12:25 a.m. UTC
Instead of always emitting BPF_REG_X, let's emit BPF_REG_X only when the
source actually is BPF_X. This causes programs generated by the classic
converter to not be importable via bpf(), as the eBPF verifier checks that
the src_reg is correct or 0. While not a problem yet, this will be a
problem when BPF_PROG_DUMP lands, and we can potentially dump and re-import
programs generated by the converter.

Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Alexei Starovoitov <ast@kernel.org>
CC: Daniel Borkmann <daniel@iogearbox.net>
---
 net/core/filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Daniel Borkmann Sept. 11, 2015, 8:45 a.m. UTC | #1
On 09/11/2015 02:25 AM, Tycho Andersen wrote:
> Instead of always emitting BPF_REG_X, let's emit BPF_REG_X only when the
> source actually is BPF_X. This causes programs generated by the classic
> converter to not be importable via bpf(), as the eBPF verifier checks that
> the src_reg is correct or 0. While not a problem yet, this will be a
> problem when BPF_PROG_DUMP lands, and we can potentially dump and re-import
> programs generated by the converter.
>
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
> CC: Alexei Starovoitov <ast@kernel.org>
> CC: Daniel Borkmann <daniel@iogearbox.net>

I think the description at the beginning could have been a bit clearer. I.e.
it's safe to zero insn->src_reg for BPF_SRC(fp->code) that is not X, because
we can either have X or K for classic jump compares, and in case of K we're
only interested in the immediate value, but not a possible src_reg, of course,
so eBPF converter should have zeroed it.

Yeah, it was never really the aim of the converter to cover something like
that requirement you seem to have:

   load classic BPF
     -> transform into eBPF
       -> dump that eBPF result to uspace
         -> load that eBPF via bpf(2)

Anyway, it doesn't cause an issue in the current code as stated, but it's good
if we fix it up nevertheless.

Looks good to me:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Sept. 11, 2015, 9:28 a.m. UTC | #2
On 09/11/2015 10:45 AM, Daniel Borkmann wrote:
> On 09/11/2015 02:25 AM, Tycho Andersen wrote:
>> Instead of always emitting BPF_REG_X, let's emit BPF_REG_X only when the
>> source actually is BPF_X. This causes programs generated by the classic
>> converter to not be importable via bpf(), as the eBPF verifier checks that
>> the src_reg is correct or 0. While not a problem yet, this will be a
>> problem when BPF_PROG_DUMP lands, and we can potentially dump and re-import
>> programs generated by the converter.
>>
>> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
>> CC: Alexei Starovoitov <ast@kernel.org>
>> CC: Daniel Borkmann <daniel@iogearbox.net>
>
> I think the description at the beginning could have been a bit clearer. I.e.
> it's safe to zero insn->src_reg for BPF_SRC(fp->code) that is not X, because
> we can either have X or K for classic jump compares, and in case of K we're
> only interested in the immediate value, but not a possible src_reg, of course,
> so eBPF converter should have zeroed it.
>
> Yeah, it was never really the aim of the converter to cover something like
> that requirement you seem to have:
>
>    load classic BPF
>      -> transform into eBPF
>        -> dump that eBPF result to uspace
>          -> load that eBPF via bpf(2)

[off topic for this patch]

... this requirement also breaks down for cases where you have a single
classic BPF instruction that maps into 2 or more eBPF instructions, hitting
BPF_MAXINSNS early at the time when you try to call into bpf(2) again with
the dumped result. If I recall correctly, Chrome seems to use up quite a lot
of insns space on (classic) seccomp-BPF, so this could potentially be a real
issue, next to artificially crafted examples that would fail.

With regards to the latter, also classic programs that could have holes of
dead code where you jump over them (see lib/test_bpf.c for examples) are
unfortunately allowed on the ABI side, so while the classic -> eBPF converter
may translate this dead region 1:1, it will be rejected by the verifier when
you dump and try to reload that. ;) Anyway, it's perhaps a silly example, but
it shows that this use-case can only work on a 'best-effort' basis, and not
cover everything of the classic BPF ABI, as opposed to having an interface
that can dump/restore classic BPF instructions directly.

Do you need this for checkpoint/restore? Wouldn't this therefore be better
done as dump/restore classic<->classic and eBPF<->eBPF directly? In socket
filters we do this by keeping orig_prog around, I guess it's better to bite
the bullet of additional memory overhead in case of classic seccomp-BPF, too.

Cheers,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Sept. 11, 2015, 3:40 p.m. UTC | #3
On Fri, Sep 11, 2015 at 11:28:24AM +0200, Daniel Borkmann wrote:
> [off topic for this patch]
> 
> ... this requirement also breaks down for cases where you have a single
> classic BPF instruction that maps into 2 or more eBPF instructions, hitting
> BPF_MAXINSNS early at the time when you try to call into bpf(2) again with
> the dumped result. If I recall correctly, Chrome seems to use up quite a lot
> of insns space on (classic) seccomp-BPF, so this could potentially be a real
> issue, next to artificially crafted examples that would fail.
> 
> With regards to the latter, also classic programs that could have holes of
> dead code where you jump over them (see lib/test_bpf.c for examples) are
> unfortunately allowed on the ABI side, so while the classic -> eBPF converter
> may translate this dead region 1:1, it will be rejected by the verifier when
> you dump and try to reload that. ;) Anyway, it's perhaps a silly example, but
> it shows that this use-case can only work on a 'best-effort' basis, and not
> cover everything of the classic BPF ABI, as opposed to having an interface
> that can dump/restore classic BPF instructions directly.
> 
> Do you need this for checkpoint/restore? Wouldn't this therefore be better
> done as dump/restore classic<->classic and eBPF<->eBPF directly? In socket
> filters we do this by keeping orig_prog around, I guess it's better to bite
> the bullet of additional memory overhead in case of classic seccomp-BPF, too.

I don't think so.
When I played with libseccomp and chrome I saw that browser installed
several bpf programs for every new tab. The longest program was 275
classic insns which translated to slightly more eBPF insns
(because in eBPF we don't have < and <= instructions, so converter
needs to emit extra jump insns)
Also they don't produce unreachable code.
So getting over 4k limit and unreachable are rather hypothetical
problems. I wouldn't want to have two interfaces to criu seccomp.
eBPF is going to be used by seccomp as well, so having two is extra
burden for user space criu.
If we start hitting 4k limit for eBPF, we can easily bump it
and/or add < and <= insns to eBPF (which was on my todo list anyway).

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tycho Andersen Sept. 11, 2015, 3:50 p.m. UTC | #4
On Fri, Sep 11, 2015 at 08:40:43AM -0700, Alexei Starovoitov wrote:
> On Fri, Sep 11, 2015 at 11:28:24AM +0200, Daniel Borkmann wrote:
> > [off topic for this patch]
> > 
> > ... this requirement also breaks down for cases where you have a single
> > classic BPF instruction that maps into 2 or more eBPF instructions, hitting
> > BPF_MAXINSNS early at the time when you try to call into bpf(2) again with
> > the dumped result. If I recall correctly, Chrome seems to use up quite a lot
> > of insns space on (classic) seccomp-BPF, so this could potentially be a real
> > issue, next to artificially crafted examples that would fail.
> > 
> > With regards to the latter, also classic programs that could have holes of
> > dead code where you jump over them (see lib/test_bpf.c for examples) are
> > unfortunately allowed on the ABI side, so while the classic -> eBPF converter
> > may translate this dead region 1:1, it will be rejected by the verifier when
> > you dump and try to reload that. ;) Anyway, it's perhaps a silly example, but
> > it shows that this use-case can only work on a 'best-effort' basis, and not
> > cover everything of the classic BPF ABI, as opposed to having an interface
> > that can dump/restore classic BPF instructions directly.
> > 
> > Do you need this for checkpoint/restore? Wouldn't this therefore be better
> > done as dump/restore classic<->classic and eBPF<->eBPF directly? In socket
> > filters we do this by keeping orig_prog around, I guess it's better to bite
> > the bullet of additional memory overhead in case of classic seccomp-BPF, too.
> 
> I don't think so.
> When I played with libseccomp and chrome I saw that browser installed
> several bpf programs for every new tab. The longest program was 275
> classic insns which translated to slightly more eBPF insns
> (because in eBPF we don't have < and <= instructions, so converter
> needs to emit extra jump insns)
> Also they don't produce unreachable code.
> So getting over 4k limit and unreachable are rather hypothetical
> problems. I wouldn't want to have two interfaces to criu seccomp.
> eBPF is going to be used by seccomp as well, so having two is extra
> burden for user space criu.

I think the burden is not so huge once we have eBPF c/r in place (we
could just check the classic flag first, then dump the eBPF version or
something). However, it doesn't seem ideal to have to support two
interfaces.

> If we start hitting 4k limit for eBPF, we can easily bump it
> and/or add < and <= insns to eBPF (which was on my todo list anyway).

I was hoping you might say this (assuming you mean add < BPF_MAXINSNS
to the converter).

The dead code is certainly a problem, but perhaps we can wait on this
until there become some critical application that has this issue. I
was hoping we could get away without extra memory usage on the kernel
side (indeed, this patchset is mostly to try and avoid that).

Tycho
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann Sept. 11, 2015, 4:53 p.m. UTC | #5
On 09/11/2015 05:50 PM, Tycho Andersen wrote:
> On Fri, Sep 11, 2015 at 08:40:43AM -0700, Alexei Starovoitov wrote:
>> On Fri, Sep 11, 2015 at 11:28:24AM +0200, Daniel Borkmann wrote:
>>> [off topic for this patch]
>>>
>>> ... this requirement also breaks down for cases where you have a single
>>> classic BPF instruction that maps into 2 or more eBPF instructions, hitting
>>> BPF_MAXINSNS early at the time when you try to call into bpf(2) again with
>>> the dumped result. If I recall correctly, Chrome seems to use up quite a lot
>>> of insns space on (classic) seccomp-BPF, so this could potentially be a real
>>> issue, next to artificially crafted examples that would fail.
>>>
>>> With regards to the latter, also classic programs that could have holes of
>>> dead code where you jump over them (see lib/test_bpf.c for examples) are
>>> unfortunately allowed on the ABI side, so while the classic -> eBPF converter
>>> may translate this dead region 1:1, it will be rejected by the verifier when
>>> you dump and try to reload that. ;) Anyway, it's perhaps a silly example, but
>>> it shows that this use-case can only work on a 'best-effort' basis, and not
>>> cover everything of the classic BPF ABI, as opposed to having an interface
>>> that can dump/restore classic BPF instructions directly.
>>>
>>> Do you need this for checkpoint/restore? Wouldn't this therefore be better
>>> done as dump/restore classic<->classic and eBPF<->eBPF directly? In socket
>>> filters we do this by keeping orig_prog around, I guess it's better to bite
>>> the bullet of additional memory overhead in case of classic seccomp-BPF, too.
>>
>> I don't think so.
>> When I played with libseccomp and chrome I saw that browser installed
>> several bpf programs for every new tab. The longest program was 275
>> classic insns which translated to slightly more eBPF insns
>> (because in eBPF we don't have < and <= instructions, so converter
>> needs to emit extra jump insns)

Yes, these cases, and also exit code translates into two and you have a
preamble moving ctx to arg1 for every classic program. So it should be
slightly more overall, depending on the program structure.

>> Also they don't produce unreachable code.
>> So getting over 4k limit and unreachable are rather hypothetical
>> problems. I wouldn't want to have two interfaces to criu seccomp.

Thinking out loud, is there such a use-case where you checkpoint your
application on kernel X (that allows, say, to dump /and/ inject classic
seccomp insns) and restore it elsewhere on kernel Y, where Y is older
than X (f.e. it can easily inject classic insns on seccomp as it's
present for some time, but /not/ dump). I guess that could be ignored
as you couldn't move it away from there w/o dumping insns then? Also,
if the *whole* environment is even to a little degree non-homogeneous,
then your own seccomp rules can already kill you. ;)

>> eBPF is going to be used by seccomp as well, so having two is extra
>> burden for user space criu.

I don't know how much burden it actually is for criu, it for sure already
would need to do exactly this for eBPF socket filters. If I could choose
between adding extra complexity on user space or kernel space, I would
choose user space when possible.

> I think the burden is not so huge once we have eBPF c/r in place (we
> could just check the classic flag first, then dump the eBPF version or
> something). However, it doesn't seem ideal to have to support two
> interfaces.
>
>> If we start hitting 4k limit for eBPF, we can easily bump it
>> and/or add < and <= insns to eBPF (which was on my todo list anyway).

I think bumping BPF_MAXINSNS seems fragile wrt user space breakage, it's
at least unclear to me whether applications interacting with the kernel
depend on this in some [even weird] way. I'd probably go for the <, <=.

> I was hoping you might say this (assuming you mean add < BPF_MAXINSNS
> to the converter).
>
> The dead code is certainly a problem, but perhaps we can wait on this
> until there become some critical application that has this issue. I
> was hoping we could get away without extra memory usage on the kernel
> side (indeed, this patchset is mostly to try and avoid that).
>
> Tycho
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 11, 2015, 9:53 p.m. UTC | #6
From: Tycho Andersen <tycho.andersen@canonical.com>
Date: Thu, 10 Sep 2015 18:25:07 -0600

> Instead of always emitting BPF_REG_X, let's emit BPF_REG_X only when the
> source actually is BPF_X. This causes programs generated by the classic
> converter to not be importable via bpf(), as the eBPF verifier checks that
> the src_reg is correct or 0. While not a problem yet, this will be a
> problem when BPF_PROG_DUMP lands, and we can potentially dump and re-import
> programs generated by the converter.
> 
> Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/filter.c b/net/core/filter.c
index 13079f0..05a04ea 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -478,9 +478,9 @@  do_pass:
 				bpf_src = BPF_X;
 			} else {
 				insn->dst_reg = BPF_REG_A;
-				insn->src_reg = BPF_REG_X;
 				insn->imm = fp->k;
 				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
 			}
 
 			/* Common case where 'jump_false' is next insn. */