diff mbox

[RFCv2,07/16] bpf: enable non-core use of the verfier

Message ID 1472234775-29453-8-git-send-email-jakub.kicinski@netronome.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jakub Kicinski Aug. 26, 2016, 6:06 p.m. UTC
Advanced JIT compilers and translators may want to use
eBPF verifier as a base for parsers or to perform custom
checks and validations.

Add ability for external users to invoke the verifier
and provide callbacks to be invoked for every intruction
checked.  For now only add most basic callback for
per-instruction pre-interpretation checks is added.  More
advanced users may also like to have per-instruction post
callback and state comparison callback.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/linux/bpf_parser.h |  84 +++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c      | 122 +++++++++++++++++++++++----------------------
 2 files changed, 146 insertions(+), 60 deletions(-)
 create mode 100644 include/linux/bpf_parser.h

Comments

Alexei Starovoitov Aug. 26, 2016, 11:29 p.m. UTC | #1
On Fri, Aug 26, 2016 at 07:06:06PM +0100, Jakub Kicinski wrote:
> Advanced JIT compilers and translators may want to use
> eBPF verifier as a base for parsers or to perform custom
> checks and validations.
> 
> Add ability for external users to invoke the verifier
> and provide callbacks to be invoked for every intruction
> checked.  For now only add most basic callback for
> per-instruction pre-interpretation checks is added.  More
> advanced users may also like to have per-instruction post
> callback and state comparison callback.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

I like the apporach. Making verifier into 'bytecode parser'
that JITs can reuse is a good design choice.
The only thing I would suggest is to tweak the verifier to
avoid in-place state recording. Then I think patch 8 for
clone/unclone of the program won't be needed, since verifier
will be read-only from bytecode point of view and patch 9
also will be slightly cleaner.
I think there are very few places in verifier that do this
state keeping inside insn. It was bugging me for some time.
Good time to clean that up.
Unless I misunderstand the patches 7,8,9...

There is also small concern for patches 5 and 6 that add
more register state information. Potentially that extra
state can prevent states_equal() to recognize equivalent
states. Only patch 9 uses that info, right?
Another question is do you need all state walking that
verifier does or single linear pass through insns
would have worked?
Looks like you're only using CONST_IMM and PTR_TO_CTX
state, right?

The rest looks very good. Thanks a lot!
Jakub Kicinski Aug. 27, 2016, 11:40 a.m. UTC | #2
On Fri, 26 Aug 2016 16:29:05 -0700, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 07:06:06PM +0100, Jakub Kicinski wrote:
> > Advanced JIT compilers and translators may want to use
> > eBPF verifier as a base for parsers or to perform custom
> > checks and validations.
> > 
> > Add ability for external users to invoke the verifier
> > and provide callbacks to be invoked for every intruction
> > checked.  For now only add most basic callback for
> > per-instruction pre-interpretation checks is added.  More
> > advanced users may also like to have per-instruction post
> > callback and state comparison callback.
> > 
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>  
> 
> I like the apporach. Making verifier into 'bytecode parser'
> that JITs can reuse is a good design choice.
> The only thing I would suggest is to tweak the verifier to
> avoid in-place state recording. Then I think patch 8 for
> clone/unclone of the program won't be needed, since verifier
> will be read-only from bytecode point of view and patch 9
> also will be slightly cleaner.
> I think there are very few places in verifier that do this
> state keeping inside insn. It was bugging me for some time.
> Good time to clean that up.
> Unless I misunderstand the patches 7,8,9...

Agreed, I think the verifier only modifies the program to
store pointer types in imm field.  I will try to come up
a way around this, any suggestions?  Perhaps state_equal()
logic could be modified to downgrade pointers to UNKONWNs
when it detects other state had incompatible pointer type.

> There is also small concern for patches 5 and 6 that add
> more register state information. Potentially that extra
> state can prevent states_equal() to recognize equivalent
> states. Only patch 9 uses that info, right?

5 and 6 recognize more constant loads, those can only
upgrade some UNKNOWN_VALUEs to CONST_IMMs.  So yes, if the
verifier hits the CONST first and then tries with UNKNOWN
it will have to reverify the path.  

> Another question is do you need all state walking that
> verifier does or single linear pass through insns
> would have worked?
> Looks like you're only using CONST_IMM and PTR_TO_CTX
> state, right?

I think I need all the parsing.  Right now I mostly need
the verification to check that exit codes are specific
CONST_IMMs.  Clang quite happily does this:

r0 <- 0
if (...)
	r0 <- 1
exit

> The rest looks very good. Thanks a lot!

Thanks for the review!  FWIW my use of parsing is isolated
to the nfp_bpf_verifier.c file, at the very end of patch 9.
Alexei Starovoitov Aug. 27, 2016, 5:32 p.m. UTC | #3
On Sat, Aug 27, 2016 at 12:40:04PM +0100, Jakub Kicinski wrote:
> On Fri, 26 Aug 2016 16:29:05 -0700, Alexei Starovoitov wrote:
> > On Fri, Aug 26, 2016 at 07:06:06PM +0100, Jakub Kicinski wrote:
> > > Advanced JIT compilers and translators may want to use
> > > eBPF verifier as a base for parsers or to perform custom
> > > checks and validations.
> > > 
> > > Add ability for external users to invoke the verifier
> > > and provide callbacks to be invoked for every intruction
> > > checked.  For now only add most basic callback for
> > > per-instruction pre-interpretation checks is added.  More
> > > advanced users may also like to have per-instruction post
> > > callback and state comparison callback.
> > > 
> > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>  
> > 
> > I like the apporach. Making verifier into 'bytecode parser'
> > that JITs can reuse is a good design choice.
> > The only thing I would suggest is to tweak the verifier to
> > avoid in-place state recording. Then I think patch 8 for
> > clone/unclone of the program won't be needed, since verifier
> > will be read-only from bytecode point of view and patch 9
> > also will be slightly cleaner.
> > I think there are very few places in verifier that do this
> > state keeping inside insn. It was bugging me for some time.
> > Good time to clean that up.
> > Unless I misunderstand the patches 7,8,9...
> 
> Agreed, I think the verifier only modifies the program to
> store pointer types in imm field.  I will try to come up
> a way around this, any suggestions?  Perhaps state_equal()

probably array_of_insn_aux_data[num_insns] should do it.
Unlike reg_state that is forked on branches, this array
is only one.

> logic could be modified to downgrade pointers to UNKONWNs
> when it detects other state had incompatible pointer type.
> 
> > There is also small concern for patches 5 and 6 that add
> > more register state information. Potentially that extra
> > state can prevent states_equal() to recognize equivalent
> > states. Only patch 9 uses that info, right?
> 
> 5 and 6 recognize more constant loads, those can only
> upgrade some UNKNOWN_VALUEs to CONST_IMMs.  So yes, if the
> verifier hits the CONST first and then tries with UNKNOWN
> it will have to reverify the path.  
> 
> > Another question is do you need all state walking that
> > verifier does or single linear pass through insns
> > would have worked?
> > Looks like you're only using CONST_IMM and PTR_TO_CTX
> > state, right?
> 
> I think I need all the parsing.  Right now I mostly need
> the verification to check that exit codes are specific
> CONST_IMMs.  Clang quite happily does this:
> 
> r0 <- 0
> if (...)
> 	r0 <- 1
> exit

I see. Indeed then you'd need the verifier to walk all paths
to make sure constant return values.
If you only need yes/no check then such info can probably be
collected unconditionally during initial program load.
Like prog->cb_access flag.
Daniel Borkmann Aug. 29, 2016, 8:13 p.m. UTC | #4
On 08/27/2016 07:32 PM, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 12:40:04PM +0100, Jakub Kicinski wrote:
>> On Fri, 26 Aug 2016 16:29:05 -0700, Alexei Starovoitov wrote:
>>> On Fri, Aug 26, 2016 at 07:06:06PM +0100, Jakub Kicinski wrote:
>>>> Advanced JIT compilers and translators may want to use
>>>> eBPF verifier as a base for parsers or to perform custom
>>>> checks and validations.
>>>>
>>>> Add ability for external users to invoke the verifier
>>>> and provide callbacks to be invoked for every intruction
>>>> checked.  For now only add most basic callback for
>>>> per-instruction pre-interpretation checks is added.  More
>>>> advanced users may also like to have per-instruction post
>>>> callback and state comparison callback.
>>>>
>>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>>
>>> I like the apporach. Making verifier into 'bytecode parser'
>>> that JITs can reuse is a good design choice.

+1

>>> The only thing I would suggest is to tweak the verifier to
>>> avoid in-place state recording. Then I think patch 8 for
>>> clone/unclone of the program won't be needed, since verifier
>>> will be read-only from bytecode point of view and patch 9
>>> also will be slightly cleaner.
>>> I think there are very few places in verifier that do this
>>> state keeping inside insn. It was bugging me for some time.
>>> Good time to clean that up.
>>> Unless I misunderstand the patches 7,8,9...
>>
>> Agreed, I think the verifier only modifies the program to
>> store pointer types in imm field.  I will try to come up
>> a way around this, any suggestions?  Perhaps state_equal()
>
> probably array_of_insn_aux_data[num_insns] should do it.
> Unlike reg_state that is forked on branches, this array
> is only one.

This would be for struct nfp_insn_meta, right? So, struct
bpf_ext_parser_ops could become:

static const struct bpf_ext_parser_ops nfp_bpf_pops = {
	.insn_hook = nfp_verify_insn,
	.insn_size = sizeof(struct nfp_insn_meta),
};

... where bpf_parse() would prealloc that f.e. in env->insn_meta[].

>> logic could be modified to downgrade pointers to UNKONWNs
>> when it detects other state had incompatible pointer type.
>>
>>> There is also small concern for patches 5 and 6 that add
>>> more register state information. Potentially that extra
>>> state can prevent states_equal() to recognize equivalent
>>> states. Only patch 9 uses that info, right?
>>
>> 5 and 6 recognize more constant loads, those can only
>> upgrade some UNKNOWN_VALUEs to CONST_IMMs.  So yes, if the
>> verifier hits the CONST first and then tries with UNKNOWN
>> it will have to reverify the path.

Agree, was also my concern when I read patch 5 and 6. It would
not only be related to types, but also different imm values,
where the memcmp() could fail on. Potentially the latter can be
avoided by only checking types which should be sufficient. Hmm,
maybe only bpf_parse() should go through this stricter mode since
only relevant for drivers (otoh downside would be that bugs
would end up less likely to be found).

>>> Another question is do you need all state walking that
>>> verifier does or single linear pass through insns
>>> would have worked?
>>> Looks like you're only using CONST_IMM and PTR_TO_CTX
>>> state, right?
>>
>> I think I need all the parsing.  Right now I mostly need
>> the verification to check that exit codes are specific
>> CONST_IMMs.  Clang quite happily does this:
>>
>> r0 <- 0
>> if (...)
>> 	r0 <- 1
>> exit
>
> I see. Indeed then you'd need the verifier to walk all paths
> to make sure constant return values.

I think this would still not cover the cases where you'd fetch
a return value/verdict from a map, but this should be ignored/
rejected for now, also since majority of programs are not written
in such a way.

> If you only need yes/no check then such info can probably be
> collected unconditionally during initial program load.
> Like prog->cb_access flag.

One other comment wrt the header, when you move these things
there, would be good to prefix with bpf_* so that this doesn't
clash in future with other header files.
Daniel Borkmann Aug. 29, 2016, 8:17 p.m. UTC | #5
On 08/29/2016 10:13 PM, Daniel Borkmann wrote:
> On 08/27/2016 07:32 PM, Alexei Starovoitov wrote:
>> On Sat, Aug 27, 2016 at 12:40:04PM +0100, Jakub Kicinski wrote:
>>> On Fri, 26 Aug 2016 16:29:05 -0700, Alexei Starovoitov wrote:
>>>> On Fri, Aug 26, 2016 at 07:06:06PM +0100, Jakub Kicinski wrote:
>>>>> Advanced JIT compilers and translators may want to use
>>>>> eBPF verifier as a base for parsers or to perform custom
>>>>> checks and validations.
>>>>>
>>>>> Add ability for external users to invoke the verifier
>>>>> and provide callbacks to be invoked for every intruction
>>>>> checked.  For now only add most basic callback for
>>>>> per-instruction pre-interpretation checks is added.  More
>>>>> advanced users may also like to have per-instruction post
>>>>> callback and state comparison callback.
>>>>>
>>>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>>>
>>>> I like the apporach. Making verifier into 'bytecode parser'
>>>> that JITs can reuse is a good design choice.
>
> +1
>
>>>> The only thing I would suggest is to tweak the verifier to
>>>> avoid in-place state recording. Then I think patch 8 for
>>>> clone/unclone of the program won't be needed, since verifier
>>>> will be read-only from bytecode point of view and patch 9
>>>> also will be slightly cleaner.
>>>> I think there are very few places in verifier that do this
>>>> state keeping inside insn. It was bugging me for some time.
>>>> Good time to clean that up.
>>>> Unless I misunderstand the patches 7,8,9...
>>>
>>> Agreed, I think the verifier only modifies the program to
>>> store pointer types in imm field.  I will try to come up
>>> a way around this, any suggestions?  Perhaps state_equal()
>>
>> probably array_of_insn_aux_data[num_insns] should do it.
>> Unlike reg_state that is forked on branches, this array
>> is only one.
>
> This would be for struct nfp_insn_meta, right? So, struct
> bpf_ext_parser_ops could become:
>
> static const struct bpf_ext_parser_ops nfp_bpf_pops = {
>      .insn_hook = nfp_verify_insn,
>      .insn_size = sizeof(struct nfp_insn_meta),
> };
>
> ... where bpf_parse() would prealloc that f.e. in env->insn_meta[].

(Well, actually everything can live in env->private_data.)

>>> logic could be modified to downgrade pointers to UNKONWNs
>>> when it detects other state had incompatible pointer type.
>>>
>>>> There is also small concern for patches 5 and 6 that add
>>>> more register state information. Potentially that extra
>>>> state can prevent states_equal() to recognize equivalent
>>>> states. Only patch 9 uses that info, right?
>>>
>>> 5 and 6 recognize more constant loads, those can only
>>> upgrade some UNKNOWN_VALUEs to CONST_IMMs.  So yes, if the
>>> verifier hits the CONST first and then tries with UNKNOWN
>>> it will have to reverify the path.
>
> Agree, was also my concern when I read patch 5 and 6. It would
> not only be related to types, but also different imm values,
> where the memcmp() could fail on. Potentially the latter can be
> avoided by only checking types which should be sufficient. Hmm,
> maybe only bpf_parse() should go through this stricter mode since
> only relevant for drivers (otoh downside would be that bugs
> would end up less likely to be found).
>
>>>> Another question is do you need all state walking that
>>>> verifier does or single linear pass through insns
>>>> would have worked?
>>>> Looks like you're only using CONST_IMM and PTR_TO_CTX
>>>> state, right?
>>>
>>> I think I need all the parsing.  Right now I mostly need
>>> the verification to check that exit codes are specific
>>> CONST_IMMs.  Clang quite happily does this:
>>>
>>> r0 <- 0
>>> if (...)
>>>     r0 <- 1
>>> exit
>>
>> I see. Indeed then you'd need the verifier to walk all paths
>> to make sure constant return values.
>
> I think this would still not cover the cases where you'd fetch
> a return value/verdict from a map, but this should be ignored/
> rejected for now, also since majority of programs are not written
> in such a way.
>
>> If you only need yes/no check then such info can probably be
>> collected unconditionally during initial program load.
>> Like prog->cb_access flag.
>
> One other comment wrt the header, when you move these things
> there, would be good to prefix with bpf_* so that this doesn't
> clash in future with other header files.
Jakub Kicinski Aug. 30, 2016, 10:48 a.m. UTC | #6
On Mon, 29 Aug 2016 22:17:10 +0200, Daniel Borkmann wrote:
> On 08/29/2016 10:13 PM, Daniel Borkmann wrote:
> > On 08/27/2016 07:32 PM, Alexei Starovoitov wrote:  
> >> On Sat, Aug 27, 2016 at 12:40:04PM +0100, Jakub Kicinski wrote:  
> >> probably array_of_insn_aux_data[num_insns] should do it.
> >> Unlike reg_state that is forked on branches, this array
> >> is only one.  
> >
> > This would be for struct nfp_insn_meta, right? So, struct
> > bpf_ext_parser_ops could become:
> >
> > static const struct bpf_ext_parser_ops nfp_bpf_pops = {
> >      .insn_hook = nfp_verify_insn,
> >      .insn_size = sizeof(struct nfp_insn_meta),
> > };
> >
> > ... where bpf_parse() would prealloc that f.e. in env->insn_meta[].  

Hm.. this is tempting, I will have to store the pointer type in
nfp_insn_meta soon, anyway.

> (Well, actually everything can live in env->private_data.)

We are discussing changing the place verifier keep its pointer type
annotation, I don't think we could put that in the private_data.

> > Agree, was also my concern when I read patch 5 and 6. It would
> > not only be related to types, but also different imm values,
> > where the memcmp() could fail on. Potentially the latter can be
> > avoided by only checking types which should be sufficient. Hmm,
> > maybe only bpf_parse() should go through this stricter mode since
> > only relevant for drivers (otoh downside would be that bugs
> > would end up less likely to be found).

I don't want only checking types because it would defeat my exit code
validation :)  I was thinking about doing a lazy evaluation -
registering branches to explored_states with UNKNOWN and only upgrading
to CONST when someone actually needed the imm value.  I'm not sure the
complexity would be justified, though.

Having two modes seems more straight forward and I think we would only
need to pay attention in the LD_IMM64 case, I don't think I've seen
LLVM generating XORs, it's just the cBPF -> eBPF conversion.

> >> I see. Indeed then you'd need the verifier to walk all paths
> >> to make sure constant return values.  
> >
> > I think this would still not cover the cases where you'd fetch
> > a return value/verdict from a map, but this should be ignored/
> > rejected for now, also since majority of programs are not written
> > in such a way.
> >  
> >> If you only need yes/no check then such info can probably be
> >> collected unconditionally during initial program load.
> >> Like prog->cb_access flag.  
> >
> > One other comment wrt the header, when you move these things
> > there, would be good to prefix with bpf_* so that this doesn't
> > clash in future with other header files.  

Good point!
Daniel Borkmann Aug. 30, 2016, 7:07 p.m. UTC | #7
On 08/30/2016 12:48 PM, Jakub Kicinski wrote:
> On Mon, 29 Aug 2016 22:17:10 +0200, Daniel Borkmann wrote:
>> On 08/29/2016 10:13 PM, Daniel Borkmann wrote:
>>> On 08/27/2016 07:32 PM, Alexei Starovoitov wrote:
>>>> On Sat, Aug 27, 2016 at 12:40:04PM +0100, Jakub Kicinski wrote:
>>>> probably array_of_insn_aux_data[num_insns] should do it.
>>>> Unlike reg_state that is forked on branches, this array
>>>> is only one.
>>>
>>> This would be for struct nfp_insn_meta, right? So, struct
>>> bpf_ext_parser_ops could become:
>>>
>>> static const struct bpf_ext_parser_ops nfp_bpf_pops = {
>>>       .insn_hook = nfp_verify_insn,
>>>       .insn_size = sizeof(struct nfp_insn_meta),
>>> };
>>>
>>> ... where bpf_parse() would prealloc that f.e. in env->insn_meta[].
>
> Hm.. this is tempting, I will have to store the pointer type in
> nfp_insn_meta soon, anyway.
>
>> (Well, actually everything can live in env->private_data.)
>
> We are discussing changing the place verifier keep its pointer type
> annotation, I don't think we could put that in the private_data.
>
>>> Agree, was also my concern when I read patch 5 and 6. It would
>>> not only be related to types, but also different imm values,
>>> where the memcmp() could fail on. Potentially the latter can be
>>> avoided by only checking types which should be sufficient. Hmm,
>>> maybe only bpf_parse() should go through this stricter mode since
>>> only relevant for drivers (otoh downside would be that bugs
>>> would end up less likely to be found).
>
> I don't want only checking types because it would defeat my exit code
> validation :)  I was thinking about doing a lazy evaluation -
> registering branches to explored_states with UNKNOWN and only upgrading
> to CONST when someone actually needed the imm value.  I'm not sure the
> complexity would be justified, though.
>
> Having two modes seems more straight forward and I think we would only
> need to pay attention in the LD_IMM64 case, I don't think I've seen
> LLVM generating XORs, it's just the cBPF -> eBPF conversion.

Okay, though, I think that the cBPF to eBPF migration wouldn't even
pass through the bpf_parse() handling, since verifier is not aware on
some of their aspects such as emitting calls directly (w/o *proto) or
arg mappings. Probably make sense to reject these (bpf_prog_was_classic())
if they cannot be handled anyway?

>>>> I see. Indeed then you'd need the verifier to walk all paths
>>>> to make sure constant return values.
>>>
>>> I think this would still not cover the cases where you'd fetch
>>> a return value/verdict from a map, but this should be ignored/
>>> rejected for now, also since majority of programs are not written
>>> in such a way.
>>>
>>>> If you only need yes/no check then such info can probably be
>>>> collected unconditionally during initial program load.
>>>> Like prog->cb_access flag.
>>>
>>> One other comment wrt the header, when you move these things
>>> there, would be good to prefix with bpf_* so that this doesn't
>>> clash in future with other header files.
>
> Good point!
Jakub Kicinski Aug. 30, 2016, 8:22 p.m. UTC | #8
On Tue, 30 Aug 2016 21:07:50 +0200, Daniel Borkmann wrote:
> > Having two modes seems more straight forward and I think we would only
> > need to pay attention in the LD_IMM64 case, I don't think I've seen
> > LLVM generating XORs, it's just the cBPF -> eBPF conversion.  
> 
> Okay, though, I think that the cBPF to eBPF migration wouldn't even
> pass through the bpf_parse() handling, since verifier is not aware on
> some of their aspects such as emitting calls directly (w/o *proto) or
> arg mappings. Probably make sense to reject these (bpf_prog_was_classic())
> if they cannot be handled anyway?

TBH again I only use cBPF for testing.  It's a convenient way of
generating certain instruction sequences.  I can probably just drop
it completely but the XOR patch is just 3 lines of code so not a huge
cost either...  I'll keep patch 6 in my tree for now.  

Alternatively - is there any eBPF assembler out there?  Something
converting verifier output back into ELF would be quite cool.
Alexei Starovoitov Aug. 30, 2016, 8:48 p.m. UTC | #9
On Tue, Aug 30, 2016 at 10:22:46PM +0200, Jakub Kicinski wrote:
> On Tue, 30 Aug 2016 21:07:50 +0200, Daniel Borkmann wrote:
> > > Having two modes seems more straight forward and I think we would only
> > > need to pay attention in the LD_IMM64 case, I don't think I've seen
> > > LLVM generating XORs, it's just the cBPF -> eBPF conversion.  
> > 
> > Okay, though, I think that the cBPF to eBPF migration wouldn't even
> > pass through the bpf_parse() handling, since verifier is not aware on
> > some of their aspects such as emitting calls directly (w/o *proto) or
> > arg mappings. Probably make sense to reject these (bpf_prog_was_classic())
> > if they cannot be handled anyway?
> 
> TBH again I only use cBPF for testing.  It's a convenient way of
> generating certain instruction sequences.  I can probably just drop
> it completely but the XOR patch is just 3 lines of code so not a huge
> cost either...  I'll keep patch 6 in my tree for now.  

if xor matching is only need for classic, I would drop that patch
just to avoid unnecessary state collection. The number of lines
is not a concern, but extra state for state prunning is.

> Alternatively - is there any eBPF assembler out there?  Something
> converting verifier output back into ELF would be quite cool.

would certainly be nice. I don't think there is anything standalone.
btw llvm can be made to work as assembler only, but simple flex/bison
is probably better.
Daniel Borkmann Aug. 30, 2016, 9 p.m. UTC | #10
On 08/30/2016 10:48 PM, Alexei Starovoitov wrote:
> On Tue, Aug 30, 2016 at 10:22:46PM +0200, Jakub Kicinski wrote:
>> On Tue, 30 Aug 2016 21:07:50 +0200, Daniel Borkmann wrote:
>>>> Having two modes seems more straight forward and I think we would only
>>>> need to pay attention in the LD_IMM64 case, I don't think I've seen
>>>> LLVM generating XORs, it's just the cBPF -> eBPF conversion.
>>>
>>> Okay, though, I think that the cBPF to eBPF migration wouldn't even
>>> pass through the bpf_parse() handling, since verifier is not aware on
>>> some of their aspects such as emitting calls directly (w/o *proto) or
>>> arg mappings. Probably make sense to reject these (bpf_prog_was_classic())
>>> if they cannot be handled anyway?
>>
>> TBH again I only use cBPF for testing.  It's a convenient way of
>> generating certain instruction sequences.  I can probably just drop
>> it completely but the XOR patch is just 3 lines of code so not a huge
>> cost either...  I'll keep patch 6 in my tree for now.
>
> if xor matching is only need for classic, I would drop that patch
> just to avoid unnecessary state collection. The number of lines
> is not a concern, but extra state for state prunning is.
>
>> Alternatively - is there any eBPF assembler out there?  Something
>> converting verifier output back into ELF would be quite cool.
>
> would certainly be nice. I don't think there is anything standalone.
> btw llvm can be made to work as assembler only, but simple flex/bison
> is probably better.

Never tried it out, but seems llvm backend doesn't have asm parser
implemented?

   $ clang -target bpf -O2 -c foo.c -S -o foo.S
   $ llvm-mc -arch bpf foo.S -filetype=obj -o foo.o
   llvm-mc: error: this target does not support assembly parsing.

LLVM IR might work, but maybe too high level(?); alternatively, we could
make bpf_asm from tools/net/ eBPF aware for debugging purposes. If you
have a toolchain supporting libbfd et al, you could probably make use
of bpf_jit_dump() (like JITs do) and then bpf_jit_disasm tool (from
same dir as bpf_asm).
Alexei Starovoitov Aug. 31, 2016, 1:18 a.m. UTC | #11
On Tue, Aug 30, 2016 at 11:00:38PM +0200, Daniel Borkmann wrote:
> On 08/30/2016 10:48 PM, Alexei Starovoitov wrote:
> >On Tue, Aug 30, 2016 at 10:22:46PM +0200, Jakub Kicinski wrote:
> >>On Tue, 30 Aug 2016 21:07:50 +0200, Daniel Borkmann wrote:
> >>>>Having two modes seems more straight forward and I think we would only
> >>>>need to pay attention in the LD_IMM64 case, I don't think I've seen
> >>>>LLVM generating XORs, it's just the cBPF -> eBPF conversion.
> >>>
> >>>Okay, though, I think that the cBPF to eBPF migration wouldn't even
> >>>pass through the bpf_parse() handling, since verifier is not aware on
> >>>some of their aspects such as emitting calls directly (w/o *proto) or
> >>>arg mappings. Probably make sense to reject these (bpf_prog_was_classic())
> >>>if they cannot be handled anyway?
> >>
> >>TBH again I only use cBPF for testing.  It's a convenient way of
> >>generating certain instruction sequences.  I can probably just drop
> >>it completely but the XOR patch is just 3 lines of code so not a huge
> >>cost either...  I'll keep patch 6 in my tree for now.
> >
> >if xor matching is only need for classic, I would drop that patch
> >just to avoid unnecessary state collection. The number of lines
> >is not a concern, but extra state for state prunning is.
> >
> >>Alternatively - is there any eBPF assembler out there?  Something
> >>converting verifier output back into ELF would be quite cool.
> >
> >would certainly be nice. I don't think there is anything standalone.
> >btw llvm can be made to work as assembler only, but simple flex/bison
> >is probably better.
> 
> Never tried it out, but seems llvm backend doesn't have asm parser
> implemented?
> 
>   $ clang -target bpf -O2 -c foo.c -S -o foo.S
>   $ llvm-mc -arch bpf foo.S -filetype=obj -o foo.o
>   llvm-mc: error: this target does not support assembly parsing.
> 
> LLVM IR might work, but maybe too high level(?); alternatively, we could
> make bpf_asm from tools/net/ eBPF aware for debugging purposes. If you
> have a toolchain supporting libbfd et al, you could probably make use
> of bpf_jit_dump() (like JITs do) and then bpf_jit_disasm tool (from
> same dir as bpf_asm).

yes. llvm-based bpf asm is not complete. It's straightforward to add though.
It won't be going through IR. Only 'mc' (machine instruciton) layer.
diff mbox

Patch

diff --git a/include/linux/bpf_parser.h b/include/linux/bpf_parser.h
new file mode 100644
index 000000000000..1b73cc464914
--- /dev/null
+++ b/include/linux/bpf_parser.h
@@ -0,0 +1,84 @@ 
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#ifndef _LINUX_BPF_PARSER_H
+#define _LINUX_BPF_PARSER_H 1
+
+#include <linux/bpf.h> /* for enum bpf_reg_type */
+#include <linux/filter.h> /* for MAX_BPF_STACK */
+
+struct reg_state {
+	enum bpf_reg_type type;
+	union {
+		/* valid when type == CONST_IMM | PTR_TO_STACK | UNKNOWN_VALUE */
+		s64 imm;
+
+		/* valid when type == PTR_TO_PACKET* */
+		struct {
+			u32 id;
+			u16 off;
+			u16 range;
+		};
+
+		/* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
+		 *   PTR_TO_MAP_VALUE_OR_NULL
+		 */
+		struct bpf_map *map_ptr;
+	};
+};
+
+enum bpf_stack_slot_type {
+	STACK_INVALID,    /* nothing was stored in this stack slot */
+	STACK_SPILL,      /* register spilled into stack */
+	STACK_MISC	  /* BPF program wrote some data into this slot */
+};
+
+#define BPF_REG_SIZE 8	/* size of eBPF register in bytes */
+
+/* state of the program:
+ * type of all registers and stack info
+ */
+struct verifier_state {
+	struct reg_state regs[MAX_BPF_REG];
+	u8 stack_slot_type[MAX_BPF_STACK];
+	struct reg_state spilled_regs[MAX_BPF_STACK / BPF_REG_SIZE];
+};
+
+/* linked list of verifier states used to prune search */
+struct verifier_state_list {
+	struct verifier_state state;
+	struct verifier_state_list *next;
+};
+
+#define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
+
+struct verifier_env;
+struct bpf_ext_parser_ops {
+	int (*insn_hook)(struct verifier_env *env,
+			 int insn_idx, int prev_insn_idx);
+};
+
+/* single container for all structs
+ * one verifier_env per bpf_check() call
+ */
+struct verifier_env {
+	struct bpf_prog *prog;		/* eBPF program being verified */
+	struct verifier_stack_elem *head; /* stack of verifier states to be processed */
+	int stack_size;			/* number of states to be processed */
+	struct verifier_state cur_state; /* current verifier state */
+	struct verifier_state_list **explored_states; /* search pruning optimization */
+	const struct bpf_ext_parser_ops *pops; /* external parser ops */
+	void *ppriv; /* pointer to external parser's private data */
+	struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
+	u32 used_map_cnt;		/* number of used maps */
+	u32 id_gen;			/* used to generate unique reg IDs */
+	bool allow_ptr_leaks;
+};
+
+int bpf_parse(struct bpf_prog *prog, const struct bpf_ext_parser_ops *pops,
+	      void *ppriv);
+
+#endif /* _LINUX_BPF_PARSER_H */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0f4494c194f9..e91faad7d2b2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -14,6 +14,7 @@ 
 #include <linux/types.h>
 #include <linux/slab.h>
 #include <linux/bpf.h>
+#include <linux/bpf_parser.h>
 #include <linux/filter.h>
 #include <net/netlink.h>
 #include <linux/file.h>
@@ -126,49 +127,6 @@ 
  * are set to NOT_INIT to indicate that they are no longer readable.
  */
 
-struct reg_state {
-	enum bpf_reg_type type;
-	union {
-		/* valid when type == CONST_IMM | PTR_TO_STACK | UNKNOWN_VALUE */
-		s64 imm;
-
-		/* valid when type == PTR_TO_PACKET* */
-		struct {
-			u32 id;
-			u16 off;
-			u16 range;
-		};
-
-		/* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
-		 *   PTR_TO_MAP_VALUE_OR_NULL
-		 */
-		struct bpf_map *map_ptr;
-	};
-};
-
-enum bpf_stack_slot_type {
-	STACK_INVALID,    /* nothing was stored in this stack slot */
-	STACK_SPILL,      /* register spilled into stack */
-	STACK_MISC	  /* BPF program wrote some data into this slot */
-};
-
-#define BPF_REG_SIZE 8	/* size of eBPF register in bytes */
-
-/* state of the program:
- * type of all registers and stack info
- */
-struct verifier_state {
-	struct reg_state regs[MAX_BPF_REG];
-	u8 stack_slot_type[MAX_BPF_STACK];
-	struct reg_state spilled_regs[MAX_BPF_STACK / BPF_REG_SIZE];
-};
-
-/* linked list of verifier states used to prune search */
-struct verifier_state_list {
-	struct verifier_state state;
-	struct verifier_state_list *next;
-};
-
 /* verifier_state + insn_idx are pushed to stack when branch is encountered */
 struct verifier_stack_elem {
 	/* verifer state is 'st'
@@ -181,23 +139,6 @@  struct verifier_stack_elem {
 	struct verifier_stack_elem *next;
 };
 
-#define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
-
-/* single container for all structs
- * one verifier_env per bpf_check() call
- */
-struct verifier_env {
-	struct bpf_prog *prog;		/* eBPF program being verified */
-	struct verifier_stack_elem *head; /* stack of verifier states to be processed */
-	int stack_size;			/* number of states to be processed */
-	struct verifier_state cur_state; /* current verifier state */
-	struct verifier_state_list **explored_states; /* search pruning optimization */
-	struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by eBPF program */
-	u32 used_map_cnt;		/* number of used maps */
-	u32 id_gen;			/* used to generate unique reg IDs */
-	bool allow_ptr_leaks;
-};
-
 #define BPF_COMPLEXITY_LIMIT_INSNS	65536
 #define BPF_COMPLEXITY_LIMIT_STACK	1024
 
@@ -683,6 +624,10 @@  static int check_packet_access(struct verifier_env *env, u32 regno, int off,
 static int check_ctx_access(struct verifier_env *env, int off, int size,
 			    enum bpf_access_type t, enum bpf_reg_type *reg_type)
 {
+	/* for parser ctx accesses are already validated and converted */
+	if (env->pops)
+		return 0;
+
 	if (env->prog->aux->ops->is_valid_access &&
 	    env->prog->aux->ops->is_valid_access(off, size, t, reg_type)) {
 		/* remember the offset of last byte accessed in ctx */
@@ -2256,6 +2201,15 @@  static int is_state_visited(struct verifier_env *env, int insn_idx)
 	return 0;
 }
 
+static int ext_parser_hook(struct verifier_env *env,
+			   int insn_idx, int prev_insn_idx)
+{
+	if (!env->pops || !env->pops->insn_hook)
+		return 0;
+
+	return env->pops->insn_hook(env, insn_idx, prev_insn_idx);
+}
+
 static int do_check(struct verifier_env *env)
 {
 	struct verifier_state *state = &env->cur_state;
@@ -2314,6 +2268,10 @@  static int do_check(struct verifier_env *env)
 			print_bpf_insn(insn);
 		}
 
+		err = ext_parser_hook(env, insn_idx, prev_insn_idx);
+		if (err)
+			return err;
+
 		if (class == BPF_ALU || class == BPF_ALU64) {
 			err = check_alu_op(env, insn);
 			if (err)
@@ -2832,3 +2790,47 @@  free_env:
 	mutex_unlock(&bpf_verifier_lock);
 	return ret;
 }
+
+int bpf_parse(struct bpf_prog *prog, const struct bpf_ext_parser_ops *pops,
+	      void *ppriv)
+{
+	struct verifier_env *env;
+	int ret;
+
+	env = kzalloc(sizeof(struct verifier_env), GFP_KERNEL);
+	if (!env)
+		return -ENOMEM;
+
+	env->prog = prog;
+	env->pops = pops;
+	env->ppriv = ppriv;
+
+	/* grab the mutex to protect few globals used by verifier */
+	mutex_lock(&bpf_verifier_lock);
+
+	log_level = 0;
+
+	env->explored_states = kcalloc(env->prog->len,
+				       sizeof(struct verifier_state_list *),
+				       GFP_KERNEL);
+	ret = -ENOMEM;
+	if (!env->explored_states)
+		goto skip_full_check;
+
+	ret = check_cfg(env);
+	if (ret < 0)
+		goto skip_full_check;
+
+	env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
+
+	ret = do_check(env);
+
+skip_full_check:
+	while (pop_stack(env, NULL) >= 0);
+	free_states(env);
+
+	kfree(env);
+	mutex_unlock(&bpf_verifier_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(bpf_parse);