diff mbox

[net-next,5/5] net: filter: optimize BPF migration for ARG1/CTX handling

Message ID 1398286621-3591-6-git-send-email-dborkman@redhat.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Daniel Borkmann April 23, 2014, 8:57 p.m. UTC
Currently, at initial setup in __sk_run_filter() we initialize the
BPF stack's frame-pointer and CTX register. However, instead of the
CTX register, we initialize context to ARG1, and during user filter
migration we emit *always* an instruction that copies the content
from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
ctx, A, X for call emission. However, we nevertheless copy CTX over
to ARG1 in these cases. So all in all, we always emit one instruction
at BPF program beginning that should have actually been avoided to
spare this overhead.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
---
 net/core/filter.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

Comments

Alexei Starovoitov April 23, 2014, 9:59 p.m. UTC | #1
On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
> Currently, at initial setup in __sk_run_filter() we initialize the
> BPF stack's frame-pointer and CTX register. However, instead of the
> CTX register, we initialize context to ARG1, and during user filter
> migration we emit *always* an instruction that copies the content
> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
> ctx, A, X for call emission. However, we nevertheless copy CTX over
> to ARG1 in these cases. So all in all, we always emit one instruction
> at BPF program beginning that should have actually been avoided to
> spare this overhead.
>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> Cc: Alexei Starovoitov <ast@plumgrid.com>

First 4 patches look great, but this one I have to disagree.
See below.

> ---
>  net/core/filter.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index eada3d5..6fed231 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -62,7 +62,6 @@
>  #define A      regs[insn->a_reg]
>  #define X      regs[insn->x_reg]
>  #define FP     regs[BPF_REG_FP]
> -#define ARG1   regs[BPF_REG_ARG1]
>  #define CTX    regs[BPF_REG_CTX]
>  #define K      insn->imm
>
> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
>  #define CONT_JMP ({ insn++; goto select_insn; })
>
>         FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
> -       ARG1 = (u64) (unsigned long) ctx;
> +       CTX = (u64) (unsigned long) ctx;

R1 (ARG1) is the register that used to pass first argument to the function.
For seamless kernel->bpf->kernel transition we have to follow calling
convention, so 'void *ctx' has to go into R1 by design.
Storing it into R6 (CTX) will only work for classic filters converted
to extended.
all native ebpf filters will be broken.
In documentation we say:
    * R1 - R5   - arguments from BPF program to in-kernel function
so llvm/gcc are following this ebpf ABI.
Calling convention is the same whether to call kernel function from ebpf
or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.

By convention ld_abs/ld_ind insns are using implicit input register 'ctx' (R6),
that's why we do a copy from R1 into R6 as a first insn of the
_converted_ filter.
The cost of one insn is not zero, yes, but cost of imbalanced ABI is huge.
For example, I don't know how to teach llvm/gcc to have different ABIs
in such cases.

This dual ABI makes JITs ugly too, since now they must remember to
emit R6=R1 copy. I would like JITs to worry about instructions only
and not to deal with such discrepancies.

The whole thing becomes unclean when kernel to ebpf passes
1st argument in R6, but ebpf to kernel passes 1st argument in R1.

So I would suggest to drop this one for now.

I also want to avoid extra generated R6=R1 for converted filters.
Let's think it through first.

>         /* Register for user BPF programs need to be reset first. */
>         regs[BPF_REG_A] = 0;
> @@ -896,13 +895,6 @@ do_pass:
>         new_insn = new_prog;
>         fp = prog;
>
> -       if (new_insn) {
> -               new_insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
> -               new_insn->a_reg = BPF_REG_CTX;
> -               new_insn->x_reg = BPF_REG_ARG1;
> -       }
> -       new_insn++;
> -
>         for (i = 0; i < len; fp++, i++) {
>                 struct sock_filter_int tmp_insns[6] = { };
>                 struct sock_filter_int *insn = tmp_insns;
> --
> 1.7.11.7
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 23, 2014, 10:21 p.m. UTC | #2
On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> Currently, at initial setup in __sk_run_filter() we initialize the
>> BPF stack's frame-pointer and CTX register. However, instead of the
>> CTX register, we initialize context to ARG1, and during user filter
>> migration we emit *always* an instruction that copies the content
>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>> to ARG1 in these cases. So all in all, we always emit one instruction
>> at BPF program beginning that should have actually been avoided to
>> spare this overhead.
>>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>
> First 4 patches look great, but this one I have to disagree.
> See below.
>
>> ---
>>   net/core/filter.c | 10 +---------
>>   1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index eada3d5..6fed231 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -62,7 +62,6 @@
>>   #define A      regs[insn->a_reg]
>>   #define X      regs[insn->x_reg]
>>   #define FP     regs[BPF_REG_FP]
>> -#define ARG1   regs[BPF_REG_ARG1]
>>   #define CTX    regs[BPF_REG_CTX]
>>   #define K      insn->imm
>>
>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
>>   #define CONT_JMP ({ insn++; goto select_insn; })
>>
>>          FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>> -       ARG1 = (u64) (unsigned long) ctx;
>> +       CTX = (u64) (unsigned long) ctx;
>
> R1 (ARG1) is the register that used to pass first argument to the function.

Yes that's clear. Which is why f.e. in convert_bpf_extensions() we currently
copy ctx over to arg1 for calls, i.e.:

   /* arg1 = ctx */
   insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
   insn->a_reg = ARG1_REG;
   insn->x_reg = CTX_REG;
   insn++;

> For seamless kernel->bpf->kernel transition we have to follow calling
> convention, so 'void *ctx' has to go into R1 by design.
> Storing it into R6 (CTX) will only work for classic filters converted
> to extended.
> all native ebpf filters will be broken.

My objection was that currently, we do _not_ have _any_ users or even kernel
API for _native_ filters, at least not in mainline tree. The _main_ users we
have are currently _all_ being converted, hence this patch. Given that these
calls have likely just a minority of use cases triggered by tcpdump et al,
the majority of users still need to do this overhead/additional work.

> In documentation we say:
>      * R1 - R5   - arguments from BPF program to in-kernel function
> so llvm/gcc are following this ebpf ABI.
> Calling convention is the same whether to call kernel function from ebpf
> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.

Yes, that's clear and convert_bpf_extensions() is doing that. So far we're
not using llvm/gcc backend here and have the internal instruction set not
exposed to user space, but even there you would need to prepare R1 - R5 to
hand-over arguments for the BPF_CALL insns, so why can't we load CTX into R1
at that time just as we do with convert_bpf_extensions()?

> By convention ld_abs/ld_ind insns are using implicit input register 'ctx' (R6),
> that's why we do a copy from R1 into R6 as a first insn of the
> _converted_ filter.

Yep, and R6 stays as is here. So ld_abs/ld_ind insns are correctly using 'ctx'.

Best,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 23, 2014, 10:37 p.m. UTC | #3
On Wed, Apr 23, 2014 at 3:21 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
>>
>> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat.com>
>> wrote:
>>>
>>> Currently, at initial setup in __sk_run_filter() we initialize the
>>> BPF stack's frame-pointer and CTX register. However, instead of the
>>> CTX register, we initialize context to ARG1, and during user filter
>>> migration we emit *always* an instruction that copies the content
>>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>>> to ARG1 in these cases. So all in all, we always emit one instruction
>>> at BPF program beginning that should have actually been avoided to
>>> spare this overhead.
>>>
>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>
>>
>> First 4 patches look great, but this one I have to disagree.
>> See below.
>>
>>> ---
>>>   net/core/filter.c | 10 +---------
>>>   1 file changed, 1 insertion(+), 9 deletions(-)
>>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index eada3d5..6fed231 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -62,7 +62,6 @@
>>>   #define A      regs[insn->a_reg]
>>>   #define X      regs[insn->x_reg]
>>>   #define FP     regs[BPF_REG_FP]
>>> -#define ARG1   regs[BPF_REG_ARG1]
>>>   #define CTX    regs[BPF_REG_CTX]
>>>   #define K      insn->imm
>>>
>>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const struct
>>> sock_filter_int *insn)
>>>   #define CONT_JMP ({ insn++; goto select_insn; })
>>>
>>>          FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>>> -       ARG1 = (u64) (unsigned long) ctx;
>>> +       CTX = (u64) (unsigned long) ctx;
>>
>>
>> R1 (ARG1) is the register that used to pass first argument to the
>> function.
>
>
> Yes that's clear. Which is why f.e. in convert_bpf_extensions() we currently
> copy ctx over to arg1 for calls, i.e.:
>
>   /* arg1 = ctx */
>
>   insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
>   insn->a_reg = ARG1_REG;
>   insn->x_reg = CTX_REG;
>   insn++;
>
>
>> For seamless kernel->bpf->kernel transition we have to follow calling
>> convention, so 'void *ctx' has to go into R1 by design.
>> Storing it into R6 (CTX) will only work for classic filters converted
>> to extended.
>> all native ebpf filters will be broken.
>
>
> My objection was that currently, we do _not_ have _any_ users or even kernel
> API for _native_ filters, at least not in mainline tree. The _main_ users we
> have are currently _all_ being converted, hence this patch. Given that these
> calls have likely just a minority of use cases triggered by tcpdump et al,
> the majority of users still need to do this overhead/additional work.
>
>
>> In documentation we say:
>>      * R1 - R5   - arguments from BPF program to in-kernel function
>> so llvm/gcc are following this ebpf ABI.
>> Calling convention is the same whether to call kernel function from ebpf
>> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.
>
>
> Yes, that's clear and convert_bpf_extensions() is doing that. So far we're
> not using llvm/gcc backend here and have the internal instruction set not
> exposed to user space, but even there you would need to prepare R1 - R5 to
> hand-over arguments for the BPF_CALL insns, so why can't we load CTX into R1
> at that time just as we do with convert_bpf_extensions()?

How about then removing extra generated R6=R1 from converted and do
both in __sk_run_filter() ?
        regs[ARG1_REG] = (u64) (unsigned long) ctx;
        regs[CTX_REG] = (u64) (unsigned long) ctx;

Overhead problem will be solved and ABI is still clean.

>> By convention ld_abs/ld_ind insns are using implicit input register 'ctx'
>> (R6),
>> that's why we do a copy from R1 into R6 as a first insn of the
>> _converted_ filter.
>
>
> Yep, and R6 stays as is here. So ld_abs/ld_ind insns are correctly using
> 'ctx'.
>
> Best,
>
> Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 23, 2014, 10:57 p.m. UTC | #4
On 04/24/2014 12:37 AM, Alexei Starovoitov wrote:
> On Wed, Apr 23, 2014 at 3:21 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
>>> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat.com>
>>> wrote:
>>>>
>>>> Currently, at initial setup in __sk_run_filter() we initialize the
>>>> BPF stack's frame-pointer and CTX register. However, instead of the
>>>> CTX register, we initialize context to ARG1, and during user filter
>>>> migration we emit *always* an instruction that copies the content
>>>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>>>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>>>> to ARG1 in these cases. So all in all, we always emit one instruction
>>>> at BPF program beginning that should have actually been avoided to
>>>> spare this overhead.
>>>>
>>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>>
>>> First 4 patches look great, but this one I have to disagree.
>>> See below.
>>>
>>>> ---
>>>>    net/core/filter.c | 10 +---------
>>>>    1 file changed, 1 insertion(+), 9 deletions(-)
>>>>
>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>> index eada3d5..6fed231 100644
>>>> --- a/net/core/filter.c
>>>> +++ b/net/core/filter.c
>>>> @@ -62,7 +62,6 @@
>>>>    #define A      regs[insn->a_reg]
>>>>    #define X      regs[insn->x_reg]
>>>>    #define FP     regs[BPF_REG_FP]
>>>> -#define ARG1   regs[BPF_REG_ARG1]
>>>>    #define CTX    regs[BPF_REG_CTX]
>>>>    #define K      insn->imm
>>>>
>>>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const struct
>>>> sock_filter_int *insn)
>>>>    #define CONT_JMP ({ insn++; goto select_insn; })
>>>>
>>>>           FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>>>> -       ARG1 = (u64) (unsigned long) ctx;
>>>> +       CTX = (u64) (unsigned long) ctx;
>>>
>>>
>>> R1 (ARG1) is the register that used to pass first argument to the
>>> function.
>>
>> Yes that's clear. Which is why f.e. in convert_bpf_extensions() we currently
>> copy ctx over to arg1 for calls, i.e.:
>>
>>    /* arg1 = ctx */
>>
>>    insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
>>    insn->a_reg = ARG1_REG;
>>    insn->x_reg = CTX_REG;
>>    insn++;
>>
>>> For seamless kernel->bpf->kernel transition we have to follow calling
>>> convention, so 'void *ctx' has to go into R1 by design.
>>> Storing it into R6 (CTX) will only work for classic filters converted
>>> to extended.
>>> all native ebpf filters will be broken.
>>
>> My objection was that currently, we do _not_ have _any_ users or even kernel
>> API for _native_ filters, at least not in mainline tree. The _main_ users we
>> have are currently _all_ being converted, hence this patch. Given that these
>> calls have likely just a minority of use cases triggered by tcpdump et al,
>> the majority of users still need to do this overhead/additional work.
>>
>>> In documentation we say:
>>>       * R1 - R5   - arguments from BPF program to in-kernel function
>>> so llvm/gcc are following this ebpf ABI.
>>> Calling convention is the same whether to call kernel function from ebpf
>>> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.
>>
>> Yes, that's clear and convert_bpf_extensions() is doing that. So far we're
>> not using llvm/gcc backend here and have the internal instruction set not
>> exposed to user space, but even there you would need to prepare R1 - R5 to
>> hand-over arguments for the BPF_CALL insns, so why can't we load CTX into R1
>> at that time just as we do with convert_bpf_extensions()?
>
> How about then removing extra generated R6=R1 from converted and do
> both in __sk_run_filter() ?
>          regs[ARG1_REG] = (u64) (unsigned long) ctx;
>          regs[CTX_REG] = (u64) (unsigned long) ctx;
>
> Overhead problem will be solved and ABI is still clean.

Well, I just don't understand the concerns of ABI here. Given that we do not
have any native BPF code to maintain in the kernel tree and given that we
currently do not expose the instruction set to user space, we're free to do
what we want, no? Eventually what's in mainline kernel is that dictates an
ABI, so far we have it only in kernel space and the only current user of the
ABI is the conversion function from user BPF programs. Given that helper
function calls may happen less often than executing instructions from the
rest of the code, why can't your llvm/gcc backend emit the load of ctx into
arg1? JITs don't need to treat that differently in any way, imho. Simply
the one who is generating the BPF insns needs to emit ctx into arg1 just as
he prepares the other argX registers. Btw, since we know a priori that
__skb_get_pay_offset() and __get_raw_cpu_id() are not making use of all
prepared registers, we could even go that far and avoid preparing loads for
the two.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov April 23, 2014, 11:14 p.m. UTC | #5
On Wed, Apr 23, 2014 at 3:57 PM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 04/24/2014 12:37 AM, Alexei Starovoitov wrote:
>>
>> On Wed, Apr 23, 2014 at 3:21 PM, Daniel Borkmann <dborkman@redhat.com>
>> wrote:
>>>
>>> On 04/23/2014 11:59 PM, Alexei Starovoitov wrote:
>>>>
>>>> On Wed, Apr 23, 2014 at 1:57 PM, Daniel Borkmann <dborkman@redhat.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Currently, at initial setup in __sk_run_filter() we initialize the
>>>>> BPF stack's frame-pointer and CTX register. However, instead of the
>>>>> CTX register, we initialize context to ARG1, and during user filter
>>>>> migration we emit *always* an instruction that copies the content
>>>>> from ARG1 to CTX. ARG1 is needed in BPF_CALL instructions to setup
>>>>> ctx, A, X for call emission. However, we nevertheless copy CTX over
>>>>> to ARG1 in these cases. So all in all, we always emit one instruction
>>>>> at BPF program beginning that should have actually been avoided to
>>>>> spare this overhead.
>>>>>
>>>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>>>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>>>>
>>>>
>>>> First 4 patches look great, but this one I have to disagree.
>>>> See below.
>>>>
>>>>> ---
>>>>>    net/core/filter.c | 10 +---------
>>>>>    1 file changed, 1 insertion(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>> index eada3d5..6fed231 100644
>>>>> --- a/net/core/filter.c
>>>>> +++ b/net/core/filter.c
>>>>> @@ -62,7 +62,6 @@
>>>>>    #define A      regs[insn->a_reg]
>>>>>    #define X      regs[insn->x_reg]
>>>>>    #define FP     regs[BPF_REG_FP]
>>>>> -#define ARG1   regs[BPF_REG_ARG1]
>>>>>    #define CTX    regs[BPF_REG_CTX]
>>>>>    #define K      insn->imm
>>>>>
>>>>> @@ -257,7 +256,7 @@ unsigned int __sk_run_filter(void *ctx, const
>>>>> struct
>>>>> sock_filter_int *insn)
>>>>>    #define CONT_JMP ({ insn++; goto select_insn; })
>>>>>
>>>>>           FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
>>>>> -       ARG1 = (u64) (unsigned long) ctx;
>>>>> +       CTX = (u64) (unsigned long) ctx;
>>>>
>>>>
>>>>
>>>> R1 (ARG1) is the register that used to pass first argument to the
>>>> function.
>>>
>>>
>>> Yes that's clear. Which is why f.e. in convert_bpf_extensions() we
>>> currently
>>> copy ctx over to arg1 for calls, i.e.:
>>>
>>>    /* arg1 = ctx */
>>>
>>>    insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
>>>    insn->a_reg = ARG1_REG;
>>>    insn->x_reg = CTX_REG;
>>>    insn++;
>>>
>>>> For seamless kernel->bpf->kernel transition we have to follow calling
>>>> convention, so 'void *ctx' has to go into R1 by design.
>>>> Storing it into R6 (CTX) will only work for classic filters converted
>>>> to extended.
>>>> all native ebpf filters will be broken.
>>>
>>>
>>> My objection was that currently, we do _not_ have _any_ users or even
>>> kernel
>>> API for _native_ filters, at least not in mainline tree. The _main_ users
>>> we
>>> have are currently _all_ being converted, hence this patch. Given that
>>> these
>>> calls have likely just a minority of use cases triggered by tcpdump et
>>> al,
>>> the majority of users still need to do this overhead/additional work.
>>>
>>>> In documentation we say:
>>>>       * R1 - R5   - arguments from BPF program to in-kernel function
>>>> so llvm/gcc are following this ebpf ABI.
>>>> Calling convention is the same whether to call kernel function from ebpf
>>>> or ebpf from kernel. So 1st argument (void *ctx) has to go into R1.
>>>
>>>
>>> Yes, that's clear and convert_bpf_extensions() is doing that. So far
>>> we're
>>> not using llvm/gcc backend here and have the internal instruction set not
>>> exposed to user space, but even there you would need to prepare R1 - R5
>>> to
>>> hand-over arguments for the BPF_CALL insns, so why can't we load CTX into
>>> R1
>>> at that time just as we do with convert_bpf_extensions()?
>>
>>
>> How about then removing extra generated R6=R1 from converted and do
>> both in __sk_run_filter() ?
>>          regs[ARG1_REG] = (u64) (unsigned long) ctx;
>>          regs[CTX_REG] = (u64) (unsigned long) ctx;
>>
>> Overhead problem will be solved and ABI is still clean.
>
>
> Well, I just don't understand the concerns of ABI here. Given that we do not
> have any native BPF code to maintain in the kernel tree and given that we
> currently do not expose the instruction set to user space, we're free to do
> what we want, no? Eventually what's in mainline kernel is that dictates an
> ABI, so far we have it only in kernel space and the only current user of the
> ABI is the conversion function from user BPF programs. Given that helper

well, all the patches for llvm and the rest were posted.
Obviously they cannot go in at once, but breaking this ABI right now
makes it impossible to continue working on them.
I want to make ebpf engine to be useful for tracing filters and so on.
Are you saying forget it, this is classic bpf engine only?

> function calls may happen less often than executing instructions from the
> rest of the code, why can't your llvm/gcc backend emit the load of ctx into
> arg1? JITs don't need to treat that differently in any way, imho. Simply

I don't think we're on the same page here. compiler at the end is just
a piece of sw and can do everything, but imbalanced ebpf ABI is really
out of normal compiler work flow.
Pretty much compiler somehow need to generate one way of passing
arguments into a function, but inside the function it needs to expect
them in different registers?! That is practically impossible to do
in normal gcc/llvm.
Even without any compiler insight.
If you insist on ctx to be in R6 only, then please kill the whole
concept of ABI in the doc. It doesn't make sense to pass a value
into a function in R1, but the function will see in R6?!

> the one who is generating the BPF insns needs to emit ctx into arg1 just as
> he prepares the other argX registers. Btw, since we know a priori that
> __skb_get_pay_offset() and __get_raw_cpu_id() are not making use of all
> prepared registers, we could even go that far and avoid preparing loads for
> the two.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann April 24, 2014, 5:56 a.m. UTC | #6
On 04/24/2014 01:14 AM, Alexei Starovoitov wrote:
...
>> Well, I just don't understand the concerns of ABI here. Given that we do not
>> have any native BPF code to maintain in the kernel tree and given that we
>> currently do not expose the instruction set to user space, we're free to do
>> what we want, no? Eventually what's in mainline kernel is that dictates an
>> ABI, so far we have it only in kernel space and the only current user of the
>> ABI is the conversion function from user BPF programs. Given that helper
>
> well, all the patches for llvm and the rest were posted.
> Obviously they cannot go in at once, but breaking this ABI right now
> makes it impossible to continue working on them.
> I want to make ebpf engine to be useful for tracing filters and so on.
> Are you saying forget it, this is classic bpf engine only?

Nope, I'm not saying that at all! As tracing has something similar, yes,
I already stated that it would probably be a good idea, i.e. when we could
JIT compile it.

>> function calls may happen less often than executing instructions from the
>> rest of the code, why can't your llvm/gcc backend emit the load of ctx into
>> arg1? JITs don't need to treat that differently in any way, imho. Simply
>
> I don't think we're on the same page here. compiler at the end is just
> a piece of sw and can do everything, but imbalanced ebpf ABI is really
> out of normal compiler work flow.
> Pretty much compiler somehow need to generate one way of passing
> arguments into a function, but inside the function it needs to expect
> them in different registers?! That is practically impossible to do
> in normal gcc/llvm.
> Even without any compiler insight.
> If you insist on ctx to be in R6 only, then please kill the whole
> concept of ABI in the doc. It doesn't make sense to pass a value
> into a function in R1, but the function will see in R6?!

Sorry, I don't think that this is what I claimed. I don't have a strong
opinion on this particular patch, just wanted to have clarified what we
have with ABI here as there's no particular freeze from the code we have
in the kernel. Anyway, I'll change it into the other variant and respin
the set, as that will already spare us a bit unnecessary of work in
fast-path.

>> the one who is generating the BPF insns needs to emit ctx into arg1 just as
>> he prepares the other argX registers. Btw, since we know a priori that
>> __skb_get_pay_offset() and __get_raw_cpu_id() are not making use of all
>> prepared registers, we could even go that far and avoid preparing loads for
>> the two.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/filter.c b/net/core/filter.c
index eada3d5..6fed231 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -62,7 +62,6 @@ 
 #define A	regs[insn->a_reg]
 #define X	regs[insn->x_reg]
 #define FP	regs[BPF_REG_FP]
-#define ARG1	regs[BPF_REG_ARG1]
 #define CTX	regs[BPF_REG_CTX]
 #define K	insn->imm
 
@@ -257,7 +256,7 @@  unsigned int __sk_run_filter(void *ctx, const struct sock_filter_int *insn)
 #define CONT_JMP ({ insn++; goto select_insn; })
 
 	FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)];
-	ARG1 = (u64) (unsigned long) ctx;
+	CTX = (u64) (unsigned long) ctx;
 
 	/* Register for user BPF programs need to be reset first. */
 	regs[BPF_REG_A] = 0;
@@ -896,13 +895,6 @@  do_pass:
 	new_insn = new_prog;
 	fp = prog;
 
-	if (new_insn) {
-		new_insn->code = BPF_ALU64 | BPF_MOV | BPF_X;
-		new_insn->a_reg = BPF_REG_CTX;
-		new_insn->x_reg = BPF_REG_ARG1;
-	}
-	new_insn++;
-
 	for (i = 0; i < len; fp++, i++) {
 		struct sock_filter_int tmp_insns[6] = { };
 		struct sock_filter_int *insn = tmp_insns;