Message ID | 20180126223348.11250-2-daniel@iogearbox.net |
---|---|
State | Accepted, archived |
Delegated to: | BPF Maintainers |
Headers | show |
Series | BPF improvements and fixes | expand |
in 32 bit alu Daniel Borkmann wrote: > Very minor optimization; saves 1 byte per program in x86_64 > JIT in cBPF prologue. ... but increases program size by 4 bytes on ppc64 :( In general, this is an area I've been wanting to spend some time on. Powerpc doesn't have 32-bit sub-registers, so we need to emit an additional instruction to clear the higher 32-bits for all 32-bit operations. I need to look at the performance impact. - Naveen > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > Acked-by: Alexei Starovoitov <ast@kernel.org> > --- > net/core/filter.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 18da42a..cba2f73 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -401,8 +401,8 @@ static int bpf_convert_filter(struct sock_filter *prog, int len, > /* Classic BPF expects A and X to be reset first. These need > * to be guaranteed to be the first two instructions. > */ > - *new_insn++ = BPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A); > - *new_insn++ = BPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X); > + *new_insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A); > + *new_insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_X, BPF_REG_X); > > /* All programs must keep CTX in callee saved BPF_REG_CTX. > * In eBPF case it's done by the compiler, here we need to > -- > 2.9.5 > > >
On 01/28/2018 07:58 PM, Naveen N. Rao wrote: > in 32 bit alu > > Daniel Borkmann wrote: >> Very minor optimization; saves 1 byte per program in x86_64 >> JIT in cBPF prologue. > > ... but increases program size by 4 bytes on ppc64 :( > In general, this is an area I've been wanting to spend some time on. Powerpc doesn't have 32-bit sub-registers, so we need to emit an additional instruction to clear the higher 32-bits for all 32-bit operations. I need to look at the performance impact. Right, I think one way to optimize this could be on JIT level in such case when CPU doesn't have subregs. There is the bpf_prog_was_classic() helper that can be used there in order to skip some of the bpf_alu32_trunc goto cases e.g. for some of the bit ops as an example, since we know that upper part in cBPF must be zero here anyway, this should definitely be a low hanging fruit given we use alu32 in the cBPF to eBPF conversion in a lot of places. Thanks, Daniel
diff --git a/net/core/filter.c b/net/core/filter.c index 18da42a..cba2f73 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -401,8 +401,8 @@ static int bpf_convert_filter(struct sock_filter *prog, int len, /* Classic BPF expects A and X to be reset first. These need * to be guaranteed to be the first two instructions. */ - *new_insn++ = BPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A); - *new_insn++ = BPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X); + *new_insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A); + *new_insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_X, BPF_REG_X); /* All programs must keep CTX in callee saved BPF_REG_CTX. * In eBPF case it's done by the compiler, here we need to