diff mbox series

[bpf-next,1/2] bpf, x86: Small optimization in comparing against imm0

Message ID 20191002234512.25902-1-daniel@iogearbox.net
State Accepted
Delegated to: BPF Maintainers
Headers show
Series [bpf-next,1/2] bpf, x86: Small optimization in comparing against imm0 | expand

Commit Message

Daniel Borkmann Oct. 2, 2019, 11:45 p.m. UTC
Replace 'cmp reg, 0' with 'test reg, reg' for comparisons against
zero. Saves 1 byte of instruction encoding per occurrence. The flag
results of test 'reg, reg' are identical to 'cmp reg, 0' in all
cases except for AF which we don't use/care about. In terms of
macro-fusibility in combination with a subsequent conditional jump
instruction, both have the same properties for the jumps used in
the JIT translation. For example, same JITed Cilium program can
shrink a bit from e.g. 12,455 to 12,317 bytes as tests with 0 are
used quite frequently.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 arch/x86/net/bpf_jit_comp.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Song Liu Oct. 3, 2019, 8:52 p.m. UTC | #1
On Wed, Oct 2, 2019 at 5:30 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Replace 'cmp reg, 0' with 'test reg, reg' for comparisons against
> zero. Saves 1 byte of instruction encoding per occurrence. The flag
> results of test 'reg, reg' are identical to 'cmp reg, 0' in all
> cases except for AF which we don't use/care about. In terms of
> macro-fusibility in combination with a subsequent conditional jump
> instruction, both have the same properties for the jumps used in
> the JIT translation. For example, same JITed Cilium program can
> shrink a bit from e.g. 12,455 to 12,317 bytes as tests with 0 are
> used quite frequently.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Acked-by: Song Liu <songliubraving@fb.com>
John Fastabend Oct. 3, 2019, 9:08 p.m. UTC | #2
Song Liu wrote:
> On Wed, Oct 2, 2019 at 5:30 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > Replace 'cmp reg, 0' with 'test reg, reg' for comparisons against
> > zero. Saves 1 byte of instruction encoding per occurrence. The flag
> > results of test 'reg, reg' are identical to 'cmp reg, 0' in all
> > cases except for AF which we don't use/care about. In terms of
> > macro-fusibility in combination with a subsequent conditional jump
> > instruction, both have the same properties for the jumps used in
> > the JIT translation. For example, same JITed Cilium program can
> > shrink a bit from e.g. 12,455 to 12,317 bytes as tests with 0 are
> > used quite frequently.
> >
> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> 
> Acked-by: Song Liu <songliubraving@fb.com>

Bonus points for causing me to spend the morning remembering the
differences between cmd, and, or, and test.

Also wonder if at some point we should clean up the jit a bit and
add some defines/helpers for all the open coded opcodes and such.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Alexei Starovoitov Oct. 4, 2019, 7:36 p.m. UTC | #3
On Thu, Oct 3, 2019 at 2:08 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Song Liu wrote:
> > On Wed, Oct 2, 2019 at 5:30 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >
> > > Replace 'cmp reg, 0' with 'test reg, reg' for comparisons against
> > > zero. Saves 1 byte of instruction encoding per occurrence. The flag
> > > results of test 'reg, reg' are identical to 'cmp reg, 0' in all
> > > cases except for AF which we don't use/care about. In terms of
> > > macro-fusibility in combination with a subsequent conditional jump
> > > instruction, both have the same properties for the jumps used in
> > > the JIT translation. For example, same JITed Cilium program can
> > > shrink a bit from e.g. 12,455 to 12,317 bytes as tests with 0 are
> > > used quite frequently.
> > >
> > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> >
> > Acked-by: Song Liu <songliubraving@fb.com>
>
> Bonus points for causing me to spend the morning remembering the
> differences between cmd, and, or, and test.
>
> Also wonder if at some point we should clean up the jit a bit and
> add some defines/helpers for all the open coded opcodes and such.
>
> Acked-by: John Fastabend <john.fastabend@gmail.com>

Applied both. Thanks
diff mbox series

Patch

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 991549a1c5f3..3ad2ba1ad855 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -909,6 +909,16 @@  xadd:			if (is_imm8(insn->off))
 		case BPF_JMP32 | BPF_JSLT | BPF_K:
 		case BPF_JMP32 | BPF_JSGE | BPF_K:
 		case BPF_JMP32 | BPF_JSLE | BPF_K:
+			/* test dst_reg, dst_reg to save one extra byte */
+			if (imm32 == 0) {
+				if (BPF_CLASS(insn->code) == BPF_JMP)
+					EMIT1(add_2mod(0x48, dst_reg, dst_reg));
+				else if (is_ereg(dst_reg))
+					EMIT1(add_2mod(0x40, dst_reg, dst_reg));
+				EMIT2(0x85, add_2reg(0xC0, dst_reg, dst_reg));
+				goto emit_cond_jmp;
+			}
+
 			/* cmp dst_reg, imm8/32 */
 			if (BPF_CLASS(insn->code) == BPF_JMP)
 				EMIT1(add_1mod(0x48, dst_reg));