diff mbox series

[RFC,bpf] bpf, x64: fix JIT emission for dead code

Message ID 20180425054216.48961-1-g.borello@gmail.com
State Accepted, archived
Delegated to: BPF Maintainers
Headers show
Series [RFC,bpf] bpf, x64: fix JIT emission for dead code | expand

Commit Message

Gianluca Borello April 25, 2018, 5:42 a.m. UTC
Commit 2a5418a13fcf ("bpf: improve dead code sanitizing") replaced dead
code with a series of ja-1 instructions, for safety. That made JIT
compilation much more complex for some BPF programs. One instance of such
programs is, for example:

bool flag = false
...
/* A bunch of other code */
...
if (flag)
        do_something()

In some cases llvm is not able to remove at compile time the code for
do_something(), so the generated BPF program ends up with a large amount
of dead instructions. In one specific real life example, there are two
series of ~500 and ~1000 dead instructions in the program. When the
verifier replaces them with a series of ja-1 instructions, it causes an
interesting behavior at JIT time.

During the first pass, since all the instructions are estimated at 64
bytes, the ja-1 instructions end up being translated as 5 bytes JMP
instructions (0xE9), since the jump offsets become increasingly large (>
127) as each instruction gets discovered to be 5 bytes instead of the
estimated 64.

Starting from the second pass, the first N instructions of the ja-1
sequence get translated into 2 bytes JMPs (0xEB) because the jump offsets
become <= 127 this time. In particular, N is defined as roughly 127 / (5
- 2) ~= 42. So, each further pass will make the subsequent N JMP
instructions shrink from 5 to 2 bytes, making the image shrink every time.
This means that in order to have the entire program converge, there need
to be, in the real example above, at least ~1000 / 42 ~= 24 passes just
for translating the dead code. If we add this number to the passes needed
to translate the other non dead code, it brings such program to 40+
passes, and JIT doesn't complete. Ultimately the userspace loader fails
because such BPF program was supposed to be part of a prog array owner
being JITed.

While it is certainly possible to try to refactor such programs to help
the compiler remove dead code, the behavior is not really intuitive and it
puts further burden on the BPF developer who is not expecting such
behavior. To make things worse, such programs are working just fine in all
the kernel releases prior to the ja-1 fix.

A possible approach to mitigate this behavior consists into noticing that
for ja-1 instructions we don't really need to rely on the estimated size
of the previous and current instructions, we know that a -1 BPF jump
offset can be safely translated into a 0xEB instruction with a jump offset
of -2.

Such fix brings the BPF program in the previous example to complete again
in ~9 passes.

Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing")
Signed-off-by: Gianluca Borello <g.borello@gmail.com>
---
Hi

Posting this as RFC since I just wanted to report this potential bug and
propose a possible fix, although I'm not sure if:

1) There might be a better fix on the JIT side
2) We might want to replace the ja-1 instructions with something else
3) We might want to ignore this issue

If we choose option 3, I'd just like to point out that this caused a real
regression on a couple BPF programs that are part of a larger collection
of programs that used to work fine on 4.15 and I recently found broken 
in 4.16, so there would be some value in somehow addressing this.

Thanks

 arch/x86/net/bpf_jit_comp.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Daniel Borkmann April 25, 2018, 7:48 a.m. UTC | #1
On 04/25/2018 07:42 AM, Gianluca Borello wrote:
> Commit 2a5418a13fcf ("bpf: improve dead code sanitizing") replaced dead
> code with a series of ja-1 instructions, for safety. That made JIT
> compilation much more complex for some BPF programs. One instance of such
> programs is, for example:
[...]
> A possible approach to mitigate this behavior consists into noticing that
> for ja-1 instructions we don't really need to rely on the estimated size
> of the previous and current instructions, we know that a -1 BPF jump
> offset can be safely translated into a 0xEB instruction with a jump offset
> of -2.
> 
> Such fix brings the BPF program in the previous example to complete again
> in ~9 passes.
> 
> Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing")
> Signed-off-by: Gianluca Borello <g.borello@gmail.com>

Thanks for reporting, Gianluca. The approach your fix takes looks good to me!
Alexei Starovoitov April 25, 2018, 2:12 p.m. UTC | #2
On Wed, Apr 25, 2018 at 05:42:16AM +0000, Gianluca Borello wrote:
> Commit 2a5418a13fcf ("bpf: improve dead code sanitizing") replaced dead
> code with a series of ja-1 instructions, for safety. That made JIT
> compilation much more complex for some BPF programs. One instance of such
> programs is, for example:
> 
> bool flag = false
> ...
> /* A bunch of other code */
> ...
> if (flag)
>         do_something()
> 
> In some cases llvm is not able to remove at compile time the code for
> do_something(), so the generated BPF program ends up with a large amount
> of dead instructions. In one specific real life example, there are two
> series of ~500 and ~1000 dead instructions in the program. When the
> verifier replaces them with a series of ja-1 instructions, it causes an
> interesting behavior at JIT time.
> 
> During the first pass, since all the instructions are estimated at 64
> bytes, the ja-1 instructions end up being translated as 5 bytes JMP
> instructions (0xE9), since the jump offsets become increasingly large (>
> 127) as each instruction gets discovered to be 5 bytes instead of the
> estimated 64.
> 
> Starting from the second pass, the first N instructions of the ja-1
> sequence get translated into 2 bytes JMPs (0xEB) because the jump offsets
> become <= 127 this time. In particular, N is defined as roughly 127 / (5
> - 2) ~= 42. So, each further pass will make the subsequent N JMP
> instructions shrink from 5 to 2 bytes, making the image shrink every time.
> This means that in order to have the entire program converge, there need
> to be, in the real example above, at least ~1000 / 42 ~= 24 passes just
> for translating the dead code. If we add this number to the passes needed
> to translate the other non dead code, it brings such program to 40+
> passes, and JIT doesn't complete. Ultimately the userspace loader fails
> because such BPF program was supposed to be part of a prog array owner
> being JITed.
> 
> While it is certainly possible to try to refactor such programs to help
> the compiler remove dead code, the behavior is not really intuitive and it
> puts further burden on the BPF developer who is not expecting such
> behavior. To make things worse, such programs are working just fine in all
> the kernel releases prior to the ja-1 fix.
> 
> A possible approach to mitigate this behavior consists into noticing that
> for ja-1 instructions we don't really need to rely on the estimated size
> of the previous and current instructions, we know that a -1 BPF jump
> offset can be safely translated into a 0xEB instruction with a jump offset
> of -2.
> 
> Such fix brings the BPF program in the previous example to complete again
> in ~9 passes.
> 
> Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing")
> Signed-off-by: Gianluca Borello <g.borello@gmail.com>
> ---
> Hi
> 
> Posting this as RFC since I just wanted to report this potential bug and
> propose a possible fix, although I'm not sure if:
> 
> 1) There might be a better fix on the JIT side
> 2) We might want to replace the ja-1 instructions with something else
> 3) We might want to ignore this issue
> 
> If we choose option 3, I'd just like to point out that this caused a real
> regression on a couple BPF programs that are part of a larger collection
> of programs that used to work fine on 4.15 and I recently found broken 
> in 4.16, so there would be some value in somehow addressing this.
> 
> Thanks
> 
>  arch/x86/net/bpf_jit_comp.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index b725154182cc..abce27ceb411 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1027,7 +1027,17 @@ xadd:			if (is_imm8(insn->off))
>  			break;
>  
>  		case BPF_JMP | BPF_JA:
> -			jmp_offset = addrs[i + insn->off] - addrs[i];
> +			if (insn->off == -1)
> +				/* -1 jmp instructions will always jump
> +				 * backwards two bytes. Explicitly handling
> +				 * this case avoids wasting too many passes
> +				 * when there are long sequences of replaced
> +				 * dead code.
> +				 */
> +				jmp_offset = -2;
> +			else
> +				jmp_offset = addrs[i + insn->off] - addrs[i];
> +

Impressive analysis and fix. Thank you Gianluca.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann April 25, 2018, 3:34 p.m. UTC | #3
On 04/25/2018 07:42 AM, Gianluca Borello wrote:
> Commit 2a5418a13fcf ("bpf: improve dead code sanitizing") replaced dead
> code with a series of ja-1 instructions, for safety. That made JIT
> compilation much more complex for some BPF programs. One instance of such
> programs is, for example:
> 
> bool flag = false
> ...
> /* A bunch of other code */
> ...
> if (flag)
>         do_something()
> 
> In some cases llvm is not able to remove at compile time the code for
> do_something(), so the generated BPF program ends up with a large amount
> of dead instructions. In one specific real life example, there are two
> series of ~500 and ~1000 dead instructions in the program. When the
> verifier replaces them with a series of ja-1 instructions, it causes an
> interesting behavior at JIT time.
[...]
> 
> Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing")
> Signed-off-by: Gianluca Borello <g.borello@gmail.com>

I've applied this fix to bpf tree, thanks Gianluca!
Gianluca Borello April 26, 2018, 4:52 a.m. UTC | #4
On Wed, Apr 25, 2018 at 8:34 AM Daniel Borkmann <daniel@iogearbox.net>
wrote:

> I've applied this fix to bpf tree, thanks Gianluca!

Thank you all for the quick review, really appreciated!
diff mbox series

Patch

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index b725154182cc..abce27ceb411 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1027,7 +1027,17 @@  xadd:			if (is_imm8(insn->off))
 			break;
 
 		case BPF_JMP | BPF_JA:
-			jmp_offset = addrs[i + insn->off] - addrs[i];
+			if (insn->off == -1)
+				/* -1 jmp instructions will always jump
+				 * backwards two bytes. Explicitly handling
+				 * this case avoids wasting too many passes
+				 * when there are long sequences of replaced
+				 * dead code.
+				 */
+				jmp_offset = -2;
+			else
+				jmp_offset = addrs[i + insn->off] - addrs[i];
+
 			if (!jmp_offset)
 				/* optimize out nop jumps */
 				break;