diff mbox

ppc: Fix BPF JIT for ABIv2

Message ID 20160621085807.GE8886@naverao1-tp.localdomain (mailing list archive)
State Superseded
Headers show

Commit Message

Naveen N. Rao June 21, 2016, 8:58 a.m. UTC
On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > 
> > > Hi, Michael and Naveen.
> > > 
> > > I noticed independently that there is a problem with BPF JIT and ABIv2, and
> > > worked out the patch below before I noticed Naveen's patchset and the latest
> > > changes in ppc tree for a better way to check for ABI versions.
> > > 
> > > However, since the issue described below affect mainline and stable kernels,
> > > would you consider applying it before merging your two patchsets, so that we can
> > > more easily backport the fix?
> > 
> > Hi Cascardo,
> > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > fixing it. But, I can see why this would be a good thing to have for 
> > -stable and existing distros. However, while your patch below may fix 
> > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > changes in bpf_jit_asm.S as well.
> 
> Hi, Naveen.
> 
> Any tips on how to exercise possible issues there? Or what changes you think
> would be sufficient?

The calling convention is different with ABIv2 and so we'll need changes 
in bpf_slow_path_common() and sk_negative_common().

However, rather than enabling classic JIT for ppc64le, are we better off 
just disabling it?



Michael,
Let me know your thoughts on whether you intend to take this patch or 
Cascardo's patch for -stable before the eBPF patches. I can redo my 
patches accordingly.


- Naveen

Comments

Michael Ellerman June 21, 2016, 11:15 a.m. UTC | #1
On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote:
> On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > > 
> > > > Hi, Michael and Naveen.
> > > > 
> > > > I noticed independently that there is a problem with BPF JIT and ABIv2, and
> > > > worked out the patch below before I noticed Naveen's patchset and the latest
> > > > changes in ppc tree for a better way to check for ABI versions.
> > > > 
> > > > However, since the issue described below affect mainline and stable kernels,
> > > > would you consider applying it before merging your two patchsets, so that we can
> > > > more easily backport the fix?
> > > 
> > > Hi Cascardo,
> > > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > > fixing it. But, I can see why this would be a good thing to have for 
> > > -stable and existing distros. However, while your patch below may fix 
> > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > > changes in bpf_jit_asm.S as well.
> > 
> > Hi, Naveen.
> > 
> > Any tips on how to exercise possible issues there? Or what changes you think
> > would be sufficient?
> 
> The calling convention is different with ABIv2 and so we'll need changes 
> in bpf_slow_path_common() and sk_negative_common().

How big would those changes be? Do we know?

How come no one reported this was broken previously? This is the first I've
heard of it being broken.

> However, rather than enabling classic JIT for ppc64le, are we better off 
> just disabling it?
> 
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -128,7 +128,7 @@ config PPC
>         select IRQ_FORCED_THREADING
>         select HAVE_RCU_TABLE_FREE if SMP
>         select HAVE_SYSCALL_TRACEPOINTS
> -       select HAVE_CBPF_JIT
> +       select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
>         select HAVE_ARCH_JUMP_LABEL
>         select ARCH_HAVE_NMI_SAFE_CMPXCHG
>         select ARCH_HAS_GCOV_PROFILE_ALL
> 
> 
> Michael,
> Let me know your thoughts on whether you intend to take this patch or 
> Cascardo's patch for -stable before the eBPF patches. I can redo my 
> patches accordingly.

This patch sounds like the best option at the moment for something we can
backport. Unless the changes to fix it are minimal.

cheers
Thadeu Lima de Souza Cascardo June 21, 2016, 2:47 p.m. UTC | #2
On Tue, Jun 21, 2016 at 09:15:48PM +1000, Michael Ellerman wrote:
> On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote:
> > On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> > > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > > > 
> > > > > Hi, Michael and Naveen.
> > > > > 
> > > > > I noticed independently that there is a problem with BPF JIT and ABIv2, and
> > > > > worked out the patch below before I noticed Naveen's patchset and the latest
> > > > > changes in ppc tree for a better way to check for ABI versions.
> > > > > 
> > > > > However, since the issue described below affect mainline and stable kernels,
> > > > > would you consider applying it before merging your two patchsets, so that we can
> > > > > more easily backport the fix?
> > > > 
> > > > Hi Cascardo,
> > > > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > > > fixing it. But, I can see why this would be a good thing to have for 
> > > > -stable and existing distros. However, while your patch below may fix 
> > > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > > > changes in bpf_jit_asm.S as well.
> > > 
> > > Hi, Naveen.
> > > 
> > > Any tips on how to exercise possible issues there? Or what changes you think
> > > would be sufficient?
> > 
> > The calling convention is different with ABIv2 and so we'll need changes 
> > in bpf_slow_path_common() and sk_negative_common().
> 
> How big would those changes be? Do we know?
> 
> How come no one reported this was broken previously? This is the first I've
> heard of it being broken.
> 

I just heard of it less than two weeks ago, and only could investigate it last
week, when I realized mainline was also affected.

It looks like the little-endian support for classic JIT were done before the
conversion to ABIv2. And as JIT is disabled by default, no one seems to have
exercised it.

> > However, rather than enabling classic JIT for ppc64le, are we better off 
> > just disabling it?
> > 
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -128,7 +128,7 @@ config PPC
> >         select IRQ_FORCED_THREADING
> >         select HAVE_RCU_TABLE_FREE if SMP
> >         select HAVE_SYSCALL_TRACEPOINTS
> > -       select HAVE_CBPF_JIT
> > +       select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
> >         select HAVE_ARCH_JUMP_LABEL
> >         select ARCH_HAVE_NMI_SAFE_CMPXCHG
> >         select ARCH_HAS_GCOV_PROFILE_ALL
> > 
> > 
> > Michael,
> > Let me know your thoughts on whether you intend to take this patch or 
> > Cascardo's patch for -stable before the eBPF patches. I can redo my 
> > patches accordingly.
> 
> This patch sounds like the best option at the moment for something we can
> backport. Unless the changes to fix it are minimal.
> 
> cheers
> 

With my patch only, I can run a minimal tcpdump tcp port 22 with success. It
correctly filter packets. But as pointed out, slow paths may not be taken.

I don't have strong opinions on what to apply to stable, just that it would be
nice to have something for the crash before applying all the nice changes by
Naveen.

Cascardo.
Alexei Starovoitov June 21, 2016, 3:45 p.m. UTC | #3
On 6/21/16 7:47 AM, Thadeu Lima de Souza Cascardo wrote:
>>>
>>> The calling convention is different with ABIv2 and so we'll need changes
>>> in bpf_slow_path_common() and sk_negative_common().
>>
>> How big would those changes be? Do we know?
>>
>> How come no one reported this was broken previously? This is the first I've
>> heard of it being broken.
>>
>
> I just heard of it less than two weeks ago, and only could investigate it last
> week, when I realized mainline was also affected.
>
> It looks like the little-endian support for classic JIT were done before the
> conversion to ABIv2. And as JIT is disabled by default, no one seems to have
> exercised it.

it's not a surprise unfortunately. The JITs that were written before
test_bpf.ko was developed were missing corner cases. Typical tcpdump
would be fine, but fragmented packets, negative offsets and
out-out-bounds wouldn't be handled correctly.
I'd suggest to validate the stable backport with test_bpf as well.
Michael Ellerman June 22, 2016, 4:06 a.m. UTC | #4
On Tue, 2016-06-21 at 08:45 -0700, Alexei Starovoitov wrote:
> On 6/21/16 7:47 AM, Thadeu Lima de Souza Cascardo wrote:
> > > > 
> > > > The calling convention is different with ABIv2 and so we'll need changes
> > > > in bpf_slow_path_common() and sk_negative_common().
> > > 
> > > How big would those changes be? Do we know?
> > > 
> > > How come no one reported this was broken previously? This is the first I've
> > > heard of it being broken.
> > > 
> > 
> > I just heard of it less than two weeks ago, and only could investigate it last
> > week, when I realized mainline was also affected.
> > 
> > It looks like the little-endian support for classic JIT were done before the
> > conversion to ABIv2. And as JIT is disabled by default, no one seems to have
> > exercised it.
> 
> it's not a surprise unfortunately. The JITs that were written before
> test_bpf.ko was developed were missing corner cases. Typical tcpdump
> would be fine, but fragmented packets, negative offsets and
> out-out-bounds wouldn't be handled correctly.
> I'd suggest to validate the stable backport with test_bpf as well.
 
OK thanks.

I have been running seltests/net/test_bpf, but I realise now it doesn't enable
the JIT.

cheers
Michael Ellerman June 22, 2016, 5:20 a.m. UTC | #5
On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote:
> On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > > 
> > > > Hi, Michael and Naveen.
> > > > 
> > > > I noticed independently that there is a problem with BPF JIT and ABIv2, and
> > > > worked out the patch below before I noticed Naveen's patchset and the latest
> > > > changes in ppc tree for a better way to check for ABI versions.
> > > > 
> > > > However, since the issue described below affect mainline and stable kernels,
> > > > would you consider applying it before merging your two patchsets, so that we can
> > > > more easily backport the fix?
> > > 
> > > Hi Cascardo,
> > > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > > fixing it. But, I can see why this would be a good thing to have for 
> > > -stable and existing distros. However, while your patch below may fix 
> > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > > changes in bpf_jit_asm.S as well.
> > 
> > Hi, Naveen.
> > 
> > Any tips on how to exercise possible issues there? Or what changes you think
> > would be sufficient?
> 
> The calling convention is different with ABIv2 and so we'll need changes 
> in bpf_slow_path_common() and sk_negative_common().
> 
> However, rather than enabling classic JIT for ppc64le, are we better off 
> just disabling it?
> 
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -128,7 +128,7 @@ config PPC
>         select IRQ_FORCED_THREADING
>         select HAVE_RCU_TABLE_FREE if SMP
>         select HAVE_SYSCALL_TRACEPOINTS
> -       select HAVE_CBPF_JIT
> +       select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
>         select HAVE_ARCH_JUMP_LABEL
>         select ARCH_HAVE_NMI_SAFE_CMPXCHG
>         select ARCH_HAS_GCOV_PROFILE_ALL
> 
> 
> Michael,
> Let me know your thoughts on whether you intend to take this patch or 
> Cascardo's patch for -stable before the eBPF patches. I can redo my 
> patches accordingly.

Can one of you send me a proper version of this patch, with change log and
sign-off etc.

cheers
Naveen N. Rao June 22, 2016, 7:12 a.m. UTC | #6
On 2016/06/21 11:47AM, Thadeu Lima de Souza Cascardo wrote:
> On Tue, Jun 21, 2016 at 09:15:48PM +1000, Michael Ellerman wrote:
> > On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote:
> > > On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> > > > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > > > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > > > > 
> > > > > > Hi, Michael and Naveen.
> > > > > > 
> > > > > > I noticed independently that there is a problem with BPF JIT and ABIv2, and
> > > > > > worked out the patch below before I noticed Naveen's patchset and the latest
> > > > > > changes in ppc tree for a better way to check for ABI versions.
> > > > > > 
> > > > > > However, since the issue described below affect mainline and stable kernels,
> > > > > > would you consider applying it before merging your two patchsets, so that we can
> > > > > > more easily backport the fix?
> > > > > 
> > > > > Hi Cascardo,
> > > > > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > > > > fixing it. But, I can see why this would be a good thing to have for 
> > > > > -stable and existing distros. However, while your patch below may fix 
> > > > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > > > > changes in bpf_jit_asm.S as well.
> > > > 
> > > > Hi, Naveen.
> > > > 
> > > > Any tips on how to exercise possible issues there? Or what changes you think
> > > > would be sufficient?
> > > 
> > > The calling convention is different with ABIv2 and so we'll need changes 
> > > in bpf_slow_path_common() and sk_negative_common().
> > 
> > How big would those changes be? Do we know?

I don't think it'd be that much -- I will take a stab at this today.

> > 
> > How come no one reported this was broken previously? This is the first I've
> > heard of it being broken.
> > 
> 
> I just heard of it less than two weeks ago, and only could investigate it last
> week, when I realized mainline was also affected.
> 
> It looks like the little-endian support for classic JIT were done before the
> conversion to ABIv2. And as JIT is disabled by default, no one seems to have
> exercised it.

Yes, my thoughts too. I didn't previously think much about this as JIT 
wouldn't be enabled by default. It's interesting though that no one else 
reported this as an issue before.

> 
> > > However, rather than enabling classic JIT for ppc64le, are we better off 
> > > just disabling it?
> > > 
> > > --- a/arch/powerpc/Kconfig
> > > +++ b/arch/powerpc/Kconfig
> > > @@ -128,7 +128,7 @@ config PPC
> > >         select IRQ_FORCED_THREADING
> > >         select HAVE_RCU_TABLE_FREE if SMP
> > >         select HAVE_SYSCALL_TRACEPOINTS
> > > -       select HAVE_CBPF_JIT
> > > +       select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
> > >         select HAVE_ARCH_JUMP_LABEL
> > >         select ARCH_HAVE_NMI_SAFE_CMPXCHG
> > >         select ARCH_HAS_GCOV_PROFILE_ALL
> > > 
> > > 
> > > Michael,
> > > Let me know your thoughts on whether you intend to take this patch or 
> > > Cascardo's patch for -stable before the eBPF patches. I can redo my 
> > > patches accordingly.
> > 
> > This patch sounds like the best option at the moment for something we can
> > backport. Unless the changes to fix it are minimal.

Right -- I will take a look today to see what changes would be needed.

- Naveen
Naveen N. Rao June 22, 2016, 2:57 p.m. UTC | #7
On 2016/06/22 12:42PM, Naveen N Rao wrote:
> On 2016/06/21 11:47AM, Thadeu Lima de Souza Cascardo wrote:
> > On Tue, Jun 21, 2016 at 09:15:48PM +1000, Michael Ellerman wrote:
> > > On Tue, 2016-06-21 at 14:28 +0530, Naveen N. Rao wrote:
> > > > On 2016/06/20 03:56PM, Thadeu Lima de Souza Cascardo wrote:
> > > > > On Sun, Jun 19, 2016 at 11:19:14PM +0530, Naveen N. Rao wrote:
> > > > > > On 2016/06/17 10:00AM, Thadeu Lima de Souza Cascardo wrote:
> > > > > > > 
> > > > > > > Hi, Michael and Naveen.
> > > > > > > 
> > > > > > > I noticed independently that there is a problem with BPF JIT and ABIv2, and
> > > > > > > worked out the patch below before I noticed Naveen's patchset and the latest
> > > > > > > changes in ppc tree for a better way to check for ABI versions.
> > > > > > > 
> > > > > > > However, since the issue described below affect mainline and stable kernels,
> > > > > > > would you consider applying it before merging your two patchsets, so that we can
> > > > > > > more easily backport the fix?
> > > > > > 
> > > > > > Hi Cascardo,
> > > > > > Given that this has been broken on ABIv2 since forever, I didn't bother 
> > > > > > fixing it. But, I can see why this would be a good thing to have for 
> > > > > > -stable and existing distros. However, while your patch below may fix 
> > > > > > the crash you're seeing on ppc64le, it is not sufficient -- you'll need 
> > > > > > changes in bpf_jit_asm.S as well.
> > > > > 
> > > > > Hi, Naveen.
> > > > > 
> > > > > Any tips on how to exercise possible issues there? Or what changes you think
> > > > > would be sufficient?
> > > > 
> > > > The calling convention is different with ABIv2 and so we'll need changes 
> > > > in bpf_slow_path_common() and sk_negative_common().
> > > 
> > > How big would those changes be? Do we know?

So, this does need quite a few changes:
- the skb helpers need to emit code to setup TOC and the JIT code needs 
  to be updated to setup r12.
- the slow path code needs to be changed to store r3 elsewhere on ABIv2
- the above also means we need to change the stack macros with the 
  proper ABIv2 values
- the little endian support isn't complete as well -- some of the skb 
  helpers are not using byte swap instructions.

As such, I think we should just disable classic JIT on ppc64le.


- Naveen
diff mbox

Patch

--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -128,7 +128,7 @@  config PPC
        select IRQ_FORCED_THREADING
        select HAVE_RCU_TABLE_FREE if SMP
        select HAVE_SYSCALL_TRACEPOINTS
-       select HAVE_CBPF_JIT
+       select HAVE_CBPF_JIT if CPU_BIG_ENDIAN
        select HAVE_ARCH_JUMP_LABEL
        select ARCH_HAVE_NMI_SAFE_CMPXCHG
        select ARCH_HAS_GCOV_PROFILE_ALL