mbox series

[E/F/Unstable,0/1] crypto: fix regression/use-after-free in af_alg_accept()

Message ID 20200630230942.1576682-1-mfo@canonical.com
Headers show
Series crypto: fix regression/use-after-free in af_alg_accept() | expand

Message

Mauricio Faria de Oliveira June 30, 2020, 11:09 p.m. UTC
Regarding the submission policies for this kernel SRU cycle:
this patch does not necessarily have to be applied for now;
just review/ack for B/E would be useful if at all possible.

It has only been merged on Linus' tree yesterday.

The patch applies cleanly on Unstable/Focal/Eoan,
and is a trivial backport on Disco/Bionic/Xenial.
(it's on all series because it's a fix to stable.)

[Impact]

 * Users of the Linux kernel's crypto userspace API
   reported BUG() / kernel NULL pointer dereference
   errors after kernel upgrades.

 * The stack trace signature is an accept() syscall
   going through af_alg_accept() and hitting errors
   usually in one of:
   - apparmor_sk_clone_security()
   - apparmor_sock_graft()
   - release_sock()

[Fix]
      
 * This is a regression introduced by upstream commit
   37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
   in sk_destruct") which made its way through stable.
   
 * The offending patch allows the critical regions
   of af_alg_accept() and af_alg_release_parent() to
   run concurrently; now with the "right" events on 2
   CPUs it might drop the non-atomic reference counter
   of the alg_sock then the sock, thus release a sock
   that is still in use.

 * The fix is upstream commit 34c86f4c4a7b ("crypto:
   af_alg - fix use-after-free in af_alg_accept() due
   to bh_lock_sock()") [1]. It changes alg_sock's ref
   counter to atomic, which addresses the root cause.
   
[Test Case]

 * There is a synthetic test case available, which
   uses a kprobes kernel module to synchronize the
   concurrent CPUs on the instructions responsible
   for the problem; and a userspace part to run it.

 * The organic reproducer is the Varnish Cache Plus
   software with the Crypto vmod (which uses kernel
   crypto userspace API) under long, very high load.
   
 * The patch has been verified on both reproducers
   with the 4.15 and 5.7 kernels.
   
 * More tests performed with 'stress-ng --af-alg'
   with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal
   (all on same version of stress-ng, V0.11.14)
   
   No regressions observed from original kernel.
   (the af-alg stressor can exercise almost all
   kernel crypto modules shipped with the kernel;
   so it checks more paths/crypto alg interfaces.)
   
[Regression Potential]

 * The fix patch does a fundamental change in how
   alg_sock reference counters work, plus another
   change to the 'nokey' counting. This of course
   *has* a risk of regression.

 * Regressions theoretically could manifest as use
   after free errors (in case of undercounting) in
   the af_alg functions or silent memory leaks (in
   case of overcounting), but also other behaviors
   since reference counting is key to many things.
   
 * FWIW, this patch has been written by the crypto
   subsystem maintainer, who certainly knows a lot
   of the normal and corner cases, thus giving the
   patch more credit.
   
 * Testing with the organic reproducer ran as long
   as 5 days, without issues, so it does look good.

[Other Info]

 * Not sending for Groovy (should get via Unstable).
 
 * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
 
[Stack Trace Examples]

Examples:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    ...
    RIP: 0010:apparmor_sk_clone_security+0x26/0x70
    ...
    Call Trace:
     security_sk_clone+0x33/0x50
     af_alg_accept+0x81/0x1c0 [af_alg]
     alg_accept+0x15/0x20 [af_alg]
     SYSC_accept4+0xff/0x210
     SyS_accept+0x10/0x20
     do_syscall_64+0x73/0x130
     entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    general protection fault: 0000 [#1] SMP PTI
    ...
    RIP: 0010:__release_sock+0x54/0xe0
    ...
    Call Trace:
     release_sock+0x30/0xa0
     af_alg_accept+0x122/0x1c0 [af_alg]
     alg_accept+0x15/0x20 [af_alg]
     SYSC_accept4+0xff/0x210
     SyS_accept+0x10/0x20
     do_syscall_64+0x73/0x130
     entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Herbert Xu (1):
  crypto: af_alg - fix use-after-free in af_alg_accept() due to
    bh_lock_sock()

 crypto/af_alg.c         | 26 +++++++++++---------------
 crypto/algif_aead.c     |  9 +++------
 crypto/algif_hash.c     |  9 +++------
 crypto/algif_skcipher.c |  9 +++------
 include/crypto/if_alg.h |  4 ++--
 5 files changed, 22 insertions(+), 35 deletions(-)

Comments

Stefan Bader July 1, 2020, 7:35 a.m. UTC | #1
On 01.07.20 01:09, Mauricio Faria de Oliveira wrote:
> Regarding the submission policies for this kernel SRU cycle:
> this patch does not necessarily have to be applied for now;
> just review/ack for B/E would be useful if at all possible.
> 
> It has only been merged on Linus' tree yesterday.
> 
> The patch applies cleanly on Unstable/Focal/Eoan,
> and is a trivial backport on Disco/Bionic/Xenial.
> (it's on all series because it's a fix to stable.)
> 
> [Impact]
> 
>  * Users of the Linux kernel's crypto userspace API
>    reported BUG() / kernel NULL pointer dereference
>    errors after kernel upgrades.
> 
>  * The stack trace signature is an accept() syscall
>    going through af_alg_accept() and hitting errors
>    usually in one of:
>    - apparmor_sk_clone_security()
>    - apparmor_sock_graft()
>    - release_sock()
> 
> [Fix]
>       
>  * This is a regression introduced by upstream commit
>    37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
>    in sk_destruct") which made its way through stable.

Just for my understanding, what is the motivation to submit this separately and
not via stable? I guess it is to get things in more quickly.

Was submitting e/f with cover and b/d/x seperatly without one intentional? It
would be much better to group all the series patches under one cover as that
keeps related things close together in the thread view.

-Stefan

>    
>  * The offending patch allows the critical regions
>    of af_alg_accept() and af_alg_release_parent() to
>    run concurrently; now with the "right" events on 2
>    CPUs it might drop the non-atomic reference counter
>    of the alg_sock then the sock, thus release a sock
>    that is still in use.
> 
>  * The fix is upstream commit 34c86f4c4a7b ("crypto:
>    af_alg - fix use-after-free in af_alg_accept() due
>    to bh_lock_sock()") [1]. It changes alg_sock's ref
>    counter to atomic, which addresses the root cause.
>    
> [Test Case]
> 
>  * There is a synthetic test case available, which
>    uses a kprobes kernel module to synchronize the
>    concurrent CPUs on the instructions responsible
>    for the problem; and a userspace part to run it.
> 
>  * The organic reproducer is the Varnish Cache Plus
>    software with the Crypto vmod (which uses kernel
>    crypto userspace API) under long, very high load.
>    
>  * The patch has been verified on both reproducers
>    with the 4.15 and 5.7 kernels.
>    
>  * More tests performed with 'stress-ng --af-alg'
>    with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal
>    (all on same version of stress-ng, V0.11.14)
>    
>    No regressions observed from original kernel.
>    (the af-alg stressor can exercise almost all
>    kernel crypto modules shipped with the kernel;
>    so it checks more paths/crypto alg interfaces.)
>    
> [Regression Potential]
> 
>  * The fix patch does a fundamental change in how
>    alg_sock reference counters work, plus another
>    change to the 'nokey' counting. This of course
>    *has* a risk of regression.
> 
>  * Regressions theoretically could manifest as use
>    after free errors (in case of undercounting) in
>    the af_alg functions or silent memory leaks (in
>    case of overcounting), but also other behaviors
>    since reference counting is key to many things.
>    
>  * FWIW, this patch has been written by the crypto
>    subsystem maintainer, who certainly knows a lot
>    of the normal and corner cases, thus giving the
>    patch more credit.
>    
>  * Testing with the organic reproducer ran as long
>    as 5 days, without issues, so it does look good.
> 
> [Other Info]
> 
>  * Not sending for Groovy (should get via Unstable).
>  
>  * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
>  
> [Stack Trace Examples]
> 
> Examples:
> 
>     BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>     ...
>     RIP: 0010:apparmor_sk_clone_security+0x26/0x70
>     ...
>     Call Trace:
>      security_sk_clone+0x33/0x50
>      af_alg_accept+0x81/0x1c0 [af_alg]
>      alg_accept+0x15/0x20 [af_alg]
>      SYSC_accept4+0xff/0x210
>      SyS_accept+0x10/0x20
>      do_syscall_64+0x73/0x130
>      entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
>     general protection fault: 0000 [#1] SMP PTI
>     ...
>     RIP: 0010:__release_sock+0x54/0xe0
>     ...
>     Call Trace:
>      release_sock+0x30/0xa0
>      af_alg_accept+0x122/0x1c0 [af_alg]
>      alg_accept+0x15/0x20 [af_alg]
>      SYSC_accept4+0xff/0x210
>      SyS_accept+0x10/0x20
>      do_syscall_64+0x73/0x130
>      entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> Herbert Xu (1):
>   crypto: af_alg - fix use-after-free in af_alg_accept() due to
>     bh_lock_sock()
> 
>  crypto/af_alg.c         | 26 +++++++++++---------------
>  crypto/algif_aead.c     |  9 +++------
>  crypto/algif_hash.c     |  9 +++------
>  crypto/algif_skcipher.c |  9 +++------
>  include/crypto/if_alg.h |  4 ++--
>  5 files changed, 22 insertions(+), 35 deletions(-)
>
Mauricio Faria de Oliveira July 1, 2020, 12:28 p.m. UTC | #2
Hi Stefan,

On Wed, Jul 1, 2020 at 4:35 AM Stefan Bader <stefan.bader@canonical.com> wrote:
>
> On 01.07.20 01:09, Mauricio Faria de Oliveira wrote:
> > Regarding the submission policies for this kernel SRU cycle:
> > this patch does not necessarily have to be applied for now;
> > just review/ack for B/E would be useful if at all possible.
> >
> > It has only been merged on Linus' tree yesterday.
> >
> > The patch applies cleanly on Unstable/Focal/Eoan,
> > and is a trivial backport on Disco/Bionic/Xenial.
> > (it's on all series because it's a fix to stable.)
> >
> > [Impact]
> >
> >  * Users of the Linux kernel's crypto userspace API
> >    reported BUG() / kernel NULL pointer dereference
> >    errors after kernel upgrades.
> >
> >  * The stack trace signature is an accept() syscall
> >    going through af_alg_accept() and hitting errors
> >    usually in one of:
> >    - apparmor_sk_clone_security()
> >    - apparmor_sock_graft()
> >    - release_sock()
> >
> > [Fix]
> >
> >  * This is a regression introduced by upstream commit
> >    37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
> >    in sk_destruct") which made its way through stable.
>
> Just for my understanding, what is the motivation to submit this separately and
> not via stable? I guess it is to get things in more quickly.
>

You're right -- particularly the review/ack, as it's required on UA
support cases
in order to provide temporary, supported kernel packages (hotfix packages)
until the official packages land.

> Was submitting e/f with cover and b/d/x seperatly without one intentional? It
> would be much better to group all the series patches under one cover as that
> keeps related things close together in the thread view.
>

No, I just didn't know it could be done for different series (say,
from a process perspective.)
I actually wondered a bit about it after the fact. Thanks for
mentioning; I will do it in the future.

cheers,
Mauricio

> -Stefan
>
> >
> >  * The offending patch allows the critical regions
> >    of af_alg_accept() and af_alg_release_parent() to
> >    run concurrently; now with the "right" events on 2
> >    CPUs it might drop the non-atomic reference counter
> >    of the alg_sock then the sock, thus release a sock
> >    that is still in use.
> >
> >  * The fix is upstream commit 34c86f4c4a7b ("crypto:
> >    af_alg - fix use-after-free in af_alg_accept() due
> >    to bh_lock_sock()") [1]. It changes alg_sock's ref
> >    counter to atomic, which addresses the root cause.
> >
> > [Test Case]
> >
> >  * There is a synthetic test case available, which
> >    uses a kprobes kernel module to synchronize the
> >    concurrent CPUs on the instructions responsible
> >    for the problem; and a userspace part to run it.
> >
> >  * The organic reproducer is the Varnish Cache Plus
> >    software with the Crypto vmod (which uses kernel
> >    crypto userspace API) under long, very high load.
> >
> >  * The patch has been verified on both reproducers
> >    with the 4.15 and 5.7 kernels.
> >
> >  * More tests performed with 'stress-ng --af-alg'
> >    with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal
> >    (all on same version of stress-ng, V0.11.14)
> >
> >    No regressions observed from original kernel.
> >    (the af-alg stressor can exercise almost all
> >    kernel crypto modules shipped with the kernel;
> >    so it checks more paths/crypto alg interfaces.)
> >
> > [Regression Potential]
> >
> >  * The fix patch does a fundamental change in how
> >    alg_sock reference counters work, plus another
> >    change to the 'nokey' counting. This of course
> >    *has* a risk of regression.
> >
> >  * Regressions theoretically could manifest as use
> >    after free errors (in case of undercounting) in
> >    the af_alg functions or silent memory leaks (in
> >    case of overcounting), but also other behaviors
> >    since reference counting is key to many things.
> >
> >  * FWIW, this patch has been written by the crypto
> >    subsystem maintainer, who certainly knows a lot
> >    of the normal and corner cases, thus giving the
> >    patch more credit.
> >
> >  * Testing with the organic reproducer ran as long
> >    as 5 days, without issues, so it does look good.
> >
> > [Other Info]
> >
> >  * Not sending for Groovy (should get via Unstable).
> >
> >  * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
> >
> > [Stack Trace Examples]
> >
> > Examples:
> >
> >     BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> >     ...
> >     RIP: 0010:apparmor_sk_clone_security+0x26/0x70
> >     ...
> >     Call Trace:
> >      security_sk_clone+0x33/0x50
> >      af_alg_accept+0x81/0x1c0 [af_alg]
> >      alg_accept+0x15/0x20 [af_alg]
> >      SYSC_accept4+0xff/0x210
> >      SyS_accept+0x10/0x20
> >      do_syscall_64+0x73/0x130
> >      entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> >
> >     general protection fault: 0000 [#1] SMP PTI
> >     ...
> >     RIP: 0010:__release_sock+0x54/0xe0
> >     ...
> >     Call Trace:
> >      release_sock+0x30/0xa0
> >      af_alg_accept+0x122/0x1c0 [af_alg]
> >      alg_accept+0x15/0x20 [af_alg]
> >      SYSC_accept4+0xff/0x210
> >      SyS_accept+0x10/0x20
> >      do_syscall_64+0x73/0x130
> >      entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> >
> > Herbert Xu (1):
> >   crypto: af_alg - fix use-after-free in af_alg_accept() due to
> >     bh_lock_sock()
> >
> >  crypto/af_alg.c         | 26 +++++++++++---------------
> >  crypto/algif_aead.c     |  9 +++------
> >  crypto/algif_hash.c     |  9 +++------
> >  crypto/algif_skcipher.c |  9 +++------
> >  include/crypto/if_alg.h |  4 ++--
> >  5 files changed, 22 insertions(+), 35 deletions(-)
> >
>
>
Seth Forshee July 1, 2020, 9:06 p.m. UTC | #3
On Tue, Jun 30, 2020 at 08:09:41PM -0300, Mauricio Faria de Oliveira wrote:
> Regarding the submission policies for this kernel SRU cycle:
> this patch does not necessarily have to be applied for now;
> just review/ack for B/E would be useful if at all possible.
> 
> It has only been merged on Linus' tree yesterday.
> 
> The patch applies cleanly on Unstable/Focal/Eoan,
> and is a trivial backport on Disco/Bionic/Xenial.
> (it's on all series because it's a fix to stable.)
> 
> [Impact]
> 
>  * Users of the Linux kernel's crypto userspace API
>    reported BUG() / kernel NULL pointer dereference
>    errors after kernel upgrades.
> 
>  * The stack trace signature is an accept() syscall
>    going through af_alg_accept() and hitting errors
>    usually in one of:
>    - apparmor_sk_clone_security()
>    - apparmor_sock_graft()
>    - release_sock()
> 
> [Fix]
>       
>  * This is a regression introduced by upstream commit
>    37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
>    in sk_destruct") which made its way through stable.
>    
>  * The offending patch allows the critical regions
>    of af_alg_accept() and af_alg_release_parent() to
>    run concurrently; now with the "right" events on 2
>    CPUs it might drop the non-atomic reference counter
>    of the alg_sock then the sock, thus release a sock
>    that is still in use.
> 
>  * The fix is upstream commit 34c86f4c4a7b ("crypto:
>    af_alg - fix use-after-free in af_alg_accept() due
>    to bh_lock_sock()") [1]. It changes alg_sock's ref
>    counter to atomic, which addresses the root cause.
>    
> [Test Case]
> 
>  * There is a synthetic test case available, which
>    uses a kprobes kernel module to synchronize the
>    concurrent CPUs on the instructions responsible
>    for the problem; and a userspace part to run it.
> 
>  * The organic reproducer is the Varnish Cache Plus
>    software with the Crypto vmod (which uses kernel
>    crypto userspace API) under long, very high load.
>    
>  * The patch has been verified on both reproducers
>    with the 4.15 and 5.7 kernels.
>    
>  * More tests performed with 'stress-ng --af-alg'
>    with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal
>    (all on same version of stress-ng, V0.11.14)
>    
>    No regressions observed from original kernel.
>    (the af-alg stressor can exercise almost all
>    kernel crypto modules shipped with the kernel;
>    so it checks more paths/crypto alg interfaces.)
>    
> [Regression Potential]
> 
>  * The fix patch does a fundamental change in how
>    alg_sock reference counters work, plus another
>    change to the 'nokey' counting. This of course
>    *has* a risk of regression.
> 
>  * Regressions theoretically could manifest as use
>    after free errors (in case of undercounting) in
>    the af_alg functions or silent memory leaks (in
>    case of overcounting), but also other behaviors
>    since reference counting is key to many things.
>    
>  * FWIW, this patch has been written by the crypto
>    subsystem maintainer, who certainly knows a lot
>    of the normal and corner cases, thus giving the
>    patch more credit.
>    
>  * Testing with the organic reproducer ran as long
>    as 5 days, without issues, so it does look good.
> 
> [Other Info]
> 
>  * Not sending for Groovy (should get via Unstable).
>  
>  * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
>  
> [Stack Trace Examples]
> 
> Examples:
> 
>     BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>     ...
>     RIP: 0010:apparmor_sk_clone_security+0x26/0x70
>     ...
>     Call Trace:
>      security_sk_clone+0x33/0x50
>      af_alg_accept+0x81/0x1c0 [af_alg]
>      alg_accept+0x15/0x20 [af_alg]
>      SYSC_accept4+0xff/0x210
>      SyS_accept+0x10/0x20
>      do_syscall_64+0x73/0x130
>      entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
>     general protection fault: 0000 [#1] SMP PTI
>     ...
>     RIP: 0010:__release_sock+0x54/0xe0
>     ...
>     Call Trace:
>      release_sock+0x30/0xa0
>      af_alg_accept+0x122/0x1c0 [af_alg]
>      alg_accept+0x15/0x20 [af_alg]
>      SYSC_accept4+0xff/0x210
>      SyS_accept+0x10/0x20
>      do_syscall_64+0x73/0x130
>      entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Applied to unstable/master-next, thanks!
Kelsey Skunberg July 22, 2020, 7:19 a.m. UTC | #4
This patch got applied to Eoan/master-next via an upstream patch set.
Will update again once applied to Focal too. Thank you! 

-Kelsey

On 2020-06-30 20:09:41 , Mauricio Faria de Oliveira wrote:
> Regarding the submission policies for this kernel SRU cycle:
> this patch does not necessarily have to be applied for now;
> just review/ack for B/E would be useful if at all possible.
> 
> It has only been merged on Linus' tree yesterday.
> 
> The patch applies cleanly on Unstable/Focal/Eoan,
> and is a trivial backport on Disco/Bionic/Xenial.
> (it's on all series because it's a fix to stable.)
> 
> [Impact]
> 
>  * Users of the Linux kernel's crypto userspace API
>    reported BUG() / kernel NULL pointer dereference
>    errors after kernel upgrades.
> 
>  * The stack trace signature is an accept() syscall
>    going through af_alg_accept() and hitting errors
>    usually in one of:
>    - apparmor_sk_clone_security()
>    - apparmor_sock_graft()
>    - release_sock()
> 
> [Fix]
>       
>  * This is a regression introduced by upstream commit
>    37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
>    in sk_destruct") which made its way through stable.
>    
>  * The offending patch allows the critical regions
>    of af_alg_accept() and af_alg_release_parent() to
>    run concurrently; now with the "right" events on 2
>    CPUs it might drop the non-atomic reference counter
>    of the alg_sock then the sock, thus release a sock
>    that is still in use.
> 
>  * The fix is upstream commit 34c86f4c4a7b ("crypto:
>    af_alg - fix use-after-free in af_alg_accept() due
>    to bh_lock_sock()") [1]. It changes alg_sock's ref
>    counter to atomic, which addresses the root cause.
>    
> [Test Case]
> 
>  * There is a synthetic test case available, which
>    uses a kprobes kernel module to synchronize the
>    concurrent CPUs on the instructions responsible
>    for the problem; and a userspace part to run it.
> 
>  * The organic reproducer is the Varnish Cache Plus
>    software with the Crypto vmod (which uses kernel
>    crypto userspace API) under long, very high load.
>    
>  * The patch has been verified on both reproducers
>    with the 4.15 and 5.7 kernels.
>    
>  * More tests performed with 'stress-ng --af-alg'
>    with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal
>    (all on same version of stress-ng, V0.11.14)
>    
>    No regressions observed from original kernel.
>    (the af-alg stressor can exercise almost all
>    kernel crypto modules shipped with the kernel;
>    so it checks more paths/crypto alg interfaces.)
>    
> [Regression Potential]
> 
>  * The fix patch does a fundamental change in how
>    alg_sock reference counters work, plus another
>    change to the 'nokey' counting. This of course
>    *has* a risk of regression.
> 
>  * Regressions theoretically could manifest as use
>    after free errors (in case of undercounting) in
>    the af_alg functions or silent memory leaks (in
>    case of overcounting), but also other behaviors
>    since reference counting is key to many things.
>    
>  * FWIW, this patch has been written by the crypto
>    subsystem maintainer, who certainly knows a lot
>    of the normal and corner cases, thus giving the
>    patch more credit.
>    
>  * Testing with the organic reproducer ran as long
>    as 5 days, without issues, so it does look good.
> 
> [Other Info]
> 
>  * Not sending for Groovy (should get via Unstable).
>  
>  * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
>  
> [Stack Trace Examples]
> 
> Examples:
> 
>     BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>     ...
>     RIP: 0010:apparmor_sk_clone_security+0x26/0x70
>     ...
>     Call Trace:
>      security_sk_clone+0x33/0x50
>      af_alg_accept+0x81/0x1c0 [af_alg]
>      alg_accept+0x15/0x20 [af_alg]
>      SYSC_accept4+0xff/0x210
>      SyS_accept+0x10/0x20
>      do_syscall_64+0x73/0x130
>      entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
>     general protection fault: 0000 [#1] SMP PTI
>     ...
>     RIP: 0010:__release_sock+0x54/0xe0
>     ...
>     Call Trace:
>      release_sock+0x30/0xa0
>      af_alg_accept+0x122/0x1c0 [af_alg]
>      alg_accept+0x15/0x20 [af_alg]
>      SYSC_accept4+0xff/0x210
>      SyS_accept+0x10/0x20
>      do_syscall_64+0x73/0x130
>      entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> Herbert Xu (1):
>   crypto: af_alg - fix use-after-free in af_alg_accept() due to
>     bh_lock_sock()
> 
>  crypto/af_alg.c         | 26 +++++++++++---------------
>  crypto/algif_aead.c     |  9 +++------
>  crypto/algif_hash.c     |  9 +++------
>  crypto/algif_skcipher.c |  9 +++------
>  include/crypto/if_alg.h |  4 ++--
>  5 files changed, 22 insertions(+), 35 deletions(-)
> 
> -- 
> 2.25.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Kelsey Skunberg Aug. 4, 2020, 1:18 a.m. UTC | #5
Applied to Focal/master-next. Thank you!

-Kelsey

On 2020-06-30 20:09:41 , Mauricio Faria de Oliveira wrote:
> Regarding the submission policies for this kernel SRU cycle:
> this patch does not necessarily have to be applied for now;
> just review/ack for B/E would be useful if at all possible.
> 
> It has only been merged on Linus' tree yesterday.
> 
> The patch applies cleanly on Unstable/Focal/Eoan,
> and is a trivial backport on Disco/Bionic/Xenial.
> (it's on all series because it's a fix to stable.)
> 
> [Impact]
> 
>  * Users of the Linux kernel's crypto userspace API
>    reported BUG() / kernel NULL pointer dereference
>    errors after kernel upgrades.
> 
>  * The stack trace signature is an accept() syscall
>    going through af_alg_accept() and hitting errors
>    usually in one of:
>    - apparmor_sk_clone_security()
>    - apparmor_sock_graft()
>    - release_sock()
> 
> [Fix]
>       
>  * This is a regression introduced by upstream commit
>    37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
>    in sk_destruct") which made its way through stable.
>    
>  * The offending patch allows the critical regions
>    of af_alg_accept() and af_alg_release_parent() to
>    run concurrently; now with the "right" events on 2
>    CPUs it might drop the non-atomic reference counter
>    of the alg_sock then the sock, thus release a sock
>    that is still in use.
> 
>  * The fix is upstream commit 34c86f4c4a7b ("crypto:
>    af_alg - fix use-after-free in af_alg_accept() due
>    to bh_lock_sock()") [1]. It changes alg_sock's ref
>    counter to atomic, which addresses the root cause.
>    
> [Test Case]
> 
>  * There is a synthetic test case available, which
>    uses a kprobes kernel module to synchronize the
>    concurrent CPUs on the instructions responsible
>    for the problem; and a userspace part to run it.
> 
>  * The organic reproducer is the Varnish Cache Plus
>    software with the Crypto vmod (which uses kernel
>    crypto userspace API) under long, very high load.
>    
>  * The patch has been verified on both reproducers
>    with the 4.15 and 5.7 kernels.
>    
>  * More tests performed with 'stress-ng --af-alg'
>    with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal
>    (all on same version of stress-ng, V0.11.14)
>    
>    No regressions observed from original kernel.
>    (the af-alg stressor can exercise almost all
>    kernel crypto modules shipped with the kernel;
>    so it checks more paths/crypto alg interfaces.)
>    
> [Regression Potential]
> 
>  * The fix patch does a fundamental change in how
>    alg_sock reference counters work, plus another
>    change to the 'nokey' counting. This of course
>    *has* a risk of regression.
> 
>  * Regressions theoretically could manifest as use
>    after free errors (in case of undercounting) in
>    the af_alg functions or silent memory leaks (in
>    case of overcounting), but also other behaviors
>    since reference counting is key to many things.
>    
>  * FWIW, this patch has been written by the crypto
>    subsystem maintainer, who certainly knows a lot
>    of the normal and corner cases, thus giving the
>    patch more credit.
>    
>  * Testing with the organic reproducer ran as long
>    as 5 days, without issues, so it does look good.
> 
> [Other Info]
> 
>  * Not sending for Groovy (should get via Unstable).
>  
>  * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
>  
> [Stack Trace Examples]
> 
> Examples:
> 
>     BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>     ...
>     RIP: 0010:apparmor_sk_clone_security+0x26/0x70
>     ...
>     Call Trace:
>      security_sk_clone+0x33/0x50
>      af_alg_accept+0x81/0x1c0 [af_alg]
>      alg_accept+0x15/0x20 [af_alg]
>      SYSC_accept4+0xff/0x210
>      SyS_accept+0x10/0x20
>      do_syscall_64+0x73/0x130
>      entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
>     general protection fault: 0000 [#1] SMP PTI
>     ...
>     RIP: 0010:__release_sock+0x54/0xe0
>     ...
>     Call Trace:
>      release_sock+0x30/0xa0
>      af_alg_accept+0x122/0x1c0 [af_alg]
>      alg_accept+0x15/0x20 [af_alg]
>      SYSC_accept4+0xff/0x210
>      SyS_accept+0x10/0x20
>      do_syscall_64+0x73/0x130
>      entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> Herbert Xu (1):
>   crypto: af_alg - fix use-after-free in af_alg_accept() due to
>     bh_lock_sock()
> 
>  crypto/af_alg.c         | 26 +++++++++++---------------
>  crypto/algif_aead.c     |  9 +++------
>  crypto/algif_hash.c     |  9 +++------
>  crypto/algif_skcipher.c |  9 +++------
>  include/crypto/if_alg.h |  4 ++--
>  5 files changed, 22 insertions(+), 35 deletions(-)
> 
> -- 
> 2.25.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team