use-after-free in sock_wake_async

On Wed, 2015-11-25 at 16:43 +0000, Rainer Weikusat wrote:
> Eric Dumazet <edumazet@google.com> writes:
> > On Tue, Nov 24, 2015 at 5:10 PM, Rainer Weikusat
> > <rweikusat@mobileactivedefense.com> wrote:
> 
> [...]
> 
> >> It's also easy to verify: Swap the unix_state_lock and
> >> other->sk_data_ready and see if the issue still occurs. Right now (this
> >> may change after I had some sleep as it's pretty late for me), I don't
> >> think there's another local fix: The ->sk_data_ready accesses a
> >> pointer after the lock taken by the code which will clear and
> >> then later free it was released.
> >
> > It seems that :
> >
> > int sock_wake_async(struct socket *sock, int how, int band)
> >
> > should really be changed to
> >
> > int sock_wake_async(struct socket_wq *wq, int how, int band)
> >
> > So that RCU rules (already present) apply safely.
> >
> > sk->sk_socket is inherently racy (that is : racy without using
> > sk_callback_lock rwlock )
> 
> The comment above sock_wait_async states that
> 
> /* This function may be called only under socket lock or callback_lock or rcu_lock */
> 
> In this case, it's called via sk_wake_async (include/net/sock.h) which
> is - in turn - called via sock_def_readable (the 'default' data ready
> routine/ net/core/sock.c) which looks like this:
> 
> static void sock_def_readable(struct sock *sk)
> {
> 	struct socket_wq *wq;
> 
> 	rcu_read_lock();
> 	wq = rcu_dereference(sk->sk_wq);
> 	if (wq_has_sleeper(wq))
> 		wake_up_interruptible_sync_poll(&wq->wait, POLLIN | POLLPRI |
> 						POLLRDNORM | POLLRDBAND);
> 	sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
> 	rcu_read_unlock();
> }
> 
> and should thus satisfy the constraint documented by the comment (I
> didn't verify if the comment is actually correct, though).
> 
> Further - sorry about that - I think changing code in "half of the
> network stack" in order to avoid calling a certain routine which will
> only ever do something in case someone's using signal-driven I/O with an
> already acquired lock held is a terrifying idea. Because of this, I
> propose the following alternate patch which should also solve the
> problem by ensuring that the ->sk_data_ready activity happens before
> unix_release_sock/ sock_release get a chance to clear or free anything
> which will be needed.
> 
> In case this demonstrably causes other issues, a more complicated
> alternate idea (still restricting itself to changes to the af_unix code)
> would be to move the socket_wq structure to a dummy struct socket
> allocated by unix_release_sock and freed by the destructor.
> 
> ---
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 4e95bdf..5c87ea6 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1754,8 +1754,8 @@ restart_locked:
>         skb_queue_tail(&other->sk_receive_queue, skb);
>         if (max_level > unix_sk(other)->recursion_level)
>                 unix_sk(other)->recursion_level = max_level;
> -       unix_state_unlock(other);
>         other->sk_data_ready(other);
> +       unix_state_unlock(other);
>         sock_put(other);
>         scm_destroy(&scm);
>         return len;
> @@ -1860,8 +1860,8 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
>                 skb_queue_tail(&other->sk_receive_queue, skb);
>                 if (max_level > unix_sk(other)->recursion_level)
>                         unix_sk(other)->recursion_level = max_level;
> -               unix_state_unlock(other);
>                 other->sk_data_ready(other);
> +               unix_state_unlock(other);
>                 sent += size;
>         }
>  

The issue is way more complex than that.

We cannot prevent inode from disappearing.
We can not safely dereference "(struct socket *)->flags"

locking the 'struct sock' wont help at all.

Here is my current work/patch :

It ran for ~2 hours under stress without warning, but I want it to run
24 hours before official submission.

Note that moving flags into sk_wq will actually avoid one cache line
miss in fast path, so might give performance improvement.

This minimal patch only moves SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
but we can move other flags later.

sock_wake_async() must not even attempt to deref a struct socket.

-> sock_wake_async(struct socket_wq *wq, int how, int band);

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Message ID	1448471494.24696.18.camel@edumazet-glaptop2.roam.corp.google.com
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id CA1C2140306 for <patchwork-incoming@ozlabs.org>; Thu, 26 Nov 2015 04:11:57 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=NeM2Sn7f; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751014AbbKYRLj (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Wed, 25 Nov 2015 12:11:39 -0500 Received: from mail-pa0-f43.google.com ([209.85.220.43]:33312 "EHLO mail-pa0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750726AbbKYRLg (ORCPT <rfc822;netdev@vger.kernel.org>); Wed, 25 Nov 2015 12:11:36 -0500 Received: by pabfh17 with SMTP id fh17so64840440pab.0; Wed, 25 Nov 2015 09:11:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:subject:from:to:cc:date:in-reply-to:references :content-type:mime-version:content-transfer-encoding; bh=Y34lkd8toFyYQecUZW0MOpbjee0kPoqVa5Ts6qxoWro=; b=NeM2Sn7fbb6LA/rhp97KCz0k4n4m512cgqTy++c8jPpDmrVSjPc177XBYXgEORYli+ avWekjH24ChHCbpWZXs8n2gIaRIxPA2mlyHBkcJb7apH1zxc7ZQZmepJwczBCyGM7Xil 0r1McmZyXDhjyCiXUqUL4LOkSaAJxUR3f8GvPrkuqLByrYoRp4PPkR3Ns6rLhm8+aN7Q jlZGLEAcxQrhHuGJLwAm7MnQiRii72uDJpsoJmyk/ruCyVia8AEubcwVUez8Kt+M72ru ddfd4Gb+i0VZDJwYDMQza5MDID9AhSlF0+9uhoh3kTge6jvnopMklTd2sNnxkPyKkSKX V2nQ== X-Received: by 10.66.140.79 with SMTP id re15mr51708218pab.127.1448471496048; Wed, 25 Nov 2015 09:11:36 -0800 (PST) Received: from [172.26.54.208] ([172.26.54.208]) by smtp.gmail.com with ESMTPSA id ux3sm22552810pac.18.2015.11.25.09.11.34 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 25 Nov 2015 09:11:35 -0800 (PST) Message-ID: <1448471494.24696.18.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: use-after-free in sock_wake_async From: Eric Dumazet <eric.dumazet@gmail.com> To: Rainer Weikusat <rweikusat@mobileactivedefense.com> Cc: Eric Dumazet <edumazet@google.com>, Dmitry Vyukov <dvyukov@google.com>, Benjamin LaHaise <bcrl@kvack.org>, "David S. Miller" <davem@davemloft.net>, Hannes Frederic Sowa <hannes@stressinduktion.org>, Al Viro <viro@zeniv.linux.org.uk>, David Howells <dhowells@redhat.com>, Ying Xue <ying.xue@windriver.com>, "Eric W. Biederman" <ebiederm@xmission.com>, netdev <netdev@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, syzkaller <syzkaller@googlegroups.com>, Kostya Serebryany <kcc@google.com>, Alexander Potapenko <glider@google.com>, Sasha Levin <sasha.levin@oracle.com> Date: Wed, 25 Nov 2015 09:11:34 -0800 In-Reply-To: <87io4q3u8u.fsf@doppelsaurus.mobileactivedefense.com> References: <CACT4Y+ZZQ0ooT9LekV35u_jUAtAHJfg5O5SD=xATfHUF5UdNBQ@mail.gmail.com> <CANn89i+Uw+YzhRbQVSyf=FAQBO06JfFoxQpSHxfmDpS5_iZBQw@mail.gmail.com> <87poyzj7j2.fsf@doppelsaurus.mobileactivedefense.com> <CANn89iKg0Lc7OVLhxUOzPRr4OhEGxA8L7V1jyaqFAS6+7kEXTA@mail.gmail.com> <87io4qevdp.fsf@doppelsaurus.mobileactivedefense.com> <CANn89iJgW+6giMUicU1m831+mkaNy9z1hQB-ixvRHBqZo-7Yig@mail.gmail.com> <87io4q3u8u.fsf@doppelsaurus.mobileactivedefense.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

use-after-free in sock_wake_async

Commit Message

Comments

Patch