[net] socket: close race condition between sock_close() and sockfs_setattr()

Message ID 20180607203949.16945-1-xiyou.wangcong@gmail.com
State Accepted
Delegated to: David Miller
Headers show
Series
  • [net] socket: close race condition between sock_close() and sockfs_setattr()
Related show

Commit Message

Cong Wang June 7, 2018, 8:39 p.m.
fchownat() doesn't even hold refcnt of fd until it figures out
fd is really needed (otherwise is ignored) and releases it after
it resolves the path. This means sock_close() could race with
sockfs_setattr(), which leads to a NULL pointer dereference
since typically we set sock->sk to NULL in ->release().

As pointed out by Al, this is unique to sockfs. So we can fix this
in socket layer by acquiring inode_lock in sock_close() and
checking against NULL in sockfs_setattr().

sock_release() is called in many places, only the sock_close()
path matters here. And fortunately, this should not affect normal
sock_close() as it is only called when the last fd refcnt is gone.
It only affects sock_close() with a parallel sockfs_setattr() in
progress, which is not common.

Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.")
Reported-by: shankarapailoor <shankarapailoor@gmail.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/socket.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Al Viro June 7, 2018, 9:26 p.m. | #1
On Thu, Jun 07, 2018 at 01:39:49PM -0700, Cong Wang wrote:
> fchownat() doesn't even hold refcnt of fd until it figures out
> fd is really needed (otherwise is ignored) and releases it after
> it resolves the path. This means sock_close() could race with
> sockfs_setattr(), which leads to a NULL pointer dereference
> since typically we set sock->sk to NULL in ->release().
> 
> As pointed out by Al, this is unique to sockfs. So we can fix this
> in socket layer by acquiring inode_lock in sock_close() and
> checking against NULL in sockfs_setattr().

That looks like a massive overkill - it's way heavier than it should be.
And it's very likely to trigger shitloads of deadlock warnings, some
possibly even true.
Cong Wang June 7, 2018, 9:45 p.m. | #2
On Thu, Jun 7, 2018 at 2:26 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Jun 07, 2018 at 01:39:49PM -0700, Cong Wang wrote:
>> fchownat() doesn't even hold refcnt of fd until it figures out
>> fd is really needed (otherwise is ignored) and releases it after
>> it resolves the path. This means sock_close() could race with
>> sockfs_setattr(), which leads to a NULL pointer dereference
>> since typically we set sock->sk to NULL in ->release().
>>
>> As pointed out by Al, this is unique to sockfs. So we can fix this
>> in socket layer by acquiring inode_lock in sock_close() and
>> checking against NULL in sockfs_setattr().
>
> That looks like a massive overkill - it's way heavier than it should be.

I don't see any other quick way to fix this. My initial thought is
to keep that refcnt until path_put(), apparently you don't like it
either.


> And it's very likely to trigger shitloads of deadlock warnings, some
> possibly even true.

I do audit the inode_lock usage in networking, I don't see any
deadlock, of course, there could be some non-networking code
uses socket API that I missed. But _generally_, socket doesn't
have a pointer to retrieve this inode.
Al Viro June 7, 2018, 10:04 p.m. | #3
On Thu, Jun 07, 2018 at 02:45:58PM -0700, Cong Wang wrote:
> On Thu, Jun 7, 2018 at 2:26 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Thu, Jun 07, 2018 at 01:39:49PM -0700, Cong Wang wrote:
> >> fchownat() doesn't even hold refcnt of fd until it figures out
> >> fd is really needed (otherwise is ignored) and releases it after
> >> it resolves the path. This means sock_close() could race with
> >> sockfs_setattr(), which leads to a NULL pointer dereference
> >> since typically we set sock->sk to NULL in ->release().
> >>
> >> As pointed out by Al, this is unique to sockfs. So we can fix this
> >> in socket layer by acquiring inode_lock in sock_close() and
> >> checking against NULL in sockfs_setattr().
> >
> > That looks like a massive overkill - it's way heavier than it should be.
> 
> I don't see any other quick way to fix this. My initial thought is
> to keep that refcnt until path_put(), apparently you don't like it
> either.

You do realize that the same ->setattr() can be called by way of
chown() on /proc/self/fd/<n>, right?  What would you do there -
bump refcount on that struct file when traversing that symlink and
hold it past the end of pathname resolution, until... what exactly?
Cong Wang June 7, 2018, 10:15 p.m. | #4
On Thu, Jun 7, 2018 at 3:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Jun 07, 2018 at 02:45:58PM -0700, Cong Wang wrote:
>> On Thu, Jun 7, 2018 at 2:26 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> > On Thu, Jun 07, 2018 at 01:39:49PM -0700, Cong Wang wrote:
>> >> fchownat() doesn't even hold refcnt of fd until it figures out
>> >> fd is really needed (otherwise is ignored) and releases it after
>> >> it resolves the path. This means sock_close() could race with
>> >> sockfs_setattr(), which leads to a NULL pointer dereference
>> >> since typically we set sock->sk to NULL in ->release().
>> >>
>> >> As pointed out by Al, this is unique to sockfs. So we can fix this
>> >> in socket layer by acquiring inode_lock in sock_close() and
>> >> checking against NULL in sockfs_setattr().
>> >
>> > That looks like a massive overkill - it's way heavier than it should be.
>>
>> I don't see any other quick way to fix this. My initial thought is
>> to keep that refcnt until path_put(), apparently you don't like it
>> either.
>
> You do realize that the same ->setattr() can be called by way of
> chown() on /proc/self/fd/<n>, right?  What would you do there -
> bump refcount on that struct file when traversing that symlink and
> hold it past the end of pathname resolution, until... what exactly?

I was thinking about this:

        error = user_path_at(dfd,....); // hold dfd when needed

        if (!error) {
                error = chown_common(&path, mode);
                path_put(&path);      // release dfd if held

With this, we can guarantee ->release() is only possibly called
after chown_common() which is after ->setattr() too.
Al Viro June 7, 2018, 10:32 p.m. | #5
On Thu, Jun 07, 2018 at 03:15:15PM -0700, Cong Wang wrote:

> > You do realize that the same ->setattr() can be called by way of
> > chown() on /proc/self/fd/<n>, right?  What would you do there -
> > bump refcount on that struct file when traversing that symlink and
> > hold it past the end of pathname resolution, until... what exactly?
> 
> I was thinking about this:
> 
>         error = user_path_at(dfd,....); // hold dfd when needed
> 
>         if (!error) {
>                 error = chown_common(&path, mode);
>                 path_put(&path);      // release dfd if held
> 
> With this, we can guarantee ->release() is only possibly called
> after chown_common() which is after ->setattr() too.

No, we can't.  You are assuming that there *is* dfd and that it points
to the opened socket we are going to operate upon.  That is not guaranteed.
At all.  If e.g. 42 is a file descriptor of an opened socket, plain chown(2)
on /proc/self/fd/42 will trigger that ->setattr().  No dfd in sight.
We do run across an opened file at some point, all right - when we traverse
the symlink in procfs.  You can't bump ->f_count there.  Even leaving aside
the "where would I stash the reference to that file?" and "how long would I
hold it?", you can't bump ->f_count on other process' files.  That would
bugger the expectations of close(2) callers.
David Miller June 10, 2018, 7:27 p.m. | #6
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Thu,  7 Jun 2018 13:39:49 -0700

> fchownat() doesn't even hold refcnt of fd until it figures out
> fd is really needed (otherwise is ignored) and releases it after
> it resolves the path. This means sock_close() could race with
> sockfs_setattr(), which leads to a NULL pointer dereference
> since typically we set sock->sk to NULL in ->release().
> 
> As pointed out by Al, this is unique to sockfs. So we can fix this
> in socket layer by acquiring inode_lock in sock_close() and
> checking against NULL in sockfs_setattr().
> 
> sock_release() is called in many places, only the sock_close()
> path matters here. And fortunately, this should not affect normal
> sock_close() as it is only called when the last fd refcnt is gone.
> It only affects sock_close() with a parallel sockfs_setattr() in
> progress, which is not common.
> 
> Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.")
> Reported-by: shankarapailoor <shankarapailoor@gmail.com>
> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Cc: Lorenzo Colitti <lorenzo@google.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

I'm applying this for now, it is at least a step towards fixing
all of these issues.

If it is really offensive, I can revert, just tell me.
Cong Wang June 10, 2018, 7:43 p.m. | #7
On Sun, Jun 10, 2018 at 12:27 PM, David Miller <davem@davemloft.net> wrote:
> I'm applying this for now, it is at least a step towards fixing
> all of these issues.
>
> If it is really offensive, I can revert, just tell me.

Thanks, David!

Unless there is something fundamentally broken, there is no
reason to revert it. The only risk here is some possible deadlock,
but I am ready to fix any deadlock report caused by this
patch. :)

Patch

diff --git a/net/socket.c b/net/socket.c
index af57d85bcb48..8a109012608a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -541,7 +541,10 @@  static int sockfs_setattr(struct dentry *dentry, struct iattr *iattr)
 	if (!err && (iattr->ia_valid & ATTR_UID)) {
 		struct socket *sock = SOCKET_I(d_inode(dentry));
 
-		sock->sk->sk_uid = iattr->ia_uid;
+		if (sock->sk)
+			sock->sk->sk_uid = iattr->ia_uid;
+		else
+			err = -ENOENT;
 	}
 
 	return err;
@@ -590,12 +593,16 @@  EXPORT_SYMBOL(sock_alloc);
  *	an inode not a file.
  */
 
-void sock_release(struct socket *sock)
+static void __sock_release(struct socket *sock, struct inode *inode)
 {
 	if (sock->ops) {
 		struct module *owner = sock->ops->owner;
 
+		if (inode)
+			inode_lock(inode);
 		sock->ops->release(sock);
+		if (inode)
+			inode_unlock(inode);
 		sock->ops = NULL;
 		module_put(owner);
 	}
@@ -609,6 +616,11 @@  void sock_release(struct socket *sock)
 	}
 	sock->file = NULL;
 }
+
+void sock_release(struct socket *sock)
+{
+	__sock_release(sock, NULL);
+}
 EXPORT_SYMBOL(sock_release);
 
 void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags)
@@ -1171,7 +1183,7 @@  static int sock_mmap(struct file *file, struct vm_area_struct *vma)
 
 static int sock_close(struct inode *inode, struct file *filp)
 {
-	sock_release(SOCKET_I(inode));
+	__sock_release(SOCKET_I(inode), inode);
 	return 0;
 }