Message ID | 20180607203949.16945-1-xiyou.wangcong@gmail.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net] socket: close race condition between sock_close() and sockfs_setattr() | expand |
On Thu, Jun 07, 2018 at 01:39:49PM -0700, Cong Wang wrote: > fchownat() doesn't even hold refcnt of fd until it figures out > fd is really needed (otherwise is ignored) and releases it after > it resolves the path. This means sock_close() could race with > sockfs_setattr(), which leads to a NULL pointer dereference > since typically we set sock->sk to NULL in ->release(). > > As pointed out by Al, this is unique to sockfs. So we can fix this > in socket layer by acquiring inode_lock in sock_close() and > checking against NULL in sockfs_setattr(). That looks like a massive overkill - it's way heavier than it should be. And it's very likely to trigger shitloads of deadlock warnings, some possibly even true.
On Thu, Jun 7, 2018 at 2:26 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, Jun 07, 2018 at 01:39:49PM -0700, Cong Wang wrote: >> fchownat() doesn't even hold refcnt of fd until it figures out >> fd is really needed (otherwise is ignored) and releases it after >> it resolves the path. This means sock_close() could race with >> sockfs_setattr(), which leads to a NULL pointer dereference >> since typically we set sock->sk to NULL in ->release(). >> >> As pointed out by Al, this is unique to sockfs. So we can fix this >> in socket layer by acquiring inode_lock in sock_close() and >> checking against NULL in sockfs_setattr(). > > That looks like a massive overkill - it's way heavier than it should be. I don't see any other quick way to fix this. My initial thought is to keep that refcnt until path_put(), apparently you don't like it either. > And it's very likely to trigger shitloads of deadlock warnings, some > possibly even true. I do audit the inode_lock usage in networking, I don't see any deadlock, of course, there could be some non-networking code uses socket API that I missed. But _generally_, socket doesn't have a pointer to retrieve this inode.
On Thu, Jun 07, 2018 at 02:45:58PM -0700, Cong Wang wrote: > On Thu, Jun 7, 2018 at 2:26 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Thu, Jun 07, 2018 at 01:39:49PM -0700, Cong Wang wrote: > >> fchownat() doesn't even hold refcnt of fd until it figures out > >> fd is really needed (otherwise is ignored) and releases it after > >> it resolves the path. This means sock_close() could race with > >> sockfs_setattr(), which leads to a NULL pointer dereference > >> since typically we set sock->sk to NULL in ->release(). > >> > >> As pointed out by Al, this is unique to sockfs. So we can fix this > >> in socket layer by acquiring inode_lock in sock_close() and > >> checking against NULL in sockfs_setattr(). > > > > That looks like a massive overkill - it's way heavier than it should be. > > I don't see any other quick way to fix this. My initial thought is > to keep that refcnt until path_put(), apparently you don't like it > either. You do realize that the same ->setattr() can be called by way of chown() on /proc/self/fd/<n>, right? What would you do there - bump refcount on that struct file when traversing that symlink and hold it past the end of pathname resolution, until... what exactly?
On Thu, Jun 7, 2018 at 3:04 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Thu, Jun 07, 2018 at 02:45:58PM -0700, Cong Wang wrote: >> On Thu, Jun 7, 2018 at 2:26 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: >> > On Thu, Jun 07, 2018 at 01:39:49PM -0700, Cong Wang wrote: >> >> fchownat() doesn't even hold refcnt of fd until it figures out >> >> fd is really needed (otherwise is ignored) and releases it after >> >> it resolves the path. This means sock_close() could race with >> >> sockfs_setattr(), which leads to a NULL pointer dereference >> >> since typically we set sock->sk to NULL in ->release(). >> >> >> >> As pointed out by Al, this is unique to sockfs. So we can fix this >> >> in socket layer by acquiring inode_lock in sock_close() and >> >> checking against NULL in sockfs_setattr(). >> > >> > That looks like a massive overkill - it's way heavier than it should be. >> >> I don't see any other quick way to fix this. My initial thought is >> to keep that refcnt until path_put(), apparently you don't like it >> either. > > You do realize that the same ->setattr() can be called by way of > chown() on /proc/self/fd/<n>, right? What would you do there - > bump refcount on that struct file when traversing that symlink and > hold it past the end of pathname resolution, until... what exactly? I was thinking about this: error = user_path_at(dfd,....); // hold dfd when needed if (!error) { error = chown_common(&path, mode); path_put(&path); // release dfd if held With this, we can guarantee ->release() is only possibly called after chown_common() which is after ->setattr() too.
On Thu, Jun 07, 2018 at 03:15:15PM -0700, Cong Wang wrote: > > You do realize that the same ->setattr() can be called by way of > > chown() on /proc/self/fd/<n>, right? What would you do there - > > bump refcount on that struct file when traversing that symlink and > > hold it past the end of pathname resolution, until... what exactly? > > I was thinking about this: > > error = user_path_at(dfd,....); // hold dfd when needed > > if (!error) { > error = chown_common(&path, mode); > path_put(&path); // release dfd if held > > With this, we can guarantee ->release() is only possibly called > after chown_common() which is after ->setattr() too. No, we can't. You are assuming that there *is* dfd and that it points to the opened socket we are going to operate upon. That is not guaranteed. At all. If e.g. 42 is a file descriptor of an opened socket, plain chown(2) on /proc/self/fd/42 will trigger that ->setattr(). No dfd in sight. We do run across an opened file at some point, all right - when we traverse the symlink in procfs. You can't bump ->f_count there. Even leaving aside the "where would I stash the reference to that file?" and "how long would I hold it?", you can't bump ->f_count on other process' files. That would bugger the expectations of close(2) callers.
From: Cong Wang <xiyou.wangcong@gmail.com> Date: Thu, 7 Jun 2018 13:39:49 -0700 > fchownat() doesn't even hold refcnt of fd until it figures out > fd is really needed (otherwise is ignored) and releases it after > it resolves the path. This means sock_close() could race with > sockfs_setattr(), which leads to a NULL pointer dereference > since typically we set sock->sk to NULL in ->release(). > > As pointed out by Al, this is unique to sockfs. So we can fix this > in socket layer by acquiring inode_lock in sock_close() and > checking against NULL in sockfs_setattr(). > > sock_release() is called in many places, only the sock_close() > path matters here. And fortunately, this should not affect normal > sock_close() as it is only called when the last fd refcnt is gone. > It only affects sock_close() with a parallel sockfs_setattr() in > progress, which is not common. > > Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") > Reported-by: shankarapailoor <shankarapailoor@gmail.com> > Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > Cc: Lorenzo Colitti <lorenzo@google.com> > Cc: Al Viro <viro@zeniv.linux.org.uk> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> I'm applying this for now, it is at least a step towards fixing all of these issues. If it is really offensive, I can revert, just tell me.
On Sun, Jun 10, 2018 at 12:27 PM, David Miller <davem@davemloft.net> wrote: > I'm applying this for now, it is at least a step towards fixing > all of these issues. > > If it is really offensive, I can revert, just tell me. Thanks, David! Unless there is something fundamentally broken, there is no reason to revert it. The only risk here is some possible deadlock, but I am ready to fix any deadlock report caused by this patch. :)
diff --git a/net/socket.c b/net/socket.c index af57d85bcb48..8a109012608a 100644 --- a/net/socket.c +++ b/net/socket.c @@ -541,7 +541,10 @@ static int sockfs_setattr(struct dentry *dentry, struct iattr *iattr) if (!err && (iattr->ia_valid & ATTR_UID)) { struct socket *sock = SOCKET_I(d_inode(dentry)); - sock->sk->sk_uid = iattr->ia_uid; + if (sock->sk) + sock->sk->sk_uid = iattr->ia_uid; + else + err = -ENOENT; } return err; @@ -590,12 +593,16 @@ EXPORT_SYMBOL(sock_alloc); * an inode not a file. */ -void sock_release(struct socket *sock) +static void __sock_release(struct socket *sock, struct inode *inode) { if (sock->ops) { struct module *owner = sock->ops->owner; + if (inode) + inode_lock(inode); sock->ops->release(sock); + if (inode) + inode_unlock(inode); sock->ops = NULL; module_put(owner); } @@ -609,6 +616,11 @@ void sock_release(struct socket *sock) } sock->file = NULL; } + +void sock_release(struct socket *sock) +{ + __sock_release(sock, NULL); +} EXPORT_SYMBOL(sock_release); void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags) @@ -1171,7 +1183,7 @@ static int sock_mmap(struct file *file, struct vm_area_struct *vma) static int sock_close(struct inode *inode, struct file *filp) { - sock_release(SOCKET_I(inode)); + __sock_release(SOCKET_I(inode), inode); return 0; }
fchownat() doesn't even hold refcnt of fd until it figures out fd is really needed (otherwise is ignored) and releases it after it resolves the path. This means sock_close() could race with sockfs_setattr(), which leads to a NULL pointer dereference since typically we set sock->sk to NULL in ->release(). As pointed out by Al, this is unique to sockfs. So we can fix this in socket layer by acquiring inode_lock in sock_close() and checking against NULL in sockfs_setattr(). sock_release() is called in many places, only the sock_close() path matters here. And fortunately, this should not affect normal sock_close() as it is only called when the last fd refcnt is gone. It only affects sock_close() with a parallel sockfs_setattr() in progress, which is not common. Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Reported-by: shankarapailoor <shankarapailoor@gmail.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> --- net/socket.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)