diff mbox series

[unix] missing barriers in some of unix_sock ->addr and ->path accesses

Message ID 20190215200934.GM2217@ZenIV.linux.org.uk
State Accepted
Delegated to: David Miller
Headers show
Series [unix] missing barriers in some of unix_sock ->addr and ->path accesses | expand

Commit Message

Al Viro Feb. 15, 2019, 8:09 p.m. UTC
Several u->addr and u->path users are not holding any locks in
common with unix_bind().  unix_state_lock() is useless for those
purposes.

u->addr is assign-once and *(u->addr) is fully set up by the time
we set u->addr (all under unix_table_lock).  u->path is also
set in the same critical area, also before setting u->addr, and
any unix_sock with ->path filled will have non-NULL ->addr.

So setting ->addr with smp_store_release() is all we need for those
"lockless" users - just have them fetch ->addr with smp_load_acquire()
and don't even bother looking at ->path if they see NULL ->addr.

Users of ->addr and ->path fall into several classes now:
    1) ones that do smp_load_acquire(u->addr) and access *(u->addr)
and u->path only if smp_load_acquire() has returned non-NULL.
    2) places holding unix_table_lock.  These are guaranteed that
*(u->addr) is seen fully initialized.  If unix_sock is in one of the
"bound" chains, so's ->path.
    3) unix_sock_destructor() using ->addr is safe.  All places
that set u->addr are guaranteed to have seen all stores *(u->addr)
while holding a reference to u and unix_sock_destructor() is called
when (atomic) refcount hits zero.
    4) unix_release_sock() using ->path is safe.  unix_bind()
is serialized wrt unix_release() (normally - by struct file
refcount), and for the instances that had ->path set by unix_bind()
unix_release_sock() comes from unix_release(), so they are fine.
Instances that had it set in unix_stream_connect() either end up
attached to a socket (in unix_accept()), in which case the call
chain to unix_release_sock() and serialization are the same as in
the previous case, or they never get accept'ed and unix_release_sock()
is called when the listener is shut down and its queue gets purged.
In that case the listener's queue lock provides the barriers needed -
unix_stream_connect() shoves our unix_sock into listener's queue
under that lock right after having set ->path and eventual
unix_release_sock() caller picks them from that queue under the
same lock right before calling unix_release_sock().
    5) unix_find_other() use of ->path is pointless, but safe -
it happens with successful lookup by (abstract) name, so ->path.dentry
is guaranteed to be NULL there.

Cc: stable@kernel.org
earlier-variant-reviewed-by: "Paul E. McKenney" <paulmck@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

---

Comments

Sasha Levin Feb. 18, 2019, 9:14 p.m. UTC | #1
Hi,

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: .+

The bot has tested the following trees: v4.20.10, v4.19.23, v4.14.101, v4.9.158, v4.4.174, v3.18.134.

v4.20.10: Build OK!
v4.19.23: Build OK!
v4.14.101: Failed to apply! Possible dependencies:
    27204aaa9dc6 ("tcp: uniform the set up of sockets after successful connection")
    3830d0771ef6 ("soc: qcom: Introduce QMI helpers")
    64ac5f5977df ("tipc: refactor function filter_rcv()")
    75da2163dbb6 ("tipc: introduce communication groups")
    9b2c45d479d0 ("net: make getname() functions return length rather than use int* parameter")
    9b8a11e82615 ("soc: qcom: Introduce QMI encoder/decoder")
    a3f8683bf7d5 ("->poll() methods should return __poll_t")
    a50ff19d0658 ("of/platform: Generalize /reserved-memory handling")
    a80ae5306a73 ("tipc: improve destination linked list")
    ade994f4f6c8 ("net: annotate ->poll() instances")
    d1de6d6c639b ("soc: qcom: Remote filesystem memory driver")

v4.9.158: Failed to apply! Possible dependencies:
    13d3b1ebe287 ("bpf: Support for setting initial receive window")
    27204aaa9dc6 ("tcp: uniform the set up of sockets after successful connection")
    3007098494be ("cgroup: add support for eBPF programs")
    3830d0771ef6 ("soc: qcom: Introduce QMI helpers")
    40304b2a1567 ("bpf: BPF support for sock_ops")
    526735ddc0ae ("net: fix AF_SMC related typo")
    610236587600 ("bpf: Add new cgroup attach type to enable sock modifications")
    8550f328f45d ("bpf: Support for per connection SYN/SYN-ACK RTOs")
    8c9814b97002 ("net: convert unix_address.refcnt from atomic_t to refcount_t")
    9872a4bde31b ("bpf: Add TCP connection BPF callbacks")
    9b2c45d479d0 ("net: make getname() functions return length rather than use int* parameter")
    9b8a11e82615 ("soc: qcom: Introduce QMI encoder/decoder")
    a3f8683bf7d5 ("->poll() methods should return __poll_t")
    a50ff19d0658 ("of/platform: Generalize /reserved-memory handling")
    ac7138746e14 ("smc: establish new socket family")
    ade994f4f6c8 ("net: annotate ->poll() instances")
    b2cd12574aa3 ("bpf: Refactor cgroups code in prep for new type")
    cb5da847af44 ("tipc: set kern=0 in sk_alloc() during tipc_accept()")
    cdfbabfb2f0c ("net: Work around lockdep limitation in sockets that use sockets")
    d1de6d6c639b ("soc: qcom: Remote filesystem memory driver")

v4.4.174: Failed to apply! Possible dependencies:
    35da60941e44 ("pstore/ram: add Device Tree bindings")
    3830d0771ef6 ("soc: qcom: Introduce QMI helpers")
    4420cfd3f51c ("staging: lustre: format properly all comment blocks for LNet core")
    44a7185c2ae6 ("of/platform: Add common method to populate default bus")
    4c3523623dc0 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
    529182e204db ("ramoops: use DT reserved-memory bindings")
    53692b1de419 ("sctp: Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC")
    5fd88337d209 ("staging: lustre: fix all conditional comparison to zero in LNet layer")
    8c9814b97002 ("net: convert unix_address.refcnt from atomic_t to refcount_t")
    9b2c45d479d0 ("net: make getname() functions return length rather than use int* parameter")
    9b8a11e82615 ("soc: qcom: Introduce QMI encoder/decoder")
    a188222b6ed2 ("net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK")
    a50ff19d0658 ("of/platform: Generalize /reserved-memory handling")
    d1de6d6c639b ("soc: qcom: Remote filesystem memory driver")
    fc520f8b4fa3 ("of/platform: disable the of_platform_default_populate_init() for all the ppc boards")

v3.18.134: Failed to apply! Possible dependencies:
    35da60941e44 ("pstore/ram: add Device Tree bindings")
    3830d0771ef6 ("soc: qcom: Introduce QMI helpers")
    43443ad692cf ("of/platform: add function to populate default bus")
    44a7185c2ae6 ("of/platform: Add common method to populate default bus")
    4b638df4c9d5 ("soc: qcom: Add Shared Memory Manager driver")
    529182e204db ("ramoops: use DT reserved-memory bindings")
    7ce75bb2c05e ("ARM: qcom: Add Subsystem Power Manager (SPM) driver")
    7d0c8beea6b8 ("soc: qcom: Reorder SMEM/SMD configs")
    8c9814b97002 ("net: convert unix_address.refcnt from atomic_t to refcount_t")
    936f14cf4e67 ("soc: qcom: Driver for the Qualcomm RPM over SMD")
    9b2c45d479d0 ("net: make getname() functions return length rather than use int* parameter")
    9b8a11e82615 ("soc: qcom: Introduce QMI encoder/decoder")
    a50ff19d0658 ("of/platform: Generalize /reserved-memory handling")
    d1de6d6c639b ("soc: qcom: Remote filesystem memory driver")
    f2ab3298fb49 ("soc: qcom: Add Shared Memory Driver")
    fc520f8b4fa3 ("of/platform: disable the of_platform_default_populate_init() for all the ppc boards")


How should we proceed with this patch?

--
Thanks,
Sasha
Al Viro Feb. 18, 2019, 9:32 p.m. UTC | #2
On Mon, Feb 18, 2019 at 09:14:33PM +0000, Sasha Levin wrote:
> Hi,
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a -stable tag.

Ugh...  Should've removed Cc; stable from netdev posting; my apologies.

> How should we proceed with this patch?

Wait for it to get into davem's tree, for starters?

Sorry about that, again...

FWIW, further adventures in net/unix land:

unix_dgram_poll() contains
                /* connection hasn't started yet? */
                if (sk->sk_state == TCP_SYN_SENT)
                        return mask;
and nothing in there sets TCP_SYN_SENT state (not that it would've made
any sense of AF_UNIX).

unix_poll() contains
        /* Connection-based need to check for termination and startup */
        if ((sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) &&
            sk->sk_state == TCP_CLOSE)
while it can only be called as ->poll of unix_stream_ops, which means
that sk->sk_type can't be anything other that SOCK_STREAM in there.

static void scan_children(struct sock *x, void (*func)(struct unix_sock *),
                          struct sk_buff_head *hitlist)
{
        if (x->sk_state != TCP_LISTEN) {
                scan_inflight(x, func, hitlist);
        } else {
...
has no exclusion or barriers to deal with the store of TCP_LISTEN into
->sk_state inside unix_listen().  That one's potentially nasty - we won't
find SCM_RIGHTS already queued to embrios in x's queue until we notice
that x->sk_state == TCP_LISTEN, which can happen between two calls of
scan_children() in the same unix_gc() run.  The race is narrow, but not
impossible, AFAICS.  Reasonably easy to fix - lift locking the queue
out of scan_inflight(), grab the queue lock before checking if it's
a listener and have unix_listen() either grab the queue lock around the
assignment to ->sk_state, or pump it up and down before dropping
unix_state_lock() (at which point connect() might be able to find it, etc.)

Al, still digging through net/unix...
Sasha Levin Feb. 19, 2019, 1:30 a.m. UTC | #3
On Mon, Feb 18, 2019 at 09:32:46PM +0000, Al Viro wrote:
>On Mon, Feb 18, 2019 at 09:14:33PM +0000, Sasha Levin wrote:
>> Hi,
>>
>> [This is an automated email]
>>
>> This commit has been processed because it contains a -stable tag.
>
>Ugh...  Should've removed Cc; stable from netdev posting; my apologies.
>
>> How should we proceed with this patch?
>
>Wait for it to get into davem's tree, for starters?

It won't go in stable until it's upstream, the only reason for this
early mail is that it's easier to get responses from people while the
patch is still fresh in their brains (as you might have noticed ;) )
rather than in a few weeks.

--
Thanks,
Sasha
David Miller Feb. 21, 2019, 4:07 a.m. UTC | #4
From: Al Viro <viro@zeniv.linux.org.uk>
Date: Fri, 15 Feb 2019 20:09:35 +0000

> Several u->addr and u->path users are not holding any locks in
> common with unix_bind().  unix_state_lock() is useless for those
> purposes.
> 
> u->addr is assign-once and *(u->addr) is fully set up by the time
> we set u->addr (all under unix_table_lock).  u->path is also
> set in the same critical area, also before setting u->addr, and
> any unix_sock with ->path filled will have non-NULL ->addr.
> 
> So setting ->addr with smp_store_release() is all we need for those
> "lockless" users - just have them fetch ->addr with smp_load_acquire()
> and don't even bother looking at ->path if they see NULL ->addr.
> 
> Users of ->addr and ->path fall into several classes now:
>     1) ones that do smp_load_acquire(u->addr) and access *(u->addr)
> and u->path only if smp_load_acquire() has returned non-NULL.
>     2) places holding unix_table_lock.  These are guaranteed that
> *(u->addr) is seen fully initialized.  If unix_sock is in one of the
> "bound" chains, so's ->path.
>     3) unix_sock_destructor() using ->addr is safe.  All places
> that set u->addr are guaranteed to have seen all stores *(u->addr)
> while holding a reference to u and unix_sock_destructor() is called
> when (atomic) refcount hits zero.
>     4) unix_release_sock() using ->path is safe.  unix_bind()
> is serialized wrt unix_release() (normally - by struct file
> refcount), and for the instances that had ->path set by unix_bind()
> unix_release_sock() comes from unix_release(), so they are fine.
> Instances that had it set in unix_stream_connect() either end up
> attached to a socket (in unix_accept()), in which case the call
> chain to unix_release_sock() and serialization are the same as in
> the previous case, or they never get accept'ed and unix_release_sock()
> is called when the listener is shut down and its queue gets purged.
> In that case the listener's queue lock provides the barriers needed -
> unix_stream_connect() shoves our unix_sock into listener's queue
> under that lock right after having set ->path and eventual
> unix_release_sock() caller picks them from that queue under the
> same lock right before calling unix_release_sock().
>     5) unix_find_other() use of ->path is pointless, but safe -
> it happens with successful lookup by (abstract) name, so ->path.dentry
> is guaranteed to be NULL there.
> 
> earlier-variant-reviewed-by: "Paul E. McKenney" <paulmck@linux.ibm.com>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Applied and queued up for -stable, thanks Al.
diff mbox series

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 74d1eed7cbd4..a95d479caeea 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -890,7 +890,7 @@  static int unix_autobind(struct socket *sock)
 	addr->hash ^= sk->sk_type;
 
 	__unix_remove_socket(sk);
-	u->addr = addr;
+	smp_store_release(&u->addr, addr);
 	__unix_insert_socket(&unix_socket_table[addr->hash], sk);
 	spin_unlock(&unix_table_lock);
 	err = 0;
@@ -1060,7 +1060,7 @@  static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 
 	err = 0;
 	__unix_remove_socket(sk);
-	u->addr = addr;
+	smp_store_release(&u->addr, addr);
 	__unix_insert_socket(list, sk);
 
 out_unlock:
@@ -1331,15 +1331,29 @@  static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	RCU_INIT_POINTER(newsk->sk_wq, &newu->peer_wq);
 	otheru = unix_sk(other);
 
-	/* copy address information from listening to new sock*/
-	if (otheru->addr) {
-		refcount_inc(&otheru->addr->refcnt);
-		newu->addr = otheru->addr;
-	}
+	/* copy address information from listening to new sock
+	 *
+	 * The contents of *(otheru->addr) and otheru->path
+	 * are seen fully set up here, since we have found
+	 * otheru in hash under unix_table_lock.  Insertion
+	 * into the hash chain we'd found it in had been done
+	 * in an earlier critical area protected by unix_table_lock,
+	 * the same one where we'd set *(otheru->addr) contents,
+	 * as well as otheru->path and otheru->addr itself.
+	 *
+	 * Using smp_store_release() here to set newu->addr
+	 * is enough to make those stores, as well as stores
+	 * to newu->path visible to anyone who gets newu->addr
+	 * by smp_load_acquire().  IOW, the same warranties
+	 * as for unix_sock instances bound in unix_bind() or
+	 * in unix_autobind().
+	 */
 	if (otheru->path.dentry) {
 		path_get(&otheru->path);
 		newu->path = otheru->path;
 	}
+	refcount_inc(&otheru->addr->refcnt);
+	smp_store_release(&newu->addr, otheru->addr);
 
 	/* Set credentials */
 	copy_peercred(sk, other);
@@ -1453,7 +1467,7 @@  static int unix_accept(struct socket *sock, struct socket *newsock, int flags,
 static int unix_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
 {
 	struct sock *sk = sock->sk;
-	struct unix_sock *u;
+	struct unix_address *addr;
 	DECLARE_SOCKADDR(struct sockaddr_un *, sunaddr, uaddr);
 	int err = 0;
 
@@ -1468,19 +1482,15 @@  static int unix_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
 		sock_hold(sk);
 	}
 
-	u = unix_sk(sk);
-	unix_state_lock(sk);
-	if (!u->addr) {
+	addr = smp_load_acquire(&unix_sk(sk)->addr);
+	if (!addr) {
 		sunaddr->sun_family = AF_UNIX;
 		sunaddr->sun_path[0] = 0;
 		err = sizeof(short);
 	} else {
-		struct unix_address *addr = u->addr;
-
 		err = addr->len;
 		memcpy(sunaddr, addr->name, addr->len);
 	}
-	unix_state_unlock(sk);
 	sock_put(sk);
 out:
 	return err;
@@ -2073,11 +2083,11 @@  static int unix_seqpacket_recvmsg(struct socket *sock, struct msghdr *msg,
 
 static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
 {
-	struct unix_sock *u = unix_sk(sk);
+	struct unix_address *addr = smp_load_acquire(&unix_sk(sk)->addr);
 
-	if (u->addr) {
-		msg->msg_namelen = u->addr->len;
-		memcpy(msg->msg_name, u->addr->name, u->addr->len);
+	if (addr) {
+		msg->msg_namelen = addr->len;
+		memcpy(msg->msg_name, addr->name, addr->len);
 	}
 }
 
@@ -2581,15 +2591,14 @@  static int unix_open_file(struct sock *sk)
 	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
-	unix_state_lock(sk);
+	if (!smp_load_acquire(&unix_sk(sk)->addr))
+		return -ENOENT;
+
 	path = unix_sk(sk)->path;
-	if (!path.dentry) {
-		unix_state_unlock(sk);
+	if (!path.dentry)
 		return -ENOENT;
-	}
 
 	path_get(&path);
-	unix_state_unlock(sk);
 
 	fd = get_unused_fd_flags(O_CLOEXEC);
 	if (fd < 0)
@@ -2830,7 +2839,7 @@  static int unix_seq_show(struct seq_file *seq, void *v)
 			(s->sk_state == TCP_ESTABLISHED ? SS_CONNECTING : SS_DISCONNECTING),
 			sock_i_ino(s));
 
-		if (u->addr) {
+		if (u->addr) {	// under unix_table_lock here
 			int i, len;
 			seq_putc(seq, ' ');
 
diff --git a/net/unix/diag.c b/net/unix/diag.c
index 384c84e83462..3183d9b8ab33 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -10,7 +10,8 @@ 
 
 static int sk_diag_dump_name(struct sock *sk, struct sk_buff *nlskb)
 {
-	struct unix_address *addr = unix_sk(sk)->addr;
+	/* might or might not have unix_table_lock */
+	struct unix_address *addr = smp_load_acquire(&unix_sk(sk)->addr);
 
 	if (!addr)
 		return 0;
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index f84001019356..33028c098ef3 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -321,6 +321,7 @@  static void dump_common_audit_data(struct audit_buffer *ab,
 		if (a->u.net->sk) {
 			struct sock *sk = a->u.net->sk;
 			struct unix_sock *u;
+			struct unix_address *addr;
 			int len = 0;
 			char *p = NULL;
 
@@ -351,14 +352,15 @@  static void dump_common_audit_data(struct audit_buffer *ab,
 #endif
 			case AF_UNIX:
 				u = unix_sk(sk);
+				addr = smp_load_acquire(&u->addr);
+				if (!addr)
+					break;
 				if (u->path.dentry) {
 					audit_log_d_path(ab, " path=", &u->path);
 					break;
 				}
-				if (!u->addr)
-					break;
-				len = u->addr->len-sizeof(short);
-				p = &u->addr->name->sun_path[0];
+				len = addr->len-sizeof(short);
+				p = &addr->name->sun_path[0];
 				audit_log_format(ab, " path=");
 				if (*p)
 					audit_log_untrustedstring(ab, p);