diff mbox

netlink: Add netlink_bound helper and use it in netlink_getname

Message ID 20150926131621.GA16724@gondor.apana.org.au
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu Sept. 26, 2015, 1:16 p.m. UTC
On Fri, Sep 25, 2015 at 11:01:13AM -0400, Tejun Heo wrote:
> 
> I'm not even sure we guarantee memory barrier on kernel/user
> crossings.  In practice, we probably have enough barriers (e.g. some
> syscall traps imply barrier) but I can't think of a reason why we'd
> guarantee the existence of barrier there.  As an extreme example,
> imagine UML on an architecture with relaxed memory model.

You misunderstood what I wrote.  I was not basing this on whether
user-space transitions contained a barrier, but on the fact that
the next syscall must recheck nlk->bound before using nlk->portid.

In fact thanks to your email I now realise that my fix to the
getsockname problem is wrong.  Instead of adding a barrier to
netlink_connect I should be adding a nlk->bound check to getname.

---8<---
netlink_getname must check nlk->bound before using nlk->portid
as otherwise nlk->portid may contain garbage.

This patch also adds a netlink_bound helper that encapsulate the
barrier, as was suggested by Tejun.  Also as suggested by Linus,
the lockless read of nlk->bound in netlink_bound is protected with
READ_ONCE to ensure that the compiler doesn't do double reads that
may screw up our use of the barrier.

Fixes: da314c9923fe ("netlink: Replace rhash_portid with bound")
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Tejun Heo Sept. 26, 2015, 6:09 p.m. UTC | #1
Hello, Herbert.

On Sat, Sep 26, 2015 at 09:16:21PM +0800, Herbert Xu wrote:
> You misunderstood what I wrote.  I was not basing this on whether
> user-space transitions contained a barrier, but on the fact that
> the next syscall must recheck nlk->bound before using nlk->portid.

But that isn't what you wrote in the comment.

	/* No need for barriers here as we return to user-space without
	 * using any of the bound attributes.
	 */

> In fact thanks to your email I now realise that my fix to the
> getsockname problem is wrong.  Instead of adding a barrier to
> netlink_connect I should be adding a nlk->bound check to getname.

I don't know, man.  This thread almost feels surreal at this point.

> @@ -1628,7 +1632,7 @@ static int netlink_getname(struct socket *sock, struct sockaddr *addr,
>  		nladdr->nl_pid = nlk->dst_portid;
>  		nladdr->nl_groups = netlink_group_mask(nlk->dst_group);
>  	} else {
> -		nladdr->nl_pid = nlk->portid;
> +		nladdr->nl_pid = netlink_bound(nlk) ? nlk->portid : 0;
>  		nladdr->nl_groups = nlk->groups ? nlk->groups[0] : 0;
>  	}
>  	return 0;

So, this is really weird because netlink_getname() doens't participate
in the autobind race and thus it's perfectly fine for it to not worry
about whether ->bound is set or the memory barrier - whoever its
caller may be, the caller is of course responsible for ensuring that
the port is bound and visible if it expects to read back the number -
ie. if the caller doesn't know (in memory ordering sense) that
bind/connect/sendmsg succeeded, it of course can't expect to reliably
read back the port number.  getname never needed the barrier.  The
above is shifting synchronization from the source to its users.  This
is a bad thing to do.

Thanks.
Herbert Xu Sept. 26, 2015, 7:41 p.m. UTC | #2
On Sat, Sep 26, 2015 at 02:09:03PM -0400, Tejun Heo wrote:
>
> > @@ -1628,7 +1632,7 @@ static int netlink_getname(struct socket *sock, struct sockaddr *addr,
> >  		nladdr->nl_pid = nlk->dst_portid;
> >  		nladdr->nl_groups = netlink_group_mask(nlk->dst_group);
> >  	} else {
> > -		nladdr->nl_pid = nlk->portid;
> > +		nladdr->nl_pid = netlink_bound(nlk) ? nlk->portid : 0;
> >  		nladdr->nl_groups = nlk->groups ? nlk->groups[0] : 0;
> >  	}
> >  	return 0;
> 
> So, this is really weird because netlink_getname() doens't participate
> in the autobind race and thus it's perfectly fine for it to not worry
> about whether ->bound is set or the memory barrier - whoever its
> caller may be, the caller is of course responsible for ensuring that
> the port is bound and visible if it expects to read back the number -
> ie. if the caller doesn't know (in memory ordering sense) that
> bind/connect/sendmsg succeeded, it of course can't expect to reliably
> read back the port number.  getname never needed the barrier.  The
> above is shifting synchronization from the source to its users.  This
> is a bad thing to do.

Thread 1			Thread 2
sendmsg				getsockname
	netlink_autobind		netlink_getname

Thread 2 should not have to do anything special to guarantee that
getsockname does not return garbage.  It must either be the bound
portid if the autobind completed in thread 1 and is visible or it
should return zero.

As it stands thread 2 may see a portid belonging to somebody else
if it catches the autobind in thread 1 trying different portids
while roving.

Cheers,
Tejun Heo Sept. 26, 2015, 7:45 p.m. UTC | #3
Hello, Herbert.

On Sun, Sep 27, 2015 at 03:41:10AM +0800, Herbert Xu wrote:
> Thread 1			Thread 2
> sendmsg				getsockname
> 	netlink_autobind		netlink_getname
> 
> Thread 2 should not have to do anything special to guarantee that
> getsockname does not return garbage.  It must either be the bound
> portid if the autobind completed in thread 1 and is visible or it
> should return zero.
> 
> As it stands thread 2 may see a portid belonging to somebody else
> if it catches the autobind in thread 1 trying different portids
> while roving.

If the fact that thread 1 finished autobind isn't visible to thread 2,
it's valid for getsockname to return zero.  No ordering between the
two operations is defined.  If the fact that thread 1 finished
autobind is visible to thread 2, ordering is defined and because
ordering is transitive, by that very ordering, the port number is
visible to thread 2 too as long as thread 1 does proper barriering.

Thanks.
Herbert Xu Sept. 26, 2015, 7:49 p.m. UTC | #4
On Sat, Sep 26, 2015 at 03:45:54PM -0400, Tejun Heo wrote:
> Hello, Herbert.
> 
> On Sun, Sep 27, 2015 at 03:41:10AM +0800, Herbert Xu wrote:
> > Thread 1			Thread 2
> > sendmsg				getsockname
> > 	netlink_autobind		netlink_getname
> > 
> > Thread 2 should not have to do anything special to guarantee that
> > getsockname does not return garbage.  It must either be the bound
> > portid if the autobind completed in thread 1 and is visible or it
> > should return zero.
> > 
> > As it stands thread 2 may see a portid belonging to somebody else
> > if it catches the autobind in thread 1 trying different portids
> > while roving.
> 
> If the fact that thread 1 finished autobind isn't visible to thread 2,
> it's valid for getsockname to return zero.  No ordering between the
> two operations is defined.  If the fact that thread 1 finished
> autobind is visible to thread 2, ordering is defined and because
> ordering is transitive, by that very ordering, the port number is
> visible to thread 2 too as long as thread 1 does proper barriering.

If the autobind is not complete then netlink_getname must return
zero rather than some garbage portid that belongs to somebody
else's socket.  That's what we did before any of this lockless
code was introduced.

If you don't check nlk->bound then you may return garbage.

Cheers,
Tejun Heo Sept. 26, 2015, 7:52 p.m. UTC | #5
Hello,

On Sun, Sep 27, 2015 at 03:49:16AM +0800, Herbert Xu wrote:
> If the autobind is not complete then netlink_getname must return
> zero rather than some garbage portid that belongs to somebody
> else's socket.  That's what we did before any of this lockless
> code was introduced.
> 
> If you don't check nlk->bound then you may return garbage.

Ah, yeah, you're right.  We need to check that there because it may
contain a garbage value.  I still think it'd better to use
netlink_bound() test in connect() too tho.

Thanks.
Herbert Xu Sept. 26, 2015, 7:55 p.m. UTC | #6
On Sat, Sep 26, 2015 at 03:52:45PM -0400, Tejun Heo wrote:
> 
> Ah, yeah, you're right.  We need to check that there because it may
> contain a garbage value.  I still think it'd better to use
> netlink_bound() test in connect() too tho.

Well I disagree.  When I say that it returns to user-space I really
mean that the next time we use portid via the same call path that
triggered the connect we must be checking nlk->bound anyway.

Good luck finding more bugs in this code :)

Cheers,
Tejun Heo Sept. 26, 2015, 8:05 p.m. UTC | #7
Hello,

On Sun, Sep 27, 2015 at 03:55:48AM +0800, Herbert Xu wrote:
> Well I disagree.  When I say that it returns to user-space I really
> mean that the next time we use portid via the same call path that
> triggered the connect we must be checking nlk->bound anyway.
> 
> Good luck finding more bugs in this code :)

Frankly, I don't understand what you've been trying to achieve.
You're actively disregarding best practices (like terminating
synchronization where it starts) and reach the target state by doing a
browian motion in the solution space.  Sure, if you do enough of that,
eventually you can arrive somewhere where it's not broken but it leads
to a lot more overhead for everyone involved - the author, reviewers
and later readers of the code and if you spread barrier usages like
this across the kernel, we'll end up with code base which is a lot
harder to verify and maintain.  I hope you stop doing things this way
but suppose that you're ignoring any conceptual arguments in this
thread.

Thanks.
Herbert Xu Sept. 26, 2015, 8:10 p.m. UTC | #8
On Sat, Sep 26, 2015 at 04:05:18PM -0400, Tejun Heo wrote:
> 
> Frankly, I don't understand what you've been trying to achieve.
> You're actively disregarding best practices (like terminating
> synchronization where it starts) and reach the target state by doing a
> browian motion in the solution space.  Sure, if you do enough of that,
> eventually you can arrive somewhere where it's not broken but it leads
> to a lot more overhead for everyone involved - the author, reviewers
> and later readers of the code and if you spread barrier usages like
> this across the kernel, we'll end up with code base which is a lot
> harder to verify and maintain.  I hope you stop doing things this way
> but suppose that you're ignoring any conceptual arguments in this
> thread.

Your point so far has been that if you add barriers or use primitives
everywhere then races magically disappear.

Well guess what the bug that you have discovered supposedly due to
a missing barrier in netlink_connect has nothing to do with the
barrier.  Instead it is caused by a logical error elsewhere that
would have gone unnoticed otherwise.

So I retain my position that blindly adding barriers do not make
bugs go away.  Instead you need to have real understanding of what
the code is doing and every spot where a barrier may be needed must
be audited manually.

Cheers,
Tejun Heo Sept. 26, 2015, 8:17 p.m. UTC | #9
Hello,

On Sun, Sep 27, 2015 at 04:10:41AM +0800, Herbert Xu wrote:
> Well guess what the bug that you have discovered supposedly due to
> a missing barrier in netlink_connect has nothing to do with the
> barrier.  Instead it is caused by a logical error elsewhere that
> would have gone unnoticed otherwise.

It's a combination of two problems.  The garbage port number is a
logical error but there still is an ordering problem there between
->bound and ->portid.  We need to test ->bind there again because of
the garbage port problem.

> So I retain my position that blindly adding barriers do not make
> bugs go away.  Instead you need to have real understanding of what

That's a dishonest summary of what I've been saying.

> the code is doing and every spot where a barrier may be needed must
> be audited manually.

What I've been saying is that we do need to be careful and audit each
barrier usages but at the same time there are established patterns
that we can use to make the process significantly easier and more
reliable.

Thanks.
diff mbox

Patch

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 2c15fae..c860464 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -125,6 +125,17 @@  static inline u32 netlink_group_mask(u32 group)
 	return group ? 1 << (group - 1) : 0;
 }
 
+static inline bool netlink_bound(struct netlink_sock *nlk)
+{
+	bool bound = READ_ONCE(nlk->bound);
+
+	/* Ensure nlk is hashed and visible. */
+	if (bound)
+		smp_rmb();
+
+	return bound;
+}
+
 int netlink_add_tap(struct netlink_tap *nt)
 {
 	if (unlikely(nt->dev->type != ARPHRD_NETLINK))
@@ -1524,14 +1535,10 @@  static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 			return err;
 	}
 
-	bound = nlk->bound;
-	if (bound) {
-		/* Ensure nlk->portid is up-to-date. */
-		smp_rmb();
-
+	bound = netlink_bound(nlk);
+	if (bound)
 		if (nladdr->nl_pid != nlk->portid)
 			return -EINVAL;
-	}
 
 	if (nlk->netlink_bind && groups) {
 		int group;
@@ -1547,9 +1554,6 @@  static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 		}
 	}
 
-	/* No need for barriers here as we return to user-space without
-	 * using any of the bound attributes.
-	 */
 	if (!bound) {
 		err = nladdr->nl_pid ?
 			netlink_insert(sk, nladdr->nl_pid) :
@@ -1628,7 +1632,7 @@  static int netlink_getname(struct socket *sock, struct sockaddr *addr,
 		nladdr->nl_pid = nlk->dst_portid;
 		nladdr->nl_groups = netlink_group_mask(nlk->dst_group);
 	} else {
-		nladdr->nl_pid = nlk->portid;
+		nladdr->nl_pid = netlink_bound(nlk) ? nlk->portid : 0;
 		nladdr->nl_groups = nlk->groups ? nlk->groups[0] : 0;
 	}
 	return 0;
@@ -2442,13 +2446,10 @@  static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 		dst_group = nlk->dst_group;
 	}
 
-	if (!nlk->bound) {
+	if (!netlink_bound(nlk)) {
 		err = netlink_autobind(sock);
 		if (err)
 			goto out;
-	} else {
-		/* Ensure nlk is hashed and visible. */
-		smp_rmb();
 	}
 
 	/* It's a really convoluted way for userland to ask for mmaped