Message ID | 20150926131621.GA16724@gondor.apana.org.au |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Hello, Herbert. On Sat, Sep 26, 2015 at 09:16:21PM +0800, Herbert Xu wrote: > You misunderstood what I wrote. I was not basing this on whether > user-space transitions contained a barrier, but on the fact that > the next syscall must recheck nlk->bound before using nlk->portid. But that isn't what you wrote in the comment. /* No need for barriers here as we return to user-space without * using any of the bound attributes. */ > In fact thanks to your email I now realise that my fix to the > getsockname problem is wrong. Instead of adding a barrier to > netlink_connect I should be adding a nlk->bound check to getname. I don't know, man. This thread almost feels surreal at this point. > @@ -1628,7 +1632,7 @@ static int netlink_getname(struct socket *sock, struct sockaddr *addr, > nladdr->nl_pid = nlk->dst_portid; > nladdr->nl_groups = netlink_group_mask(nlk->dst_group); > } else { > - nladdr->nl_pid = nlk->portid; > + nladdr->nl_pid = netlink_bound(nlk) ? nlk->portid : 0; > nladdr->nl_groups = nlk->groups ? nlk->groups[0] : 0; > } > return 0; So, this is really weird because netlink_getname() doens't participate in the autobind race and thus it's perfectly fine for it to not worry about whether ->bound is set or the memory barrier - whoever its caller may be, the caller is of course responsible for ensuring that the port is bound and visible if it expects to read back the number - ie. if the caller doesn't know (in memory ordering sense) that bind/connect/sendmsg succeeded, it of course can't expect to reliably read back the port number. getname never needed the barrier. The above is shifting synchronization from the source to its users. This is a bad thing to do. Thanks.
On Sat, Sep 26, 2015 at 02:09:03PM -0400, Tejun Heo wrote: > > > @@ -1628,7 +1632,7 @@ static int netlink_getname(struct socket *sock, struct sockaddr *addr, > > nladdr->nl_pid = nlk->dst_portid; > > nladdr->nl_groups = netlink_group_mask(nlk->dst_group); > > } else { > > - nladdr->nl_pid = nlk->portid; > > + nladdr->nl_pid = netlink_bound(nlk) ? nlk->portid : 0; > > nladdr->nl_groups = nlk->groups ? nlk->groups[0] : 0; > > } > > return 0; > > So, this is really weird because netlink_getname() doens't participate > in the autobind race and thus it's perfectly fine for it to not worry > about whether ->bound is set or the memory barrier - whoever its > caller may be, the caller is of course responsible for ensuring that > the port is bound and visible if it expects to read back the number - > ie. if the caller doesn't know (in memory ordering sense) that > bind/connect/sendmsg succeeded, it of course can't expect to reliably > read back the port number. getname never needed the barrier. The > above is shifting synchronization from the source to its users. This > is a bad thing to do. Thread 1 Thread 2 sendmsg getsockname netlink_autobind netlink_getname Thread 2 should not have to do anything special to guarantee that getsockname does not return garbage. It must either be the bound portid if the autobind completed in thread 1 and is visible or it should return zero. As it stands thread 2 may see a portid belonging to somebody else if it catches the autobind in thread 1 trying different portids while roving. Cheers,
Hello, Herbert. On Sun, Sep 27, 2015 at 03:41:10AM +0800, Herbert Xu wrote: > Thread 1 Thread 2 > sendmsg getsockname > netlink_autobind netlink_getname > > Thread 2 should not have to do anything special to guarantee that > getsockname does not return garbage. It must either be the bound > portid if the autobind completed in thread 1 and is visible or it > should return zero. > > As it stands thread 2 may see a portid belonging to somebody else > if it catches the autobind in thread 1 trying different portids > while roving. If the fact that thread 1 finished autobind isn't visible to thread 2, it's valid for getsockname to return zero. No ordering between the two operations is defined. If the fact that thread 1 finished autobind is visible to thread 2, ordering is defined and because ordering is transitive, by that very ordering, the port number is visible to thread 2 too as long as thread 1 does proper barriering. Thanks.
On Sat, Sep 26, 2015 at 03:45:54PM -0400, Tejun Heo wrote: > Hello, Herbert. > > On Sun, Sep 27, 2015 at 03:41:10AM +0800, Herbert Xu wrote: > > Thread 1 Thread 2 > > sendmsg getsockname > > netlink_autobind netlink_getname > > > > Thread 2 should not have to do anything special to guarantee that > > getsockname does not return garbage. It must either be the bound > > portid if the autobind completed in thread 1 and is visible or it > > should return zero. > > > > As it stands thread 2 may see a portid belonging to somebody else > > if it catches the autobind in thread 1 trying different portids > > while roving. > > If the fact that thread 1 finished autobind isn't visible to thread 2, > it's valid for getsockname to return zero. No ordering between the > two operations is defined. If the fact that thread 1 finished > autobind is visible to thread 2, ordering is defined and because > ordering is transitive, by that very ordering, the port number is > visible to thread 2 too as long as thread 1 does proper barriering. If the autobind is not complete then netlink_getname must return zero rather than some garbage portid that belongs to somebody else's socket. That's what we did before any of this lockless code was introduced. If you don't check nlk->bound then you may return garbage. Cheers,
Hello, On Sun, Sep 27, 2015 at 03:49:16AM +0800, Herbert Xu wrote: > If the autobind is not complete then netlink_getname must return > zero rather than some garbage portid that belongs to somebody > else's socket. That's what we did before any of this lockless > code was introduced. > > If you don't check nlk->bound then you may return garbage. Ah, yeah, you're right. We need to check that there because it may contain a garbage value. I still think it'd better to use netlink_bound() test in connect() too tho. Thanks.
On Sat, Sep 26, 2015 at 03:52:45PM -0400, Tejun Heo wrote: > > Ah, yeah, you're right. We need to check that there because it may > contain a garbage value. I still think it'd better to use > netlink_bound() test in connect() too tho. Well I disagree. When I say that it returns to user-space I really mean that the next time we use portid via the same call path that triggered the connect we must be checking nlk->bound anyway. Good luck finding more bugs in this code :) Cheers,
Hello, On Sun, Sep 27, 2015 at 03:55:48AM +0800, Herbert Xu wrote: > Well I disagree. When I say that it returns to user-space I really > mean that the next time we use portid via the same call path that > triggered the connect we must be checking nlk->bound anyway. > > Good luck finding more bugs in this code :) Frankly, I don't understand what you've been trying to achieve. You're actively disregarding best practices (like terminating synchronization where it starts) and reach the target state by doing a browian motion in the solution space. Sure, if you do enough of that, eventually you can arrive somewhere where it's not broken but it leads to a lot more overhead for everyone involved - the author, reviewers and later readers of the code and if you spread barrier usages like this across the kernel, we'll end up with code base which is a lot harder to verify and maintain. I hope you stop doing things this way but suppose that you're ignoring any conceptual arguments in this thread. Thanks.
On Sat, Sep 26, 2015 at 04:05:18PM -0400, Tejun Heo wrote: > > Frankly, I don't understand what you've been trying to achieve. > You're actively disregarding best practices (like terminating > synchronization where it starts) and reach the target state by doing a > browian motion in the solution space. Sure, if you do enough of that, > eventually you can arrive somewhere where it's not broken but it leads > to a lot more overhead for everyone involved - the author, reviewers > and later readers of the code and if you spread barrier usages like > this across the kernel, we'll end up with code base which is a lot > harder to verify and maintain. I hope you stop doing things this way > but suppose that you're ignoring any conceptual arguments in this > thread. Your point so far has been that if you add barriers or use primitives everywhere then races magically disappear. Well guess what the bug that you have discovered supposedly due to a missing barrier in netlink_connect has nothing to do with the barrier. Instead it is caused by a logical error elsewhere that would have gone unnoticed otherwise. So I retain my position that blindly adding barriers do not make bugs go away. Instead you need to have real understanding of what the code is doing and every spot where a barrier may be needed must be audited manually. Cheers,
Hello, On Sun, Sep 27, 2015 at 04:10:41AM +0800, Herbert Xu wrote: > Well guess what the bug that you have discovered supposedly due to > a missing barrier in netlink_connect has nothing to do with the > barrier. Instead it is caused by a logical error elsewhere that > would have gone unnoticed otherwise. It's a combination of two problems. The garbage port number is a logical error but there still is an ordering problem there between ->bound and ->portid. We need to test ->bind there again because of the garbage port problem. > So I retain my position that blindly adding barriers do not make > bugs go away. Instead you need to have real understanding of what That's a dishonest summary of what I've been saying. > the code is doing and every spot where a barrier may be needed must > be audited manually. What I've been saying is that we do need to be careful and audit each barrier usages but at the same time there are established patterns that we can use to make the process significantly easier and more reliable. Thanks.
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 2c15fae..c860464 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -125,6 +125,17 @@ static inline u32 netlink_group_mask(u32 group) return group ? 1 << (group - 1) : 0; } +static inline bool netlink_bound(struct netlink_sock *nlk) +{ + bool bound = READ_ONCE(nlk->bound); + + /* Ensure nlk is hashed and visible. */ + if (bound) + smp_rmb(); + + return bound; +} + int netlink_add_tap(struct netlink_tap *nt) { if (unlikely(nt->dev->type != ARPHRD_NETLINK)) @@ -1524,14 +1535,10 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr, return err; } - bound = nlk->bound; - if (bound) { - /* Ensure nlk->portid is up-to-date. */ - smp_rmb(); - + bound = netlink_bound(nlk); + if (bound) if (nladdr->nl_pid != nlk->portid) return -EINVAL; - } if (nlk->netlink_bind && groups) { int group; @@ -1547,9 +1554,6 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr, } } - /* No need for barriers here as we return to user-space without - * using any of the bound attributes. - */ if (!bound) { err = nladdr->nl_pid ? netlink_insert(sk, nladdr->nl_pid) : @@ -1628,7 +1632,7 @@ static int netlink_getname(struct socket *sock, struct sockaddr *addr, nladdr->nl_pid = nlk->dst_portid; nladdr->nl_groups = netlink_group_mask(nlk->dst_group); } else { - nladdr->nl_pid = nlk->portid; + nladdr->nl_pid = netlink_bound(nlk) ? nlk->portid : 0; nladdr->nl_groups = nlk->groups ? nlk->groups[0] : 0; } return 0; @@ -2442,13 +2446,10 @@ static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) dst_group = nlk->dst_group; } - if (!nlk->bound) { + if (!netlink_bound(nlk)) { err = netlink_autobind(sock); if (err) goto out; - } else { - /* Ensure nlk is hashed and visible. */ - smp_rmb(); } /* It's a really convoluted way for userland to ask for mmaped