Message ID | 55C25CFB.2060103@iogearbox.net |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, Aug 05, 2015 at 08:59:07PM +0200, Daniel Borkmann wrote: > > Here's a theory and patch below. Herbert, Thomas, does this make any > sense to you resp. sound plausible? ;) It's certainly possible. Whether it's plausible I'm not so sure. The netlink hashtable is unlimited in size. So it should always be expanding, not rehashing. The bug you found should only affect rehashing. > I'm not quite sure what's best to return from here, i.e. whether we > propagate -ENOMEM or instead retry over and over again hoping that the > rehashing completed (and no new rehashing started in the mean time) ... Please use something other than ENOMEM as it is already heavily used in this context. Perhaps EOVERFLOW? We should probably add a WARN_ON_ONCE in rhashtable_insert_rehash since two concurrent rehashings indicates something is going seriously wrong. Thanks,
On Wed, Aug 05, 2015 at 08:59:07PM +0200, Daniel Borkmann wrote: > > Here's a theory and patch below. Herbert, Thomas, does this make any > sense to you resp. sound plausible? ;) Another possibility is the following bug: https://patchwork.ozlabs.org/patch/503374/ It can cause a use-after-free which may lead to corruption of skb state, including the cb buffer. Of course it's a long shot. Cheers,
On 08/06/2015 02:30 AM, Herbert Xu wrote: > On Wed, Aug 05, 2015 at 08:59:07PM +0200, Daniel Borkmann wrote: >> >> Here's a theory and patch below. Herbert, Thomas, does this make any >> sense to you resp. sound plausible? ;) > > It's certainly possible. Whether it's plausible I'm not so sure. > The netlink hashtable is unlimited in size. So it should always > be expanding, not rehashing. The bug you found should only affect > rehashing. > >> I'm not quite sure what's best to return from here, i.e. whether we >> propagate -ENOMEM or instead retry over and over again hoping that the >> rehashing completed (and no new rehashing started in the mean time) ... > > Please use something other than ENOMEM as it is already heavily > used in this context. Perhaps EOVERFLOW? Okay, I'll do that. > We should probably add a WARN_ON_ONCE in rhashtable_insert_rehash > since two concurrent rehashings indicates something is going > seriously wrong. So, if I didn't miss anything, it looks like the following could have happened: the worker thread, that is rht_deferred_worker(), itself could trigger the first rehashing, e.g. after shrinking or expanding (or also in case none of both happen). Then, in __rhashtable_insert_fast(), I could trigger an -EBUSY when I'm really unlucky and exceed the ht->elasticity limit of 16. I would then end up in rhashtable_insert_rehash() to find out there's already one ongoing and thus, I'm getting -EBUSY via __netlink_insert(). Perhaps that is what could have happened? Seems rare though, but it was also only seen rarely so far ... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/06/2015 04:50 PM, Daniel Borkmann wrote: > On 08/06/2015 02:30 AM, Herbert Xu wrote: >> On Wed, Aug 05, 2015 at 08:59:07PM +0200, Daniel Borkmann wrote: >>> >>> Here's a theory and patch below. Herbert, Thomas, does this make any >>> sense to you resp. sound plausible? ;) >> >> It's certainly possible. Whether it's plausible I'm not so sure. >> The netlink hashtable is unlimited in size. So it should always >> be expanding, not rehashing. The bug you found should only affect >> rehashing. >> >>> I'm not quite sure what's best to return from here, i.e. whether we >>> propagate -ENOMEM or instead retry over and over again hoping that the >>> rehashing completed (and no new rehashing started in the mean time) ...../net/ipv4/af_inet.c:172:static >> >> Please use something other than ENOMEM as it is already heavily >> used in this context. Perhaps EOVERFLOW? > > Okay, I'll do that. > >> We should probably add a WARN_ON_ONCE in rhashtable_insert_rehash >> since two concurrent rehashings indicates something is going >> seriously wrong. > > So, if I didn't miss anything, it looks like the following could have > happened: the worker thread, that is rht_deferred_worker(), itself could > trigger the first rehashing, e.g. after shrinking or expanding (or also > in case none of both happen). > > Then, in __rhashtable_insert_fast(), I could trigger an -EBUSY when I'm > really unlucky and exceed the ht->elasticity limit of 16. I would then > end up in rhashtable_insert_rehash() to find out there's already one > ongoing and thus, I'm getting -EBUSY via __netlink_insert(). > > Perhaps that is what could have happened? Seems rare though, but it was > also only seen rarely so far ... Experimenting a bit more, letting __netlink_insert() return -EBUSY so far, I only managed when either artificially reducing ht->elasticity limit a bit or biasing the hash function, that means, it would require some specific knowledge at what slot we end up to overcome the elasticity limit and thus trigger rehashing. Pretty unlikely though if you ask me. The other thing I could observe, when I used the bind stress test from Thomas' repo and reduced the amount of bind()'s, so that we very frequently fluctuate in the ranges of 4 to 256 of the hashtable size, I could observe that we from time to time enter rhashtable_insert_rehash() on insertions, but probably the window was too small to trigger an error. I think in any case, remapping seems okay. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 06, 2015 at 04:50:39PM +0200, Daniel Borkmann wrote: > > Then, in __rhashtable_insert_fast(), I could trigger an -EBUSY when I'm > really unlucky and exceed the ht->elasticity limit of 16. I would then > end up in rhashtable_insert_rehash() to find out there's already one > ongoing and thus, I'm getting -EBUSY via __netlink_insert(). Right, so the only way you can trigger this is if you hit a chain longer than 16 and the number of entries in the table is less than 75% the size of the table, as well as there being an existing resize or rehash operation. This should be pretty much impossible. But if we had a WARN_ON_ONCE there then we'll know for sure. Cheers,
On Fri, Aug 07, 2015 at 12:39:47AM +0200, Daniel Borkmann wrote: > > window was too small to trigger an error. I think in any case, remapping > seems okay. Oh there is no doubt that we need your EBUSY remapping patch. It's just that it's very unlikely for this to be responsible for the dead-lock that Linus saw. Cheers,
On 08/07/2015 01:41 AM, Herbert Xu wrote: > On Thu, Aug 06, 2015 at 04:50:39PM +0200, Daniel Borkmann wrote: >> >> Then, in __rhashtable_insert_fast(), I could trigger an -EBUSY when I'm >> really unlucky and exceed the ht->elasticity limit of 16. I would then >> end up in rhashtable_insert_rehash() to find out there's already one >> ongoing and thus, I'm getting -EBUSY via __netlink_insert(). > > Right, so the only way you can trigger this is if you hit a chain > longer than 16 and the number of entries in the table is less than > 75% the size of the table, as well as there being an existing resize > or rehash operation. > > This should be pretty much impossible. > > But if we had a WARN_ON_ONCE there then we'll know for sure. Looks like we had a WARN_ON() in rhashtable_insert_rehash() before, but was removed in a87b9ebf1709 ("rhashtable: Do not schedule more than one rehash if we can't grow further"). Do you want to re-add a WARN_ON_ONCE()? Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 07, 2015 at 01:58:15AM +0200, Daniel Borkmann wrote: > > Looks like we had a WARN_ON() in rhashtable_insert_rehash() before, but > was removed in a87b9ebf1709 ("rhashtable: Do not schedule more than one > rehash if we can't grow further"). Do you want to re-add a WARN_ON_ONCE()? I think so. Thomas? Cheers,
On 08/07/15 at 08:00am, Herbert Xu wrote: > On Fri, Aug 07, 2015 at 01:58:15AM +0200, Daniel Borkmann wrote: > > > > Looks like we had a WARN_ON() in rhashtable_insert_rehash() before, but > > was removed in a87b9ebf1709 ("rhashtable: Do not schedule more than one > > rehash if we can't grow further"). Do you want to re-add a WARN_ON_ONCE()? > > I think so. Thomas? Makes sense. I removed it because I thought it was not possible to reach. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index d8e2e39..1cfd4af 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1096,6 +1096,11 @@ static int netlink_insert(struct sock *sk, u32 portid) err = __netlink_insert(table, sk); if (err) { + /* Currently, a rehashing of rhashtable might be in progress, + * we however must not allow -EBUSY to escape from here. + */ + if (err == -EBUSY) + err = -ENOMEM; if (err == -EEXIST) err = -EADDRINUSE; nlk_sk(sk)->portid = 0;