Message ID | 1551363086-29652-1-git-send-email-igor.druzhinin@citrix.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | xen-netback: don't populate the hash cache on XenBus disconnect | expand |
> -----Original Message----- > From: Igor Druzhinin [mailto:igor.druzhinin@citrix.com] > Sent: 28 February 2019 14:11 > To: xen-devel@lists.xenproject.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org > Cc: Wei Liu <wei.liu2@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>; davem@davemloft.net; Igor > Druzhinin <igor.druzhinin@citrix.com> > Subject: [PATCH] xen-netback: don't populate the hash cache on XenBus disconnect > > Occasionally, during the disconnection procedure on XenBus which > includes hash cache deinitialization there might be some packets > still in-flight on other processors. Handling of these packets includes > hashing and hash cache population that finally results in hash cache > data structure corruption. > > In order to avoid this we prevent hashing of those packets if there > are no queues initialized. In that case RCU protection of queues guards > the hash cache as well. > > Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> > --- > > Found this while applying the previous patch to our patchqueue. Seems it > never went to the mailing list and, to my knowledge, the problem is still > present. From my recollection, it only happened on stress frontend on/off > test with Windows guests (since only those detach the frontend completely). > So better late than never. > > --- > drivers/net/xen-netback/hash.c | 2 ++ > drivers/net/xen-netback/interface.c | 7 +++++++ > 2 files changed, 9 insertions(+) > > diff --git a/drivers/net/xen-netback/hash.c b/drivers/net/xen-netback/hash.c > index 0ccb021..10d580c 100644 > --- a/drivers/net/xen-netback/hash.c > +++ b/drivers/net/xen-netback/hash.c > @@ -454,6 +454,8 @@ void xenvif_init_hash(struct xenvif *vif) > if (xenvif_hash_cache_size == 0) > return; > > + BUG_ON(vif->hash.cache.count); > + > spin_lock_init(&vif->hash.cache.lock); > INIT_LIST_HEAD(&vif->hash.cache.list); > } > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c > index 182d677..6da1251 100644 > --- a/drivers/net/xen-netback/interface.c > +++ b/drivers/net/xen-netback/interface.c > @@ -153,6 +153,13 @@ static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb, > { > struct xenvif *vif = netdev_priv(dev); > unsigned int size = vif->hash.size; > + unsigned int num_queues; > + > + /* If queues are not set up internally - always return 0 > + * as the packet going to be dropped anyway */ > + num_queues = READ_ONCE(vif->num_queues); > + if (num_queues < 1) > + return 0; > > if (vif->hash.alg == XEN_NETIF_CTRL_HASH_ALGORITHM_NONE) > return fallback(dev, skb, NULL) % dev->real_num_tx_queues; > -- > 2.7.4
From: Igor Druzhinin <igor.druzhinin@citrix.com> Date: Thu, 28 Feb 2019 14:11:26 +0000 > Occasionally, during the disconnection procedure on XenBus which > includes hash cache deinitialization there might be some packets > still in-flight on other processors. Handling of these packets includes > hashing and hash cache population that finally results in hash cache > data structure corruption. > > In order to avoid this we prevent hashing of those packets if there > are no queues initialized. In that case RCU protection of queues guards > the hash cache as well. > > Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Applied and queued up for -stable, thanks.
diff --git a/drivers/net/xen-netback/hash.c b/drivers/net/xen-netback/hash.c index 0ccb021..10d580c 100644 --- a/drivers/net/xen-netback/hash.c +++ b/drivers/net/xen-netback/hash.c @@ -454,6 +454,8 @@ void xenvif_init_hash(struct xenvif *vif) if (xenvif_hash_cache_size == 0) return; + BUG_ON(vif->hash.cache.count); + spin_lock_init(&vif->hash.cache.lock); INIT_LIST_HEAD(&vif->hash.cache.list); } diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 182d677..6da1251 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -153,6 +153,13 @@ static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb, { struct xenvif *vif = netdev_priv(dev); unsigned int size = vif->hash.size; + unsigned int num_queues; + + /* If queues are not set up internally - always return 0 + * as the packet going to be dropped anyway */ + num_queues = READ_ONCE(vif->num_queues); + if (num_queues < 1) + return 0; if (vif->hash.alg == XEN_NETIF_CTRL_HASH_ALGORITHM_NONE) return fallback(dev, skb, NULL) % dev->real_num_tx_queues;
Occasionally, during the disconnection procedure on XenBus which includes hash cache deinitialization there might be some packets still in-flight on other processors. Handling of these packets includes hashing and hash cache population that finally results in hash cache data structure corruption. In order to avoid this we prevent hashing of those packets if there are no queues initialized. In that case RCU protection of queues guards the hash cache as well. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> --- Found this while applying the previous patch to our patchqueue. Seems it never went to the mailing list and, to my knowledge, the problem is still present. From my recollection, it only happened on stress frontend on/off test with Windows guests (since only those detach the frontend completely). So better late than never. --- drivers/net/xen-netback/hash.c | 2 ++ drivers/net/xen-netback/interface.c | 7 +++++++ 2 files changed, 9 insertions(+)