net: Fix xps_needed inc/dec mismatch

Message ID 20181207101621.24316-1-ross.lagerwall@citrix.com
State Not Applicable
Delegated to: David Miller
Headers show
Series
  • net: Fix xps_needed inc/dec mismatch
Related show

Commit Message

Ross Lagerwall Dec. 7, 2018, 10:16 a.m.
xps_needed is incremented only when a new dev map is allocated (in
__netif_set_xps_queue). Therefore it should be decremented only when we
actually have a dev map to destroy. Without this, it may be decremented
too many times which causes netif_reset_xps_queues to return early and
not actually clean up the old dev maps. This results in a crash in
__netif_set_xps_queue when it is called later.

The crash occurred when having multiple ixgbe devices in a host. lldpad
would reconfigure them to be FCoE-capable causing reset_xps_queues /
set_xps_queue to be called several times. The xps_needed count would get
out of sync and eventually the above-mentioned crash would occur.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 net/core/dev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Sabrina Dubroca Dec. 7, 2018, 11:01 a.m. | #1
Hi Ross,

2018-12-07, 10:16:21 +0000, Ross Lagerwall wrote:
> xps_needed is incremented only when a new dev map is allocated (in
> __netif_set_xps_queue). Therefore it should be decremented only when we
> actually have a dev map to destroy. Without this, it may be decremented
> too many times which causes netif_reset_xps_queues to return early and
> not actually clean up the old dev maps. This results in a crash in
> __netif_set_xps_queue when it is called later.
> 
> The crash occurred when having multiple ixgbe devices in a host. lldpad
> would reconfigure them to be FCoE-capable causing reset_xps_queues /
> set_xps_queue to be called several times. The xps_needed count would get
> out of sync and eventually the above-mentioned crash would occur.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

I posted another patchset recently (commits f28c020fb488 and
867d0ad476db in the "net" tree) for issues in XPS, including broken
xps_needed accounting, so your patch won't apply to David's "net"
tree. Could you try it with your use case, and if you still see
issues, fix them on top? You can grab the latest net tree here:

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git


Thanks,
Ross Lagerwall Dec. 7, 2018, 3:15 p.m. | #2
On 12/7/18 11:01 AM, Sabrina Dubroca wrote:
> Hi Ross,
> 
> 2018-12-07, 10:16:21 +0000, Ross Lagerwall wrote:
>> xps_needed is incremented only when a new dev map is allocated (in
>> __netif_set_xps_queue). Therefore it should be decremented only when we
>> actually have a dev map to destroy. Without this, it may be decremented
>> too many times which causes netif_reset_xps_queues to return early and
>> not actually clean up the old dev maps. This results in a crash in
>> __netif_set_xps_queue when it is called later.
>>
>> The crash occurred when having multiple ixgbe devices in a host. lldpad
>> would reconfigure them to be FCoE-capable causing reset_xps_queues /
>> set_xps_queue to be called several times. The xps_needed count would get
>> out of sync and eventually the above-mentioned crash would occur.
>>
>> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> I posted another patchset recently (commits f28c020fb488 and
> 867d0ad476db in the "net" tree) for issues in XPS, including broken
> xps_needed accounting, so your patch won't apply to David's "net"
> tree. Could you try it with your use case, and if you still see
> issues, fix them on top? You can grab the latest net tree here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
> 

Your two commits fix the issue I was seeing. Thanks!

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index ddc551f24ba2..8aa72e93af9f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2233,11 +2233,12 @@  static void netif_reset_xps_queues(struct net_device *dev, u16 offset,
 	clean_xps_maps(dev, possible_mask, dev_maps, nr_ids, offset, count,
 		       false);
 
-out_no_maps:
 	if (static_key_enabled(&xps_rxqs_needed))
 		static_key_slow_dec_cpuslocked(&xps_rxqs_needed);
 
 	static_key_slow_dec_cpuslocked(&xps_needed);
+
+out_no_maps:
 	mutex_unlock(&xps_map_mutex);
 	cpus_read_unlock();
 }