[net] net: systemport: Fix 64-bit stats deadlock

Message ID 1505247266-42195-1-git-send-email-f.fainelli@gmail.com
State Accepted
Delegated to: David Miller
Headers show
Series
  • [net] net: systemport: Fix 64-bit stats deadlock
Related show

Commit Message

Florian Fainelli Sept. 12, 2017, 8:14 p.m.
We can enter a deadlock situation because there is no sufficient protection
when ndo_get_stats64() runs in process context to guard against RX or TX NAPI
contexts running in softirq, this can lead to the following lockdep splat and
actual deadlock was experienced as well with an iperf session in the background
and a while loop doing ifconfig + ethtool.

[    5.780350] ================================
[    5.784679] WARNING: inconsistent lock state
[    5.789011] 4.13.0-rc7-02179-g32fae27c725d #70 Not tainted
[    5.794561] --------------------------------
[    5.798890] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[    5.804971] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[    5.810175]  (&syncp->seq#2){+.?...}, at: [<c0768a28>] bcm_sysport_tx_reclaim+0x30/0x54
[    5.818327] {SOFTIRQ-ON-W} state was registered at:
[    5.823278]   bcm_sysport_get_stats64+0x17c/0x258
[    5.828053]   dev_get_stats+0x38/0xac
[    5.831776]   rtnl_fill_stats+0x30/0x118
[    5.835761]   rtnl_fill_ifinfo+0x538/0xe24
[    5.839921]   rtmsg_ifinfo_build_skb+0x6c/0xd8
[    5.844430]   rtmsg_ifinfo_event.part.5+0x14/0x44
[    5.849201]   rtmsg_ifinfo+0x20/0x28
[    5.852837]   register_netdevice+0x628/0x6b8
[    5.857171]   register_netdev+0x14/0x24
[    5.861051]   bcm_sysport_probe+0x30c/0x438
[    5.865280]   platform_drv_probe+0x50/0xb0
[    5.869418]   driver_probe_device+0x2e8/0x450
[    5.873817]   __driver_attach+0x104/0x120
[    5.877871]   bus_for_each_dev+0x7c/0xc0
[    5.881834]   bus_add_driver+0x1b0/0x270
[    5.885797]   driver_register+0x78/0xf4
[    5.889675]   do_one_initcall+0x54/0x190
[    5.893646]   kernel_init_freeable+0x144/0x1d0
[    5.898135]   kernel_init+0x8/0x110
[    5.901665]   ret_from_fork+0x14/0x2c
[    5.905363] irq event stamp: 24263
[    5.908804] hardirqs last  enabled at (24262): [<c08eecf0>] net_rx_action+0xc4/0x4e4
[    5.916624] hardirqs last disabled at (24263): [<c0a7da00>] _raw_spin_lock_irqsave+0x1c/0x98
[    5.925143] softirqs last  enabled at (24258): [<c022a7fc>] irq_enter+0x84/0x98
[    5.932524] softirqs last disabled at (24259): [<c022a918>] irq_exit+0x108/0x16c
[    5.939985]
[    5.939985] other info that might help us debug this:
[    5.946576]  Possible unsafe locking scenario:
[    5.946576]
[    5.952556]        CPU0
[    5.955031]        ----
[    5.957506]   lock(&syncp->seq#2);
[    5.960955]   <Interrupt>
[    5.963604]     lock(&syncp->seq#2);
[    5.967227]
[    5.967227]  *** DEADLOCK ***
[    5.967227]
[    5.973222] 1 lock held by swapper/0/0:
[    5.977092]  #0:  (&(&ring->lock)->rlock){..-...}, at: [<c0768a18>] bcm_sysport_tx_reclaim+0x20/0x54

So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64()
since it does not appear to be useful for anything. No inconsistency was
observed with either ifconfig or ethtool, global TX counts equal the sum of
per-queue TX counts on a 32-bit architecture.

Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 3 ---
 1 file changed, 3 deletions(-)

Comments

Eric Dumazet Sept. 12, 2017, 9:38 p.m. | #1
On Tue, 2017-09-12 at 13:14 -0700, Florian Fainelli wrote:
> We can enter a deadlock situation because there is no sufficient protection
> when ndo_get_stats64() runs in process context to guard against RX or TX NAPI
> contexts running in softirq, this can lead to the following lockdep splat and
> actual deadlock was experienced as well with an iperf session in the background
> and a while loop doing ifconfig + ethtool.

> So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64()
> since it does not appear to be useful for anything. No inconsistency was
> observed with either ifconfig or ethtool, global TX counts equal the sum of
> per-queue TX counts on a 32-bit architecture.
> 
> Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  drivers/net/ethernet/broadcom/bcmsysport.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
> index a6572b51435a..c3c53f6cd9e6 100644
> --- a/drivers/net/ethernet/broadcom/bcmsysport.c
> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c
> @@ -1735,11 +1735,8 @@ static void bcm_sysport_get_stats64(struct net_device *dev,
>  		stats->tx_packets += tx_packets;
>  	}
>  
> -	/* lockless update tx_bytes and tx_packets */
> -	u64_stats_update_begin(&priv->syncp);

Yes, this u64_stats_update_begin()/u64_stats_update_end() is bogus

But why do we even write on tx_bytes/tx_packets here ??? 

Seems very wrong anyway.

(ethtool -S does not call bcm_sysport_get_stats64() to refresh them )

>  	stats64->tx_bytes = stats->tx_bytes;
>  	stats64->tx_packets = stats->tx_packets;
> -	u64_stats_update_end(&priv->syncp);
>  
>  	do {
>  		start = u64_stats_fetch_begin_irq(&priv->syncp);
Florian Fainelli Sept. 12, 2017, 9:48 p.m. | #2
On 09/12/2017 02:38 PM, Eric Dumazet wrote:
> On Tue, 2017-09-12 at 13:14 -0700, Florian Fainelli wrote:
>> We can enter a deadlock situation because there is no sufficient protection
>> when ndo_get_stats64() runs in process context to guard against RX or TX NAPI
>> contexts running in softirq, this can lead to the following lockdep splat and
>> actual deadlock was experienced as well with an iperf session in the background
>> and a while loop doing ifconfig + ethtool.
> 
>> So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64()
>> since it does not appear to be useful for anything. No inconsistency was
>> observed with either ifconfig or ethtool, global TX counts equal the sum of
>> per-queue TX counts on a 32-bit architecture.
>>
>> Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics")
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  drivers/net/ethernet/broadcom/bcmsysport.c | 3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
>> index a6572b51435a..c3c53f6cd9e6 100644
>> --- a/drivers/net/ethernet/broadcom/bcmsysport.c
>> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c
>> @@ -1735,11 +1735,8 @@ static void bcm_sysport_get_stats64(struct net_device *dev,
>>  		stats->tx_packets += tx_packets;
>>  	}
>>  
>> -	/* lockless update tx_bytes and tx_packets */
>> -	u64_stats_update_begin(&priv->syncp);
> 
> Yes, this u64_stats_update_begin()/u64_stats_update_end() is bogus
> 
> But why do we even write on tx_bytes/tx_packets here ??? 

That's for the ethtool -S netdev stats copy (that's on me, I added that
in the driver initial version), so yes, not very robust...

> 
> Seems very wrong anyway.
> 
> (ethtool -S does not call bcm_sysport_get_stats64() to refresh them )

Yes that might actually be the simplest way to get this fixed.

> 
>>  	stats64->tx_bytes = stats->tx_bytes;
>>  	stats64->tx_packets = stats->tx_packets;
>> -	u64_stats_update_end(&priv->syncp);
>>  
>>  	do {
>>  		start = u64_stats_fetch_begin_irq(&priv->syncp);
> 
>
David Miller Sept. 15, 2017, 9:25 p.m. | #3
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 12 Sep 2017 13:14:26 -0700

> We can enter a deadlock situation because there is no sufficient protection
> when ndo_get_stats64() runs in process context to guard against RX or TX NAPI
> contexts running in softirq, this can lead to the following lockdep splat and
> actual deadlock was experienced as well with an iperf session in the background
> and a while loop doing ifconfig + ethtool.
 ...
> So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64()
> since it does not appear to be useful for anything. No inconsistency was
> observed with either ifconfig or ethtool, global TX counts equal the sum of
> per-queue TX counts on a 32-bit architecture.
> 
> Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied.
Florian Fainelli Sept. 15, 2017, 9:28 p.m. | #4
On September 15, 2017 2:25:11 PM PDT, David Miller <davem@davemloft.net> wrote:
>From: Florian Fainelli <f.fainelli@gmail.com>
>Date: Tue, 12 Sep 2017 13:14:26 -0700
>
>> We can enter a deadlock situation because there is no sufficient
>protection
>> when ndo_get_stats64() runs in process context to guard against RX or
>TX NAPI
>> contexts running in softirq, this can lead to the following lockdep
>splat and
>> actual deadlock was experienced as well with an iperf session in the
>background
>> and a while loop doing ifconfig + ethtool.
> ...
>> So just remove the u64_stats_update_begin()/end() pair in
>ndo_get_stats64()
>> since it does not appear to be useful for anything. No inconsistency
>was
>> observed with either ifconfig or ethtool, global TX counts equal the
>sum of
>> per-queue TX counts on a 32-bit architecture.
>> 
>> Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics")
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>
>Applied.

FYI, there is another patch needed to ensure consistency between ethtool reported stats and netdevice stats, will submit that after some more testing. Thanks!
David Miller Sept. 15, 2017, 9:53 p.m. | #5
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Fri, 15 Sep 2017 14:28:40 -0700

> FYI, there is another patch needed to ensure consistency between
> ethtool reported stats and netdevice stats, will submit that after
> some more testing. Thanks!

Yes, I saw that, th anks.

Patch

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c
index a6572b51435a..c3c53f6cd9e6 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -1735,11 +1735,8 @@  static void bcm_sysport_get_stats64(struct net_device *dev,
 		stats->tx_packets += tx_packets;
 	}
 
-	/* lockless update tx_bytes and tx_packets */
-	u64_stats_update_begin(&priv->syncp);
 	stats64->tx_bytes = stats->tx_bytes;
 	stats64->tx_packets = stats->tx_packets;
-	u64_stats_update_end(&priv->syncp);
 
 	do {
 		start = u64_stats_fetch_begin_irq(&priv->syncp);