Message ID | 1590161383-8141-1-git-send-email-leoyu@nvidia.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | net: stmmac: don't attach interface until resume finishes | expand |
From: Leon Yu <leoyu@nvidia.com> Date: Fri, 22 May 2020 23:29:43 +0800 > Commit 14b41a2959fb ("net: stmmac: Delete txtimer in suspend") was the > first attempt to fix a race between mod_timer() and setup_timer() > during stmmac_resume(). However the issue still exists as the commit > only addressed half of the issue. > > Same race can still happen as stmmac_resume() re-attaches interface > way too early - even before hardware is fully initialized. Worse, > doing so allows network traffic to restart and stmmac_tx_timer_arm() > being called in the middle of stmmac_resume(), which re-init tx timers > in stmmac_init_coalesce(). timer_list will be corrupted and system > crashes as a result of race between mod_timer() and setup_timer(). > > systemd--1995 2.... 552950018us : stmmac_suspend: 4994 > ksoftirq-9 0..s2 553123133us : stmmac_tx_timer_arm: 2276 > systemd--1995 0.... 553127896us : stmmac_resume: 5101 > systemd--320 7...2 553132752us : stmmac_tx_timer_arm: 2276 > (sd-exec-1999 5...2 553135204us : stmmac_tx_timer_arm: 2276 > --------------------------------- > pc : run_timer_softirq+0x468/0x5e0 > lr : run_timer_softirq+0x570/0x5e0 > Call trace: > run_timer_softirq+0x468/0x5e0 > __do_softirq+0x124/0x398 > irq_exit+0xd8/0xe0 > __handle_domain_irq+0x6c/0xc0 > gic_handle_irq+0x60/0xb0 > el1_irq+0xb8/0x180 > arch_cpu_idle+0x38/0x230 > default_idle_call+0x24/0x3c > do_idle+0x1e0/0x2b8 > cpu_startup_entry+0x28/0x48 > secondary_start_kernel+0x1b4/0x208 > > Fix this by deferring netif_device_attach() to the end of > stmmac_resume(). > > Signed-off-by: Leon Yu <leoyu@nvidia.com> Applied, thank you.
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index a999d6b33a64..1f319c9cee46 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -5190,8 +5190,6 @@ int stmmac_resume(struct device *dev) return ret; } - netif_device_attach(ndev); - mutex_lock(&priv->lock); stmmac_reset_queues_param(priv); @@ -5218,6 +5216,8 @@ int stmmac_resume(struct device *dev) phylink_mac_change(priv->phylink, true); + netif_device_attach(ndev); + return 0; } EXPORT_SYMBOL_GPL(stmmac_resume);
Commit 14b41a2959fb ("net: stmmac: Delete txtimer in suspend") was the first attempt to fix a race between mod_timer() and setup_timer() during stmmac_resume(). However the issue still exists as the commit only addressed half of the issue. Same race can still happen as stmmac_resume() re-attaches interface way too early - even before hardware is fully initialized. Worse, doing so allows network traffic to restart and stmmac_tx_timer_arm() being called in the middle of stmmac_resume(), which re-init tx timers in stmmac_init_coalesce(). timer_list will be corrupted and system crashes as a result of race between mod_timer() and setup_timer(). systemd--1995 2.... 552950018us : stmmac_suspend: 4994 ksoftirq-9 0..s2 553123133us : stmmac_tx_timer_arm: 2276 systemd--1995 0.... 553127896us : stmmac_resume: 5101 systemd--320 7...2 553132752us : stmmac_tx_timer_arm: 2276 (sd-exec-1999 5...2 553135204us : stmmac_tx_timer_arm: 2276 --------------------------------- pc : run_timer_softirq+0x468/0x5e0 lr : run_timer_softirq+0x570/0x5e0 Call trace: run_timer_softirq+0x468/0x5e0 __do_softirq+0x124/0x398 irq_exit+0xd8/0xe0 __handle_domain_irq+0x6c/0xc0 gic_handle_irq+0x60/0xb0 el1_irq+0xb8/0x180 arch_cpu_idle+0x38/0x230 default_idle_call+0x24/0x3c do_idle+0x1e0/0x2b8 cpu_startup_entry+0x28/0x48 secondary_start_kernel+0x1b4/0x208 Fix this by deferring netif_device_attach() to the end of stmmac_resume(). Signed-off-by: Leon Yu <leoyu@nvidia.com> --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)