diff mbox

atl1c:Do not call cancel_work_sync from the work itself

Message ID 12493523461127-git-send-email-jie.yang@atheros.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

jie.yang@atheros.com Aug. 4, 2009, 2:19 a.m. UTC
Do not call cancel_work_sync from the work itself, for it my cause 
recursive locking.
detail info:
events/1/10 is trying to acquire lock:
(&adapter->reset_task){+.+...}, at: [<c043e384>] __cancel_work_timer+0x80/0x187

but task is already holding lock:
(&adapter->reset_task){+.+...}, at: [<c043ed6f>] worker_thread+0x127/0x234
other info that might help us debug this:

2 locks held by events/1/10:
#0:  (events){+.+.+.}, at: [<c043ed6f>] worker_thread+0x127/0x234
#1:  (&adapter->reset_task){+.+...}, at: [<c043ed6f>] worker_thread+0x127/0x234

stack backtrace:
Pid: 10, comm: events/1 Not tainted 2.6.31-rc2 #12
Call Trace:
[<c04519c9>] validate_chain+0x4ae/0xb26
[<c0451e8d>] ? validate_chain+0x972/0xb26
[<c0437f4b>] ? lock_timer_base+0x1f/0x3e
[<c04526f8>] __lock_acquire+0x6b7/0x745
[<c0452816>] lock_acquire+0x90/0xad
[<c043e384>] ? __cancel_work_timer+0x80/0x187
[<c043e3b1>] __cancel_work_timer+0xad/0x187
[<c043e384>] ? __cancel_work_timer+0x80/0x187
[<c044f6b3>] ? mark_held_locks+0x3d/0x58
[<c0690855>] ? _spin_unlock_irqrestore+0x36/0x3c
[<c044f7d5>] ? trace_hardirqs_on_caller+0x107/0x12f
[<c044f808>] ? trace_hardirqs_on+0xb/0xd
[<c0437fb2>] ? try_to_del_timer_sync+0x48/0x4f
[<c043e4a2>] cancel_work_sync+0xa/0xc
[<f8955f95>] atl1c_down+0x1f/0xde [atl1c]
[<f8956955>] atl1c_reset_task+0x1f/0x31 [atl1c]
[<c043edae>] worker_thread+0x166/0x234
[<c043ed6f>] ? worker_thread+0x127/0x234
[<f8956936>] ? atl1c_reset_task+0x0/0x31 [atl1c]
[<c0441ac5>] ? autoremove_wake_function+0x0/0x33
[<c043ec48>] ? worker_thread+0x0/0x234
[<c0441a25>] kthread+0x69/0x70
[<c04419bc>] ? kthread+0x0/0x70
[<c04034b7>] kernel_thread_helper+0x7/0x10

So when atl1c_reset_task be scheduled just set a flag in ctrl_flag, 
to demonstrate it is in reset_task, when atl1c_down is call it will not call 
cancel_work_sync(&adapter->reset_task) if it sees the flag.

Signed-off-by: jie yang <jie.yang@atheros.com>
---

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Aug. 4, 2009, 2:32 a.m. UTC | #1
From: <jie.yang@atheros.com>
Date: Tue, 4 Aug 2009 10:19:06 +0800

> Do not call cancel_work_sync from the work itself, for it my cause 
> recursive locking.

Sorry, this won't work.

Your patches, all of them, are corrupted by your email client.  It
turns tab characters into spaces as well as making other textual
modifications to the patch, such that it will not apply.

Please fix this up and resubmit all of your patches when the problem
is corrected.

If you have to, email the patches to yourself and try to apply them
just to make sure.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton Aug. 4, 2009, 7:52 a.m. UTC | #2
On Tue, 4 Aug 2009 10:19:06 +0800 <jie.yang@atheros.com> wrote:

> Do not call cancel_work_sync from the work itself, for it my cause 
> recursive locking.
> detail info:
> events/1/10 is trying to acquire lock:
> (&adapter->reset_task){+.+...}, at: [<c043e384>] __cancel_work_timer+0x80/0x187
> 
> but task is already holding lock:
> (&adapter->reset_task){+.+...}, at: [<c043ed6f>] worker_thread+0x127/0x234
> other info that might help us debug this:
> 
> 2 locks held by events/1/10:
> #0:  (events){+.+.+.}, at: [<c043ed6f>] worker_thread+0x127/0x234
> #1:  (&adapter->reset_task){+.+...}, at: [<c043ed6f>] worker_thread+0x127/0x234
> 
> stack backtrace:
> Pid: 10, comm: events/1 Not tainted 2.6.31-rc2 #12
> Call Trace:
> [<c04519c9>] validate_chain+0x4ae/0xb26
> [<c0451e8d>] ? validate_chain+0x972/0xb26
> [<c0437f4b>] ? lock_timer_base+0x1f/0x3e
> [<c04526f8>] __lock_acquire+0x6b7/0x745
> [<c0452816>] lock_acquire+0x90/0xad
> [<c043e384>] ? __cancel_work_timer+0x80/0x187
> [<c043e3b1>] __cancel_work_timer+0xad/0x187
> [<c043e384>] ? __cancel_work_timer+0x80/0x187
> [<c044f6b3>] ? mark_held_locks+0x3d/0x58
> [<c0690855>] ? _spin_unlock_irqrestore+0x36/0x3c
> [<c044f7d5>] ? trace_hardirqs_on_caller+0x107/0x12f
> [<c044f808>] ? trace_hardirqs_on+0xb/0xd
> [<c0437fb2>] ? try_to_del_timer_sync+0x48/0x4f
> [<c043e4a2>] cancel_work_sync+0xa/0xc
> [<f8955f95>] atl1c_down+0x1f/0xde [atl1c]
> [<f8956955>] atl1c_reset_task+0x1f/0x31 [atl1c]
> [<c043edae>] worker_thread+0x166/0x234
> [<c043ed6f>] ? worker_thread+0x127/0x234
> [<f8956936>] ? atl1c_reset_task+0x0/0x31 [atl1c]
> [<c0441ac5>] ? autoremove_wake_function+0x0/0x33
> [<c043ec48>] ? worker_thread+0x0/0x234
> [<c0441a25>] kthread+0x69/0x70
> [<c04419bc>] ? kthread+0x0/0x70
> [<c04034b7>] kernel_thread_helper+0x7/0x10
> 
> So when atl1c_reset_task be scheduled just set a flag in ctrl_flag, 
> to demonstrate it is in reset_task, when atl1c_down is call it will not call 
> cancel_work_sync(&adapter->reset_task) if it sees the flag.
> 
> Signed-off-by: jie yang <jie.yang@atheros.com>
> ---
> diff --git a/drivers/net/atl1c/atl1c.h b/drivers/net/atl1c/atl1c.h
> index 2a1120a..53242dc 100644
> --- a/drivers/net/atl1c/atl1c.h
> +++ b/drivers/net/atl1c/atl1c.h
> @@ -427,6 +427,7 @@ struct atl1c_hw {
>  #define ATL1C_ASPM_CTRL_MON        0x0200
>  #define ATL1C_HIB_DISABLE      0x0400
>  #define ATL1C_LINK_CAP_1000M       0x0800
> +#define ATL1C_RESET_IN_WORK        0x1000
>  #define ATL1C_FPGA_VERSION     0x8000
>     u16 cmb_tpd;
>     u16 cmb_rrd;
> diff --git a/drivers/net/atl1c/atl1c_main.c b/drivers/net/atl1c/atl1c_main.c
> index 1d601ce..dec88fa 100644
> --- a/drivers/net/atl1c/atl1c_main.c
> +++ b/drivers/net/atl1c/atl1c_main.c
> @@ -321,7 +321,10 @@ static void atl1c_del_timer(struct atl1c_adapter *adapter)
>  
>  static void atl1c_cancel_work(struct atl1c_adapter *adapter)
>  {
> -   cancel_work_sync(&adapter->reset_task);
> +   if (adapter->hw.ctrl_flags & ATL1C_RESET_IN_WORK)
> +       adapter->hw.ctrl_flags &= ~ATL1C_RESET_IN_WORK;/* clear the flag */
> +   else
> +       cancel_work_sync(&adapter->reset_task);
>     cancel_work_sync(&adapter->link_chg_task);
>  }
>  
> @@ -1544,6 +1547,7 @@ static irqreturn_t atl1c_intr(int irq, void *data)
>             /* reset MAC */
>             hw->intr_mask &= ~ISR_ERROR;
>             AT_WRITE_REG(hw, REG_IMR, hw->intr_mask);
> +           adapter->hw.ctrl_flags |= ATL1C_RESET_IN_WORK;
>             schedule_work(&adapter->reset_task);
>             break;
>         }

The use of non-atomic bit operations upon ctrl_flags looks unsafe.  If
a CPU is running atl1c_cancel_work() and then an interrupt happens (on
this CPU or a different one), then the interrupt will set
ATL1C_RESET_IN_WORK.  But the non-atomic operation in
atl1c_cancel_work() can just overwrite that change.  


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/atl1c/atl1c.h b/drivers/net/atl1c/atl1c.h
index 2a1120a..53242dc 100644
--- a/drivers/net/atl1c/atl1c.h
+++ b/drivers/net/atl1c/atl1c.h
@@ -427,6 +427,7 @@  struct atl1c_hw {
 #define ATL1C_ASPM_CTRL_MON        0x0200
 #define ATL1C_HIB_DISABLE      0x0400
 #define ATL1C_LINK_CAP_1000M       0x0800
+#define ATL1C_RESET_IN_WORK        0x1000
 #define ATL1C_FPGA_VERSION     0x8000
    u16 cmb_tpd;
    u16 cmb_rrd;
diff --git a/drivers/net/atl1c/atl1c_main.c b/drivers/net/atl1c/atl1c_main.c
index 1d601ce..dec88fa 100644
--- a/drivers/net/atl1c/atl1c_main.c
+++ b/drivers/net/atl1c/atl1c_main.c
@@ -321,7 +321,10 @@  static void atl1c_del_timer(struct atl1c_adapter *adapter)
 
 static void atl1c_cancel_work(struct atl1c_adapter *adapter)
 {
-   cancel_work_sync(&adapter->reset_task);
+   if (adapter->hw.ctrl_flags & ATL1C_RESET_IN_WORK)
+       adapter->hw.ctrl_flags &= ~ATL1C_RESET_IN_WORK;/* clear the flag */
+   else
+       cancel_work_sync(&adapter->reset_task);
    cancel_work_sync(&adapter->link_chg_task);
 }
 
@@ -1544,6 +1547,7 @@  static irqreturn_t atl1c_intr(int irq, void *data)
            /* reset MAC */
            hw->intr_mask &= ~ISR_ERROR;
            AT_WRITE_REG(hw, REG_IMR, hw->intr_mask);
+           adapter->hw.ctrl_flags |= ATL1C_RESET_IN_WORK;
            schedule_work(&adapter->reset_task);
            break;
        }