Patchwork [for,2.6.31] cxgb3: Fix crash caused by stashing wrong netdev_queue

login
register
mail settings
Submitter Roland Dreier
Date July 9, 2009, 7:30 p.m.
Message ID <adabpnty7qm.fsf_-_@cisco.com>
Download mbox | patch
Permalink /patch/29640/
State Accepted
Delegated to: David Miller
Headers show

Comments

Roland Dreier - July 9, 2009, 7:30 p.m.
Commit c3a8c5b6 ("cxgb3: move away from LLTX") exposed a bug in how
cxgb3 looks up the netdev_queue it stashes away in a qset during
initialization.  For multiport devices, the TX queue index it uses is
offset by the first_qset index of each port.  This leads to a crash
once LLTX is removed, since hard_start_xmit is called with one TX
queue lock held, while the TX reclaim timer task grabs a different
(wrong) TX queue lock when it frees skbs.

Fix this by removing the first_qset offset used to look up the TX
queue passed into t3_sge_alloc_qset() from setup_sge_qsets().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
---
OK, found the bug that was causing the crash I saw.  With this patch
everything looks solid again.  Please apply.

 drivers/net/cxgb3/cxgb3_main.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Divy Le Ray - July 9, 2009, 9:15 p.m.
Roland Dreier wrote:
> Commit c3a8c5b6 ("cxgb3: move away from LLTX") exposed a bug in how
> cxgb3 looks up the netdev_queue it stashes away in a qset during
> initialization.  For multiport devices, the TX queue index it uses is
> offset by the first_qset index of each port.  This leads to a crash
> once LLTX is removed, since hard_start_xmit is called with one TX
> queue lock held, while the TX reclaim timer task grabs a different
> (wrong) TX queue lock when it frees skbs.
>
> Fix this by removing the first_qset offset used to look up the TX
> queue passed into t3_sge_alloc_qset() from setup_sge_qsets().
>   

Thanks Roland!
You were very fast to fix it, you beat us.

Acked-by: Divy Le Ray <divy@chelsio.com>

> Signed-off-by: Roland Dreier <rolandd@cisco.com>
> ---
> OK, found the bug that was causing the crash I saw.  With this patch
> everything looks solid again.  Please apply.
>
>  drivers/net/cxgb3/cxgb3_main.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
> index 538dda4..fb5df5c 100644
> --- a/drivers/net/cxgb3/cxgb3_main.c
> +++ b/drivers/net/cxgb3/cxgb3_main.c
> @@ -642,8 +642,7 @@ static int setup_sge_qsets(struct adapter *adap)
>  		struct port_info *pi = netdev_priv(dev);
>  
>  		pi->qs = &adap->sge.qs[pi->first_qset];
> -		for (j = pi->first_qset; j < pi->first_qset + pi->nqsets;
> -		     ++j, ++qset_idx) {
> +		for (j = 0; j < pi->nqsets; ++j, ++qset_idx) {
>  			set_qset_lro(dev, qset_idx, pi->rx_offload & T3_LRO);
>  			err = t3_sge_alloc_qset(adap, qset_idx, 1,
>  				(adap->flags & USING_MSIX) ? qset_idx + 1 :
>   

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roland Dreier - July 9, 2009, 11:16 p.m.
> You were very fast to fix it, you beat us.

motivated -- I wanted to be able to use the system where I saw the crash :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 10, 2009, 12:16 a.m.
From: Divy Le Ray <divy@chelsio.com>
Date: Thu, 09 Jul 2009 14:15:21 -0700

> Roland Dreier wrote:
>> Commit c3a8c5b6 ("cxgb3: move away from LLTX") exposed a bug in how
>> cxgb3 looks up the netdev_queue it stashes away in a qset during
>> initialization.  For multiport devices, the TX queue index it uses is
>> offset by the first_qset index of each port.  This leads to a crash
>> once LLTX is removed, since hard_start_xmit is called with one TX
>> queue lock held, while the TX reclaim timer task grabs a different
>> (wrong) TX queue lock when it frees skbs.
>>
>> Fix this by removing the first_qset offset used to look up the TX
>> queue passed into t3_sge_alloc_qset() from setup_sge_qsets().
>>   
> 
> Thanks Roland!
> You were very fast to fix it, you beat us.
> 
> Acked-by: Divy Le Ray <divy@chelsio.com>
> 
>> Signed-off-by: Roland Dreier <rolandd@cisco.com>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index 538dda4..fb5df5c 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -642,8 +642,7 @@  static int setup_sge_qsets(struct adapter *adap)
 		struct port_info *pi = netdev_priv(dev);
 
 		pi->qs = &adap->sge.qs[pi->first_qset];
-		for (j = pi->first_qset; j < pi->first_qset + pi->nqsets;
-		     ++j, ++qset_idx) {
+		for (j = 0; j < pi->nqsets; ++j, ++qset_idx) {
 			set_qset_lro(dev, qset_idx, pi->rx_offload & T3_LRO);
 			err = t3_sge_alloc_qset(adap, qset_idx, 1,
 				(adap->flags & USING_MSIX) ? qset_idx + 1 :