Patchwork [net,bna] Fix call trace when interrupts are disabled while sleeping function kzalloc is called

login
register
mail settings
Submitter Shyam Iyer
Date June 27, 2011, 8:21 p.m.
Message ID <1309206092-23064-1-git-send-email-shyam_iyer@dell.com>
Download mbox | patch
Permalink /patch/102270/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Shyam Iyer - June 27, 2011, 8:21 p.m.
The kzalloc sleeps and disabling interrupts(spin_lock_irqsave) causes oops like the one.

Jun 27 08:15:24 home-t710 kernel: [11735.634550] Brocade 10G Ethernet driver
Jun 27 08:15:24 home-t710 kernel: [11735.634590] bnad_pci_probe : (0xffff880427f3d000, 0xffffffffa020f3e0) PCI Func : (2)
Jun 27 08:15:24 home-t710 kernel: [11735.637677] bna 0000:82:00.2: PCI INT A -> GSI 66 (level, low) -> IRQ 66
Jun 27 08:15:24 home-t710 kernel: [11735.638290] bar0 mapped to ffffc90014980000, len 262144
Jun 27 08:15:24 home-t710 kernel: [11735.638732] BUG: sleeping function called from invalid context at mm/slub.c:847
Jun 27 08:15:24 home-t710 kernel: [11735.638736] in_atomic(): 0, irqs_disabled(): 1, pid: 11243, name: insmod
Jun 27 08:15:24 home-t710 kernel: [11735.638740] Pid: 11243, comm: insmod Not tainted 3.0.0-rc4+ #6
Jun 27 08:15:24 home-t710 kernel: [11735.638743] Call Trace:
Jun 27 08:15:24 home-t710 kernel: [11735.638755]  [<ffffffff81046427>] __might_sleep+0xeb/0xf0
Jun 27 08:15:24 home-t710 kernel: [11735.638766]  [<ffffffffa01fe469>] ? netif_wake_queue+0x3d/0x3d [bna]
Jun 27 08:15:24 home-t710 kernel: [11735.638773]  [<ffffffff8111201c>] kmem_cache_alloc_trace+0x43/0xd8
Jun 27 08:15:24 home-t710 kernel: [11735.638782]  [<ffffffffa01fe469>] ? netif_wake_queue+0x3d/0x3d [bna]
Jun 27 08:15:24 home-t710 kernel: [11735.638787]  [<ffffffff810ab791>] request_threaded_irq+0xa1/0x113
Jun 27 08:15:24 home-t710 kernel: [11735.638798]  [<ffffffffa020f0c0>] bnad_pci_probe+0x612/0x8e5 [bna]
Jun 27 08:15:24 home-t710 kernel: [11735.638807]  [<ffffffffa01fe469>] ? netif_wake_queue+0x3d/0x3d [bna]
Jun 27 08:15:24 home-t710 kernel: [11735.638816]  [<ffffffff81482ef4>] ? _raw_spin_unlock_irqrestore+0x17/0x19
Jun 27 08:15:24 home-t710 kernel: [11735.638822]  [<ffffffff8124d17a>] local_pci_probe+0x44/0x75
Jun 27 08:15:24 home-t710 kernel: [11735.638826]  [<ffffffff8124dc06>] pci_device_probe+0xd0/0xff
Jun 27 08:15:24 home-t710 kernel: [11735.638832]  [<ffffffff812ef8ab>] driver_probe_device+0x131/0x213
Jun 27 08:15:24 home-t710 kernel: [11735.638836]  [<ffffffff812ef9e7>] __driver_attach+0x5a/0x7e
Jun 27 08:15:24 home-t710 kernel: [11735.638840]  [<ffffffff812ef98d>] ? driver_probe_device+0x213/0x213
Jun 27 08:15:24 home-t710 kernel: [11735.638844]  [<ffffffff812ee933>] bus_for_each_dev+0x53/0x89
Jun 27 08:15:24 home-t710 kernel: [11735.638848]  [<ffffffff812ef48a>] driver_attach+0x1e/0x20
Jun 27 08:15:24 home-t710 kernel: [11735.638852]  [<ffffffff812ef0ae>] bus_add_driver+0xd1/0x224
Jun 27 08:15:24 home-t710 kernel: [11735.638858]  [<ffffffffa01b8000>] ? 0xffffffffa01b7fff
Jun 27 08:15:24 home-t710 kernel: [11735.638862]  [<ffffffff812efe57>] driver_register+0x98/0x105
Jun 27 08:15:24 home-t710 kernel: [11735.638866]  [<ffffffffa01b8000>] ? 0xffffffffa01b7fff
Jun 27 08:15:24 home-t710 kernel: [11735.638871]  [<ffffffff8124e4c9>] __pci_register_driver+0x56/0xc1
Jun 27 08:15:24 home-t710 kernel: [11735.638875]  [<ffffffffa01b8000>] ? 0xffffffffa01b7fff
Jun 27 08:15:24 home-t710 kernel: [11735.638884]  [<ffffffffa01b8040>] bnad_module_init+0x40/0x60 [bna]
Jun 27 08:15:24 home-t710 kernel: [11735.638892]  [<ffffffff81002099>] do_one_initcall+0x7f/0x136
Jun 27 08:15:24 home-t710 kernel: [11735.638899]  [<ffffffff8108608b>] sys_init_module+0x88/0x1d0
Jun 27 08:15:24 home-t710 kernel: [11735.638906]  [<ffffffff81489682>] system_call_fastpath+0x16/0x1b
Jun 27 08:15:24 home-t710 kernel: [11735.639642] bnad_pci_probe : (0xffff880427f3e000, 0xffffffffa020f3e0) PCI Func : (3)
Jun 27 08:15:24 home-t710 kernel: [11735.639665] bna 0000:82:00.3: PCI INT A -> GSI 66 (level, low) -> IRQ 66
Jun 27 08:15:24 home-t710 kernel: [11735.639735] bar0 mapped to ffffc90014400000, len 262144


Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
---
 drivers/net/bna/bnad.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)
David Miller - June 27, 2011, 11:07 p.m.
From: Shyam Iyer <shyam.iyer.t@gmail.com>
Date: Mon, 27 Jun 2011 16:21:32 -0400

> The kzalloc sleeps and disabling interrupts(spin_lock_irqsave)
> causes oops like the one.

What if ->cfg_flags changes while you have dropped the lock?

If the lock doesn't protect those flags, what was it being
taken for in the first place?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rasesh Mody - June 28, 2011, 12:51 a.m.
>From: Shyam Iyer [mailto:shyam.iyer.t@gmail.com]
>Sent: Monday, June 27, 2011 1:22 PM
>Subject: [PATCH] [net][bna] Fix call trace when interrupts are disabled
>while sleeping function kzalloc is called
>
>The kzalloc sleeps and disabling interrupts(spin_lock_irqsave) causes
>oops like the one.

Hi Shyam,

We are not calling any sleeping function while holding the lock in bnad_mbox_irq_alloc(). How would your patch fix the call trace? Also can you tell which conditions led to this trace.

Thanks,
--Rasesh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shyam_Iyer@Dell.com - June 28, 2011, 4:29 p.m.
(My replies bounced of netdev for some reason..resending)

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Monday, June 27, 2011 7:07 PM
> To: shyam.iyer.t@gmail.com
> Cc: netdev@vger.kernel.org; rmody@brocade.com; ddutt@brocadel.com;
> Iyer, Shyam
> Subject: Re: [PATCH] [net][bna] Fix call trace when interrupts are
> disabled while sleeping function kzalloc is called
> 
> From: Shyam Iyer <shyam.iyer.t@gmail.com>
> Date: Mon, 27 Jun 2011 16:21:32 -0400
> 
> > The kzalloc sleeps and disabling interrupts(spin_lock_irqsave)
> > causes oops like the one.
> 
> What if ->cfg_flags changes while you have dropped the lock?
> 
> If the lock doesn't protect those flags, what was it being
> taken for in the first place?

Oops yes!

Here is try 2.

I found that I had to initialize the flags variable outside the spin_lock_irqsave in the following way to fix the problem. The sleeping kzalloc function is in fact in the request_threaded_irq call which allocates the irq handler function and its flags arguments.

Attached patch fixes the issue.
Shyam_Iyer@Dell.com - June 28, 2011, 4:32 p.m.
(Resending the reply for this as well.)

> -----Original Message-----
> From: Rasesh Mody [mailto:rmody@brocade.com]
> Sent: Monday, June 27, 2011 8:51 PM
> To: Shyam Iyer; netdev@vger.kernel.org
> Cc: ddutt@brocadel.com; Iyer, Shyam; Jing Huang
> Subject: RE: [PATCH] [net][bna] Fix call trace when interrupts are
> disabled while sleeping function kzalloc is called
> 
> 
> >From: Shyam Iyer [mailto:shyam.iyer.t@gmail.com]
> >Sent: Monday, June 27, 2011 1:22 PM
> >Subject: [PATCH] [net][bna] Fix call trace when interrupts are
> disabled
> >while sleeping function kzalloc is called
> >
> >The kzalloc sleeps and disabling interrupts(spin_lock_irqsave) causes
> >oops like the one.
> 
> Hi Shyam,
> 
> We are not calling any sleeping function while holding the lock in
> bnad_mbox_irq_alloc(). How would your patch fix the call trace? Also
> can you tell which conditions led to this trace.
> 
> Thanks,
> --Rasesh

Hi Rasesh,

Please review my reply/patch2 to Dave. I guess the condition to reproduce is to simply insmod/modprobe the driver on the Brocade 1020 CNA . I don't have cables attached to the card if that helps.

Thanks,
Shyam
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rasesh Mody - June 28, 2011, 5:27 p.m.
>From: Shyam_Iyer@Dell.com [mailto:Shyam_Iyer@Dell.com]
>Sent: Tuesday, June 28, 2011 9:30 AM
>
>Here is try 2.
>
>I found that I had to initialize the flags variable outside the
>spin_lock_irqsave in the following way to fix the problem. The sleeping
>kzalloc function is in fact in the request_threaded_irq call which
>allocates the irq handler function and its flags arguments.
>
>Attached patch fixes the issue.

The attached patch may fix the issue in MSIX mode, however we'll continue to see the issue in INTX mode. Using a separate irq_flags may be a good idea.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shyam_Iyer@Dell.com - June 28, 2011, 6:48 p.m.
> -----Original Message-----
> From: Rasesh Mody [mailto:rmody@brocade.com]
> Sent: Tuesday, June 28, 2011 1:28 PM
> To: Iyer, Shyam; davem@davemloft.net; shyam.iyer.t@gmail.com
> Cc: netdev@vger.kernel.org; Debashis Dutt; Jing Huang
> Subject: RE: [PATCH] [net][bna] Fix call trace when interrupts are
> disabled while sleeping function kzalloc is called
> 
> >From: Shyam_Iyer@Dell.com [mailto:Shyam_Iyer@Dell.com]
> >Sent: Tuesday, June 28, 2011 9:30 AM
> >
> >Here is try 2.
> >
> >I found that I had to initialize the flags variable outside the
> >spin_lock_irqsave in the following way to fix the problem. The
> sleeping
> >kzalloc function is in fact in the request_threaded_irq call which
> >allocates the irq handler function and its flags arguments.
> >
> >Attached patch fixes the issue.
> 
> The attached patch may fix the issue in MSIX mode, however we'll
> continue to see the issue in INTX mode. Using a separate irq_flags may
> be a good idea.

You are right.. Resending another version.
-Shyam
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/bna/bnad.c b/drivers/net/bna/bnad.c
index 7d25a97..04da3c8 100644
--- a/drivers/net/bna/bnad.c
+++ b/drivers/net/bna/bnad.c
@@ -1114,6 +1114,11 @@  bnad_mbox_irq_alloc(struct bnad *bnad,
 	unsigned long 	flags;
 	u32	irq;
 	irq_handler_t 	irq_handler;
+	u32	cfg_flags;
+
+	spin_lock_irqsave(&bnad->bna_lock, flags);
+	cfg_flags = bnad->cfg_flags;
+	spin_unlock_irqrestore(&bnad->bna_lock, flags);
 
 	/* Mbox should use only 1 vector */
 
@@ -1121,8 +1126,7 @@  bnad_mbox_irq_alloc(struct bnad *bnad,
 	if (!intr_info->idl)
 		return -ENOMEM;
 
-	spin_lock_irqsave(&bnad->bna_lock, flags);
-	if (bnad->cfg_flags & BNAD_CF_MSIX) {
+	if (cfg_flags & BNAD_CF_MSIX) {
 		irq_handler = (irq_handler_t)bnad_msix_mbox_handler;
 		irq = bnad->msix_table[bnad->msix_num - 1].vector;
 		flags = 0;
@@ -1135,7 +1139,6 @@  bnad_mbox_irq_alloc(struct bnad *bnad,
 		intr_info->intr_type = BNA_INTR_T_INTX;
 		/* intr_info->idl.vector = 0 ? */
 	}
-	spin_unlock_irqrestore(&bnad->bna_lock, flags);
 
 	sprintf(bnad->mbox_irq_name, "%s", BNAD_NAME);