diff mbox

[net-next] be2net: fix INTx ISR for interrupt behaviour on BE2

Message ID d856540b-bacc-45e3-a7cc-49e7febafb95@CMEXHTCAS1.ad.emulex.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Sathya Perla Nov. 28, 2012, 5:50 a.m. UTC
On BE2 chip, an interrupt may be raised even when EQ is in un-armed state.
As a result be_intx()::events_get() and be_poll:events_get() can race and
notify an EQ wrongly.

Fix this by counting events only in be_poll(). Commit 0b545a629 fixes
the same issue in the MSI-x path.

But, on Lancer, INTx can be de-asserted only by notifying num evts. This
is not an issue as the above BE2 behavior doesn't exist/has never been
seen on Lancer.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c |   54 +++++++++++---------------
 1 files changed, 23 insertions(+), 31 deletions(-)

Comments

David Miller Nov. 28, 2012, 4:35 p.m. UTC | #1
From: Sathya Perla <sathya.perla@emulex.com>
Date: Wed, 28 Nov 2012 11:20:02 +0530

> On BE2 chip, an interrupt may be raised even when EQ is in un-armed state.
> As a result be_intx()::events_get() and be_poll:events_get() can race and
> notify an EQ wrongly.
> 
> Fix this by counting events only in be_poll(). Commit 0b545a629 fixes
> the same issue in the MSI-x path.
> 
> But, on Lancer, INTx can be de-asserted only by notifying num evts. This
> is not an issue as the above BE2 behavior doesn't exist/has never been
> seen on Lancer.
> 
> Signed-off-by: Sathya Perla <sathya.perla@emulex.com>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Nov. 28, 2012, 8:20 p.m. UTC | #2
On Wed, 2012-11-28 at 11:20 +0530, Sathya Perla wrote:
> On BE2 chip, an interrupt may be raised even when EQ is in un-armed state.
> As a result be_intx()::events_get() and be_poll:events_get() can race and
> notify an EQ wrongly.
> 
> Fix this by counting events only in be_poll(). Commit 0b545a629 fixes
> the same issue in the MSI-x path.
> 
> But, on Lancer, INTx can be de-asserted only by notifying num evts. This
> is not an issue as the above BE2 behavior doesn't exist/has never been
> seen on Lancer.
[...]
> @@ -2014,15 +1996,23 @@ static int be_rx_cqs_create(struct be_adapter *adapter)
>  
>  static irqreturn_t be_intx(int irq, void *dev)
>  {
> -	struct be_adapter *adapter = dev;
> -	int num_evts;
> +	struct be_eq_obj *eqo = dev;
> +	struct be_adapter *adapter = eqo->adapter;
> +	int num_evts = 0;
>  
> -	/* With INTx only one EQ is used */
> -	num_evts = event_handle(&adapter->eq_obj[0]);
> -	if (num_evts)
> -		return IRQ_HANDLED;
> -	else
> -		return IRQ_NONE;
> +	/* On Lancer, clear-intr bit of the EQ DB does not work.
> +	 * INTx is de-asserted only on notifying num evts.
> +	 */
> +	if (lancer_chip(adapter))
> +		num_evts = events_get(eqo);
> +
> +	/* The EQ-notify may not de-assert INTx rightaway, causing
> +	 * the ISR to be invoked again. So, return HANDLED even when
> +	 * num_evts is zero.
> +	 */
> +	be_eq_notify(adapter, eqo->q.id, false, true, num_evts);
> +	napi_schedule(&eqo->napi);
> +	return IRQ_HANDLED;
>  }
[...]

You shouldn't unconditionally return IRQ_HANDLED.  This prevents
interrupt storm detection from working, not just for your device but for
anything else sharing its IRQ.

I understand there is a real problem to be fixed (PCIe write completions
overtaking INTx deassertion, and maybe a specific hardware bug).  The
way we dealt with such problems in sfc is to count the number of times
in a row that we don't see any events, and only return IRQ_HANDLED the
first time.

Ben.
Ben Hutchings Nov. 28, 2012, 8:25 p.m. UTC | #3
On Wed, 2012-11-28 at 20:20 +0000, Ben Hutchings wrote:
> On Wed, 2012-11-28 at 11:20 +0530, Sathya Perla wrote:
> > On BE2 chip, an interrupt may be raised even when EQ is in un-armed state.
> > As a result be_intx()::events_get() and be_poll:events_get() can race and
> > notify an EQ wrongly.
> > 
> > Fix this by counting events only in be_poll(). Commit 0b545a629 fixes
> > the same issue in the MSI-x path.
> > 
> > But, on Lancer, INTx can be de-asserted only by notifying num evts. This
> > is not an issue as the above BE2 behavior doesn't exist/has never been
> > seen on Lancer.
> [...]
> > @@ -2014,15 +1996,23 @@ static int be_rx_cqs_create(struct be_adapter *adapter)
> >  
> >  static irqreturn_t be_intx(int irq, void *dev)
> >  {
> > -	struct be_adapter *adapter = dev;
> > -	int num_evts;
> > +	struct be_eq_obj *eqo = dev;
> > +	struct be_adapter *adapter = eqo->adapter;
> > +	int num_evts = 0;
> >  
> > -	/* With INTx only one EQ is used */
> > -	num_evts = event_handle(&adapter->eq_obj[0]);
> > -	if (num_evts)
> > -		return IRQ_HANDLED;
> > -	else
> > -		return IRQ_NONE;
> > +	/* On Lancer, clear-intr bit of the EQ DB does not work.
> > +	 * INTx is de-asserted only on notifying num evts.
> > +	 */
> > +	if (lancer_chip(adapter))
> > +		num_evts = events_get(eqo);
> > +
> > +	/* The EQ-notify may not de-assert INTx rightaway, causing
> > +	 * the ISR to be invoked again. So, return HANDLED even when
> > +	 * num_evts is zero.
> > +	 */
> > +	be_eq_notify(adapter, eqo->q.id, false, true, num_evts);
> > +	napi_schedule(&eqo->napi);
> > +	return IRQ_HANDLED;
> >  }
> [...]
> 
> You shouldn't unconditionally return IRQ_HANDLED.  This prevents
> interrupt storm detection from working, not just for your device but for
> anything else sharing its IRQ.
> 
> I understand there is a real problem to be fixed (PCIe write completions
> overtaking INTx deassertion, and maybe a specific hardware bug).
[...]

I was thinking of read completions; there are no write completions to
wait for so you're pretty much guaranteed to get called a second time.
Maybe you should add an MMIO read after calling be_eq_notify().

Ben.
Ben Hutchings Dec. 20, 2012, 9:20 p.m. UTC | #4
On Tue, 2012-12-18 at 12:57 +0000, Perla, Sathya wrote:
> >-----Original Message-----
> >From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> 
> >On Wed, 2012-11-28 at 20:20 +0000, Ben Hutchings wrote:
> >> On Wed, 2012-11-28 at 11:20 +0530, Sathya Perla wrote:
> >> > On BE2 chip, an interrupt may be raised even when EQ is in un-armed state.
> >> > As a result be_intx()::events_get() and be_poll:events_get() can race and
> >> > notify an EQ wrongly.
> >> >
> >> > Fix this by counting events only in be_poll(). Commit 0b545a629 fixes
> >> > the same issue in the MSI-x path.
> >> >
> >> > But, on Lancer, INTx can be de-asserted only by notifying num evts. This
> >> > is not an issue as the above BE2 behavior doesn't exist/has never been
> >> > seen on Lancer.
> >> [...]
> >> > @@ -2014,15 +1996,23 @@ static int be_rx_cqs_create(struct be_adapter
> >*adapter)
> >> >
> >> >  static irqreturn_t be_intx(int irq, void *dev)
> >> >  {
> >> > -	struct be_adapter *adapter = dev;
> >> > -	int num_evts;
> >> > +	struct be_eq_obj *eqo = dev;
> >> > +	struct be_adapter *adapter = eqo->adapter;
> >> > +	int num_evts = 0;
> >> >
> >> > -	/* With INTx only one EQ is used */
> >> > -	num_evts = event_handle(&adapter->eq_obj[0]);
> >> > -	if (num_evts)
> >> > -		return IRQ_HANDLED;
> >> > -	else
> >> > -		return IRQ_NONE;
> >> > +	/* On Lancer, clear-intr bit of the EQ DB does not work.
> >> > +	 * INTx is de-asserted only on notifying num evts.
> >> > +	 */
> >> > +	if (lancer_chip(adapter))
> >> > +		num_evts = events_get(eqo);
> >> > +
> >> > +	/* The EQ-notify may not de-assert INTx rightaway, causing
> >> > +	 * the ISR to be invoked again. So, return HANDLED even when
> >> > +	 * num_evts is zero.
> >> > +	 */
> >> > +	be_eq_notify(adapter, eqo->q.id, false, true, num_evts);
> >> > +	napi_schedule(&eqo->napi);
> >> > +	return IRQ_HANDLED;
> >> >  }
> >> [...]
> >>
> >> You shouldn't unconditionally return IRQ_HANDLED.  This prevents
> >> interrupt storm detection from working, not just for your device but for
> >> anything else sharing its IRQ.
> >>
> >> I understand there is a real problem to be fixed (PCIe write completions
> >> overtaking INTx deassertion, and maybe a specific hardware bug).
> >[...]
> >
> >I was thinking of read completions; there are no write completions to
> >wait for so you're pretty much guaranteed to get called a second time.
> >Maybe you should add an MMIO read after calling be_eq_notify().
> 
> Ben, I'm very sorry for not replying to this thread earlier. I needed time to play with
> your suggested solution and better understand the HW behavior in this case.
> 
> Adding an extra PCI memory read after the EQ-notify helps in syncing the PCI de-assert
> message *only* if it was already issued.
> But, there are cases when the HW block takes some time (longer) to initiate the INTx de-assert.
> The PCI memory read would complete but the de-assert wouldn't have happened yet.
> The PCI read sync will work only if the de-assert was issued *before* the PCI-read request was seen by the HW.

Yes, I've seen the exact same problem with our controllers.  At the PCIe
transaction level, legacy interrupt messages and read completions are
queued and flow-controlled separately.  If the chip's PCIe core doesn't
provide a way to restrict reordering of the two TLPs, this seems to be
inevitable.

What we do in sfc is to count the number of times in a row we've seen no
reason for the interrupt (irq_zero_count), and return IRQ_HANDLED only
if this is the first time.  This might work for you too.

Ben.
diff mbox

Patch

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index adef536..0661e93 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1675,24 +1675,6 @@  static inline int events_get(struct be_eq_obj *eqo)
 	return num;
 }
 
-static int event_handle(struct be_eq_obj *eqo)
-{
-	bool rearm = false;
-	int num = events_get(eqo);
-
-	/* Deal with any spurious interrupts that come without events */
-	if (!num)
-		rearm = true;
-
-	if (num || msix_enabled(eqo->adapter))
-		be_eq_notify(eqo->adapter, eqo->q.id, rearm, true, num);
-
-	if (num)
-		napi_schedule(&eqo->napi);
-
-	return num;
-}
-
 /* Leaves the EQ is disarmed state */
 static void be_eq_clean(struct be_eq_obj *eqo)
 {
@@ -2014,15 +1996,23 @@  static int be_rx_cqs_create(struct be_adapter *adapter)
 
 static irqreturn_t be_intx(int irq, void *dev)
 {
-	struct be_adapter *adapter = dev;
-	int num_evts;
+	struct be_eq_obj *eqo = dev;
+	struct be_adapter *adapter = eqo->adapter;
+	int num_evts = 0;
 
-	/* With INTx only one EQ is used */
-	num_evts = event_handle(&adapter->eq_obj[0]);
-	if (num_evts)
-		return IRQ_HANDLED;
-	else
-		return IRQ_NONE;
+	/* On Lancer, clear-intr bit of the EQ DB does not work.
+	 * INTx is de-asserted only on notifying num evts.
+	 */
+	if (lancer_chip(adapter))
+		num_evts = events_get(eqo);
+
+	/* The EQ-notify may not de-assert INTx rightaway, causing
+	 * the ISR to be invoked again. So, return HANDLED even when
+	 * num_evts is zero.
+	 */
+	be_eq_notify(adapter, eqo->q.id, false, true, num_evts);
+	napi_schedule(&eqo->napi);
+	return IRQ_HANDLED;
 }
 
 static irqreturn_t be_msix(int irq, void *dev)
@@ -2342,10 +2332,10 @@  static int be_irq_register(struct be_adapter *adapter)
 			return status;
 	}
 
-	/* INTx */
+	/* INTx: only the first EQ is used */
 	netdev->irq = adapter->pdev->irq;
 	status = request_irq(netdev->irq, be_intx, IRQF_SHARED, netdev->name,
-			adapter);
+			     &adapter->eq_obj[0]);
 	if (status) {
 		dev_err(&adapter->pdev->dev,
 			"INTx request IRQ failed - err %d\n", status);
@@ -2367,7 +2357,7 @@  static void be_irq_unregister(struct be_adapter *adapter)
 
 	/* INTx */
 	if (!msix_enabled(adapter)) {
-		free_irq(netdev->irq, adapter);
+		free_irq(netdev->irq, &adapter->eq_obj[0]);
 		goto done;
 	}
 
@@ -3023,8 +3013,10 @@  static void be_netpoll(struct net_device *netdev)
 	struct be_eq_obj *eqo;
 	int i;
 
-	for_all_evt_queues(adapter, eqo, i)
-		event_handle(eqo);
+	for_all_evt_queues(adapter, eqo, i) {
+		be_eq_notify(eqo->adapter, eqo->q.id, false, true, 0);
+		napi_schedule(&eqo->napi);
+	}
 
 	return;
 }