diff mbox

[RFC,net-next,00/11] net: remove disable_irq() from ->ndo_poll_controller

Message ID 20141211214537.GA4159@kria
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Sabrina Dubroca Dec. 11, 2014, 9:45 p.m. UTC
2014-12-09, 21:44:33 -0500, David Miller wrote:
> 
> Adding a new spinlock to every interrupt service routine is
> simply a non-starter.
> 
> You will certainly have to find a way to fix this in a way
> that doesn't involve adding any new overhead to the normal
> operational paths of these drivers.
> 
> Thanks.

Okay. Here is another idea.

Since the issue is with the wait_event() part of synchronize_irq(),
and it only takes care of threaded handlers, maybe we could try not
waiting for threaded handlers.

Introduce disable_irq_nosleep() that returns true if it successfully
synchronized against all handlers (there was no threaded handler
running), false if it left some threads running.  And in
->ndo_poll_controller, only call the interrupt handler if
synchronization was successful.

Both users of the poll controllers retry their action (alloc/xmit an
skb) several times, with calls to the device's poll controller between
attempts.  And hopefully, if the first attempt fails, we will still
manage to get through?

Comments

David Miller Dec. 12, 2014, 2:14 a.m. UTC | #1
From: Sabrina Dubroca <sd@queasysnail.net>
Date: Thu, 11 Dec 2014 22:45:37 +0100

> Introduce disable_irq_nosleep() that returns true if it successfully
> synchronized against all handlers (there was no threaded handler
> running), false if it left some threads running.  And in
> ->ndo_poll_controller, only call the interrupt handler if
> synchronization was successful.
> 
> Both users of the poll controllers retry their action (alloc/xmit an
> skb) several times, with calls to the device's poll controller between
> attempts.  And hopefully, if the first attempt fails, we will still
> manage to get through?

This looks more palatable to me.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Gleixner Dec. 12, 2014, 10:01 p.m. UTC | #2
On Thu, 11 Dec 2014, Sabrina Dubroca wrote:
> 2014-12-09, 21:44:33 -0500, David Miller wrote:
> > 
> > Adding a new spinlock to every interrupt service routine is
> > simply a non-starter.
> > 
> > You will certainly have to find a way to fix this in a way
> > that doesn't involve adding any new overhead to the normal
> > operational paths of these drivers.
> 
> Okay. Here is another idea.
> 
> Since the issue is with the wait_event() part of synchronize_irq(),
> and it only takes care of threaded handlers, maybe we could try not
> waiting for threaded handlers.
> 
> Introduce disable_irq_nosleep() that returns true if it successfully
> synchronized against all handlers (there was no threaded handler
> running), false if it left some threads running.  And in
> ->ndo_poll_controller, only call the interrupt handler if
> synchronization was successful.
> 
> Both users of the poll controllers retry their action (alloc/xmit an
> skb) several times, with calls to the device's poll controller between
> attempts.  And hopefully, if the first attempt fails, we will still
> manage to get through?

Hopefully is not a good starting point. Is the poll controller
definitely retrying? Otherwise you might end up with the following:

Interrupt line is shared between your network device and a
device which requested a threaded interrupt handler.

  CPU0	       		   	    CPU1
  interrupt()
    your_device_handler()
      return NONE;
    shared_device_handler()
      return WAKE_THREAD;
      --> atomic_inc(threads_active);
				    poll()
				      disable_irq_nosleep()
					sync_hardirq()
					return atomic_read(threads_active);

So if you do not have a reliable retry then you might just go into a
stale state. And this can happen if the interrupt type is edge because
we do not disable the interrupt when we wakeup the thread for obvious
reasons.

Aside of that I think that something like this is a reasonable
approach to the problem.

The only other nitpicks I have are:

    - The name of the function sucks, though my tired braain can't
      come up with something reasonable right now

    - The lack of extensive documentation how this interface is
      supposed to be used and the pitfals of abusage, both in the
      function documentation and the changelog.

      Merlily copying the existing documentation of the other
      interface is not sufficient.

Thanks,

	tglx

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sabrina Dubroca Jan. 5, 2015, 2:31 p.m. UTC | #3
2014-12-12, 23:01:28 +0100, Thomas Gleixner wrote:
> On Thu, 11 Dec 2014, Sabrina Dubroca wrote:
> > 2014-12-09, 21:44:33 -0500, David Miller wrote:
> > > 
> > > Adding a new spinlock to every interrupt service routine is
> > > simply a non-starter.
> > > 
> > > You will certainly have to find a way to fix this in a way
> > > that doesn't involve adding any new overhead to the normal
> > > operational paths of these drivers.
> > 
> > Okay. Here is another idea.
> > 
> > Since the issue is with the wait_event() part of synchronize_irq(),
> > and it only takes care of threaded handlers, maybe we could try not
> > waiting for threaded handlers.
> > 
> > Introduce disable_irq_nosleep() that returns true if it successfully
> > synchronized against all handlers (there was no threaded handler
> > running), false if it left some threads running.  And in
> > ->ndo_poll_controller, only call the interrupt handler if
> > synchronization was successful.
> > 
> > Both users of the poll controllers retry their action (alloc/xmit an
> > skb) several times, with calls to the device's poll controller between
> > attempts.  And hopefully, if the first attempt fails, we will still
> > manage to get through?
> 
> Hopefully is not a good starting point. Is the poll controller
> definitely retrying? Otherwise you might end up with the following:
> 
> Interrupt line is shared between your network device and a
> device which requested a threaded interrupt handler.
> 
>   CPU0	       		   	    CPU1
>   interrupt()
>     your_device_handler()
>       return NONE;
>     shared_device_handler()
>       return WAKE_THREAD;
>       --> atomic_inc(threads_active);
> 				    poll()
> 				      disable_irq_nosleep()
> 					sync_hardirq()
> 					return atomic_read(threads_active);
> 
> So if you do not have a reliable retry then you might just go into a
> stale state. And this can happen if the interrupt type is edge because
> we do not disable the interrupt when we wakeup the thread for obvious
> reasons.

We do have loops retrying to run the netpoll controller, and trying to
do the work even if the controller doesn't help.  And by hopefully I
mean: even if we fail, we tried our best and netpoll isn't 100%
reliable.


static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
{
	...
repeat:

	skb = alloc_skb(len, GFP_ATOMIC);
	if (!skb)
		skb = skb_dequeue(&skb_pool);

	if (!skb) {
		if (++count < 10) {
			netpoll_poll_dev(np->dev);
			goto repeat;
		}
		return NULL;
	}

	...
}

void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
			     struct net_device *dev)
{
	...

		/* try until next clock tick */
		for (tries = jiffies_to_usecs(1)/USEC_PER_POLL;
		     tries > 0; --tries) {
			if (HARD_TX_TRYLOCK(dev, txq)) {
				if (!netif_xmit_stopped(txq))
					status = netpoll_start_xmit(skb, dev, txq);

				HARD_TX_UNLOCK(dev, txq);

				if (status == NETDEV_TX_OK)
					break;

			}

			/* tickle device maybe there is some cleanup */
			netpoll_poll_dev(np->dev);

			udelay(USEC_PER_POLL);
		}

	...
}



> Aside of that I think that something like this is a reasonable
> approach to the problem.
> 
> The only other nitpicks I have are:
> 
>     - The name of the function sucks, though my tired braain can't
>       come up with something reasonable right now

I couldn't think of anything better.  Maybe 'disable_irq_trysync' or
'disable_irq_hardsync'?

Or maybe you prefer something that works like spin_trylock, and
reenables the irq before returning if we can't sync?  Maybe the risk
of abuse would be a bit lower this way?

I made synchronize_irq_nosleep static, but maybe it should be
EXPORT_SYMBOL'ed as well.  I didn't need that for e1000, but that
would be more consistent.


>     - The lack of extensive documentation how this interface is
>       supposed to be used and the pitfals of abusage, both in the
>       function documentation and the changelog.
> 
>       Merlily copying the existing documentation of the other
>       interface is not sufficient.


Yes, my email wasn't really a changelog, just a description and RFC.


Modified documentation:

-----
disable_irq_nosleep - disable an irq and wait for completion of hard IRQ handlers
@irq: Interrupt to disable

Disable the selected interrupt line.  Enables and Disables are
nested.
This function does not sleep, and is safe to call in atomic context.

This function waits for any pending hard IRQ handlers for this
interrupt to complete before returning. If you use this
function while holding a resource the IRQ handler may need you
will deadlock.

This function does not wait for threaded IRQ handlers.
Returns true if synchronized, false if there are threaded
handlers pending.

If false is returned, the caller must assume that synchronization
didn't occur, and that it is NOT safe to proceed.
The caller MUST reenable the interrupt by calling enable_irq in all
cases.

This function may be called - with care - from IRQ context.
-----


Thanks.

--
Sabrina
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sabrina Dubroca Feb. 5, 2015, 12:20 a.m. UTC | #4
Thomas, ping?

thread is over there if you need it:
https://marc.info/?l=linux-netdev&m=141833435000554&w=2

2015-01-05, 15:31:26 +0100, Sabrina Dubroca wrote:
> 2014-12-12, 23:01:28 +0100, Thomas Gleixner wrote:
> > On Thu, 11 Dec 2014, Sabrina Dubroca wrote:
> > > 2014-12-09, 21:44:33 -0500, David Miller wrote:
> > > > 
> > > > Adding a new spinlock to every interrupt service routine is
> > > > simply a non-starter.
> > > > 
> > > > You will certainly have to find a way to fix this in a way
> > > > that doesn't involve adding any new overhead to the normal
> > > > operational paths of these drivers.
> > > 
> > > Okay. Here is another idea.
> > > 
> > > Since the issue is with the wait_event() part of synchronize_irq(),
> > > and it only takes care of threaded handlers, maybe we could try not
> > > waiting for threaded handlers.
> > > 
> > > Introduce disable_irq_nosleep() that returns true if it successfully
> > > synchronized against all handlers (there was no threaded handler
> > > running), false if it left some threads running.  And in
> > > ->ndo_poll_controller, only call the interrupt handler if
> > > synchronization was successful.
> > > 
> > > Both users of the poll controllers retry their action (alloc/xmit an
> > > skb) several times, with calls to the device's poll controller between
> > > attempts.  And hopefully, if the first attempt fails, we will still
> > > manage to get through?
> > 
> > Hopefully is not a good starting point. Is the poll controller
> > definitely retrying? Otherwise you might end up with the following:
> > 
> > Interrupt line is shared between your network device and a
> > device which requested a threaded interrupt handler.
> > 
> >   CPU0	       		   	    CPU1
> >   interrupt()
> >     your_device_handler()
> >       return NONE;
> >     shared_device_handler()
> >       return WAKE_THREAD;
> >       --> atomic_inc(threads_active);
> > 				    poll()
> > 				      disable_irq_nosleep()
> > 					sync_hardirq()
> > 					return atomic_read(threads_active);
> > 
> > So if you do not have a reliable retry then you might just go into a
> > stale state. And this can happen if the interrupt type is edge because
> > we do not disable the interrupt when we wakeup the thread for obvious
> > reasons.
> 
> We do have loops retrying to run the netpoll controller, and trying to
> do the work even if the controller doesn't help.  And by hopefully I
> mean: even if we fail, we tried our best and netpoll isn't 100%
> reliable.
> 
> 
> static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve)
> {
> 	...
> repeat:
> 
> 	skb = alloc_skb(len, GFP_ATOMIC);
> 	if (!skb)
> 		skb = skb_dequeue(&skb_pool);
> 
> 	if (!skb) {
> 		if (++count < 10) {
> 			netpoll_poll_dev(np->dev);
> 			goto repeat;
> 		}
> 		return NULL;
> 	}
> 
> 	...
> }
> 
> void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
> 			     struct net_device *dev)
> {
> 	...
> 
> 		/* try until next clock tick */
> 		for (tries = jiffies_to_usecs(1)/USEC_PER_POLL;
> 		     tries > 0; --tries) {
> 			if (HARD_TX_TRYLOCK(dev, txq)) {
> 				if (!netif_xmit_stopped(txq))
> 					status = netpoll_start_xmit(skb, dev, txq);
> 
> 				HARD_TX_UNLOCK(dev, txq);
> 
> 				if (status == NETDEV_TX_OK)
> 					break;
> 
> 			}
> 
> 			/* tickle device maybe there is some cleanup */
> 			netpoll_poll_dev(np->dev);
> 
> 			udelay(USEC_PER_POLL);
> 		}
> 
> 	...
> }
> 
> 
> 
> > Aside of that I think that something like this is a reasonable
> > approach to the problem.
> > 
> > The only other nitpicks I have are:
> > 
> >     - The name of the function sucks, though my tired braain can't
> >       come up with something reasonable right now
> 
> I couldn't think of anything better.  Maybe 'disable_irq_trysync' or
> 'disable_irq_hardsync'?
> 
> Or maybe you prefer something that works like spin_trylock, and
> reenables the irq before returning if we can't sync?  Maybe the risk
> of abuse would be a bit lower this way?
> 
> I made synchronize_irq_nosleep static, but maybe it should be
> EXPORT_SYMBOL'ed as well.  I didn't need that for e1000, but that
> would be more consistent.
> 
> 
> >     - The lack of extensive documentation how this interface is
> >       supposed to be used and the pitfals of abusage, both in the
> >       function documentation and the changelog.
> > 
> >       Merlily copying the existing documentation of the other
> >       interface is not sufficient.
> 
> 
> Yes, my email wasn't really a changelog, just a description and RFC.
> 
> 
> Modified documentation:
> 
> -----
> disable_irq_nosleep - disable an irq and wait for completion of hard IRQ handlers
> @irq: Interrupt to disable
> 
> Disable the selected interrupt line.  Enables and Disables are
> nested.
> This function does not sleep, and is safe to call in atomic context.
> 
> This function waits for any pending hard IRQ handlers for this
> interrupt to complete before returning. If you use this
> function while holding a resource the IRQ handler may need you
> will deadlock.
> 
> This function does not wait for threaded IRQ handlers.
> Returns true if synchronized, false if there are threaded
> handlers pending.
> 
> If false is returned, the caller must assume that synchronization
> didn't occur, and that it is NOT safe to proceed.
> The caller MUST reenable the interrupt by calling enable_irq in all
> cases.
> 
> This function may be called - with care - from IRQ context.
> -----
> 
> 
> Thanks.
> 
> --
> Sabrina
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 83140cbb5f01..d967937aca3c 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -5216,8 +5216,8 @@  static void e1000_netpoll(struct net_device *netdev)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 
-	disable_irq(adapter->pdev->irq);
-	e1000_intr(adapter->pdev->irq, netdev);
+	if (disable_irq_nosleep(adapter->pdev->irq))
+		e1000_intr(adapter->pdev->irq, netdev);
 	enable_irq(adapter->pdev->irq);
 }
 #endif
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 69517a24bc50..f2e4125ac963 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -184,6 +184,7 @@  extern void devm_free_irq(struct device *dev, unsigned int irq, void *dev_id);
 #endif
 
 extern void disable_irq_nosync(unsigned int irq);
+extern bool disable_irq_nosleep(unsigned int irq);
 extern void disable_irq(unsigned int irq);
 extern void disable_percpu_irq(unsigned int irq);
 extern void enable_irq(unsigned int irq);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0a9104b4608b..58199b023845 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -106,6 +106,23 @@  void synchronize_irq(unsigned int irq)
 }
 EXPORT_SYMBOL(synchronize_irq);
 
+static bool synchronize_irq_nosleep(unsigned int irq)
+{
+	struct irq_desc *desc = irq_to_desc(irq);
+
+	if (desc) {
+		__synchronize_hardirq(desc);
+		/*
+		 * We made sure that no hardirq handler is
+		 * running. Now check if some threaded handlers are
+		 * active.
+		 */
+		return !atomic_read(&desc->threads_active);
+	}
+
+	return true;
+}
+
 #ifdef CONFIG_SMP
 cpumask_var_t irq_default_affinity;
 
@@ -418,6 +435,31 @@  void disable_irq_nosync(unsigned int irq)
 EXPORT_SYMBOL(disable_irq_nosync);
 
 /**
+ *	disable_irq_nosleep - disable an irq and wait for completion of hard IRQ handlers
+ *	@irq: Interrupt to disable
+ *
+ *	Disable the selected interrupt line.  Enables and Disables are
+ *	nested.
+ *	This function waits for any pending hard IRQ handlers for this
+ *	interrupt to complete before returning. If you use this
+ *	function while holding a resource the IRQ handler may need you
+ *	will deadlock.
+ *	This function does not wait for threaded IRQ handlers.
+ *
+ *      Returns true if synchronized, false if there are threaded
+ *      handlers pending.
+ *
+ *	This function may be called - with care - from IRQ context.
+ */
+bool disable_irq_nosleep(unsigned int irq)
+{
+	if (!__disable_irq_nosync(irq))
+		return synchronize_irq_nosleep(irq);
+	return true;
+}
+EXPORT_SYMBOL(disable_irq_nosleep);
+
+/**
  *	disable_irq - disable an irq and wait for completion
  *	@irq: Interrupt to disable
  *