diff mbox

netpoll: Drop budget parameter from NAPI polling call hierarchy

Message ID 20150922215049.3088.32475.stgit@ahduyck-vm-fedora22
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Alexander Duyck Sept. 22, 2015, 9:56 p.m. UTC
For some reason we were carrying the budget value around between the
various calls to napi->poll.  If for example one of the drivers called had
a bug in which it returned a non-zero value for work this could result in
the budget value becoming negative.

Rather than carry around a value of budget that is 0 or less we can instead
just loop through and pass 0 to each napi->poll call.  If any driver
returns a value for work done that is non-zero then we can report that
driver and continue rather than allowing a bad actor to make the budget
value negative and pass that negative value to napi->poll.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---

This patch is meant to be applied after Neil's patch:
	[PATCH v2] netpoll: Close race condition between poll_one_napi and napi_disable

 net/core/netpoll.c |   25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Sept. 27, 2015, 5:36 a.m. UTC | #1
From: Alexander Duyck <aduyck@mirantis.com>
Date: Tue, 22 Sep 2015 14:56:08 -0700

> Rather than carry around a value of budget that is 0 or less we can instead
> just loop through and pass 0 to each napi->poll call.  If any driver
> returns a value for work done that is non-zero then we can report that
> driver and continue rather than allowing a bad actor to make the budget
> value negative and pass that negative value to napi->poll.

Unfortunately we have drivers that won't do any TX work if the budget
is zero.

Using the budget for TX work is unfortunate and not the recommended
way for drivers to do things, but it's not explicitly disallowed
either.

So I'm not applying this because it definitely has the potential
to break something.

Sorry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander H Duyck Sept. 27, 2015, 10:58 p.m. UTC | #2
On 09/26/2015 10:36 PM, David Miller wrote:
> From: Alexander Duyck <aduyck@mirantis.com>
> Date: Tue, 22 Sep 2015 14:56:08 -0700
>
>> Rather than carry around a value of budget that is 0 or less we can instead
>> just loop through and pass 0 to each napi->poll call.  If any driver
>> returns a value for work done that is non-zero then we can report that
>> driver and continue rather than allowing a bad actor to make the budget
>> value negative and pass that negative value to napi->poll.
> Unfortunately we have drivers that won't do any TX work if the budget
> is zero.

Well that is what we are doing right now.  The fact is the call starts 
out with a budget of 0, and it is somewhat hidden from the call since 
the budget is assigned a value of 0 in netpoll_poll_dev. That is one of 
the things I was wanting do address because that is clear as mud from 
looking at poll_one_napi.  Based on the code you would assume budget 
starts out as a non-zero value and it doesn't.

> Using the budget for TX work is unfortunate and not the recommended
> way for drivers to do things, but it's not explicitly disallowed
> either.
>
> So I'm not applying this because it definitely has the potential
> to break something.
>
> Sorry.

I don't see how this introduces a regression when all I am doing is 
avoiding tracking a value that should be 0 assuming everything is 
working correctly.  If work returns a non-zero value with the code as it 
currently is then the WARN_ONCE is triggered, and the value of budget is 
becoming negative.  I would consider a negative budget value worse than 
a 0 budget value.

I'll go back through the patch and rebase it since it looks like Neil 
had to submit a v3 of his patch and it may have impacted mine. However 
perhaps we need to revisit this code if you think it is risky as the 
only thing my changes did is remove the ability for the budget value to 
go from 0 to negative and then passing that negative value into the 
function.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 29, 2015, 8:48 p.m. UTC | #3
From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Sun, 27 Sep 2015 15:58:56 -0700

> On 09/26/2015 10:36 PM, David Miller wrote:
>> From: Alexander Duyck <aduyck@mirantis.com>
>> Date: Tue, 22 Sep 2015 14:56:08 -0700
>>
>>> Rather than carry around a value of budget that is 0 or less we can
>>> instead
>>> just loop through and pass 0 to each napi->poll call.  If any driver
>>> returns a value for work done that is non-zero then we can report that
>>> driver and continue rather than allowing a bad actor to make the
>>> budget
>>> value negative and pass that negative value to napi->poll.
>> Unfortunately we have drivers that won't do any TX work if the budget
>> is zero.
> 
> Well that is what we are doing right now.  The fact is the call starts
> out with a budget of 0, and it is somewhat hidden from the call since
> the budget is assigned a value of 0 in netpoll_poll_dev. That is one
> of the things I was wanting do address because that is clear as mud
> from looking at poll_one_napi.  Based on the code you would assume
> budget starts out as a non-zero value and it doesn't.

I see, thanks for explaining.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9312b665ff73..df9b2fd5fee8 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -140,16 +140,16 @@  static void queue_process(struct work_struct *work)
  * case. Further, we test the poll_owner to avoid recursion on UP
  * systems where the lock doesn't exist.
  */
-static int poll_one_napi(struct napi_struct *napi, int budget)
+static void poll_one_napi(struct napi_struct *napi)
 {
-	int work = 0;
+	int work;
 
 	/* net_rx_action's ->poll() invocations and our's are
 	 * synchronized by this test which is only made while
 	 * holding the napi->poll_lock.
 	 */
 	if (!test_bit(NAPI_STATE_SCHED, &napi->state))
-		return budget;
+		return;
 
 	/*
 	 * If we set this bit but see that it has already been set,
@@ -158,26 +158,26 @@  static int poll_one_napi(struct napi_struct *napi, int budget)
  	 */
 
 	if(test_and_set_bit(NAPI_STATE_NPSVC, &napi->state))
-		goto out;
+		return;
 
-	work = napi->poll(napi, budget);
-	WARN_ONCE(work > budget, "%pF exceeded budget in poll\n", napi->poll);
+	/* We explicilty pass the polling call a budget of 0 to
+	 * indicate that we are clearing the Tx path only.
+	 */
+	work = napi->poll(napi, 0);
+	WARN_ONCE(work, "%pF exceeded budget in poll\n", napi->poll);
 	trace_napi_poll(napi);
 
 	clear_bit(NAPI_STATE_NPSVC, &napi->state);
-
-out:
-	return budget - work;
 }
 
-static void poll_napi(struct net_device *dev, int budget)
+static void poll_napi(struct net_device *dev)
 {
 	struct napi_struct *napi;
 
 	list_for_each_entry(napi, &dev->napi_list, dev_list) {
 		if (napi->poll_owner != smp_processor_id() &&
 		    spin_trylock(&napi->poll_lock)) {
-			budget = poll_one_napi(napi, budget);
+			poll_one_napi(napi);
 			spin_unlock(&napi->poll_lock);
 		}
 	}
@@ -187,7 +187,6 @@  static void netpoll_poll_dev(struct net_device *dev)
 {
 	const struct net_device_ops *ops;
 	struct netpoll_info *ni = rcu_dereference_bh(dev->npinfo);
-	int budget = 0;
 
 	/* Don't do any rx activity if the dev_lock mutex is held
 	 * the dev_open/close paths use this to block netpoll activity
@@ -210,7 +209,7 @@  static void netpoll_poll_dev(struct net_device *dev)
 	/* Process pending work on NIC */
 	ops->ndo_poll_controller(dev);
 
-	poll_napi(dev, budget);
+	poll_napi(dev);
 
 	up(&ni->dev_lock);