diff mbox

[net-next,03/10] ixgbe: Drop the TX work limit and instead just leave it to budget

Message ID 1313911761-11709-4-git-send-email-jeffrey.t.kirsher@intel.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Kirsher, Jeffrey T Aug. 21, 2011, 7:29 a.m. UTC
From: Alexander Duyck <alexander.h.duyck@intel.com>

This change makes it so that the TX work limit is now obsolete.  Instead of
using it we can instead rely on the NAPI budget for the number of packets
we should clean per interrupt.  The advantage to this approach is that it
results in a much more balanced work flow since the same number of RX and
TX packets should be cleaned per interrupts.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    4 ----
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    7 -------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   16 +++++++---------
 3 files changed, 7 insertions(+), 20 deletions(-)

Comments

Ben Hutchings Aug. 21, 2011, 2:01 p.m. UTC | #1
On Sun, 2011-08-21 at 00:29 -0700, Jeff Kirsher wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This change makes it so that the TX work limit is now obsolete.  Instead of
> using it we can instead rely on the NAPI budget for the number of packets
> we should clean per interrupt.  The advantage to this approach is that it
> results in a much more balanced work flow since the same number of RX and
> TX packets should be cleaned per interrupts.
[...]

This seems kind of sensible, but it's not how Dave has been recommending
people to account for TX work in NAPI.

Ben.
Duyck, Alexander H Aug. 22, 2011, 4:30 p.m. UTC | #2
On 08/21/2011 07:01 AM, Ben Hutchings wrote:
> On Sun, 2011-08-21 at 00:29 -0700, Jeff Kirsher wrote:
>> From: Alexander Duyck<alexander.h.duyck@intel.com>
>>
>> This change makes it so that the TX work limit is now obsolete.  Instead of
>> using it we can instead rely on the NAPI budget for the number of packets
>> we should clean per interrupt.  The advantage to this approach is that it
>> results in a much more balanced work flow since the same number of RX and
>> TX packets should be cleaned per interrupts.
> [...]
>
> This seems kind of sensible, but it's not how Dave has been recommending
> people to account for TX work in NAPI.
>
> Ben.
>
I wasn't aware there was a recommended approach.  Could you tell me more 
about it?

As I stated in the patch description this approach works very well for 
me, especially in routing workloads since it typically keeps the TX 
clean-up in polling as long as the RX is in polling.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Aug. 22, 2011, 4:46 p.m. UTC | #3
On Mon, 2011-08-22 at 09:30 -0700, Alexander Duyck wrote:
> On 08/21/2011 07:01 AM, Ben Hutchings wrote:
> > On Sun, 2011-08-21 at 00:29 -0700, Jeff Kirsher wrote:
> >> From: Alexander Duyck<alexander.h.duyck@intel.com>
> >>
> >> This change makes it so that the TX work limit is now obsolete.  Instead of
> >> using it we can instead rely on the NAPI budget for the number of packets
> >> we should clean per interrupt.  The advantage to this approach is that it
> >> results in a much more balanced work flow since the same number of RX and
> >> TX packets should be cleaned per interrupts.
> > [...]
> >
> > This seems kind of sensible, but it's not how Dave has been recommending
> > people to account for TX work in NAPI.
> >
> > Ben.
> >
> I wasn't aware there was a recommended approach.  Could you tell me more 
> about it?

If a whole TX ring is cleaned then consider the budget spent; otherwise
don't count it.

Ben.

> As I stated in the patch description this approach works very well for 
> me, especially in routing workloads since it typically keeps the TX 
> clean-up in polling as long as the RX is in polling.
> 
> Thanks,
> 
> Alex
Duyck, Alexander H Aug. 22, 2011, 5:29 p.m. UTC | #4
On 08/22/2011 09:46 AM, Ben Hutchings wrote:
> On Mon, 2011-08-22 at 09:30 -0700, Alexander Duyck wrote:
>> On 08/21/2011 07:01 AM, Ben Hutchings wrote:
>>> On Sun, 2011-08-21 at 00:29 -0700, Jeff Kirsher wrote:
>>>> From: Alexander Duyck<alexander.h.duyck@intel.com>
>>>>
>>>> This change makes it so that the TX work limit is now obsolete.  Instead of
>>>> using it we can instead rely on the NAPI budget for the number of packets
>>>> we should clean per interrupt.  The advantage to this approach is that it
>>>> results in a much more balanced work flow since the same number of RX and
>>>> TX packets should be cleaned per interrupts.
>>> [...]
>>>
>>> This seems kind of sensible, but it's not how Dave has been recommending
>>> people to account for TX work in NAPI.
>>>
>>> Ben.
>>>
>> I wasn't aware there was a recommended approach.  Could you tell me more
>> about it?
> If a whole TX ring is cleaned then consider the budget spent; otherwise
> don't count it.
>
> Ben.

The only problem I was seeing with that was that in certain cases it 
seemed like the TX cleanup could consume enough CPU time to cause pretty 
significant delays in processing the RX cleanup.  This in turn was 
causing single queue bi-directional routing tests to come out pretty 
unbalanced since what seemed to happen is that one CPUs RX work would 
overwhelm the other CPU with the TX processing resulting in an 
unbalanced flow that was something like a 60/40 split between the 
upstream and downstream throughput.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 22, 2011, 8:56 p.m. UTC | #5
From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Mon, 22 Aug 2011 10:29:51 -0700

> The only problem I was seeing with that was that in certain cases it
> seemed like the TX cleanup could consume enough CPU time to cause
> pretty significant delays in processing the RX cleanup.  This in turn
> was causing single queue bi-directional routing tests to come out
> pretty unbalanced since what seemed to happen is that one CPUs RX work
> would overwhelm the other CPU with the TX processing resulting in an
> unbalanced flow that was something like a 60/40 split between the
> upstream and downstream throughput.

But the problem is that now you're applying the budget to two operations
that have much differing costs.  Freeing up a TX ring packet is probably
on the order of 1/10th the cost of processing an incoming RX ring frame.

I've advocated to not apply the budget at all to TX ring processing.

I can see your delimma with respect to RX ring processing being delayed,
but if that's really happening you can consider whether the TX ring is
simply too large.

In any event can you try something like dampening the cost applied to
budget for TX work (1/2, 1/4, etc.)?  Because as far as I can tell, if
you are really hitting the budget limit on TX then you won't be doing
any RX work on that device until a future NAPI round that depletes the
TX ring work without going over the budget.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duyck, Alexander H Aug. 22, 2011, 10:57 p.m. UTC | #6
On 08/22/2011 01:56 PM, David Miller wrote:
> From: Alexander Duyck<alexander.h.duyck@intel.com>
> Date: Mon, 22 Aug 2011 10:29:51 -0700
>
>> The only problem I was seeing with that was that in certain cases it
>> seemed like the TX cleanup could consume enough CPU time to cause
>> pretty significant delays in processing the RX cleanup.  This in turn
>> was causing single queue bi-directional routing tests to come out
>> pretty unbalanced since what seemed to happen is that one CPUs RX work
>> would overwhelm the other CPU with the TX processing resulting in an
>> unbalanced flow that was something like a 60/40 split between the
>> upstream and downstream throughput.
> But the problem is that now you're applying the budget to two operations
> that have much differing costs.  Freeing up a TX ring packet is probably
> on the order of 1/10th the cost of processing an incoming RX ring frame.
>
> I've advocated to not apply the budget at all to TX ring processing.
I fully understand that the TX path is much cheaper than the RX path.  
One step I have taken in all of this code is that the TX path only 
counts SKBs cleaned, it doesn't count descriptors.  So a single 
descriptor 60byte transmit will cost the same as a 64K 18 descriptor 
TSO.  All I am really counting is the number of times I have called 
dev_kfree_skb_any();
> I can see your delimma with respect to RX ring processing being delayed,
> but if that's really happening you can consider whether the TX ring is
> simply too large.
The problem was occurring even without large rings.  I was seeing issues 
with rings just 256 descriptors in size.  The problem seemed to be that 
the TX cleanup being a multiple of budget was allowing one CPU to 
overwhelm the other and the fact that the TX was essentially unbounded 
was just allowing the issue to feedback on itself.

In the routing test case I was actually seeing significant advantages to 
this approach as we were essentially cleaning just the right number of 
buffers to make room for the next set of transmits when the RX cleanup 
came though.  In addition since the RX and TX workload was balanced it 
kept both locked into polling while the CPU was saturated instead of 
allowing the TX to become interrupt driven.  In addition since the TX 
was working on the same budget as the RX the number of SKBs freed up in 
the TX path would match the number consumed when being reallocated on 
the RX path.

> In any event can you try something like dampening the cost applied to
> budget for TX work (1/2, 1/4, etc.)?  Because as far as I can tell, if
> you are really hitting the budget limit on TX then you won't be doing
> any RX work on that device until a future NAPI round that depletes the
> TX ring work without going over the budget.
The problem seemed to be present as long as I allowed the TX budget to 
be a multiple of the RX budget.  The easiest way to keep things balanced 
and avoid allowing the TX from one CPU to overwhelm the RX on another 
was just to keep the budgets equal.

I'm a bit confused by this last comment.  The full budget is used for TX 
and RX, it isn't divided.  I do a budget worth of TX cleanup and a 
budget worth of RX cleanup within the ixgbe_poll routine, and if either 
of them consume their full budget then I return the budget value as the 
work done.

If you are referring to the case where two devices are sharing the CPU 
then I would suspect this might lead to faster consumption of the 
netdev_budget, but other than that I don't see any starvation issues for 
RX or TX.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 22, 2011, 11:40 p.m. UTC | #7
From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Mon, 22 Aug 2011 15:57:51 -0700

> The problem was occurring even without large rings.
> I was seeing issues with rings just 256 descriptors in size.

And the default in the ixgbe driver is 512 entries which I think
itself is quite excessive.  Something like 128 is more in line with
what I'd call a sane default.

So the only side effect of your change is to decrease the TX quota to
64 (the default NAPI quota) from it's current value of 512
(IXGBE_DEFAULT_TXD).

Talking about the existing code, I can't even see how the current
driver private TX quota can trigger except in the most extreme cases.
This is because the quota is set the same as the size you're setting
the TX ring to.

> The problem seemed to be that the TX cleanup being a multiple of
> budget was allowing one CPU to overwhelm the other and the fact that
> the TX was essentially unbounded was just allowing the issue to
> feedback on itself.

I still don't understand what issue you could even be running into.

On each CPU we round robin against all NAPI requestors for that CPU.

In your routing test setup, we should have one cpu doing the RX and
another different cpu doing TX.

Therefore if the TX cpu simply spins in a loop doing nothing but TX
reclaim work it should not really matter.

And if you hit the TX budget on the TX cpu, it's just going to come
right back into the ixgbe NAPI handler and thus the TX reclaim
processing not even a dozen cycles later.

The only effect is to have us go through the whole function call
sequence and data structure setup into local variables more than you
would be doing so before.

> In addition since the RX and TX workload was balanced it kept both
> locked into polling while the CPU was saturated instead of allowing
> the TX to become interrupt driven.  In addition since the TX was
> working on the same budget as the RX the number of SKBs freed up in
> the TX path would match the number consumed when being reallocated
> on the RX path.

So the only conclusion I can come to is that what happens is we're now
executing what are essentially wasted cpu cycles and this takes us
over the threshold such that we poll more and take interrupts less.
And this improves performance.

That's pretty unwise if you ask me, we should do something useful with
cpu cycles instead of wasting them merely to make us poll more.

> The problem seemed to be present as long as I allowed the TX budget to
> be a multiple of the RX budget.  The easiest way to keep things
> balanced and avoid allowing the TX from one CPU to overwhelm the RX on
> another was just to keep the budgets equal.

You're executing 10 or 20 cpu cycles after every 64 TX reclaims,
that's the only effect of these changes.  That's not even long enough
for a cache line to transfer between two cpus.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander H Duyck Aug. 23, 2011, 4:04 a.m. UTC | #8
On Mon, Aug 22, 2011 at 4:40 PM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> Date: Mon, 22 Aug 2011 15:57:51 -0700
>
>> The problem was occurring even without large rings.
>> I was seeing issues with rings just 256 descriptors in size.
>
> And the default in the ixgbe driver is 512 entries which I think
> itself is quite excessive.  Something like 128 is more in line with
> what I'd call a sane default.

Are you suggesting I change the the ring size, the TX quota, or both?

> So the only side effect of your change is to decrease the TX quota to
> 64 (the default NAPI quota) from it's current value of 512
> (IXGBE_DEFAULT_TXD).

Yeah, that pretty much sums it up.  However as I said in the earlier
email I am counting SKBs freed instead of descriptors.  As such we
will probably end up cleaning more like 128 descriptors per TX clean
just due to our context descriptors that are also occupying space in
the ring.

> Talking about the existing code, I can't even see how the current
> driver private TX quota can trigger except in the most extreme cases.
> This is because the quota is set the same as the size you're setting
> the TX ring to.

I'm not sure if it ever met the quota either.  I suspect not since
under the routing workloads I would see the RX interrupts get disabled
but the TX interrupts keep going.

>> The problem seemed to be that the TX cleanup being a multiple of
>> budget was allowing one CPU to overwhelm the other and the fact that
>> the TX was essentially unbounded was just allowing the issue to
>> feedback on itself.
>
> I still don't understand what issue you could even be running into.
>
> On each CPU we round robin against all NAPI requestors for that CPU.
>
> In your routing test setup, we should have one cpu doing the RX and
> another different cpu doing TX.
>
> Therefore if the TX cpu simply spins in a loop doing nothing but TX
> reclaim work it should not really matter.

Doing a unidirectional test was fine and everything worked as you
describe.  It was doing a bidirectional test with two ports and a
single queue for each port that was the issue.  Specifically what
would happen is that one direction would tend to dominate over the
other so I would end up with a 60/40 split with either upstream or
downstream dominating.  By using the budget as the quota I found the
results were generally much closer to 50/50 and the overall result of
the two flows combined was higher.  I suspect it had to do with TX
work getting backlogged on the ring and acting as a feedback mechanism
to prevent the RX work on that CPU from getting squashed.

> And if you hit the TX budget on the TX cpu, it's just going to come
> right back into the ixgbe NAPI handler and thus the TX reclaim
> processing not even a dozen cycles later.
>
> The only effect is to have us go through the whole function call
> sequence and data structure setup into local variables more than you
> would be doing so before.

I suppose that is possible.  It all depends on the type of packets
being sent.  For single sends without any offloads we would only be
cleaning 64 descriptors per call to ixgbe_poll,  with an offloaded
checksum or VLAN it would be 128 descriptors per call, and with TSO we
probably still wouldn't consume the budget since the sk_buff
consumption rate would be too low.

I can try testing the throughput with pktgen tomorrow to see if it
improves by increasing the TX budget.  I suppose there could be a few
factors affecting this since the budget value also determines the
number of buffers we clean before we call netif_wake_queue to
re-enable the transmit path.

>> In addition since the RX and TX workload was balanced it kept both
>> locked into polling while the CPU was saturated instead of allowing
>> the TX to become interrupt driven.  In addition since the TX was
>> working on the same budget as the RX the number of SKBs freed up in
>> the TX path would match the number consumed when being reallocated
>> on the RX path.
>
> So the only conclusion I can come to is that what happens is we're now
> executing what are essentially wasted cpu cycles and this takes us
> over the threshold such that we poll more and take interrupts less.
> And this improves performance.
>
> That's pretty unwise if you ask me, we should do something useful with
> cpu cycles instead of wasting them merely to make us poll more.

The thing is we are probably going to be wasting those cycles anyway.
In the case of bidirectional routing I was always locked into RX
polling with this change in place or not.  The only difference is that
the TX will likely clean itself completely with each poll versus the
possibility of leaving a few buffers behind when it hits the 64 quota.

>> The problem seemed to be present as long as I allowed the TX budget to
>> be a multiple of the RX budget.  The easiest way to keep things
>> balanced and avoid allowing the TX from one CPU to overwhelm the RX on
>> another was just to keep the budgets equal.
>
> You're executing 10 or 20 cpu cycles after every 64 TX reclaims,
> that's the only effect of these changes.  That's not even long enough
> for a cache line to transfer between two cpus.

It sounds like I may not have been seeing this due to the type of
workload I was focusing on.  I'll try generating some data with pktgen
and netperf tomorrow to see how this holds up under small packet
transmit only traffic since those are the cases most likely to get
into the state you mention.

Also I would appreciate it if you had any suggestions on other
workloads I might need to focus on in order to determine the impact of
this change.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duyck, Alexander H Aug. 23, 2011, 8:52 p.m. UTC | #9
On 08/22/2011 09:04 PM, Alexander Duyck wrote:
> On Mon, Aug 22, 2011 at 4:40 PM, David Miller<davem@davemloft.net>  wrote:
>> From: Alexander Duyck<alexander.h.duyck@intel.com>
>> Date: Mon, 22 Aug 2011 15:57:51 -0700
>>> The problem seemed to be present as long as I allowed the TX budget to
>>> be a multiple of the RX budget.  The easiest way to keep things
>>> balanced and avoid allowing the TX from one CPU to overwhelm the RX on
>>> another was just to keep the budgets equal.
>> You're executing 10 or 20 cpu cycles after every 64 TX reclaims,
>> that's the only effect of these changes.  That's not even long enough
>> for a cache line to transfer between two cpus.
> It sounds like I may not have been seeing this due to the type of
> workload I was focusing on.  I'll try generating some data with pktgen
> and netperf tomorrow to see how this holds up under small packet
> transmit only traffic since those are the cases most likely to get
> into the state you mention.
>
> Also I would appreciate it if you had any suggestions on other
> workloads I might need to focus on in order to determine the impact of
> this change.
>
> Thanks,
>
> Alex

I found a reason to rewrite this.  Basically this modification has a 
negative impact in the case of multiple ports on a single CPU all 
routing to the same port on the same CPU.  It ends up making it so that 
the transmit throughput is only (total CPU packets per second)/(number 
of ports receiving on cpu).  So on a system that can receive at 1.4Mpps 
on a single core we end up seeing only a little over 350Kpps of transmit 
when 4 ports are all receiving packets on the system.

I'll look at rewriting this.  I'll probably leave the work limit 
controlling things but lower it to a more reasonable value such as 1/2 
to 1/4 of the ring size.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index dc3b12e..b85312f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -281,7 +281,6 @@  struct ixgbe_ring_container {
 	struct ixgbe_ring *ring;	/* pointer to linked list of rings */
 	unsigned int total_bytes;	/* total bytes processed this int */
 	unsigned int total_packets;	/* total packets processed this int */
-	u16 work_limit;			/* total work allowed per interrupt */
 	u8 count;			/* total number of rings in vector */
 	u8 itr;				/* current ITR setting for ring */
 };
@@ -416,9 +415,6 @@  struct ixgbe_adapter {
 	u16 eitr_low;
 	u16 eitr_high;
 
-	/* Work limits */
-	u16 tx_work_limit;
-
 	/* TX */
 	struct ixgbe_ring *tx_ring[MAX_TX_QUEUES] ____cacheline_aligned_in_smp;
 	int num_tx_queues;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 82d4244..16835cc 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2005,8 +2005,6 @@  static int ixgbe_get_coalesce(struct net_device *netdev,
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
-	ec->tx_max_coalesced_frames_irq = adapter->tx_work_limit;
-
 	/* only valid if in constant ITR mode */
 	switch (adapter->rx_itr_setting) {
 	case 0:
@@ -2093,9 +2091,6 @@  static int ixgbe_set_coalesce(struct net_device *netdev,
 	   && ec->tx_coalesce_usecs)
 		return -EINVAL;
 
-	if (ec->tx_max_coalesced_frames_irq)
-		adapter->tx_work_limit = ec->tx_max_coalesced_frames_irq;
-
 	if (ec->rx_coalesce_usecs > 1) {
 		/* check the limits */
 		if ((1000000/ec->rx_coalesce_usecs > IXGBE_MAX_INT_RATE) ||
@@ -2169,14 +2164,12 @@  static int ixgbe_set_coalesce(struct net_device *netdev,
 			else
 				/* rx only or mixed */
 				q_vector->eitr = adapter->rx_eitr_param;
-			q_vector->tx.work_limit = adapter->tx_work_limit;
 			ixgbe_write_eitr(q_vector);
 		}
 	/* Legacy Interrupt Mode */
 	} else {
 		q_vector = adapter->q_vector[0];
 		q_vector->eitr = adapter->rx_eitr_param;
-		q_vector->tx.work_limit = adapter->tx_work_limit;
 		ixgbe_write_eitr(q_vector);
 	}
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index bb54d3d..2450279 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -796,15 +796,16 @@  static void ixgbe_tx_timeout_reset(struct ixgbe_adapter *adapter)
  * ixgbe_clean_tx_irq - Reclaim resources after transmit completes
  * @q_vector: structure containing interrupt and ring information
  * @tx_ring: tx ring to clean
+ * @budget: amount of work driver is allowed to do this pass, in packets
  **/
 static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
-			       struct ixgbe_ring *tx_ring)
+			       struct ixgbe_ring *tx_ring,
+			       int budget)
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	struct ixgbe_tx_buffer *tx_buffer;
 	union ixgbe_adv_tx_desc *tx_desc;
 	unsigned int total_bytes = 0, total_packets = 0;
-	u16 budget = q_vector->tx.work_limit;
 	u16 i = tx_ring->next_to_clean;
 
 	tx_buffer = &tx_ring->tx_buffer_info[i];
@@ -2082,7 +2083,7 @@  static int ixgbe_clean_rxtx_many(struct napi_struct *napi, int budget)
 #endif
 
 	for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
-		clean_complete &= ixgbe_clean_tx_irq(q_vector, ring);
+		clean_complete &= ixgbe_clean_tx_irq(q_vector, ring, budget);
 
 	/* attempt to distribute budget to each queue fairly, but don't allow
 	 * the budget to go below 1 because we'll exit polling */
@@ -2128,7 +2129,7 @@  static int ixgbe_clean_txonly(struct napi_struct *napi, int budget)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	if (!ixgbe_clean_tx_irq(q_vector, q_vector->tx.ring))
+	if (!ixgbe_clean_tx_irq(q_vector, q_vector->tx.ring, budget))
 		return budget;
 
 	/* If all Tx work done, exit the polling mode */
@@ -2163,7 +2164,6 @@  static inline void map_vector_to_txq(struct ixgbe_adapter *a, int v_idx,
 	tx_ring->next = q_vector->tx.ring;
 	q_vector->tx.ring = tx_ring;
 	q_vector->tx.count++;
-	q_vector->tx.work_limit = a->tx_work_limit;
 }
 
 /**
@@ -4155,7 +4155,8 @@  static int ixgbe_poll(struct napi_struct *napi, int budget)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	tx_clean_complete = ixgbe_clean_tx_irq(q_vector, adapter->tx_ring[0]);
+	tx_clean_complete = ixgbe_clean_tx_irq(q_vector, adapter->tx_ring[0],
+					       budget);
 	ixgbe_clean_rx_irq(q_vector, adapter->rx_ring[0], &work_done, budget);
 
 	if (!tx_clean_complete)
@@ -5090,9 +5091,6 @@  static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
 	adapter->tx_ring_count = IXGBE_DEFAULT_TXD;
 	adapter->rx_ring_count = IXGBE_DEFAULT_RXD;
 
-	/* set default work limits */
-	adapter->tx_work_limit = adapter->tx_ring_count;
-
 	/* initialize eeprom parameters */
 	if (ixgbe_init_eeprom_params_generic(hw)) {
 		e_dev_err("EEPROM initialization failed\n");