diff mbox

[v3,net-next,3/4] ixgbe: Add support for ndo_ll_poll

Message ID 20130520101622.14133.21998.stgit@ladj378.jer.intel.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Eliezer Tamir May 20, 2013, 10:16 a.m. UTC
Add the ixgbe driver code implementing ndo_ll_poll.
It should be easy for other drivers to do something similar
in order to enable support for CONFIG_INET_LL_RX_POLL

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   96 +++++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   80 +++++++++++++++++++--
 2 files changed, 168 insertions(+), 8 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Or Gerlitz May 20, 2013, 8:20 p.m. UTC | #1
On Mon, May 20, 2013 at 1:16 PM, Eliezer Tamir
<eliezer.tamir@linux.intel.com> wrote:
> Add the ixgbe driver code implementing ndo_ll_poll.
> It should be easy for other drivers to do something similar
> in order to enable support for CONFIG_INET_LL_RX_POLL

I am not sure,


> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
[...]
> @@ -144,6 +145,14 @@ static int debug = -1;
>  module_param(debug, int, 0);
>  MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
>
> +#ifdef CONFIG_INET_LL_RX_POLL
> +static int allow_unsafe_removal;
> +static int unsafe_to_remove;
> +module_param(allow_unsafe_removal, int, 0);
> +MODULE_PARM_DESC(allow_unsafe_removal,
> +       "Allow removal of module after low latency receive was used");
> +#endif

what?!

[...]

> +#ifdef CONFIG_INET_LL_RX_POLL
> +/* must be called with local_bh_disable()d */
> +static int ixgbe_low_latency_recv(struct napi_struct *napi)
> +{
> +       struct ixgbe_q_vector *q_vector =
> +                       container_of(napi, struct ixgbe_q_vector, napi);
> +       struct ixgbe_adapter *adapter = q_vector->adapter;
> +       struct ixgbe_ring  *ring;
> +       int found;
> +
> +       if (unlikely(!unsafe_to_remove)) {
> +               unsafe_to_remove = 1;
> +               if (!allow_unsafe_removal) {
> +                       pr_info("module may no longer be removed\n");
> +                       try_module_get(THIS_MODULE);
> +               }
> +       }

guys, so what is going here, you were asking to put this series in
net-next, and you expect each other driver implementing this ndo to
follow this undocumented hack? or maybe this code was just left here
by mistake from previous implementations and just needed to be
removed?  please clarify.

Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andi Kleen May 20, 2013, 8:33 p.m. UTC | #2
> guys, so what is going here, you were asking to put this series in
> net-next, and you expect each other driver implementing this ndo to
> follow this undocumented hack? or maybe this code was just left here
> by mistake from previous implementations and just needed to be
> removed?  please clarify.

This is discussed in 0/x

-Andi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andi Kleen May 20, 2013, 9:01 p.m. UTC | #3
On Mon, May 20, 2013 at 11:42:37PM +0300, Or Gerlitz wrote:
> On Mon, May 20, 2013 at 11:33 PM, Andi Kleen <andi@firstfloor.org> wrote:
> 
> >
> > This is discussed in 0/x
> >
> 
> I am not with you, V3's cover letter is empty, and in V2's cover letter I
> don't see that

It was here. Looks like Eliezer didn't reuse the full cover later.

http://www.gossamer-threads.com/lists/linux/kernel/1715294?page=last
Or Gerlitz May 21, 2013, 6:23 a.m. UTC | #4
On Tue, May 21, 2013 at 12:01 AM, Andi Kleen <andi@firstfloor.org> wrote:
>
> On Mon, May 20, 2013 at 11:42:37PM +0300, Or Gerlitz wrote:
> > On Mon, May 20, 2013 at 11:33 PM, Andi Kleen <andi@firstfloor.org> wrote:
> >
> > >
> > > This is discussed in 0/x
> > >
> >
> > I am not with you, V3's cover letter is empty, and in V2's cover letter I
> > don't see that
>
> It was here. Looks like Eliezer didn't reuse the full cover later.
> http://www.gossamer-threads.com/lists/linux/kernel/1715294?page=last
>


Grepping for "ixgbe" in the link you sent only brings the following hits:

Patch 3 shows how this method would be implemented for the ixgbe driver.
Patch 4 adds statistics to the ixgbe driver for ndo_ll_poll events.

does "shows how this method would be implemented" means patch #4 is
just RFC or proof-of-concept code and is not actually submitted for
acceptance?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eliezer Tamir May 21, 2013, 6:54 a.m. UTC | #5
On 20/05/2013 23:20, Or Gerlitz wrote:
> On Mon, May 20, 2013 at 1:16 PM, Eliezer Tamir
> <eliezer.tamir@linux.intel.com> wrote:
>> Add the ixgbe driver code implementing ndo_ll_poll.
>> It should be easy for other drivers to do something similar
>> in order to enable support for CONFIG_INET_LL_RX_POLL
>
> I am not sure,

Willem ported this to <some undisclosed HW that they use at Google>, his 
feedback was that it was not a major effort.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eilon Greenstein May 21, 2013, 7:06 a.m. UTC | #6
On Tue, 2013-05-21 at 09:54 +0300, Eliezer Tamir wrote:
> On 20/05/2013 23:20, Or Gerlitz wrote:
> > On Mon, May 20, 2013 at 1:16 PM, Eliezer Tamir
> > <eliezer.tamir@linux.intel.com> wrote:
> >> Add the ixgbe driver code implementing ndo_ll_poll.
> >> It should be easy for other drivers to do something similar
> >> in order to enable support for CONFIG_INET_LL_RX_POLL
> >
> > I am not sure,
> 
> Willem ported this to <some undisclosed HW that they use at Google>, his 
> feedback was that it was not a major effort.

We also played with applying a similar patch on the bnx2x and it looks
great :) - Thanks Eliezer!

Or - at least for the bnx2x, it is easy to add support for this new ndo.

Hopefully this series will be accepted so we can send follow up support
for the bnx2x as well.

Thanks,
Eilon



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 21, 2013, 7:14 a.m. UTC | #7
From: "Eilon Greenstein" <eilong@broadcom.com>
Date: Tue, 21 May 2013 10:06:43 +0300

> Hopefully this series will be accepted so we can send follow up support
> for the bnx2x as well.

I think in two or three more iterations it will be merged.

There are no objections on the fundamentals, it's just implementation
details and coding style at this point.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz May 21, 2013, 8:21 a.m. UTC | #8
On Tue, May 21, 2013 at 10:06 AM, Eilon Greenstein <eilong@broadcom.com> wrote:
> Or - at least for the bnx2x, it is easy to add support for this new ndo.

Do you understand what's the equivalent of that mysterious module
param for your driver/HW - or you just copied and pasted that black
magic code?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz May 21, 2013, 8:24 a.m. UTC | #9
On Tue, May 21, 2013 at 10:14 AM, David Miller <davem@davemloft.net> wrote:
> From: "Eilon Greenstein" <eilong@broadcom.com>
> Date: Tue, 21 May 2013 10:06:43 +0300
>
>> Hopefully this series will be accepted so we can send follow up support
>> for the bnx2x as well.
>
> I think in two or three more iterations it will be merged.
>
> There are no objections on the fundamentals, it's just implementation
> details and coding style at this point.

Dave, sorry, I might be a bit behind the rest of the reviewers, but I
just fail to understand nor find any reference that explains the
module param of ixgbe nor it makes sense to me to merge that piece of
the code upstream (its not for staging, correct?), as I wrote here
http://marc.info/?l=linux-netdev&m=136908123432072&w=2 basically, I
know you're not a great fun of module params (to say the least) and
surely not something named  "allow_unsafe_removal", thoughts?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eilon Greenstein May 21, 2013, 8:28 a.m. UTC | #10
On Tue, 2013-05-21 at 11:21 +0300, Or Gerlitz wrote:
> On Tue, May 21, 2013 at 10:06 AM, Eilon Greenstein <eilong@broadcom.com> wrote:
> > Or - at least for the bnx2x, it is easy to add support for this new ndo.
> 
> Do you understand what's the equivalent of that mysterious module
> param for your driver/HW - or you just copied and pasted that black
> magic code?
> 

The module parameter is not the interesting part of this patch. It is
clear that unloading the module while this sort of traffic is running is
not safe and the alternative of adding a reference count or something
similar sounds too costly (after all, this patch is about performance).
I just played with it to get a feel of the latency improvement - I did
not try unloading the module during traffic so I do not care about the
module parameter part right now. I agree that the unload should be
looked into - but the general concept is great and it is a very nice
improvement.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eliezer Tamir May 21, 2013, 8:31 a.m. UTC | #11
On 21/05/2013 11:24, Or Gerlitz wrote:
> On Tue, May 21, 2013 at 10:14 AM, David Miller <davem@davemloft.net> wrote:
>> From: "Eilon Greenstein" <eilong@broadcom.com>
>> Date: Tue, 21 May 2013 10:06:43 +0300
>>
>>> Hopefully this series will be accepted so we can send follow up support
>>> for the bnx2x as well.
>>
>> I think in two or three more iterations it will be merged.
>>
>> There are no objections on the fundamentals, it's just implementation
>> details and coding style at this point.
>
> Dave, sorry, I might be a bit behind the rest of the reviewers, but I
> just fail to understand nor find any reference that explains the
> module param of ixgbe nor it makes sense to me to merge that piece of
> the code upstream (its not for staging, correct?), as I wrote here
> http://marc.info/?l=linux-netdev&m=136908123432072&w=2 basically, I
> know you're not a great fun of module params (to say the least) and
> surely not something named  "allow_unsafe_removal", thoughts?

from v2 0/4

6. To avoid the overhead of reference counting napi structs by skbs
and sockets in the fastpath, and increasing the size of the skb struct,
we no longer allow unloading the module once this feature has been used.

It seems that for most of the people interested in busy-polling, giving
up the ability to blindly remove the module for a slight but measurable
performance gain is a good tradeoff.
(There is a module parameter to override this behavior and if you know
what you are doing and are careful to stop the processes you can safely
unload, but we don't enforce this.)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 21, 2013, 8:39 a.m. UTC | #12
From: Or Gerlitz <or.gerlitz@gmail.com>
Date: Tue, 21 May 2013 11:24:41 +0300

> On Tue, May 21, 2013 at 10:14 AM, David Miller <davem@davemloft.net> wrote:
>> From: "Eilon Greenstein" <eilong@broadcom.com>
>> Date: Tue, 21 May 2013 10:06:43 +0300
>>
>>> Hopefully this series will be accepted so we can send follow up support
>>> for the bnx2x as well.
>>
>> I think in two or three more iterations it will be merged.
>>
>> There are no objections on the fundamentals, it's just implementation
>> details and coding style at this point.
> 
> Dave, sorry, I might be a bit behind the rest of the reviewers, but I
> just fail to understand nor find any reference that explains the
> module param of ixgbe nor it makes sense to me to merge that piece of
> the code upstream (its not for staging, correct?), as I wrote here
> http://marc.info/?l=linux-netdev&m=136908123432072&w=2 basically, I
> know you're not a great fun of module params (to say the least) and
> surely not something named  "allow_unsafe_removal", thoughts?

It's one of those "implementation details", I hate it too.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eliezer Tamir May 21, 2013, 8:42 a.m. UTC | #13
On 21/05/2013 11:39, David Miller wrote:
> From: Or Gerlitz <or.gerlitz@gmail.com>
> Date: Tue, 21 May 2013 11:24:41 +0300
>
>> On Tue, May 21, 2013 at 10:14 AM, David Miller <davem@davemloft.net> wrote:
>>> From: "Eilon Greenstein" <eilong@broadcom.com>
>>> Date: Tue, 21 May 2013 10:06:43 +0300
>>>
>>>> Hopefully this series will be accepted so we can send follow up support
>>>> for the bnx2x as well.
>>>
>>> I think in two or three more iterations it will be merged.
>>>
>>> There are no objections on the fundamentals, it's just implementation
>>> details and coding style at this point.
>>
>> Dave, sorry, I might be a bit behind the rest of the reviewers, but I
>> just fail to understand nor find any reference that explains the
>> module param of ixgbe nor it makes sense to me to merge that piece of
>> the code upstream (its not for staging, correct?), as I wrote here
>> http://marc.info/?l=linux-netdev&m=136908123432072&w=2 basically, I
>> know you're not a great fun of module params (to say the least) and
>> surely not something named  "allow_unsafe_removal", thoughts?
>
> It's one of those "implementation details", I hate it too.

I'm open to suggestions.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz May 21, 2013, 8:43 a.m. UTC | #14
On Tue, May 21, 2013 at 11:39 AM, David Miller <davem@davemloft.net> wrote:

> It's one of those "implementation details", I hate it too.

Maybe if we bake it on this list little further we can see how to get
away from that, or what's the most non ugly way for that?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eliezer Tamir May 21, 2013, 10:27 a.m. UTC | #15
On 21/05/2013 11:43, Or Gerlitz wrote:
> On Tue, May 21, 2013 at 11:39 AM, David Miller <davem@davemloft.net> wrote:
>
>> It's one of those "implementation details", I hate it too.
>
> Maybe if we bake it on this list little further we can see how to get
> away from that, or what's the most non ugly way for that?

I'm all for proper review and fixing any issues before forcing 
"ugliness" and "black magic" on unsuspecting users.

Having said that, you failed to mention that your company sells 
userspace stack replacements.

Informal testing I did convinces me that for a given HW, the latencies 
you get with this patchset and with userspace busy-polling are about the 
same. (This is my personal opinion, I'm not authorized to talk on behalf 
of my company or anyone else.)

Why don't you try it out and tell us what you find.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz May 21, 2013, 10:41 a.m. UTC | #16
On Tue, May 21, 2013 at 1:27 PM, Eliezer Tamir
<eliezer.tamir@linux.intel.com> wrote:
> On 21/05/2013 11:43, Or Gerlitz wrote:

>> Maybe if we bake it on this list little further we can see how to get
>> away from that, or what's the most non ugly way for that?

> I'm all for proper review and fixing any issues before forcing "ugliness"
> and "black magic" on unsuspecting users.
>
> Having said that, you failed to mention that your company sells userspace
> stack replacements.

ditto for your firm, heard on DPDK? but your comment is irrelevant, I
only raised the idea that if we bake this little further we might find
a solution which avoids this nasty corner, not more but not less.

[...]

> Why don't you try it out and tell us what you find.

sure, we are looking on that on the driver level, my comment was more general
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Willem de Bruijn May 21, 2013, 2:19 p.m. UTC | #17
On Tue, May 21, 2013 at 2:54 AM, Eliezer Tamir
<eliezer.tamir@linux.intel.com> wrote:
> On 20/05/2013 23:20, Or Gerlitz wrote:
>>
>> On Mon, May 20, 2013 at 1:16 PM, Eliezer Tamir
>> <eliezer.tamir@linux.intel.com> wrote:
>>>
>>> Add the ixgbe driver code implementing ndo_ll_poll.
>>> It should be easy for other drivers to do something similar
>>> in order to enable support for CONFIG_INET_LL_RX_POLL
>>
>>
>> I am not sure,
>
>
> Willem ported this to <some undisclosed HW that they use at Google>, his
> feedback was that it was not a major effort.

The core ndo_ll_poll implementation is generally a subset of a device
driver's existing napi callback. It cleans the queues, but it skips
napi_complete and unmasking of the IRQ.

+       ixgbe_for_each_ring(ring, q_vector->rx) {
+               found = ixgbe_clean_rx_irq(q_vector, ring, 4);
+               if (found)
+                       break;
+       }

A subtle difference in the above code vs ixgbe_poll is that the
callback returns as soon as some data arrived on a queue, as opposed
to iterating over all queues. The budget is lower, too. Since not all
data arriving is necessarily destined towards polling socket, this may
or may not be an improvement.

Besides that, the driver has to mark the packet with
ll_mark_skb(&cq->napi, skb);

On devices where tx completion interrupts share the same IRQ as rx
interrupts, the driver may also have to clean the tx queue once in a
while (at obvious tail latency cost). LLS does not disable the IRQ,
but I think the suggestion was to set its moderation threshold very
high to avoid net_rx_action/LLS lock contention. If so, starvation may
occur.

The most difficult bit is handling mutual exclusion with the
interrupt-driven receive path. The ixgbe port has its own internal
locking mechanism in anticipation of future use cases that can be
lock-free. As first approximation, I just took the napi->poll_lock,
similar to how netpoll handles mutual exclusion with net_rx_action.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index ca93238..72be661 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -356,9 +356,105 @@  struct ixgbe_q_vector {
 	struct rcu_head rcu;	/* to avoid race with update stats on free */
 	char name[IFNAMSIZ + 9];
 
+#ifdef CONFIG_INET_LL_RX_POLL
+	unsigned int state;
+#define IXGBE_QV_STATE_IDLE        0
+#define IXGBE_QV_STATE_NAPI	   1    /* NAPI owns this QV */
+#define IXGBE_QV_STATE_POLL	   2    /* poll owns this QV */
+#define IXGBE_QV_LOCKED (IXGBE_QV_STATE_NAPI | IXGBE_QV_STATE_POLL)
+#define IXGBE_QV_STATE_NAPI_YIELD  4    /* NAPI yielded this QV */
+#define IXGBE_QV_STATE_POLL_YIELD  8    /* poll yielded this QV */
+#define IXGBE_QV_YIELD (IXGBE_QV_STATE_NAPI_YIELD | IXGBE_QV_STATE_POLL_YIELD)
+#define IXGBE_QV_USER_PEND (IXGBE_QV_STATE_POLL | IXGBE_QV_STATE_POLL_YIELD)
+	spinlock_t lock;
+#endif  /* CONFIG_INET_LL_RX_POLL */
+
 	/* for dynamic allocation of rings associated with this q_vector */
 	struct ixgbe_ring ring[0] ____cacheline_internodealigned_in_smp;
 };
+#ifdef CONFIG_INET_LL_RX_POLL
+static inline void ixgbe_qv_init_lock(struct ixgbe_q_vector *q_vector)
+{
+
+	spin_lock_init(&q_vector->lock);
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+}
+
+/* called from the device poll rutine to get ownership of a q_vector */
+static inline int ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock(&q_vector->lock);
+	if (q_vector->state & IXGBE_QV_LOCKED) {
+		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
+		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
+		rc = false;
+	} else
+		/* we don't care if someone yielded */
+		q_vector->state = IXGBE_QV_STATE_NAPI;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* returns true is someone tried to get the qv while napi had it */
+static inline int ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_POLL |
+			       IXGBE_QV_STATE_NAPI_YIELD));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* called from ixgbe_low_latency_poll() */
+static inline int ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock_bh(&q_vector->lock);
+	if ((q_vector->state & IXGBE_QV_LOCKED)) {
+		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
+		rc = false;
+	} else
+		/* preserve yield marks */
+		q_vector->state |= IXGBE_QV_STATE_POLL;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* returns true if someone tried to get the qv while it was locked */
+static inline int ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock_bh(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_NAPI));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* true if a socket is polling, even if it did not get the lock */
+static inline int ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
+{
+	WARN_ON(!(q_vector->state & IXGBE_QV_LOCKED));
+	return q_vector->state & IXGBE_QV_USER_PEND;
+}
+#else
+#define ixgbe_qv_init_lock(qv) do {} while (0)
+#define ixgbe_qv_lock_napi(qv) 1
+#define ixgbe_qv_unlock_napi(qv) 0
+#define ixgbe_qv_lock_poll(qv) 0
+#define ixgbe_qv_unlock_poll(qv) 0
+#define ixgbe_qv_ll_polling(qv) 0
+#endif /* CONFIG_INET_LL_RX_POLL */
+
 #ifdef CONFIG_IXGBE_HWMON
 
 #define IXGBE_HWMON_TYPE_LOC		0
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d30fbdd..628b7b1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -47,6 +47,7 @@ 
 #include <linux/if_bridge.h>
 #include <linux/prefetch.h>
 #include <scsi/fc/fc_fcoe.h>
+#include <net/ll_poll.h>
 
 #include "ixgbe.h"
 #include "ixgbe_common.h"
@@ -144,6 +145,14 @@  static int debug = -1;
 module_param(debug, int, 0);
 MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
 
+#ifdef CONFIG_INET_LL_RX_POLL
+static int allow_unsafe_removal;
+static int unsafe_to_remove;
+module_param(allow_unsafe_removal, int, 0);
+MODULE_PARM_DESC(allow_unsafe_removal,
+	"Allow removal of module after low latency receive was used");
+#endif
+
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
 MODULE_DESCRIPTION("Intel(R) 10 Gigabit PCI Express Network Driver");
 MODULE_LICENSE("GPL");
@@ -1504,7 +1513,9 @@  static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 
-	if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
+	if (ixgbe_qv_ll_polling(q_vector))
+		netif_receive_skb(skb);
+	else if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
 		napi_gro_receive(&q_vector->napi, skb);
 	else
 		netif_rx(skb);
@@ -1892,9 +1903,9 @@  dma_sync:
  * expensive overhead for IOMMU access this provides a means of avoiding
  * it by maintaining the mapping of the page to the syste.
  *
- * Returns true if all work is completed without reaching budget
+ * Returns amount of work completed
  **/
-static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
+static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			       struct ixgbe_ring *rx_ring,
 			       const int budget)
 {
@@ -1976,6 +1987,7 @@  static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		}
 
 #endif /* IXGBE_FCOE */
+		skb_mark_ll(skb, &q_vector->napi);
 		ixgbe_rx_skb(q_vector, skb);
 
 		/* update budget accounting */
@@ -1992,9 +2004,45 @@  static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	if (cleaned_count)
 		ixgbe_alloc_rx_buffers(rx_ring, cleaned_count);
 
-	return (total_rx_packets < budget);
+	return total_rx_packets;
 }
 
+#ifdef CONFIG_INET_LL_RX_POLL
+/* must be called with local_bh_disable()d */
+static int ixgbe_low_latency_recv(struct napi_struct *napi)
+{
+	struct ixgbe_q_vector *q_vector =
+			container_of(napi, struct ixgbe_q_vector, napi);
+	struct ixgbe_adapter *adapter = q_vector->adapter;
+	struct ixgbe_ring  *ring;
+	int found;
+
+	if (unlikely(!unsafe_to_remove)) {
+		unsafe_to_remove = 1;
+		if (!allow_unsafe_removal) {
+			pr_info("module may no longer be removed\n");
+			try_module_get(THIS_MODULE);
+		}
+	}
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state))
+		return LL_FLUSH_FAILED;
+
+	if (!ixgbe_qv_lock_poll(q_vector))
+		return LL_FLUSH_BUSY;
+
+	ixgbe_for_each_ring(ring, q_vector->rx) {
+		found = ixgbe_clean_rx_irq(q_vector, ring, 4);
+		if (found)
+			break;
+	}
+
+	ixgbe_qv_unlock_poll(q_vector);
+
+	return LL_FLUSH_DONE;
+}
+#endif	/* CONFIG_INET_LL_RX_POLL */
+
 /**
  * ixgbe_configure_msix - Configure MSI-X hardware
  * @adapter: board private structure
@@ -2550,6 +2598,9 @@  int ixgbe_poll(struct napi_struct *napi, int budget)
 	ixgbe_for_each_ring(ring, q_vector->tx)
 		clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring);
 
+	if (!ixgbe_qv_lock_napi(q_vector))
+		return budget;
+
 	/* attempt to distribute budget to each queue fairly, but don't allow
 	 * the budget to go below 1 because we'll exit polling */
 	if (q_vector->rx.count > 1)
@@ -2558,9 +2609,10 @@  int ixgbe_poll(struct napi_struct *napi, int budget)
 		per_ring_budget = budget;
 
 	ixgbe_for_each_ring(ring, q_vector->rx)
-		clean_complete &= ixgbe_clean_rx_irq(q_vector, ring,
-						     per_ring_budget);
+		clean_complete &= (ixgbe_clean_rx_irq(q_vector, ring,
+				   per_ring_budget) < per_ring_budget);
 
+	ixgbe_qv_unlock_napi(q_vector);
 	/* If all work not completed, return budget and keep polling */
 	if (!clean_complete)
 		return budget;
@@ -3747,16 +3799,25 @@  static void ixgbe_napi_enable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
+		ixgbe_qv_init_lock(adapter->q_vector[q_idx]);
 		napi_enable(&adapter->q_vector[q_idx]->napi);
+	}
 }
 
 static void ixgbe_napi_disable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	local_bh_disable(); /* for ixgbe_qv_lock_napi() */
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
 		napi_disable(&adapter->q_vector[q_idx]->napi);
+		while (!ixgbe_qv_lock_napi(adapter->q_vector[q_idx])) {
+			pr_info("QV %d locked\n", q_idx);
+			mdelay(1);
+		}
+	}
+	local_bh_enable();
 }
 
 #ifdef CONFIG_IXGBE_DCB
@@ -7177,6 +7238,9 @@  static const struct net_device_ops ixgbe_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
 #endif
+#ifdef CONFIG_INET_LL_RX_POLL
+	.ndo_ll_poll		= ixgbe_low_latency_recv,
+#endif
 #ifdef IXGBE_FCOE
 	.ndo_fcoe_ddp_setup = ixgbe_fcoe_ddp_get,
 	.ndo_fcoe_ddp_target = ixgbe_fcoe_ddp_target,