diff mbox

[net-next-2.6] net: convert bonding to use rx_handler

Message ID 20110218132524.GC2939@psychotron.redhat.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Pirko Feb. 18, 2011, 1:25 p.m. UTC
This patch converts bonding to use rx_handler. Results in cleaner
__netif_receive_skb() with much less exceptions needed. Also bond-specific
work is moved into bond code.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/bonding/bond_main.c |   75 ++++++++++++++++++++-
 include/linux/skbuff.h          |    2 +
 net/core/dev.c                  |  144 +++++++++++---------------------------
 net/core/skbuff.c               |    1 +
 4 files changed, 119 insertions(+), 103 deletions(-)

Comments

Eric Dumazet Feb. 18, 2011, 1:29 p.m. UTC | #1
Le vendredi 18 février 2011 à 14:25 +0100, Jiri Pirko a écrit :
> This patch converts bonding to use rx_handler. Results in cleaner
> __netif_receive_skb() with much less exceptions needed. Also bond-specific
> work is moved into bond code.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> ---
>  drivers/net/bonding/bond_main.c |   75 ++++++++++++++++++++-
>  include/linux/skbuff.h          |    2 +
>  net/core/dev.c                  |  144 +++++++++++---------------------------
>  net/core/skbuff.c               |    1 +
>  4 files changed, 119 insertions(+), 103 deletions(-)
> 

> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 31f02d0..9f3af5d 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -267,6 +267,7 @@ typedef unsigned char *sk_buff_data_t;
>   *	@sk: Socket we are owned by
>   *	@tstamp: Time we arrived
>   *	@dev: Device we arrived on/are leaving by
> + *	@input_dev: Original device on which we arrived
>   *	@transport_header: Transport layer header
>   *	@network_header: Network layer header
>   *	@mac_header: Link layer header
> @@ -325,6 +326,7 @@ struct sk_buff {
>  
>  	struct sock		*sk;
>  	struct net_device	*dev;
> +	struct net_device	*input_dev;
>  

Your patch looks fine, but adding 8 bytes to sk_buff for a "cleanup" is
really a show stopper for me.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Feb. 18, 2011, 2:14 p.m. UTC | #2
Fri, Feb 18, 2011 at 02:29:51PM CET, eric.dumazet@gmail.com wrote:
>Le vendredi 18 février 2011 à 14:25 +0100, Jiri Pirko a écrit :
>> This patch converts bonding to use rx_handler. Results in cleaner
>> __netif_receive_skb() with much less exceptions needed. Also bond-specific
>> work is moved into bond code.
>> 
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>> ---
>>  drivers/net/bonding/bond_main.c |   75 ++++++++++++++++++++-
>>  include/linux/skbuff.h          |    2 +
>>  net/core/dev.c                  |  144 +++++++++++---------------------------
>>  net/core/skbuff.c               |    1 +
>>  4 files changed, 119 insertions(+), 103 deletions(-)
>> 
>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 31f02d0..9f3af5d 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -267,6 +267,7 @@ typedef unsigned char *sk_buff_data_t;
>>   *	@sk: Socket we are owned by
>>   *	@tstamp: Time we arrived
>>   *	@dev: Device we arrived on/are leaving by
>> + *	@input_dev: Original device on which we arrived
>>   *	@transport_header: Transport layer header
>>   *	@network_header: Network layer header
>>   *	@mac_header: Link layer header
>> @@ -325,6 +326,7 @@ struct sk_buff {
>>  
>>  	struct sock		*sk;
>>  	struct net_device	*dev;
>> +	struct net_device	*input_dev;
>>  
>
>Your patch looks fine, but adding 8 bytes to sk_buff for a "cleanup" is
>really a show stopper for me.

Do not know how to do it better. As for percpu variable, not only
origdev would have to be remembered but also probably skb pointer to
know if it's the first run on the skb or not. Can't really figure out a
better solution. Can you?

Thanks.

Jirka
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 18, 2011, 2:27 p.m. UTC | #3
Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :

> Do not know how to do it better. As for percpu variable, not only
> origdev would have to be remembered but also probably skb pointer to
> know if it's the first run on the skb or not. Can't really figure out a
> better solution. Can you?

I'll try and let you know.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Feb. 18, 2011, 2:46 p.m. UTC | #4
Am 18.02.2011 15:27, schrieb Eric Dumazet:
> Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :
> 
>> Do not know how to do it better. As for percpu variable, not only
>> origdev would have to be remembered but also probably skb pointer to
>> know if it's the first run on the skb or not. Can't really figure out a
>> better solution. Can you?
> 
> I'll try and let you know.

Why not simply do a lookup on skb->iif?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Feb. 18, 2011, 2:58 p.m. UTC | #5
Fri, Feb 18, 2011 at 03:46:45PM CET, kaber@trash.net wrote:
>Am 18.02.2011 15:27, schrieb Eric Dumazet:
>> Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :
>> 
>>> Do not know how to do it better. As for percpu variable, not only
>>> origdev would have to be remembered but also probably skb pointer to
>>> know if it's the first run on the skb or not. Can't really figure out a
>>> better solution. Can you?
>> 
>> I'll try and let you know.
>
>Why not simply do a lookup on skb->iif?

Well I was trying to avoid iterating over list of devices for each
incoming frame.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Feb. 18, 2011, 3:50 p.m. UTC | #6
On 18.02.2011 15:58, Jiri Pirko wrote:
> Fri, Feb 18, 2011 at 03:46:45PM CET, kaber@trash.net wrote:
>> Am 18.02.2011 15:27, schrieb Eric Dumazet:
>>> Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :
>>>
>>>> Do not know how to do it better. As for percpu variable, not only
>>>> origdev would have to be remembered but also probably skb pointer to
>>>> know if it's the first run on the skb or not. Can't really figure out a
>>>> better solution. Can you?
>>>
>>> I'll try and let you know.
>>
>> Why not simply do a lookup on skb->iif?
> 
> Well I was trying to avoid iterating over list of devices for each
> incoming frame.
> 

Well, there are a couple of holes on 64 bit, perhaps you can rearrange
things and eliminate either iif or input_dev without increasing size
since they appear to be redundant.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 18, 2011, 4:14 p.m. UTC | #7
Le vendredi 18 février 2011 à 16:50 +0100, Patrick McHardy a écrit :
> On 18.02.2011 15:58, Jiri Pirko wrote:
> > Fri, Feb 18, 2011 at 03:46:45PM CET, kaber@trash.net wrote:
> >> Am 18.02.2011 15:27, schrieb Eric Dumazet:
> >>> Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :
> >>>
> >>>> Do not know how to do it better. As for percpu variable, not only
> >>>> origdev would have to be remembered but also probably skb pointer to
> >>>> know if it's the first run on the skb or not. Can't really figure out a
> >>>> better solution. Can you?
> >>>
> >>> I'll try and let you know.
> >>
> >> Why not simply do a lookup on skb->iif?
> > 
> > Well I was trying to avoid iterating over list of devices for each
> > incoming frame.
> > 
> 
> Well, there are a couple of holes on 64 bit, perhaps you can rearrange
> things and eliminate either iif or input_dev without increasing size
> since they appear to be redundant.

Jiri

I dont understand why netif_rx() is needed in your patch.

Can we stack 10 bond devices or so ???

If we avoid this stage and call the real thing (netif_receive_skb()),
then we dont need adding a field in each skb, since it can be carried by
a global variable (per cpu of course)

bond_handle_frame() being called from __netif_receive_skb() I believe it
can use netif_receive_skb() instead of netif_rx().

Same remark for vlan_on_bond_hook()



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Feb. 18, 2011, 6:47 p.m. UTC | #8
Fri, Feb 18, 2011 at 05:14:30PM CET, eric.dumazet@gmail.com wrote:
>Le vendredi 18 février 2011 à 16:50 +0100, Patrick McHardy a écrit :
>> On 18.02.2011 15:58, Jiri Pirko wrote:
>> > Fri, Feb 18, 2011 at 03:46:45PM CET, kaber@trash.net wrote:
>> >> Am 18.02.2011 15:27, schrieb Eric Dumazet:
>> >>> Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :
>> >>>
>> >>>> Do not know how to do it better. As for percpu variable, not only
>> >>>> origdev would have to be remembered but also probably skb pointer to
>> >>>> know if it's the first run on the skb or not. Can't really figure out a
>> >>>> better solution. Can you?
>> >>>
>> >>> I'll try and let you know.
>> >>
>> >> Why not simply do a lookup on skb->iif?
>> > 
>> > Well I was trying to avoid iterating over list of devices for each
>> > incoming frame.
>> > 
>> 
>> Well, there are a couple of holes on 64 bit, perhaps you can rearrange
>> things and eliminate either iif or input_dev without increasing size
>> since they appear to be redundant.
>
>Jiri
>
>I dont understand why netif_rx() is needed in your patch.

I used netif_rx() because bridge and macvlan does that too. I did not see
a reason to not to do the same.

>
>Can we stack 10 bond devices or so ???
>
>If we avoid this stage and call the real thing (netif_receive_skb()),
>then we dont need adding a field in each skb, since it can be carried by
>a global variable (per cpu of course)
>
I'm probably missing something. How do netif_receive_skb() and
netif_rx() differ in this point of view, since both are calling:
"ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);"
?

Still I see a problem with the percpu global variable. We would have to
store skb pointer there as well and in each __netif_receive_skb() call it
would have to be checked if it's different from the current one.
In that case store new skb and orig_Dev.

Leaving aside that global variables are evil in general, I still think
this is not nicer solution then to add skb->input_dev (although I
understand your arguments).


>bond_handle_frame() being called from __netif_receive_skb() I believe it
>can use netif_receive_skb() instead of netif_rx().
>
>Same remark for vlan_on_bond_hook()
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 18, 2011, 7:17 p.m. UTC | #9
Le vendredi 18 février 2011 à 19:47 +0100, Jiri Pirko a écrit :
> Fri, Feb 18, 2011 at 05:14:30PM CET, eric.dumazet@gmail.com wrote:
> >Le vendredi 18 février 2011 à 16:50 +0100, Patrick McHardy a écrit :
> >> On 18.02.2011 15:58, Jiri Pirko wrote:
> >> > Fri, Feb 18, 2011 at 03:46:45PM CET, kaber@trash.net wrote:
> >> >> Am 18.02.2011 15:27, schrieb Eric Dumazet:
> >> >>> Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :
> >> >>>
> >> >>>> Do not know how to do it better. As for percpu variable, not only
> >> >>>> origdev would have to be remembered but also probably skb pointer to
> >> >>>> know if it's the first run on the skb or not. Can't really figure out a
> >> >>>> better solution. Can you?
> >> >>>
> >> >>> I'll try and let you know.
> >> >>
> >> >> Why not simply do a lookup on skb->iif?
> >> > 
> >> > Well I was trying to avoid iterating over list of devices for each
> >> > incoming frame.
> >> > 
> >> 
> >> Well, there are a couple of holes on 64 bit, perhaps you can rearrange
> >> things and eliminate either iif or input_dev without increasing size
> >> since they appear to be redundant.
> >
> >Jiri
> >
> >I dont understand why netif_rx() is needed in your patch.
> 
> I used netif_rx() because bridge and macvlan does that too. I did not see
> a reason to not to do the same.
> 
> >
> >Can we stack 10 bond devices or so ???
> >
> >If we avoid this stage and call the real thing (netif_receive_skb()),
> >then we dont need adding a field in each skb, since it can be carried by
> >a global variable (per cpu of course)
> >
> I'm probably missing something. How do netif_receive_skb() and
> netif_rx() differ in this point of view, since both are calling:
> "ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);"
> ?
> 
> Still I see a problem with the percpu global variable. We would have to
> store skb pointer there as well and in each __netif_receive_skb() call it
> would have to be checked if it's different from the current one.
> In that case store new skb and orig_Dev.
> 
> Leaving aside that global variables are evil in general, I still think
> this is not nicer solution then to add skb->input_dev (although I
> understand your arguments).

Really I must miss something about "global variables" thing/fear.

Kernel is full of global variables, they are not evil if properly used.

Take a look at net/core/dev.c :

static DEFINE_PER_CPU(int, xmit_recursion);

For an example of what I have in mind.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Feb. 18, 2011, 7:28 p.m. UTC | #10
Fri, Feb 18, 2011 at 08:17:37PM CET, eric.dumazet@gmail.com wrote:
>Le vendredi 18 février 2011 à 19:47 +0100, Jiri Pirko a écrit :
>> Fri, Feb 18, 2011 at 05:14:30PM CET, eric.dumazet@gmail.com wrote:
>> >Le vendredi 18 février 2011 à 16:50 +0100, Patrick McHardy a écrit :
>> >> On 18.02.2011 15:58, Jiri Pirko wrote:
>> >> > Fri, Feb 18, 2011 at 03:46:45PM CET, kaber@trash.net wrote:
>> >> >> Am 18.02.2011 15:27, schrieb Eric Dumazet:
>> >> >>> Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :
>> >> >>>
>> >> >>>> Do not know how to do it better. As for percpu variable, not only
>> >> >>>> origdev would have to be remembered but also probably skb pointer to
>> >> >>>> know if it's the first run on the skb or not. Can't really figure out a
>> >> >>>> better solution. Can you?
>> >> >>>
>> >> >>> I'll try and let you know.
>> >> >>
>> >> >> Why not simply do a lookup on skb->iif?
>> >> > 
>> >> > Well I was trying to avoid iterating over list of devices for each
>> >> > incoming frame.
>> >> > 
>> >> 
>> >> Well, there are a couple of holes on 64 bit, perhaps you can rearrange
>> >> things and eliminate either iif or input_dev without increasing size
>> >> since they appear to be redundant.
>> >
>> >Jiri
>> >
>> >I dont understand why netif_rx() is needed in your patch.
>> 
>> I used netif_rx() because bridge and macvlan does that too. I did not see
>> a reason to not to do the same.
>> 
>> >
>> >Can we stack 10 bond devices or so ???
>> >
>> >If we avoid this stage and call the real thing (netif_receive_skb()),
>> >then we dont need adding a field in each skb, since it can be carried by
>> >a global variable (per cpu of course)
>> >
>> I'm probably missing something. How do netif_receive_skb() and
>> netif_rx() differ in this point of view, since both are calling:
>> "ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);"
>> ?
>> 
>> Still I see a problem with the percpu global variable. We would have to
>> store skb pointer there as well and in each __netif_receive_skb() call it
>> would have to be checked if it's different from the current one.
>> In that case store new skb and orig_Dev.
>> 
>> Leaving aside that global variables are evil in general, I still think
>> this is not nicer solution then to add skb->input_dev (although I
>> understand your arguments).
>
>Really I must miss something about "global variables" thing/fear.
>
>Kernel is full of global variables, they are not evil if properly used.

I know. But that doesn't mean it's ok. But I see your point.

>
>Take a look at net/core/dev.c :
>
>static DEFINE_PER_CPU(int, xmit_recursion);
>
>For an example of what I have in mind.

Yes I saw this. We would have to do something like:

struct skb_rx_context {
	struct sk_buff *skb;
	struct net_device *orig_dev;
};

static DEFINE_PER_CPU(struct skb_rx_context, skb_rx_context);

and then in __netif_receive_skb():

struct skb_rx_context *cont = __this_cpu_read(skb_rx_context);

if (cont->skb != skb) {
	cont->skb = skb;
	orig_dev = cont->orig_dev = skb->dev;
} else {
	orig_dev = cont->orig_dev;
}


Does this make sense?

>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 18, 2011, 7:58 p.m. UTC | #11
Le vendredi 18 février 2011 à 20:28 +0100, Jiri Pirko a écrit :

> Yes I saw this. We would have to do something like:
> 
> struct skb_rx_context {
> 	struct sk_buff *skb;
> 	struct net_device *orig_dev;
> };
> 
> static DEFINE_PER_CPU(struct skb_rx_context, skb_rx_context);
> 
> and then in __netif_receive_skb():
> 
> struct skb_rx_context *cont = __this_cpu_read(skb_rx_context);
> 
> if (cont->skb != skb) {
> 	cont->skb = skb;
> 	orig_dev = cont->orig_dev = skb->dev;
> } else {
> 	orig_dev = cont->orig_dev;
> }
> 
> 
> Does this make sense?

Well, yes, something like that, but I think you dont need to keep a
pointer to current skb. (It would not work if one handled/freed, same
'skb pointer' is reused a bit later)

Sorry, I wont have time to look at this right now, its now 21h00 in
France, time to get some time with family ;)

See you !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Feb. 18, 2011, 8:03 p.m. UTC | #12
Fri, Feb 18, 2011 at 08:58:26PM CET, eric.dumazet@gmail.com wrote:
>Le vendredi 18 février 2011 à 20:28 +0100, Jiri Pirko a écrit :
>
>> Yes I saw this. We would have to do something like:
>> 
>> struct skb_rx_context {
>> 	struct sk_buff *skb;
>> 	struct net_device *orig_dev;
>> };
>> 
>> static DEFINE_PER_CPU(struct skb_rx_context, skb_rx_context);
>> 
>> and then in __netif_receive_skb():
>> 
>> struct skb_rx_context *cont = __this_cpu_read(skb_rx_context);
>> 
>> if (cont->skb != skb) {
>> 	cont->skb = skb;
>> 	orig_dev = cont->orig_dev = skb->dev;
>> } else {
>> 	orig_dev = cont->orig_dev;
>> }
>> 
>> 
>> Does this make sense?
>
>Well, yes, something like that, but I think you dont need to keep a
>pointer to current skb. (It would not work if one handled/freed, same
>'skb pointer' is reused a bit later)

Well I think I need. How else should I distinguish that new skb (first
time in __netif_receive_skb) is there and I need to remember orig_dev?

>
>Sorry, I wont have time to look at this right now, its now 21h00 in
>France, time to get some time with family ;)

Np, same time here in CZ :)


>
>See you !
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 18, 2011, 8:06 p.m. UTC | #13
From: Jiri Pirko <jpirko@redhat.com>
Date: Fri, 18 Feb 2011 15:58:51 +0100

> Fri, Feb 18, 2011 at 03:46:45PM CET, kaber@trash.net wrote:
>>Am 18.02.2011 15:27, schrieb Eric Dumazet:
>>> Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :
>>> 
>>>> Do not know how to do it better. As for percpu variable, not only
>>>> origdev would have to be remembered but also probably skb pointer to
>>>> know if it's the first run on the skb or not. Can't really figure out a
>>>> better solution. Can you?
>>> 
>>> I'll try and let you know.
>>
>>Why not simply do a lookup on skb->iif?
> 
> Well I was trying to avoid iterating over list of devices for each
> incoming frame.

It is not list, it is hash :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Feb. 18, 2011, 8:13 p.m. UTC | #14
Fri, Feb 18, 2011 at 09:06:56PM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jpirko@redhat.com>
>Date: Fri, 18 Feb 2011 15:58:51 +0100
>
>> Fri, Feb 18, 2011 at 03:46:45PM CET, kaber@trash.net wrote:
>>>Am 18.02.2011 15:27, schrieb Eric Dumazet:
>>>> Le vendredi 18 février 2011 à 15:14 +0100, Jiri Pirko a écrit :
>>>> 
>>>>> Do not know how to do it better. As for percpu variable, not only
>>>>> origdev would have to be remembered but also probably skb pointer to
>>>>> know if it's the first run on the skb or not. Can't really figure out a
>>>>> better solution. Can you?
>>>> 
>>>> I'll try and let you know.
>>>
>>>Why not simply do a lookup on skb->iif?
>> 
>> Well I was trying to avoid iterating over list of devices for each
>> incoming frame.
>
>It is not list, it is hash :-)

Even if, Do you think that this would be ok?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 77e3c6a..a856a11 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1423,6 +1423,68 @@  static void bond_setup_by_slave(struct net_device *bond_dev,
 	bond->setup_by_slave = 1;
 }
 
+/* On bonding slaves other than the currently active slave, suppress
+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
+ * ARP on active-backup slaves with arp_validate enabled.
+ */
+static bool bond_should_deliver_exact_match(struct sk_buff *skb,
+					    struct net_device *slave_dev,
+					    struct net_device *bond_dev)
+{
+	if (slave_dev->priv_flags & IFF_SLAVE_INACTIVE) {
+		if (slave_dev->priv_flags & IFF_SLAVE_NEEDARP &&
+		    skb->protocol == __cpu_to_be16(ETH_P_ARP))
+			return false;
+
+		if (bond_dev->priv_flags & IFF_MASTER_ALB &&
+		    skb->pkt_type != PACKET_BROADCAST &&
+		    skb->pkt_type != PACKET_MULTICAST)
+				return false;
+
+		if (bond_dev->priv_flags & IFF_MASTER_8023AD &&
+		    skb->protocol == __cpu_to_be16(ETH_P_SLOW))
+			return false;
+
+		return true;
+	}
+	return false;
+}
+
+static struct sk_buff *bond_handle_frame(struct sk_buff *skb)
+{
+	struct net_device *slave_dev;
+	struct net_device *bond_dev;
+
+	skb = skb_share_check(skb, GFP_ATOMIC);
+	if (unlikely(!skb))
+		return NULL;
+	slave_dev = skb->dev;
+	bond_dev = ACCESS_ONCE(slave_dev->master);
+	if (unlikely(!bond_dev))
+		return skb;
+
+	if (bond_dev->priv_flags & IFF_MASTER_ARPMON)
+		slave_dev->last_rx = jiffies;
+
+	if (bond_should_deliver_exact_match(skb, slave_dev, bond_dev)) {
+		skb->deliver_no_wcard = 1;
+		return skb;
+	}
+
+	skb->dev = bond_dev;
+
+	if (bond_dev->priv_flags & IFF_MASTER_ALB &&
+	    bond_dev->priv_flags & IFF_BRIDGE_PORT &&
+	    skb->pkt_type == PACKET_HOST) {
+		u16 *dest = (u16 *) eth_hdr(skb)->h_dest;
+
+		memcpy(dest, bond_dev->dev_addr, ETH_ALEN);
+	}
+
+	netif_rx(skb);
+	return NULL;
+}
+
 /* enslave device <slave> to bond device <master> */
 int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 {
@@ -1599,11 +1661,17 @@  int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 		pr_debug("Error %d calling netdev_set_bond_master\n", res);
 		goto err_restore_mac;
 	}
+	res = netdev_rx_handler_register(slave_dev, bond_handle_frame, NULL);
+	if (res) {
+		pr_debug("Error %d calling netdev_rx_handler_register\n", res);
+		goto err_unset_master;
+	}
+
 	/* open the slave since the application closed it */
 	res = dev_open(slave_dev);
 	if (res) {
 		pr_debug("Opening slave %s failed\n", slave_dev->name);
-		goto err_unset_master;
+		goto err_unreg_rxhandler;
 	}
 
 	new_slave->dev = slave_dev;
@@ -1811,6 +1879,9 @@  int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 err_close:
 	dev_close(slave_dev);
 
+err_unreg_rxhandler:
+	netdev_rx_handler_unregister(slave_dev);
+
 err_unset_master:
 	netdev_set_bond_master(slave_dev, NULL);
 
@@ -1992,6 +2063,7 @@  int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 		netif_addr_unlock_bh(bond_dev);
 	}
 
+	netdev_rx_handler_unregister(slave_dev);
 	netdev_set_bond_master(slave_dev, NULL);
 
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -2114,6 +2186,7 @@  static int bond_release_all(struct net_device *bond_dev)
 			netif_addr_unlock_bh(bond_dev);
 		}
 
+		netdev_rx_handler_unregister(slave_dev);
 		netdev_set_bond_master(slave_dev, NULL);
 
 		/* close slave before restoring its mac address */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 31f02d0..9f3af5d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -267,6 +267,7 @@  typedef unsigned char *sk_buff_data_t;
  *	@sk: Socket we are owned by
  *	@tstamp: Time we arrived
  *	@dev: Device we arrived on/are leaving by
+ *	@input_dev: Original device on which we arrived
  *	@transport_header: Transport layer header
  *	@network_header: Network layer header
  *	@mac_header: Link layer header
@@ -325,6 +326,7 @@  struct sk_buff {
 
 	struct sock		*sk;
 	struct net_device	*dev;
+	struct net_device	*input_dev;
 
 	/*
 	 * This is the control buffer. It is free to use for every
diff --git a/net/core/dev.c b/net/core/dev.c
index 4f69439..9d2f485 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1530,12 +1530,17 @@  int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(dev_forward_skb);
 
+static inline int __deliver_skb(struct sk_buff *skb,
+				struct packet_type *pt_prev)
+{
+	return pt_prev->func(skb, skb->dev, pt_prev, skb->input_dev);
+}
+
 static inline int deliver_skb(struct sk_buff *skb,
-			      struct packet_type *pt_prev,
-			      struct net_device *orig_dev)
+			      struct packet_type *pt_prev)
 {
 	atomic_inc(&skb->users);
-	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+	return __deliver_skb(skb, pt_prev);
 }
 
 /*
@@ -1558,7 +1563,7 @@  static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 		    (ptype->af_packet_priv == NULL ||
 		     (struct sock *)ptype->af_packet_priv != skb->sk)) {
 			if (pt_prev) {
-				deliver_skb(skb2, pt_prev, skb->dev);
+				deliver_skb(skb2, pt_prev);
 				pt_prev = ptype;
 				continue;
 			}
@@ -1591,7 +1596,7 @@  static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 	if (pt_prev)
-		pt_prev->func(skb2, skb->dev, pt_prev, skb->dev);
+		__deliver_skb(skb2, pt_prev);
 	rcu_read_unlock();
 }
 
@@ -3021,8 +3026,7 @@  static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
 }
 
 static inline struct sk_buff *handle_ing(struct sk_buff *skb,
-					 struct packet_type **pt_prev,
-					 int *ret, struct net_device *orig_dev)
+					 struct packet_type **pt_prev, int *ret)
 {
 	struct netdev_queue *rxq = rcu_dereference(skb->dev->ingress_queue);
 
@@ -3030,7 +3034,7 @@  static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 		goto out;
 
 	if (*pt_prev) {
-		*ret = deliver_skb(skb, *pt_prev, orig_dev);
+		*ret = deliver_skb(skb, *pt_prev);
 		*pt_prev = NULL;
 	}
 
@@ -3092,63 +3096,30 @@  void netdev_rx_handler_unregister(struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
 
-static inline void skb_bond_set_mac_by_master(struct sk_buff *skb,
-					      struct net_device *master)
-{
-	if (skb->pkt_type == PACKET_HOST) {
-		u16 *dest = (u16 *) eth_hdr(skb)->h_dest;
-
-		memcpy(dest, master->dev_addr, ETH_ALEN);
-	}
-}
-
-/* On bonding slaves other than the currently active slave, suppress
- * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
- * ARP on active-backup slaves with arp_validate enabled.
- */
-static int __skb_bond_should_drop(struct sk_buff *skb,
-				  struct net_device *master)
+static void vlan_on_bond_hook(struct sk_buff *skb)
 {
-	struct net_device *dev = skb->dev;
-
-	if (master->priv_flags & IFF_MASTER_ARPMON)
-		dev->last_rx = jiffies;
-
-	if ((master->priv_flags & IFF_MASTER_ALB) &&
-	    (master->priv_flags & IFF_BRIDGE_PORT)) {
-		/* Do address unmangle. The local destination address
-		 * will be always the one master has. Provides the right
-		 * functionality in a bridge.
-		 */
-		skb_bond_set_mac_by_master(skb, master);
-	}
-
-	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
-		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
-		    skb->protocol == __cpu_to_be16(ETH_P_ARP))
-			return 0;
-
-		if (master->priv_flags & IFF_MASTER_ALB) {
-			if (skb->pkt_type != PACKET_BROADCAST &&
-			    skb->pkt_type != PACKET_MULTICAST)
-				return 0;
-		}
-		if (master->priv_flags & IFF_MASTER_8023AD &&
-		    skb->protocol == __cpu_to_be16(ETH_P_SLOW))
-			return 0;
+	/*
+	 * Make sure ARP frames received on VLAN interfaces stacked on
+	 * bonding interfaces still make their way to any base bonding
+	 * device that may have registered for a specific ptype.
+	 */
+	if (skb->dev->priv_flags & IFF_802_1Q_VLAN &&
+	    vlan_dev_real_dev(skb->dev)->priv_flags & IFF_BONDING &&
+	    skb->protocol == htons(ETH_P_ARP)) {
+		struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC);
 
-		return 1;
+		if (!skb2)
+			return;
+		skb2->dev = vlan_dev_real_dev(skb->dev);
+		netif_rx(skb2);
 	}
-	return 0;
 }
 
 static int __netif_receive_skb(struct sk_buff *skb)
 {
 	struct packet_type *ptype, *pt_prev;
 	rx_handler_func_t *rx_handler;
-	struct net_device *orig_dev;
-	struct net_device *null_or_orig;
-	struct net_device *orig_or_bond;
+	struct net_device *null_or_dev;
 	int ret = NET_RX_DROP;
 	__be16 type;
 
@@ -3164,29 +3135,8 @@  static int __netif_receive_skb(struct sk_buff *skb)
 	if (!skb->skb_iif)
 		skb->skb_iif = skb->dev->ifindex;
 
-	/*
-	 * bonding note: skbs received on inactive slaves should only
-	 * be delivered to pkt handlers that are exact matches.  Also
-	 * the deliver_no_wcard flag will be set.  If packet handlers
-	 * are sensitive to duplicate packets these skbs will need to
-	 * be dropped at the handler.
-	 */
-	null_or_orig = NULL;
-	orig_dev = skb->dev;
-	if (skb->deliver_no_wcard)
-		null_or_orig = orig_dev;
-	else if (netif_is_bond_slave(orig_dev)) {
-		struct net_device *bond_master = ACCESS_ONCE(orig_dev->master);
-
-		if (likely(bond_master)) {
-			if (__skb_bond_should_drop(skb, bond_master)) {
-				skb->deliver_no_wcard = 1;
-				/* deliver only exact match */
-				null_or_orig = orig_dev;
-			} else
-				skb->dev = bond_master;
-		}
-	}
+	if (!skb->input_dev)
+		skb->input_dev = skb->dev;
 
 	__this_cpu_inc(softnet_data.processed);
 	skb_reset_network_header(skb);
@@ -3205,26 +3155,24 @@  static int __netif_receive_skb(struct sk_buff *skb)
 #endif
 
 	list_for_each_entry_rcu(ptype, &ptype_all, list) {
-		if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
-		    ptype->dev == orig_dev) {
+		if (!ptype->dev || ptype->dev == skb->dev) {
 			if (pt_prev)
-				ret = deliver_skb(skb, pt_prev, orig_dev);
+				ret = deliver_skb(skb, pt_prev);
 			pt_prev = ptype;
 		}
 	}
 
 #ifdef CONFIG_NET_CLS_ACT
-	skb = handle_ing(skb, &pt_prev, &ret, orig_dev);
+	skb = handle_ing(skb, &pt_prev, &ret);
 	if (!skb)
 		goto out;
 ncls:
 #endif
 
-	/* Handle special case of bridge or macvlan */
 	rx_handler = rcu_dereference(skb->dev->rx_handler);
 	if (rx_handler) {
 		if (pt_prev) {
-			ret = deliver_skb(skb, pt_prev, orig_dev);
+			ret = deliver_skb(skb, pt_prev);
 			pt_prev = NULL;
 		}
 		skb = rx_handler(skb);
@@ -3234,7 +3182,7 @@  ncls:
 
 	if (vlan_tx_tag_present(skb)) {
 		if (pt_prev) {
-			ret = deliver_skb(skb, pt_prev, orig_dev);
+			ret = deliver_skb(skb, pt_prev);
 			pt_prev = NULL;
 		}
 		if (vlan_hwaccel_do_receive(&skb)) {
@@ -3244,32 +3192,24 @@  ncls:
 			goto out;
 	}
 
-	/*
-	 * Make sure frames received on VLAN interfaces stacked on
-	 * bonding interfaces still make their way to any base bonding
-	 * device that may have registered for a specific ptype.  The
-	 * handler may have to adjust skb->dev and orig_dev.
-	 */
-	orig_or_bond = orig_dev;
-	if ((skb->dev->priv_flags & IFF_802_1Q_VLAN) &&
-	    (vlan_dev_real_dev(skb->dev)->priv_flags & IFF_BONDING)) {
-		orig_or_bond = vlan_dev_real_dev(skb->dev);
-	}
+	vlan_on_bond_hook(skb);
+
+	/* deliver only exact match when indicated */
+	null_or_dev = skb->deliver_no_wcard ? skb->dev : NULL;
 
 	type = skb->protocol;
 	list_for_each_entry_rcu(ptype,
 			&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
-		if (ptype->type == type && (ptype->dev == null_or_orig ||
-		     ptype->dev == skb->dev || ptype->dev == orig_dev ||
-		     ptype->dev == orig_or_bond)) {
+		if (ptype->type == type &&
+		    (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
 			if (pt_prev)
-				ret = deliver_skb(skb, pt_prev, orig_dev);
+				ret = deliver_skb(skb, pt_prev);
 			pt_prev = ptype;
 		}
 	}
 
 	if (pt_prev) {
-		ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+		ret = __deliver_skb(skb, pt_prev);
 	} else {
 		atomic_long_inc(&skb->dev->rx_dropped);
 		kfree_skb(skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 14cf560..e19eabe 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -508,6 +508,7 @@  static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 {
 	new->tstamp		= old->tstamp;
 	new->dev		= old->dev;
+	new->input_dev		= old->input_dev;
 	new->transport_header	= old->transport_header;
 	new->network_header	= old->network_header;
 	new->mac_header		= old->mac_header;