diff mbox

[net-next-2.6] net: reinject arps into bonding slave instead of master

Message ID 20110307224338.GU11864@gospo.rdu.redhat.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Andy Gospodarek March 7, 2011, 10:43 p.m. UTC
On Mon, Mar 07, 2011 at 01:51:00PM +0100, Jiri Pirko wrote:
> Recent patch "bonding: move processing of recv handlers into
> handle_frame()" caused a regression on following net scheme:
> 
> eth0 - bond0 - bond0.5
> 
> where arp monitoring is happening over vlan. This patch fixes it by
> reinjecting the arp packet into bonding slave device so the bonding
> rx_handler can pickup and process it.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> ---
>  net/core/dev.c |    8 ++++----
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index c71bd18..3d88458 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3094,12 +3094,12 @@ void netdev_rx_handler_unregister(struct net_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
>  
> -static void vlan_on_bond_hook(struct sk_buff *skb)
> +static void vlan_on_bond_hook(struct sk_buff *skb, struct net_device *orig_dev)
>  {
>  	/*
>  	 * Make sure ARP frames received on VLAN interfaces stacked on
>  	 * bonding interfaces still make their way to any base bonding
> -	 * device that may have registered for a specific ptype.
> +	 * device by reinjecting the frame into bonding slave (orig_dev)
>  	 */
>  	if (skb->dev->priv_flags & IFF_802_1Q_VLAN &&
>  	    vlan_dev_real_dev(skb->dev)->priv_flags & IFF_BONDING &&
> @@ -3108,7 +3108,7 @@ static void vlan_on_bond_hook(struct sk_buff *skb)
>  
>  		if (!skb2)
>  			return;
> -		skb2->dev = vlan_dev_real_dev(skb->dev);
> +		skb2->dev = orig_dev;
>  		netif_rx(skb2);
>  	}
>  }
> @@ -3202,7 +3202,7 @@ ncls:
>  			goto out;
>  	}
>  
> -	vlan_on_bond_hook(skb);
> +	vlan_on_bond_hook(skb, orig_dev);
>  
>  	/* deliver only exact match when indicated */
>  	null_or_dev = deliver_exact ? skb->dev : NULL;

This patch doesn't work.

My setup has bond0.100 -> bond0 -> eth2 and eth3.  ARP monitoring is
enabled as is arp_valiate.

The initial problem was just that just before vlan_on_bond_hook is
called, skb->dev = bond0.100 and orig_dev = eth2.   (This is after
running goto another_route and having been called back through
__netif_receive_skb since vlan_hwaccel_do_receive it true.)

Now vlan_on_bond_hook is called and we have 2 skbs.

The original skb still have skb->dev = bond0.100 and orig_dev = eth2.
Since bond_arp_rcv is registered for traffic only to bond0, the handler
is not hit and the frame is dropped (or processed by another handler).

The cloned skb has skb->dev = bond0 and is put back on the receive queue
and comes back through __netif_receive_skb.  This frame will match the
ptype entry for bond_arp_rcv, but since orig_dev = bond0 in this case,
the code in bond_arp_rcv will not handle the frame.  

If we truly want to track the original interface that received the
frame, the following is a better option.  With the recursive nature of
__netif_receive_skb at this point, we should really consider setting
orig_dev from skb_iif rather than just from skb->dev.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Nicolas de Pesloüan March 7, 2011, 11:09 p.m. UTC | #1
Le 07/03/2011 23:43, Andy Gospodarek a écrit :
> On Mon, Mar 07, 2011 at 01:51:00PM +0100, Jiri Pirko wrote:
>> Recent patch "bonding: move processing of recv handlers into
>> handle_frame()" caused a regression on following net scheme:
>>
>> eth0 - bond0 - bond0.5
>>
>> where arp monitoring is happening over vlan. This patch fixes it by
>> reinjecting the arp packet into bonding slave device so the bonding
>> rx_handler can pickup and process it.
>>
>> Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>> ---
>>   net/core/dev.c |    8 ++++----
>>   1 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index c71bd18..3d88458 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3094,12 +3094,12 @@ void netdev_rx_handler_unregister(struct net_device *dev)
>>   }
>>   EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
>>
>> -static void vlan_on_bond_hook(struct sk_buff *skb)
>> +static void vlan_on_bond_hook(struct sk_buff *skb, struct net_device *orig_dev)
>>   {
>>   	/*
>>   	 * Make sure ARP frames received on VLAN interfaces stacked on
>>   	 * bonding interfaces still make their way to any base bonding
>> -	 * device that may have registered for a specific ptype.
>> +	 * device by reinjecting the frame into bonding slave (orig_dev)
>>   	 */
>>   	if (skb->dev->priv_flags&  IFF_802_1Q_VLAN&&
>>   	vlan_dev_real_dev(skb->dev)->priv_flags&  IFF_BONDING&&
>> @@ -3108,7 +3108,7 @@ static void vlan_on_bond_hook(struct sk_buff *skb)
>>
>>   		if (!skb2)
>>   			return;
>> -		skb2->dev = vlan_dev_real_dev(skb->dev);
>> +		skb2->dev = orig_dev;
>>   		netif_rx(skb2);
>>   	}
>>   }
>> @@ -3202,7 +3202,7 @@ ncls:
>>   			goto out;
>>   	}
>>
>> -	vlan_on_bond_hook(skb);
>> +	vlan_on_bond_hook(skb, orig_dev);
>>
>>   	/* deliver only exact match when indicated */
>>   	null_or_dev = deliver_exact ? skb->dev : NULL;
>
> This patch doesn't work.
>
> My setup has bond0.100 ->  bond0 ->  eth2 and eth3.  ARP monitoring is
> enabled as is arp_valiate.
>
> The initial problem was just that just before vlan_on_bond_hook is
> called, skb->dev = bond0.100 and orig_dev = eth2.   (This is after
> running goto another_route and having been called back through
> __netif_receive_skb since vlan_hwaccel_do_receive it true.)
>
> Now vlan_on_bond_hook is called and we have 2 skbs.
>
> The original skb still have skb->dev = bond0.100 and orig_dev = eth2.
> Since bond_arp_rcv is registered for traffic only to bond0, the handler
> is not hit and the frame is dropped (or processed by another handler).

After Jiri's last patch series, bond_arp_rcv() is not registered anymore as a protocol handler on 
bond0, but directly called from inside bond_handle_frame(), through bond->recv_probe.

Because bond_handler_frame() is a rx_handler for the slave interfaces, bond_arp_rcv() is now called 
at the slave level and not a the master level anymore.

Hence this patch and the reason I thought it should work.

Did you tested this patch with Jiri's previous patches applied before?

> The cloned skb has skb->dev = bond0 and is put back on the receive queue
> and comes back through __netif_receive_skb.  This frame will match the
> ptype entry for bond_arp_rcv, but since orig_dev = bond0 in this case,
> the code in bond_arp_rcv will not handle the frame.

I definitely hate all those unnecessary reinjects from rx_handler. The another_round loop is 
designed to allow for stacking inside __netif_receive_skb().

Jiri apparently has another (better) solution in mind. I hope to see it, but Jiri arguably want some 
of the patchs in the queue to flow before adding more.

Does someone had a look at my proposal of a late_delivery property for packet_type, previously in 
this thread, to handle the situation where a given protocol handler registered on a particular 
device would like to receive the final skb instead of the one at the time it crossed that particular 
device?

>
> If we truly want to track the original interface that received the
> frame, the following is a better option.  With the recursive nature of
> __netif_receive_skb at this point, we should really consider setting
> orig_dev from skb_iif rather than just from skb->dev.

Or we can remove all this orig_dev stuff...

	Nicolas.

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 30440e7..500fdbc 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3135,7 +3135,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
>
>   	if (!skb->skb_iif)
>   		skb->skb_iif = skb->dev->ifindex;
> -	orig_dev = skb->dev;
>
>   	skb_reset_network_header(skb);
>   	skb_reset_transport_header(skb);
> @@ -3145,6 +3144,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>
>   	rcu_read_lock();
>
> +	orig_dev = dev_get_by_index_rcu(dev_net(skb->dev),skb->skb_iif);
>   another_round:
>
>   	__this_cpu_inc(softnet_data.processed);
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Gospodarek March 8, 2011, 2:43 a.m. UTC | #2
On Tue, Mar 08, 2011 at 12:09:48AM +0100, Nicolas de Pesloüan wrote:
>
> After Jiri's last patch series, bond_arp_rcv() is not registered anymore 
> as a protocol handler on bond0, but directly called from inside 
> bond_handle_frame(), through bond->recv_probe.
>
> Because bond_handler_frame() is a rx_handler for the slave interfaces, 
> bond_arp_rcv() is now called at the slave level and not a the master 
> level anymore.
>
> Hence this patch and the reason I thought it should work.
>
> Did you tested this patch with Jiri's previous patches applied before?
>

No, I have not tested them yet.  I looked at this problem first as
Jiri's patch did not fix a regression in code that is actually in
net-next right now.

I don't have a problem with the work you and Jiri are doing on this
latest patch series, but the frequency of patch revisions and now
regressions concerns me.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas de Pesloüan March 8, 2011, 9:34 p.m. UTC | #3
Le 08/03/2011 03:43, Andy Gospodarek a écrit :
> On Tue, Mar 08, 2011 at 12:09:48AM +0100, Nicolas de Pesloüan wrote:
>>
>> After Jiri's last patch series, bond_arp_rcv() is not registered anymore
>> as a protocol handler on bond0, but directly called from inside
>> bond_handle_frame(), through bond->recv_probe.
>>
>> Because bond_handler_frame() is a rx_handler for the slave interfaces,
>> bond_arp_rcv() is now called at the slave level and not a the master
>> level anymore.
>>
>> Hence this patch and the reason I thought it should work.
>>
>> Did you tested this patch with Jiri's previous patches applied before?
>>
>
> No, I have not tested them yet.  I looked at this problem first as
> Jiri's patch did not fix a regression in code that is actually in
> net-next right now.

For as far as I understand, the regression was introduced by Jiri's patches that are not yet 
accepted in net-next-2.6, so this path definitely need the full patch series to be applied before. 
And there is no special urgency to fix this regression, because it doesn't "exist" yet anywhere 
except in some sandboxes.

This patch change the reinjection to inject at the slave level instead of the master level, because 
in Jiri's patch series, ARP handling moved from a normal protocol handler registered at the master 
level to the rx_handler at the slave level.

Jiri, do I miss something ?

> I don't have a problem with the work you and Jiri are doing on this
> latest patch series, but the frequency of patch revisions and now
> regressions concerns me.

I agree with your concerns, and this is the reason why Jiri and I are working hard to try and fight 
every possible problems in the whole package of patch. I think that, at the end, Jiri and I will 
have to properly resubmit the whole thing, for the convenient of those who will have to decide to 
accept it or not.

	Nicolas.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 30440e7..500fdbc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3135,7 +3135,6 @@  static int __netif_receive_skb(struct sk_buff *skb)
 
 	if (!skb->skb_iif)
 		skb->skb_iif = skb->dev->ifindex;
-	orig_dev = skb->dev;
 
 	skb_reset_network_header(skb);
 	skb_reset_transport_header(skb);
@@ -3145,6 +3144,7 @@  static int __netif_receive_skb(struct sk_buff *skb)
 
 	rcu_read_lock();
 
+	orig_dev = dev_get_by_index_rcu(dev_net(skb->dev),skb->skb_iif);
 another_round:
 
 	__this_cpu_inc(softnet_data.processed);