diff mbox

net: take care of bonding in build_skb_flow_key (v2)

Message ID 1452070997-10395-1-git-send-email-wen.gang.wang@oracle.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Wengang Wang Jan. 6, 2016, 9:03 a.m. UTC
In a bonding setting, we determines fragment size according to MTU and
PMTU associated to the bonding master. If the slave finds the fragment
size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
passing _skb_ and _pmtu_, trying to update the path MTU.
Problem is that the target device that function ip_rt_update_pmtu actually
tries to update is the slave (skb->dev), not the master. Thus since no
PMTU change happens on master, the fragment size for later packets doesn't
change so all later fragments/packets are dropped too.

The fix is letting build_skb_flow_key() take care of the transition of
device index from bonding slave to the master. That makes the master become
the target device that ip_rt_update_pmtu tries to update PMTU to.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 net/ipv4/route.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Zhu Yanjun Jan. 6, 2016, 9:43 a.m. UTC | #1
On 01/06/2016 05:03 PM, Wengang Wang wrote:
> In a bonding setting, we determines fragment size according to MTU and
s/determines/determine

Thanks a lot.
Zhu Yanjun
> PMTU associated to the bonding master. If the slave finds the fragment
> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
> passing _skb_ and _pmtu_, trying to update the path MTU.
> Problem is that the target device that function ip_rt_update_pmtu actually
> tries to update is the slave (skb->dev), not the master. Thus since no
> PMTU change happens on master, the fragment size for later packets doesn't
> change so all later fragments/packets are dropped too.
>
> The fix is letting build_skb_flow_key() take care of the transition of
> device index from bonding slave to the master. That makes the master become
> the target device that ip_rt_update_pmtu tries to update PMTU to.
>
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>   net/ipv4/route.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 85f184e..fffc7e6 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -523,10 +523,18 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
>   			       const struct sock *sk)
>   {
>   	const struct iphdr *iph = ip_hdr(skb);
> -	int oif = skb->dev->ifindex;
>   	u8 tos = RT_TOS(iph->tos);
> +	struct net_device *master;
>   	u8 prot = iph->protocol;
>   	u32 mark = skb->mark;
> +	int oif;
> +
> +	if (skb->dev->flags & IFF_SLAVE) {
> +		master = netdev_master_upper_dev_get(skb->dev);
> +		oif = master->ifindex;
> +	} else {
> +		oif = skb->dev->ifindex;
> +	}
>   
>   	__build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
>   }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh Jan. 12, 2016, 8:22 p.m. UTC | #2
Wengang Wang <wen.gang.wang@oracle.com> wrote:

>In a bonding setting, we determines fragment size according to MTU and
>PMTU associated to the bonding master. If the slave finds the fragment
>size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>passing _skb_ and _pmtu_, trying to update the path MTU.
>Problem is that the target device that function ip_rt_update_pmtu actually
>tries to update is the slave (skb->dev), not the master. Thus since no
>PMTU change happens on master, the fragment size for later packets doesn't
>change so all later fragments/packets are dropped too.
>
>The fix is letting build_skb_flow_key() take care of the transition of
>device index from bonding slave to the master. That makes the master become
>the target device that ip_rt_update_pmtu tries to update PMTU to.

	Does the team driver have the equivalent issue?

>Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>---
> net/ipv4/route.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
>diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>index 85f184e..fffc7e6 100644
>--- a/net/ipv4/route.c
>+++ b/net/ipv4/route.c
>@@ -523,10 +523,18 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
> 			       const struct sock *sk)
> {
> 	const struct iphdr *iph = ip_hdr(skb);
>-	int oif = skb->dev->ifindex;
> 	u8 tos = RT_TOS(iph->tos);
>+	struct net_device *master;
> 	u8 prot = iph->protocol;
> 	u32 mark = skb->mark;
>+	int oif;
>+
>+	if (skb->dev->flags & IFF_SLAVE) {
>+		master = netdev_master_upper_dev_get(skb->dev);
>+		oif = master->ifindex;
>+	} else {
>+		oif = skb->dev->ifindex;
>+	}

	netdev_master_upper_dev_get() requires RTNL to be held; I don't
see that all callers to build_skb_flow_key will do so.

	I also believe the above would dereference a NULL pointer if an
eql device is configured, as it uses IFF_SLAVE but doesn't use the
upper/lower device infrastructure, thus, netdev_master_upper_dev_get()
would likely return NULL for eql.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Wengang Wang Jan. 20, 2016, 4:56 a.m. UTC | #3
在 2016年01月13日 04:22, Jay Vosburgh 写道:
> Wengang Wang <wen.gang.wang@oracle.com> wrote:
>
>> In a bonding setting, we determines fragment size according to MTU and
>> PMTU associated to the bonding master. If the slave finds the fragment
>> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>> passing _skb_ and _pmtu_, trying to update the path MTU.
>> Problem is that the target device that function ip_rt_update_pmtu actually
>> tries to update is the slave (skb->dev), not the master. Thus since no
>> PMTU change happens on master, the fragment size for later packets doesn't
>> change so all later fragments/packets are dropped too.
>>
>> The fix is letting build_skb_flow_key() take care of the transition of
>> device index from bonding slave to the master. That makes the master become
>> the target device that ip_rt_update_pmtu tries to update PMTU to.
> 	Does the team driver have the equivalent issue?
I didn't make a test for team. It can be separated fix for team in case 
it needs.

>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>> ---
>> net/ipv4/route.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index 85f184e..fffc7e6 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -523,10 +523,18 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
>> 			       const struct sock *sk)
>> {
>> 	const struct iphdr *iph = ip_hdr(skb);
>> -	int oif = skb->dev->ifindex;
>> 	u8 tos = RT_TOS(iph->tos);
>> +	struct net_device *master;
>> 	u8 prot = iph->protocol;
>> 	u32 mark = skb->mark;
>> +	int oif;
>> +
>> +	if (skb->dev->flags & IFF_SLAVE) {
>> +		master = netdev_master_upper_dev_get(skb->dev);
>> +		oif = master->ifindex;
>> +	} else {
>> +		oif = skb->dev->ifindex;
>> +	}
> 	netdev_master_upper_dev_get() requires RTNL to be held; I don't
> see that all callers to build_skb_flow_key will do so.
Yep, it needs a rtnl_lock/rtnl_unlock pair.
> 	I also believe the above would dereference a NULL pointer if an
> eql device is configured, as it uses IFF_SLAVE but doesn't use the
> upper/lower device infrastructure, thus, netdev_master_upper_dev_get()
> would likely return NULL for eql.

I would like to think it's misuse for eql if what you said is true :)
Well, anyway I will send a v3 taking care of this too.

thanks,
wengang

>
> 	-J
>
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com
diff mbox

Patch

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 85f184e..fffc7e6 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -523,10 +523,18 @@  static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
 			       const struct sock *sk)
 {
 	const struct iphdr *iph = ip_hdr(skb);
-	int oif = skb->dev->ifindex;
 	u8 tos = RT_TOS(iph->tos);
+	struct net_device *master;
 	u8 prot = iph->protocol;
 	u32 mark = skb->mark;
+	int oif;
+
+	if (skb->dev->flags & IFF_SLAVE) {
+		master = netdev_master_upper_dev_get(skb->dev);
+		oif = master->ifindex;
+	} else {
+		oif = skb->dev->ifindex;
+	}
 
 	__build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
 }