Patchwork ipv4: fix ipsec forward performance regression

login
register
mail settings
Submitter Yan, Zheng
Date Oct. 23, 2011, 7:58 a.m.
Message ID <4EA3C91C.3090801@intel.com>
Download mbox | patch
Permalink /patch/121213/
State Accepted
Delegated to: David Miller
Headers show

Comments

Yan, Zheng - Oct. 23, 2011, 7:58 a.m.
There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
It makes xfrm4_fill_dst() modify wrong data structure.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
---
 net/ipv4/xfrm4_policy.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)
Eric Dumazet - Oct. 23, 2011, 9:03 a.m.
Le dimanche 23 octobre 2011 à 15:58 +0800, Yan, Zheng a écrit :
> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
> It makes xfrm4_fill_dst() modify wrong data structure.
> 
> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
> ---
>  net/ipv4/xfrm4_policy.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
> index fc5368a..a0b4c5d 100644
> --- a/net/ipv4/xfrm4_policy.c
> +++ b/net/ipv4/xfrm4_policy.c
> @@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
>  	struct rtable *rt = (struct rtable *)xdst->route;
>  	const struct flowi4 *fl4 = &fl->u.ip4;
>  
> -	rt->rt_key_dst = fl4->daddr;
> -	rt->rt_key_src = fl4->saddr;
> -	rt->rt_key_tos = fl4->flowi4_tos;
> -	rt->rt_route_iif = fl4->flowi4_iif;
> -	rt->rt_iif = fl4->flowi4_iif;
> -	rt->rt_oif = fl4->flowi4_oif;
> -	rt->rt_mark = fl4->flowi4_mark;
> +	xdst->u.rt.rt_key_dst = fl4->daddr;
> +	xdst->u.rt.rt_key_src = fl4->saddr;
> +	xdst->u.rt.rt_key_tos = fl4->flowi4_tos;
> +	xdst->u.rt.rt_route_iif = fl4->flowi4_iif;
> +	xdst->u.rt.rt_iif = fl4->flowi4_iif;
> +	xdst->u.rt.rt_oif = fl4->flowi4_oif;
> +	xdst->u.rt.rt_mark = fl4->flowi4_mark;
>  
>  	xdst->u.dst.dev = dev;
>  	dev_hold(dev);

Good catch, thanks !

Reported-by: Kim Phillips <kim.phillips@freescale.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov - Oct. 23, 2011, 2:52 p.m.
Hello,

On Sun, 23 Oct 2011, Yan, Zheng wrote:

> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
> It makes xfrm4_fill_dst() modify wrong data structure.
> 
> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
> ---
>  net/ipv4/xfrm4_policy.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
> index fc5368a..a0b4c5d 100644
> --- a/net/ipv4/xfrm4_policy.c
> +++ b/net/ipv4/xfrm4_policy.c
> @@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
>  	struct rtable *rt = (struct rtable *)xdst->route;
>  	const struct flowi4 *fl4 = &fl->u.ip4;
>  
> -	rt->rt_key_dst = fl4->daddr;
> -	rt->rt_key_src = fl4->saddr;
> -	rt->rt_key_tos = fl4->flowi4_tos;
> -	rt->rt_route_iif = fl4->flowi4_iif;
> -	rt->rt_iif = fl4->flowi4_iif;
> -	rt->rt_oif = fl4->flowi4_oif;
> -	rt->rt_mark = fl4->flowi4_mark;
> +	xdst->u.rt.rt_key_dst = fl4->daddr;
> +	xdst->u.rt.rt_key_src = fl4->saddr;
> +	xdst->u.rt.rt_key_tos = fl4->flowi4_tos;
> +	xdst->u.rt.rt_route_iif = fl4->flowi4_iif;

	May be I'm missing something but I don't see where
flowi4_iif is set for the forwarding case. __xfrm_route_forward
calls xfrm_decode_session which does not appear to set
flowi4_iif. When providing fl4 for output routes flowi4_iif
is always set to 0, so it represents rt_route_iif. But
then there are 2 variants for __ip_route_output_key:

- ip_route_output_slow sets flowi4_iif to loopback and
flowi4_oif to outdev during lookup but never restores them
to original values. It is assumed that caller uses outdev
from dst, not from flowi4_oif.

- for cached route we do not update flowi4_iif and flowi4_oif
in __ip_route_output_key, so the resulting fl4 can not be
used for these values. I assume, the current rules are that
only fl4.saddr and daddr are updated while flowi4_iif and
flowi4_oif are not. It looks wrong flowi code to rely on them.

	Currently, we have 3 values for devices:

rt_iif: indev for input routes, resulting outdev for output routes
which plays the role as indev for loopback traffic.

rt_oif: original outdev key, 0 for input routes, can be 0 for
output routes if socket is not bound to oif

rt_route_iif: indev for input routes, 0 for output routes

	With above rules for flowi4_iif and flowi4_oif
it is impossible to select value for rt_iif from fl4.

	I don't know the xfrm code well, may be after the
mentioned change we damaged rt_oif and rt_route_iif values
for cached dst which can lead to using slow path all the time.
Even if rt_intern_hash() avoids caching similar dsts multiple
times, if cached entry is damaged we will add more and
more new entries after every damage.

> +	xdst->u.rt.rt_iif = fl4->flowi4_iif;
> +	xdst->u.rt.rt_oif = fl4->flowi4_oif;
> +	xdst->u.rt.rt_mark = fl4->flowi4_mark;
>  
>  	xdst->u.dst.dev = dev;
>  	dev_hold(dev);

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yan, Zheng - Oct. 24, 2011, 12:41 a.m.
On 10/23/2011 10:52 PM, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Sun, 23 Oct 2011, Yan, Zheng wrote:
> 
>> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
>> It makes xfrm4_fill_dst() modify wrong data structure.
>>
>> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
>> ---
>>  net/ipv4/xfrm4_policy.c |   14 +++++++-------
>>  1 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
>> index fc5368a..a0b4c5d 100644
>> --- a/net/ipv4/xfrm4_policy.c
>> +++ b/net/ipv4/xfrm4_policy.c
>> @@ -79,13 +79,13 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
>>  	struct rtable *rt = (struct rtable *)xdst->route;
>>  	const struct flowi4 *fl4 = &fl->u.ip4;
>>  
>> -	rt->rt_key_dst = fl4->daddr;
>> -	rt->rt_key_src = fl4->saddr;
>> -	rt->rt_key_tos = fl4->flowi4_tos;
>> -	rt->rt_route_iif = fl4->flowi4_iif;
>> -	rt->rt_iif = fl4->flowi4_iif;
>> -	rt->rt_oif = fl4->flowi4_oif;
>> -	rt->rt_mark = fl4->flowi4_mark;
>> +	xdst->u.rt.rt_key_dst = fl4->daddr;
>> +	xdst->u.rt.rt_key_src = fl4->saddr;
>> +	xdst->u.rt.rt_key_tos = fl4->flowi4_tos;
>> +	xdst->u.rt.rt_route_iif = fl4->flowi4_iif;
> 
> 	May be I'm missing something but I don't see where
> flowi4_iif is set for the forwarding case. __xfrm_route_forward
> calls xfrm_decode_session which does not appear to set
> flowi4_iif. When providing fl4 for output routes flowi4_iif
> is always set to 0, so it represents rt_route_iif. But
> then there are 2 variants for __ip_route_output_key:
> 
> - ip_route_output_slow sets flowi4_iif to loopback and
> flowi4_oif to outdev during lookup but never restores them
> to original values. It is assumed that caller uses outdev
> from dst, not from flowi4_oif.
> 
> - for cached route we do not update flowi4_iif and flowi4_oif
> in __ip_route_output_key, so the resulting fl4 can not be
> used for these values. I assume, the current rules are that
> only fl4.saddr and daddr are updated while flowi4_iif and
> flowi4_oif are not. It looks wrong flowi code to rely on them.
> 
> 	Currently, we have 3 values for devices:
> 
> rt_iif: indev for input routes, resulting outdev for output routes
> which plays the role as indev for loopback traffic.
> 
> rt_oif: original outdev key, 0 for input routes, can be 0 for
> output routes if socket is not bound to oif
> 
> rt_route_iif: indev for input routes, 0 for output routes
> 
> 	With above rules for flowi4_iif and flowi4_oif
> it is impossible to select value for rt_iif from fl4.
> 
> 	I don't know the xfrm code well, may be after the

Neither do I. My understanding is that xfrm_dst(s) are managed by the
flow cache (net/core/flow.c). We don't put them into the routing cache.

Regards
Yan, Zheng 

> mentioned change we damaged rt_oif and rt_route_iif values
> for cached dst which can lead to using slow path all the time.
> Even if rt_intern_hash() avoids caching similar dsts multiple
> times, if cached entry is damaged we will add more and
> more new entries after every damage.
> 
>> +	xdst->u.rt.rt_iif = fl4->flowi4_iif;
>> +	xdst->u.rt.rt_oif = fl4->flowi4_oif;
>> +	xdst->u.rt.rt_mark = fl4->flowi4_mark;
>>  
>>  	xdst->u.dst.dev = dev;
>>  	dev_hold(dev);
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Oct. 24, 2011, 7:01 a.m.
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Date: Mon, 24 Oct 2011 08:41:53 +0800

> My understanding is that xfrm_dst(s) are managed by the flow cache
> (net/core/flow.c). We don't put them into the routing cache.

This is correct.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Oct. 24, 2011, 7:02 a.m.
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 23 Oct 2011 11:03:10 +0200

> Le dimanche 23 octobre 2011 à 15:58 +0800, Yan, Zheng a écrit :
>> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
>> It makes xfrm4_fill_dst() modify wrong data structure.
>> 
>> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
 ...
> Reported-by: Kim Phillips <kim.phillips@freescale.com>
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kim Phillips - Nov. 1, 2011, 11:50 p.m.
On Mon, 24 Oct 2011 03:02:03 -0400
David Miller <davem@davemloft.net> wrote:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 23 Oct 2011 11:03:10 +0200
> 
> > Le dimanche 23 octobre 2011 à 15:58 +0800, Yan, Zheng a écrit :
> >> There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
> >> It makes xfrm4_fill_dst() modify wrong data structure.
> >> 
> >> Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
>  ...
> > Reported-by: Kim Phillips <kim.phillips@freescale.com>
> > 
> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Applied, thanks everyone.

To: <stable@kernel.org>

-stable maintainers, please consider the following two upstream
commits for inclusion in upcoming v3.0.x [1] stable releases:

v3.0.8 plus this:

upstream commit b73233960a59ee66e09d642f13d0592b13651e94
(ipv4: fix ipsec forward performance regression)

increases IPSec fwding performance from 0.2kpps to ~3.5kpps.

Adding this:

upstream commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0
(net: Handle different key sizes between address families in flow
cache)

to that, brings it back up to 2.6.38 levels, i.e., ~44kpps.

note that for v2.6.39.4 (.39 is the first kernel version with the
40->0.2kpps regression), commit b732339 depends on a slew of
commits, presumably ending with commit 5e2b61f: ipv4: Remove flowi
from struct rtable.

However it appears commit aa1c366e alone will restore almost all
the performance (~42kpps) on that kernel version.

So to summarize, please cherry-pick:

v2.6.39.x: aa1c366: net: Handle different key sizes between address families in flow cache
v3.0.x: aa1c366: net: Handle different key sizes between address families in flow cache
v3.0.x: b732339: ipv4: fix ipsec forward performance regression
v3.1.x: b732339: ipv4: fix ipsec forward performance regression

[All figures are based on a p2020ds board configured to rx, encrypt
and forward 64-byte packets.]

Thanks,

Kim

[1] initial kernel in long-term stable series for the embedded
industry (http://lwn.net/Articles/464834/)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Nov. 2, 2011, 12:34 a.m.
From: Kim Phillips <kim.phillips@freescale.com>
Date: Tue, 1 Nov 2011 18:50:22 -0500

> -stable maintainers, please consider the following two upstream
> commits for inclusion in upcoming v3.0.x [1] stable releases:

I submit networking stable fixes as needed, so please make such
requests to me.

I haven't submitted this change yet because I want it to sit a bit
longer in Linus's tree to make sure there aren't any unwanted
side-effects.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kim Phillips - Nov. 3, 2011, 6:58 p.m.
On Tue, 1 Nov 2011 20:34:30 -0400
David Miller <davem@davemloft.net> wrote:

> From: Kim Phillips <kim.phillips@freescale.com>
> Date: Tue, 1 Nov 2011 18:50:22 -0500
> 
> > -stable maintainers, please consider the following two upstream
> > commits for inclusion in upcoming v3.0.x [1] stable releases:
> 
> I submit networking stable fixes as needed, so please make such
> requests to me.
> 
> I haven't submitted this change yet because I want it to sit a bit
> longer in Linus's tree to make sure there aren't any unwanted
> side-effects.

I see b732339 (ipv4: fix ipsec forward performance regression) has
been added to the 3.0- and 3.1-stable trees.

b732339 increases IPSec fwding performance from 0.2kpps to ~3.5kpps.

However, adding upstream commit aa1c366 (net: Handle different key
sizes between address families in flow cache) brings performance
back up to 2.6.38 levels, i.e., ~44kpps.

So can aa1c366 also be queued to the v2.6.39 and v3.0 stable
trees?

Thanks,

Kim

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Nov. 4, 2011, 2:43 a.m.
From: Kim Phillips <kim.phillips@freescale.com>
Date: Thu, 3 Nov 2011 13:58:51 -0500

> I see b732339 (ipv4: fix ipsec forward performance regression) has
> been added to the 3.0- and 3.1-stable trees.
> 
> b732339 increases IPSec fwding performance from 0.2kpps to ~3.5kpps.
> 
> However, adding upstream commit aa1c366 (net: Handle different key
> sizes between address families in flow cache) brings performance
> back up to 2.6.38 levels, i.e., ~44kpps.
> 
> So can aa1c366 also be queued to the v2.6.39 and v3.0 stable
> trees?

1) stable@kernel.org no longer exists and will just go into the bit
   bucket, it's now stable@vger.kernel.org

2) The patch in question doesn't apply cleanly to v3.0.8 -stable so
   if it's important to you you'll need to submit a backport.

3) 2.6.39 is no longer a maintained -stable kernel series.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index fc5368a..a0b4c5d 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -79,13 +79,13 @@  static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 	struct rtable *rt = (struct rtable *)xdst->route;
 	const struct flowi4 *fl4 = &fl->u.ip4;
 
-	rt->rt_key_dst = fl4->daddr;
-	rt->rt_key_src = fl4->saddr;
-	rt->rt_key_tos = fl4->flowi4_tos;
-	rt->rt_route_iif = fl4->flowi4_iif;
-	rt->rt_iif = fl4->flowi4_iif;
-	rt->rt_oif = fl4->flowi4_oif;
-	rt->rt_mark = fl4->flowi4_mark;
+	xdst->u.rt.rt_key_dst = fl4->daddr;
+	xdst->u.rt.rt_key_src = fl4->saddr;
+	xdst->u.rt.rt_key_tos = fl4->flowi4_tos;
+	xdst->u.rt.rt_route_iif = fl4->flowi4_iif;
+	xdst->u.rt.rt_iif = fl4->flowi4_iif;
+	xdst->u.rt.rt_oif = fl4->flowi4_oif;
+	xdst->u.rt.rt_mark = fl4->flowi4_mark;
 
 	xdst->u.dst.dev = dev;
 	dev_hold(dev);