diff mbox

[RFC,ipsec] xfrm: use the right dev to fill xdst

Message ID 1365088362-4318-1-git-send-email-nicolas.dichtel@6wind.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Nicolas Dichtel April 4, 2013, 3:12 p.m. UTC
Commit bc8e4b954e46 (xfrm6: ensure to use the same dev when building a bundle)
broke IPsec for IPv4 over IPv6 tunnels (because dev points to an IPv4 only
interface, hence in6_dev_get(dev) returns NULL.

After looking again into commit 25ee3286dcbc ([IPSEC]: Merge common code into
xfrm_bundle_create), it seems that previously we were using dev from the route,
for both IPv4 and IPv6.

In fact, xfrm_fill_dst() is called during a loop on chained dst, but dev points
always to the same device.

By analogy, I made the same change for IPv4 side (only IPv6 part is tested).

Reported-by: Daniel Baluta <dbaluta@ixiacom.com>
Tested-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

This patch is only a RFC, it needs more tests. Any comments/help is welcome to
understand if the patch do the right thing or if the bug if somewere else.

If the patch is correct, I can also remove the argument dev from
xfrm[4|6]_fill_dst, because it will not be used anymore.

FYI, the initial thread for commit bc8e4b954e46 can be found here:
http://kerneltrap.org/mailarchive/linux-netdev/2010/4/15/6274817

 net/ipv4/xfrm4_policy.c | 4 ++--
 net/ipv6/xfrm6_policy.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

Comments

Steffen Klassert April 5, 2013, 9:46 a.m. UTC | #1
On Thu, Apr 04, 2013 at 05:12:42PM +0200, Nicolas Dichtel wrote:
> Commit bc8e4b954e46 (xfrm6: ensure to use the same dev when building a bundle)
> broke IPsec for IPv4 over IPv6 tunnels (because dev points to an IPv4 only
> interface, hence in6_dev_get(dev) returns NULL.

Can you give some informations on how to reproduce this? I'm running
interfamily tunnels on our testing environment and it seems to
work fine.

> 
> After looking again into commit 25ee3286dcbc ([IPSEC]: Merge common code into
> xfrm_bundle_create), it seems that previously we were using dev from the route,
> for both IPv4 and IPv6.

I think this was the right way. We need to attach the dev from the
corresponding route to the xdst.

> 
> In fact, xfrm_fill_dst() is called during a loop on chained dst, but dev points
> always to the same device.

The way we do it now can be problematic for tunnel in tunnel scenarios too.
We assign the dev from the first tunnel route to all the bundle entries,
this looks really wrong.

I think your patch is correct, but I want understand the breaking 
scenario first.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Baluta April 5, 2013, 12:59 p.m. UTC | #2
On Fri, Apr 5, 2013 at 12:46 PM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Thu, Apr 04, 2013 at 05:12:42PM +0200, Nicolas Dichtel wrote:
>> Commit bc8e4b954e46 (xfrm6: ensure to use the same dev when building a bundle)
>> broke IPsec for IPv4 over IPv6 tunnels (because dev points to an IPv4 only
>> interface, hence in6_dev_get(dev) returns NULL.
>
> Can you give some informations on how to reproduce this? I'm running
> interfamily tunnels on our testing environment and it seems to
> work fine.

I can hit this in our setup while using some internal custom simulated
interfaces.

Anyhow, this should be reproducible with a classic IPv6 IPsec over
IPv4 test.  Please make sure
that the IPv4 interface doesn't have an IPv6 address set up.

Quoting from commit bc8e4b954e46 (xfrm6: ensure to use the same dev
when building a bundle):

-       xdst->u.rt6.rt6i_idev = in6_dev_get(rt->u.dst.dev);
+       xdst->u.rt6.rt6i_idev = in6_dev_get(dev);

dev points to IPv4 endpoint and if it doesn't have an IPv6 address
associated then
in6_dev_get(dev) will return NULL.

>
>>
>> After looking again into commit 25ee3286dcbc ([IPSEC]: Merge common code into
>> xfrm_bundle_create), it seems that previously we were using dev from the route,
>> for both IPv4 and IPv6.
>
> I think this was the right way. We need to attach the dev from the
> corresponding route to the xdst.
>
>>
>> In fact, xfrm_fill_dst() is called during a loop on chained dst, but dev points
>> always to the same device.
>
> The way we do it now can be problematic for tunnel in tunnel scenarios too.
> We assign the dev from the first tunnel route to all the bundle entries,
> this looks really wrong.
>
> I think your patch is correct, but I want understand the breaking
> scenario first.

thanks,
Daniel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert April 8, 2013, 11:42 a.m. UTC | #3
On Fri, Apr 05, 2013 at 03:59:59PM +0300, Daniel Baluta wrote:
> On Fri, Apr 5, 2013 at 12:46 PM, Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> > On Thu, Apr 04, 2013 at 05:12:42PM +0200, Nicolas Dichtel wrote:
> >> Commit bc8e4b954e46 (xfrm6: ensure to use the same dev when building a bundle)
> >> broke IPsec for IPv4 over IPv6 tunnels (because dev points to an IPv4 only
> >> interface, hence in6_dev_get(dev) returns NULL.
> >
> > Can you give some informations on how to reproduce this? I'm running
> > interfamily tunnels on our testing environment and it seems to
> > work fine.
> 
> I can hit this in our setup while using some internal custom simulated
> interfaces.
> 
> Anyhow, this should be reproducible with a classic IPv6 IPsec over
> IPv4 test.  Please make sure
> that the IPv4 interface doesn't have an IPv6 address set up.
> 
> Quoting from commit bc8e4b954e46 (xfrm6: ensure to use the same dev
> when building a bundle):
> 
> -       xdst->u.rt6.rt6i_idev = in6_dev_get(rt->u.dst.dev);
> +       xdst->u.rt6.rt6i_idev = in6_dev_get(dev);
> 
> dev points to IPv4 endpoint and if it doesn't have an IPv6 address
> associated then
> in6_dev_get(dev) will return NULL.

I have ipv6 compiled into the kernel. So when I set up a netdevice, a
struct inet6_dev is allocated and associated. Therefore I have always
a valid pointer, even if I disable ipv6 for that device. That's probaply
why I can't reproduce it. I'll change my configuration and try again.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert April 9, 2013, 12:47 p.m. UTC | #4
On Fri, Apr 05, 2013 at 03:59:59PM +0300, Daniel Baluta wrote:
> On Fri, Apr 5, 2013 at 12:46 PM, Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> > On Thu, Apr 04, 2013 at 05:12:42PM +0200, Nicolas Dichtel wrote:
> >> Commit bc8e4b954e46 (xfrm6: ensure to use the same dev when building a bundle)
> >> broke IPsec for IPv4 over IPv6 tunnels (because dev points to an IPv4 only
> >> interface, hence in6_dev_get(dev) returns NULL.
> >
> > Can you give some informations on how to reproduce this? I'm running
> > interfamily tunnels on our testing environment and it seems to
> > work fine.
> 
> I can hit this in our setup while using some internal custom simulated
> interfaces.
> 
> Anyhow, this should be reproducible with a classic IPv6 IPsec over
> IPv4 test.  Please make sure
> that the IPv4 interface doesn't have an IPv6 address set up.
> 
> Quoting from commit bc8e4b954e46 (xfrm6: ensure to use the same dev
> when building a bundle):
> 
> -       xdst->u.rt6.rt6i_idev = in6_dev_get(rt->u.dst.dev);
> +       xdst->u.rt6.rt6i_idev = in6_dev_get(dev);
> 
> dev points to IPv4 endpoint and if it doesn't have an IPv6 address
> associated then
> in6_dev_get(dev) will return NULL.

Hm, inet6_init() registers addrconf_notify() as a netdevice notifier
function. So addrconf_notify() is called whenever a netdevice is
registered. When looking at addrconf_notify(), there are only two
cases when the net_device has no inet6_dev assigned. This is either
on error, or if the device mtu is smaller than IPV6_MIN_MTU (i.e. 1280).

I can reproduce the behaviour you describe if I set the mtu of the
ipv4 device to a value below IPV6_MIN_MTU, but in no other case.

Is it possible that your ipv4 device has a mtu below IPV6_MIN_MTU?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 9, 2013, 5:21 p.m. UTC | #5
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Tue, 9 Apr 2013 14:47:35 +0200

> Hm, inet6_init() registers addrconf_notify() as a netdevice notifier
> function. So addrconf_notify() is called whenever a netdevice is
> registered. When looking at addrconf_notify(), there are only two
> cases when the net_device has no inet6_dev assigned. This is either
> on error, or if the device mtu is smaller than IPV6_MIN_MTU (i.e. 1280).
> 
> I can reproduce the behaviour you describe if I set the mtu of the
> ipv4 device to a value below IPV6_MIN_MTU, but in no other case.
> 
> Is it possible that your ipv4 device has a mtu below IPV6_MIN_MTU?

Like Steffen I am also curious how you are able to create a device
with no ipv6 device information attached, yet still have the ipv6
module loaded to the point where the ipv6 ipsec paths can execute.

If you're forcing this in an unnatural way or with localized changes,
I don't think we have anything to really fix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Baluta April 9, 2013, 5:31 p.m. UTC | #6
On Tue, Apr 9, 2013 at 8:21 PM, David Miller <davem@davemloft.net> wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Tue, 9 Apr 2013 14:47:35 +0200
>
>> Hm, inet6_init() registers addrconf_notify() as a netdevice notifier
>> function. So addrconf_notify() is called whenever a netdevice is
>> registered. When looking at addrconf_notify(), there are only two
>> cases when the net_device has no inet6_dev assigned. This is either
>> on error, or if the device mtu is smaller than IPV6_MIN_MTU (i.e. 1280).
>>
>> I can reproduce the behaviour you describe if I set the mtu of the
>> ipv4 device to a value below IPV6_MIN_MTU, but in no other case.
>>
>> Is it possible that your ipv4 device has a mtu below IPV6_MIN_MTU?
>
> Like Steffen I am also curious how you are able to create a device
> with no ipv6 device information attached, yet still have the ipv6
> module loaded to the point where the ipv6 ipsec paths can execute.
>
> If you're forcing this in an unnatural way or with localized changes,
> I don't think we have anything to really fix.

Hi Dave, Steffen,

As I mentioned earlier in this thread we are using some custom kernel
modules that create the interfaces.

It's likely that these interfaces, for memory saving purposes, to skip attaching
ipv6 device information.

Anyhow, i still think that there is something wrong with commit 25ee3286dcbc
([IPSEC]: Merge common code into xfrm_bundle_create).

The code for xfrm is not easy to understand, so for this reason i
pointed out the problem to Nicolas in
the first place.

Thanks a lot.

Daniel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 9, 2013, 5:33 p.m. UTC | #7
From: Daniel Baluta <daniel.baluta@gmail.com>
Date: Tue, 9 Apr 2013 20:31:30 +0300

> As I mentioned earlier in this thread we are using some custom
> kernel modules that create the interfaces.
> 
> It's likely that these interfaces, for memory saving purposes, to
> skip attaching ipv6 device information.

So this is not an upstream bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Baluta April 9, 2013, 6:18 p.m. UTC | #8
On Tue, Apr 9, 2013 at 8:33 PM, David Miller <davem@davemloft.net> wrote:
> From: Daniel Baluta <daniel.baluta@gmail.com>
> Date: Tue, 9 Apr 2013 20:31:30 +0300
>
>> As I mentioned earlier in this thread we are using some custom
>> kernel modules that create the interfaces.
>>
>> It's likely that these interfaces, for memory saving purposes, to
>> skip attaching ipv6 device information.
>
> So this is not an upstream bug.

Correct.

With conditions mentioned by Steffen, in upstream, each net_device
has an in6_device assigned so we won't hit the problem.

Daniel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert April 10, 2013, 11:29 a.m. UTC | #9
On Tue, Apr 09, 2013 at 09:18:33PM +0300, Daniel Baluta wrote:
> On Tue, Apr 9, 2013 at 8:33 PM, David Miller <davem@davemloft.net> wrote:
> > From: Daniel Baluta <daniel.baluta@gmail.com>
> > Date: Tue, 9 Apr 2013 20:31:30 +0300
> >
> >> As I mentioned earlier in this thread we are using some custom
> >> kernel modules that create the interfaces.
> >>
> >> It's likely that these interfaces, for memory saving purposes, to
> >> skip attaching ipv6 device information.
> >
> > So this is not an upstream bug.
> 
> Correct.
> 
> With conditions mentioned by Steffen, in upstream, each net_device
> has an in6_device assigned so we won't hit the problem.
> 

We have an in6_device assigned in almost all of the cases, but I doubt
it is always the right one.

Let's say we want to tunnel ipv6 over ipv4. We do an ipv6 route lookup
that returns a route with output device, say eth0. With that route,
we do an IPsec route lookup and we get an ipv4 tunnel route with output
device eth1.

When we create the xfrm_dst in xfrm_bundle_create(), we copy the
informations from the original ipv6 route, because we pass the new
allocated IPsec route back to the ipv6 layer. But the device is taken
from the ipv4 tunnel route (eth1 instead of eth0), so we pass a
dst_entry with a ipv4 device assigned to the ipv6 layer.

For that reason, xfrm6_fill_dst() complains about a NULL in6_device,
when the mtu of the ipv4 device (passed to xfrm6_fill_dst()) is
below IPV6_MIN_MTU.

I think there is to fix something, but this needs some more research
before we can do anything about that.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Baluta April 10, 2013, 11:39 a.m. UTC | #10
On 04/10/2013 02:29 PM, Steffen Klassert wrote:
> On Tue, Apr 09, 2013 at 09:18:33PM +0300, Daniel Baluta wrote:
>> On Tue, Apr 9, 2013 at 8:33 PM, David Miller<davem@davemloft.net>  wrote:
>>> From: Daniel Baluta<daniel.baluta@gmail.com>
>>> Date: Tue, 9 Apr 2013 20:31:30 +0300
>>>
>>>> As I mentioned earlier in this thread we are using some custom
>>>> kernel modules that create the interfaces.
>>>>
>>>> It's likely that these interfaces, for memory saving purposes, to
>>>> skip attaching ipv6 device information.
>>> So this is not an upstream bug.
>> Correct.
>>
>> With conditions mentioned by Steffen, in upstream, each net_device
>> has an in6_device assigned so we won't hit the problem.
>>
> We have an in6_device assigned in almost all of the cases, but I doubt
> it is always the right one.
>
> Let's say we want to tunnel ipv6 over ipv4. We do an ipv6 route lookup
> that returns a route with output device, say eth0. With that route,
> we do an IPsec route lookup and we get an ipv4 tunnel route with output
> device eth1.
>
> When we create the xfrm_dst in xfrm_bundle_create(), we copy the
> informations from the original ipv6 route, because we pass the new
> allocated IPsec route back to the ipv6 layer. But the device is taken
> from the ipv4 tunnel route (eth1 instead of eth0), so we pass a
> dst_entry with a ipv4 device assigned to the ipv6 layer.
>
> For that reason, xfrm6_fill_dst() complains about a NULL in6_device,
> when the mtu of the ipv4 device (passed to xfrm6_fill_dst()) is
> below IPV6_MIN_MTU.
>
> I think there is to fix something, but this needs some more research
> before we can do anything about that.
>
I will try  to further look into this. If you have any scripts
that can ease IPSec tunnels setup please do share.

thanks,
daniel.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 9a459be..3cffae9 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -81,8 +81,8 @@  static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 
 	xdst->u.rt.rt_iif = fl4->flowi4_iif;
 
-	xdst->u.dst.dev = dev;
-	dev_hold(dev);
+	xdst->u.dst.dev = rt->dst.dev;
+	dev_hold(rt->dst.dev);
 
 	/* Sheit... I remember I did this right. Apparently,
 	 * it was magically lost, so this code needs audit */
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 4ef7bdb..680b890 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -99,10 +99,10 @@  static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 {
 	struct rt6_info *rt = (struct rt6_info*)xdst->route;
 
-	xdst->u.dst.dev = dev;
-	dev_hold(dev);
+	xdst->u.dst.dev = rt->dst.dev;
+	dev_hold(rt->dst.dev);
 
-	xdst->u.rt6.rt6i_idev = in6_dev_get(dev);
+	xdst->u.rt6.rt6i_idev = in6_dev_get(rt->dst.dev);
 	if (!xdst->u.rt6.rt6i_idev)
 		return -ENODEV;