diff mbox

Multicast Fails Over Multipoint GRE Tunnel

Message ID 4D7F9592.5050408@iki.fi
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Timo Teras March 15, 2011, 4:36 p.m. UTC
On 03/15/2011 05:34 PM, Eric Dumazet wrote:
> Le lundi 14 mars 2011 à 16:34 -0700, Doug Kehn a écrit :
>> I'm running kernel version 2.6.36 on ARM XSCALE (big-endian) and multicast over a multipoint GRE tunnel isn't working.  For my architecture, this worked on 2.6.26.8.  For x86, multicast over a multipoint GRE tunnel worked with kernel version 2.6.31 but failed with version 2.6.35.  Multicast over a multipoint GRE tunnel fails because ipgre_header() fails the 'if (iph->daddr)' check and reutrns -t->hlen.  ipgre_header() is being called, from neigh_connected_output(), with a non-null daddr; the contents of daddr is zero.
>>
>> Reverting the ip_gre.c patch posted in http://marc.info/?l=linux-netdev&m=126762491525281&w=2 resolves the problem.  (Reviewing the HEAD of net-next-2.6 it appears that ipgre_header() remains unchanged from 2.6.36.)
>>
>> The configuration used to discover/diagnose the problem:
>>
>> ip tunnel add tun1 mode gre key 11223344 ttl 64 csum remote any
>> ip link set dev tun1 up
>> ip link set dev tun1 multicast on
>> ip addr flush dev tun1
>> ip addr add 10.40.92.114/24 broadcast 10.40.92.255 dev tun1
>>
>> 12: tun1: <MULTICAST,NOARP,UP,10000> mtu 1468 qdisc noqueue
>>     link/gre 0.0.0.0 brd 0.0.0.0
>>     inet 10.40.92.114/24 brd 10.40.92.255 scope global tun1
>>
>> Then attempt:
>> ping -I tun1 224.0.0.9
>>
>> Are additional configuration steps now required for multicast over multipoint GRE tunnel or is ipgre_header() in error?
> 
> Hi Doug
> 
> CC Timo Teras <timo.teras@iki.fi>
> 
> I would do a partial revert of Timo patch, but this means initial
> concern should be addressed ?
> 
> (Timo mentioned : 
> 	If the NOARP packets are not dropped, ipgre_tunnel_xmit() will
> 	take rt->rt_gateway (= NBMA IP) and use that for route
> 	look up (and may lead to bogus xfrm acquires).)
> 
> 
> Is the following works for you ?

I have memory that _header() is called with daddr being valid pointer,
but pointing to zero memory. So basically my situation would break with
this.

The above configuration would be fixable by setting broadcast to tun1
interface explicitly. But I'm not sure if the above configuration is
somehow different that it'd need to work.

Basically how things work on send path is:
 1. arp_constructor maps multicast address to NUD_NOARP
 2. arp_mc_map in turn copies dev->broadcast to haddr
    (so you get the ip packet pointing to zero ip)
    - i assumed normally gre tunnels would have the
      broadcast address set
 3. ipgre_tunnel_xmit checks for daddr==0 and uses the
    rt_gateway then, which would map to the multicast
    address

So I assume that the above ping command not working would end up sending
gre encapsulated packet where both the inner and outer IP address is the
same multicast address?

Looks like my patch also broke the default behaviour that if one has
such GRE tunnel, and you send unicast packets, it would default the link
destination to be same as the inner destination. Not sure if this would
be useful.

So basically my situation is undistinguishable from the above one. The
only difference is that the above problems only with multicast, and I'm
having with unicast.

I think the fundamental problem is that arp_mc_map maps the multicast
address to zeroes (due to device broadcast being zero).

Could we instead maybe do something like:

                        memcpy(haddr, dev->broadcast, dev->addr_len);


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Timo Teras March 15, 2011, 6:28 p.m. UTC | #1
On 03/15/2011 06:36 PM, Timo Teräs wrote:
> On 03/15/2011 05:34 PM, Eric Dumazet wrote:
>> (Timo mentioned : 
>> 	If the NOARP packets are not dropped, ipgre_tunnel_xmit() will
>> 	take rt->rt_gateway (= NBMA IP) and use that for route
>> 	look up (and may lead to bogus xfrm acquires).)
>>
>> Is the following works for you ?
> 
> I have memory that _header() is called with daddr being valid pointer,
> but pointing to zero memory. So basically my situation would break with
> this.

Ok, I've gone through now the code paths. And I believe I made
originally the assumption that ipgre_tunnel_xmit would should not ever
get tiph->daddr == 0 if we got ipgre_header() call.

However, what actually happens is for (NOARP interfaces) in arp.c:
 - unicast traffic gets NOARP entries mapped to dev->dev_addr (in gre
case it's the tunnel 'local' address)
 - multicast gets mapped to dev->broadcast

And if we create gre tunnel without local or remote address we end up
getting the NOARP entries with hwaddr 0.0.0.0.

Now, for unicast traffic it's mostly pointless. If the tunnel was
locally bound the packets would never leave: they'd get NOARP entry for
local address. And if it's locally unbound, the packets get rt_gateway,
which is pretty confusing routing wise (it apparently assumes your link
device has same ipv4 subnet as the gre device).

On multicast side it makes a bit more sense to map multicast groups. And
this happened implicitly.

IMHO, we should fix the arp code in ipv4 and ipv6 to do proper
ARPHRD_IPGRE mappings so that the _header() gets called with proper
data. I think the multicast-to-same-multicast group mapping makes sense.
But do not really know what to do with unicast packets sent to gre
interface with NOARP and no link broadcast IP address.

Actually this was my problem: the unicast packets for gre interface with
NOARP flag resulted in trying to send packets out. So I could probably
just fix this by creating my gre interface *with* the ARP flag in the
first place.

But is there any sensible thing to do with the unicast packets in above
case? I think those should be just dropped. Or does someone think that
it'd ever make sense to take the inner unicast address and use it as the
outer address too? If so, my patch should be just reverted.

My honest thought is to keep the ip_gre header check as it is currently
and fix arp code in ipv4 / neighbour code in ipv6 to do the proper NOARP
mappings as needed. We might be able to get rid of the huge protocol
dependent "tiph->daddr == 0" check in xmit path this way, and make sure
that the header is set properly.

This would also allow us to see proper NOARP entries when doing "ip
neigh show nud noarp". Now it will just show 0.0.0.0 entries with gre
devices without telling where to the packets are actually being sent to.

Any thoughts?

Cheers,
  Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Kehn March 15, 2011, 9:33 p.m. UTC | #2
Hi Timo,


--- On Tue, 3/15/11, Timo Teräs <timo.teras@iki.fi> wrote:

> From: Timo Teräs <timo.teras@iki.fi>
> Subject: Re: Multicast Fails Over Multipoint GRE Tunnel
> To: "Eric Dumazet" <eric.dumazet@gmail.com>
> Cc: "Doug Kehn" <rdkehn@yahoo.com>, netdev@vger.kernel.org
> Date: Tuesday, March 15, 2011, 12:36 PM
> On 03/15/2011 05:34 PM, Eric Dumazet
> wrote:
> > Le lundi 14 mars 2011 à 16:34 -0700, Doug Kehn a
> écrit :
> >> I'm running kernel version 2.6.36 on ARM XSCALE
> (big-endian) and multicast over a multipoint GRE tunnel
> isn't working.  For my architecture, this worked on
> 2.6.26.8.  For x86, multicast over a multipoint GRE
> tunnel worked with kernel version 2.6.31 but failed with
> version 2.6.35.  Multicast over a multipoint GRE tunnel
> fails because ipgre_header() fails the 'if (iph->daddr)'
> check and reutrns -t->hlen.  ipgre_header() is being
> called, from neigh_connected_output(), with a non-null
> daddr; the contents of daddr is zero.
> >>
> >> Reverting the ip_gre.c patch posted in http://marc.info/?l=linux-netdev&m=126762491525281&w=2
> resolves the problem.  (Reviewing the HEAD of
> net-next-2.6 it appears that ipgre_header() remains
> unchanged from 2.6.36.)
> >>
> >> The configuration used to discover/diagnose the
> problem:
> >>
> >> ip tunnel add tun1 mode gre key 11223344 ttl 64
> csum remote any
> >> ip link set dev tun1 up
> >> ip link set dev tun1 multicast on
> >> ip addr flush dev tun1
> >> ip addr add 10.40.92.114/24 broadcast 10.40.92.255
> dev tun1
> >>
> >> 12: tun1: <MULTICAST,NOARP,UP,10000> mtu
> 1468 qdisc noqueue
> >>     link/gre 0.0.0.0 brd
> 0.0.0.0
> >>     inet 10.40.92.114/24 brd
> 10.40.92.255 scope global tun1
> >>
> >> Then attempt:
> >> ping -I tun1 224.0.0.9
> >>
> >> Are additional configuration steps now required
> for multicast over multipoint GRE tunnel or is
> ipgre_header() in error?
> > 
> > Hi Doug
> > 
> > CC Timo Teras <timo.teras@iki.fi>
> > 
> > I would do a partial revert of Timo patch, but this
> means initial
> > concern should be addressed ?
> > 
> > (Timo mentioned : 
> >     If the NOARP packets are not
> dropped, ipgre_tunnel_xmit() will
> >     take rt->rt_gateway (= NBMA IP)
> and use that for route
> >     look up (and may lead to bogus xfrm
> acquires).)
> > 
> > 
> > Is the following works for you ?
> 
> I have memory that _header() is called with daddr being
> valid pointer,
> but pointing to zero memory. So basically my situation
> would break with
> this.
> 
> The above configuration would be fixable by setting
> broadcast to tun1
> interface explicitly. But I'm not sure if the above
> configuration is
> somehow different that it'd need to work.
> 
> Basically how things work on send path is:
>  1. arp_constructor maps multicast address to NUD_NOARP
>  2. arp_mc_map in turn copies dev->broadcast to haddr
>     (so you get the ip packet pointing to zero
> ip)
>     - i assumed normally gre tunnels would have
> the
>       broadcast address set
>  3. ipgre_tunnel_xmit checks for daddr==0 and uses the
>     rt_gateway then, which would map to the
> multicast
>     address
> 
> So I assume that the above ping command not working would
> end up sending
> gre encapsulated packet where both the inner and outer IP
> address is the
> same multicast address?
> 
> Looks like my patch also broke the default behaviour that
> if one has
> such GRE tunnel, and you send unicast packets, it would
> default the link
> destination to be same as the inner destination. Not sure
> if this would
> be useful.
> 
> So basically my situation is undistinguishable from the
> above one. The
> only difference is that the above problems only with
> multicast, and I'm
> having with unicast.
> 
> I think the fundamental problem is that arp_mc_map maps the
> multicast
> address to zeroes (due to device broadcast being zero).
> 
> Could we instead maybe do something like:
> 
> diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
> index 7927589..372448a 100644
> --- a/net/ipv4/arp.c
> +++ b/net/ipv4/arp.c
> @@ -215,6 +215,12 @@ int arp_mc_map(__be32 addr, u8 *haddr,
> struct
> net_device *dev, int dir)
>         case ARPHRD_INFINIBAND:
>                
> ip_ib_mc_map(addr, dev->broadcast, haddr);
>                
> return 0;
> +       case ARPHRD_IPGRE:
> +           
>    if (dev->broadcast)
> +               
>        memcpy(haddr,
> dev->broadcast, dev->addr_len);
> +           
>    else
> +               
>        memcpy(haddr, &addr,
> sizeof(addr));
> +           
>    return 0;
>         default:
>                 if
> (dir) {
>                
>         memcpy(haddr, dev->broadcast,
> dev->addr_len);
> 


I wasn't sure if the above patch was in addition too or in lieu of the partial revert of ipgre_header() suggested by Eric; both cases were attempted.

The multicast scenario described does not work if only the arp_mc_map() patch is applied.

The multicast scenario described does work if the partial revert of ipgre_header(), suggested by Eric, and the arp_mc_map patch are applied.

Regards,
...doug



      
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Kehn March 15, 2011, 9:35 p.m. UTC | #3
Hi Timo,


--- On Tue, 3/15/11, Timo Teräs <timo.teras@iki.fi> wrote:

> From: Timo Teräs <timo.teras@iki.fi>
> Subject: Re: Multicast Fails Over Multipoint GRE Tunnel
> To: "Eric Dumazet" <eric.dumazet@gmail.com>
> Cc: "Doug Kehn" <rdkehn@yahoo.com>, netdev@vger.kernel.org
> Date: Tuesday, March 15, 2011, 12:36 PM
> On 03/15/2011 05:34 PM, Eric Dumazet
> wrote:
> > Le lundi 14 mars 2011 à 16:34 -0700, Doug Kehn a
> écrit :
> >> I'm running kernel version 2.6.36 on ARM XSCALE
> (big-endian) and multicast over a multipoint GRE tunnel
> isn't working.  For my architecture, this worked on
> 2.6.26.8.  For x86, multicast over a multipoint GRE
> tunnel worked with kernel version 2.6.31 but failed with
> version 2.6.35.  Multicast over a multipoint GRE tunnel
> fails because ipgre_header() fails the 'if (iph->daddr)'
> check and reutrns -t->hlen.  ipgre_header() is being
> called, from neigh_connected_output(), with a non-null
> daddr; the contents of daddr is zero.
> >>
> >> Reverting the ip_gre.c patch posted in http://marc.info/?l=linux-netdev&m=126762491525281&w=2
> resolves the problem.  (Reviewing the HEAD of
> net-next-2.6 it appears that ipgre_header() remains
> unchanged from 2.6.36.)
> >>
> >> The configuration used to discover/diagnose the
> problem:
> >>
> >> ip tunnel add tun1 mode gre key 11223344 ttl 64
> csum remote any
> >> ip link set dev tun1 up
> >> ip link set dev tun1 multicast on
> >> ip addr flush dev tun1
> >> ip addr add 10.40.92.114/24 broadcast 10.40.92.255
> dev tun1
> >>
> >> 12: tun1: <MULTICAST,NOARP,UP,10000> mtu
> 1468 qdisc noqueue
> >>     link/gre 0.0.0.0 brd
> 0.0.0.0
> >>     inet 10.40.92.114/24 brd
> 10.40.92.255 scope global tun1
> >>
> >> Then attempt:
> >> ping -I tun1 224.0.0.9
> >>
> >> Are additional configuration steps now required
> for multicast over multipoint GRE tunnel or is
> ipgre_header() in error?
> > 
> > Hi Doug
> > 
> > CC Timo Teras <timo.teras@iki.fi>
> > 
> > I would do a partial revert of Timo patch, but this
> means initial
> > concern should be addressed ?
> > 
> > (Timo mentioned : 
> >     If the NOARP packets are not
> dropped, ipgre_tunnel_xmit() will
> >     take rt->rt_gateway (= NBMA IP)
> and use that for route
> >     look up (and may lead to bogus xfrm
> acquires).)
> > 
> > 
> > Is the following works for you ?
> 
> I have memory that _header() is called with daddr being
> valid pointer,
> but pointing to zero memory. So basically my situation
> would break with
> this.
> 
> The above configuration would be fixable by setting
> broadcast to tun1
> interface explicitly. But I'm not sure if the above
> configuration is
> somehow different that it'd need to work.
> 
> Basically how things work on send path is:
>  1. arp_constructor maps multicast address to NUD_NOARP
>  2. arp_mc_map in turn copies dev->broadcast to haddr
>     (so you get the ip packet pointing to zero
> ip)
>     - i assumed normally gre tunnels would have
> the
>       broadcast address set
>  3. ipgre_tunnel_xmit checks for daddr==0 and uses the
>     rt_gateway then, which would map to the
> multicast
>     address
> 
> So I assume that the above ping command not working would
> end up sending
> gre encapsulated packet where both the inner and outer IP
> address is the
> same multicast address?
> 
> Looks like my patch also broke the default behaviour that
> if one has
> such GRE tunnel, and you send unicast packets, it would
> default the link
> destination to be same as the inner destination. Not sure
> if this would
> be useful.
> 
> So basically my situation is undistinguishable from the
> above one. The
> only difference is that the above problems only with
> multicast, and I'm
> having with unicast.
> 
> I think the fundamental problem is that arp_mc_map maps the
> multicast
> address to zeroes (due to device broadcast being zero).
> 
> Could we instead maybe do something like:
> 
> diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
> index 7927589..372448a 100644
> --- a/net/ipv4/arp.c
> +++ b/net/ipv4/arp.c
> @@ -215,6 +215,12 @@ int arp_mc_map(__be32 addr, u8 *haddr,
> struct
> net_device *dev, int dir)
>         case ARPHRD_INFINIBAND:
>                
> ip_ib_mc_map(addr, dev->broadcast, haddr);
>                
> return 0;
> +       case ARPHRD_IPGRE:
> +           
>    if (dev->broadcast)
> +               
>        memcpy(haddr,
> dev->broadcast, dev->addr_len);
> +           
>    else
> +               
>        memcpy(haddr, &addr,
> sizeof(addr));
> +           
>    return 0;
>         default:
>                 if
> (dir) {
>                
>         memcpy(haddr, dev->broadcast,
> dev->addr_len);
> 


I wasn't sure if the above patch was in addition too or in lieu of the partial revert of ipgre_header() suggested by Eric; both cases were attempted.

The multicast scenario described does not work if only the arp_mc_map() patch is applied.

The multicast scenario described does work if the partial revert of ipgre_header(), suggested by Eric, and the arp_mc_map patch are applied.

Regards,
...doug



      
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 7927589..372448a 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -215,6 +215,12 @@  int arp_mc_map(__be32 addr, u8 *haddr, struct
net_device *dev, int dir)
        case ARPHRD_INFINIBAND:
                ip_ib_mc_map(addr, dev->broadcast, haddr);
                return 0;
+       case ARPHRD_IPGRE:
+               if (dev->broadcast)
+                       memcpy(haddr, dev->broadcast, dev->addr_len);
+               else
+                       memcpy(haddr, &addr, sizeof(addr));
+               return 0;
        default:
                if (dir) {