diff mbox

[net] net: vxlan: fix crash when interface is created with no group

Message ID 20140320.160229.857536522237793124.davem@davemloft.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

David Miller March 20, 2014, 8:02 p.m. UTC
From: Mike Rapoport <mike.rapoport@ravellosystems.com>
Date: Mon, 17 Mar 2014 13:17:30 +0200

> If the vxlan interface is created without group definition, there is a
> panic on the first packet reception:
 ...
> The crash occurs because vxlan_rcv decides on protocol version of outer
> packed using vxlan->default_dst.remote_ip.sa.sa_family field which is
> not initialized if no multicast group was specified at interface
> creation time. This causes vxlan driver to always assume that outer
> packet is IPv6.
> 
> Using IP protocol version from skb instead of default destination
> address family fixes the problem.
> 
> Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>

Thinking some more, I'd like to propose an alternate version of this fix.

Any objections to this?  I think it maintains the pre-ipv6-support
behavior.  I know there may be some concerns about supporting multiple
families on the same socket, but I'm not so sure the code is able to
support that right now anyways.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Stevens March 20, 2014, 8:47 p.m. UTC | #1
>From: David Miller 

>Any objections to this? I think it maintains the pre-ipv6-support
>behavior. I know there may be some concerns about supporting
>multiple
>families on the same socket, but I'm not so sure the code is able to
>support that right now anyways.

I'm ok with the idea of determining the AF from the socket -- mixed
support, if useful, can be added later. But the patch needs to then
check and drop packets encapsulated with the wrong address family.

And it still shouldn't assume !v4 means v6.

[apologies for spacing; doing this from a web browser...]
So, I think we need something like:

     if (vs->family == AF_INET && skb->protocol == ntohs(ETH_P_IP)) {
        ....
     } else if (vs->family == AF_INET6 && skb->protocol == ntohs(ETH_P_IPV6)) {
        ...
     } else
           goto drop


                                                          +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Rapoport March 21, 2014, 5:06 a.m. UTC | #2
On Thu, Mar 20, 2014 at 10:02 PM, David Miller <davem@davemloft.net> wrote:
> From: Mike Rapoport <mike.rapoport@ravellosystems.com>
> Date: Mon, 17 Mar 2014 13:17:30 +0200
>
>> If the vxlan interface is created without group definition, there is a
>> panic on the first packet reception:
>  ...
>> The crash occurs because vxlan_rcv decides on protocol version of outer
>> packed using vxlan->default_dst.remote_ip.sa.sa_family field which is
>> not initialized if no multicast group was specified at interface
>> creation time. This causes vxlan driver to always assume that outer
>> packet is IPv6.
>>
>> Using IP protocol version from skb instead of default destination
>> address family fixes the problem.
>>
>> Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
>
> Thinking some more, I'd like to propose an alternate version of this fix.
>
> Any objections to this?  I think it maintains the pre-ipv6-support
> behavior.  I know there may be some concerns about supporting multiple
> families on the same socket, but I'm not so sure the code is able to
> support that right now anyways.

No objections from my side either.

> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index a7eb3f2..3a23623 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1206,7 +1206,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>                 goto drop;
>
>         /* Re-examine inner Ethernet packet */
> -       if (remote_ip->sa.sa_family == AF_INET) {
> +       if (vs->family == AF_INET) {
>                 oip = ip_hdr(skb);
>                 saddr.sin.sin_addr.s_addr = oip->saddr;
>                 saddr.sa.sa_family = AF_INET;
> @@ -2409,10 +2409,13 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
>
>         INIT_WORK(&vs->del_work, vxlan_del_work);
>
> -       if (ipv6)
> +       if (ipv6) {
> +               vs->family = AF_INET6;
>                 sock = create_v6_sock(net, port);
> -       else
> +       } else {
> +               vs->family = AF_INET;
>                 sock = create_v4_sock(net, port);
> +       }
>         if (IS_ERR(sock)) {
>                 kfree(vs);
>                 return ERR_CAST(sock);
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index 5deef1a..6f00731 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -16,6 +16,7 @@ struct vxlan_sock {
>         struct hlist_node hlist;
>         vxlan_rcv_t      *rcv;
>         void             *data;
> +       __u16             family;
>         struct work_struct del_work;
>         struct socket    *sock;
>         struct rcu_head   rcu;
Mike Rapoport March 21, 2014, 10:22 a.m. UTC | #3
On Thu, Mar 20, 2014 at 10:47 PM, David Stevens <dlstevens@us.ibm.com> wrote:
>
>>From: David Miller
>
>>Any objections to this? I think it maintains the pre-ipv6-support
>>behavior. I know there may be some concerns about supporting
>>multiple
>>families on the same socket, but I'm not so sure the code is able to
>>support that right now anyways.
>
> I'm ok with the idea of determining the AF from the socket -- mixed
> support, if useful, can be added later. But the patch needs to then
> check and drop packets encapsulated with the wrong address family.
>
> And it still shouldn't assume !v4 means v6.
>
> [apologies for spacing; doing this from a web browser...]
> So, I think we need something like:
>
>      if (vs->family == AF_INET && skb->protocol == ntohs(ETH_P_IP)) {
>         ....
>      } else if (vs->family == AF_INET6 && skb->protocol == ntohs(ETH_P_IPV6)) {
>         ...
>      } else
>            goto drop
>

Checking skb->protocol will drop ARP requests. What about using
ip_hdr(skb)->version?

>                                                           +-DLS
>
David Stevens March 21, 2014, 11:22 a.m. UTC | #4
-----Mike Rapoport <mike.rapoport@ravellosystems.com> wrote: -----


>Checking skb->protocol will drop ARP requests. What about using
>ip_hdr(skb)->version?

Mike, ip_hdr() here is the outer packet, so it's got to be a UDP packet--
we just don't know if it's UDP/IP or UDP/IPv6 when it is bound to INADDR_ANY,
since both can be delivered. It could use version in this case, because
both possible protocols have version in the same place, but I think it's
more correct to use the MAC layer protocol rather than relying on the
fact that IPv4 and IPv6 have "version" in the same spot. "It could be ARP"
would be the argument for NOT using the version in places where it really
could be ARP, even though that isn't the case here.

vxlan_rcv() is only called for VXLAN encapsulated packets sent to the bound
UDP port.

So, if "vs->family" holds the one we want to support, we can't just blindly
assume the received packet is IPv4, for example, and start accessing
IPv4 fields, because it could be an IPv6 packet. We have to check the
packet type too. And if it's not the one we bound to, drop it.

That's what the code snippet I outlined is trying to do.

                                         +-DLS



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Rapoport March 21, 2014, 3:31 p.m. UTC | #5
On Fri, Mar 21, 2014 at 1:22 PM, David Stevens <dlstevens@us.ibm.com> wrote:
>
> -----Mike Rapoport <mike.rapoport@ravellosystems.com> wrote: -----
>
> Mike, ip_hdr() here is the outer packet, so it's got to be a UDP packet--
> we just don't know if it's UDP/IP or UDP/IPv6 when it is bound to INADDR_ANY,
> since both can be delivered. It could use version in this case, because
> both possible protocols have version in the same place, but I think it's
> more correct to use the MAC layer protocol rather than relying on the
> fact that IPv4 and IPv6 have "version" in the same spot. "It could be ARP"
> would be the argument for NOT using the version in places where it really
> could be ARP, even though that isn't the case here.
>
> vxlan_rcv() is only called for VXLAN encapsulated packets sent to the bound
> UDP port.
>
> So, if "vs->family" holds the one we want to support, we can't just blindly
> assume the received packet is IPv4, for example, and start accessing
> IPv4 fields, because it could be an IPv6 packet. We have to check the
> packet type too. And if it's not the one we bound to, drop it.
>
> That's what the code snippet I outlined is trying to do.
>

David,

I've tried your snippet with IPv4 and I've got all ARP replies
dropped. And if I enable IPv6 I still get crushes in ipv6_rcv.
It seems to me that at the time vxlan_rcv gets outer IP header, the
SKB contains mixed information of outer and inner packets.
I'll continue to look into it.


                                          +-DLS
>
>
>
Mike Rapoport March 23, 2014, 9:27 a.m. UTC | #6
On Fri, Mar 21, 2014 at 05:22:06AM -0600, David Stevens wrote:
> -----Mike Rapoport <mike.rapoport@ravellosystems.com> wrote: -----
>
> >Checking skb->protocol will drop ARP requests. What about using
> >ip_hdr(skb)->version?
>
> Mike, ip_hdr() here is the outer packet, so it's got to be a UDP packet--
> we just don't know if it's UDP/IP or UDP/IPv6 when it is bound to INADDR_ANY,
> since both can be delivered. It could use version in this case, because
> both possible protocols have version in the same place, but I think it's
> more correct to use the MAC layer protocol rather than relying on the
> fact that IPv4 and IPv6 have "version" in the same spot. "It could be ARP"
> would be the argument for NOT using the version in places where it really
> could be ARP, even though that isn't the case here.
>
> vxlan_rcv() is only called for VXLAN encapsulated packets sent to the bound
> UDP port.
>
> So, if "vs->family" holds the one we want to support, we can't just blindly
> assume the received packet is IPv4, for example, and start accessing
> IPv4 fields, because it could be an IPv6 packet. We have to check the
> packet type too. And if it's not the one we bound to, drop it.
>
> That's what the code snippet I outlined is trying to do.
>
>                                          +-DLS
>

I beleive I've groked what's going on in vxlan_udp_encap_recv and
vxlan_rcv. There are actually two unrelated problems:

1) When the vxlan is configured with IPv4 group it crashes when it
starts to receive IPv6 IGMP packets encapsulated into IPv4 vxlan
packets. This happens because when ipv6_rcv handles the inner packet,
the skb->dst still refernces outer IPv4 info. The very old vxlan code
had skb_dst_drop call in vxlan_udp_encap_recv, which was removed when
vxlan was refactored to use iptunnel_pull_header (commit
7ce04758279514ca1d8ebfe322508a4a430fe2c8: "vxlan: Restructure vxlan
receive"). The iptunnel_pull_header called skb_dst_drop until recent
commit 10ddceb22bab11dab10ba645c7df2e4a8e7a5db5 ("ip_tunnel:multicast
process cause panic due to skb->_skb_refdst NULL pointer").
The simplest fix, I think, would be to restore call to skb_dst_drop in
vxlan_udp_encap_recv.

2) When the vxlan is using custom configuration and the vxlan interface
is created without group definition, the vxlan_rcv always takes IPv6
path because the decision is based on default_dst.sa.sa_family which is
AF_UNSPEC in this case. The code snippet proposed by David S. is not
working because by the time vxlan_rcv checks the outer protocol the
skb->protocol is already set to that of the inner packet in
iptunnel_pull_header. So, to have proper check for packet type in
vxlan_rcv we can either check outer IP header version or pass outer
skb->protocol to vxlan_rcv.
I personally favor checking ip_hdr(skb)->version despite David S.
objection above. The version field is not coincidentally at the same
spot for both v4 and v6, and check for version keeps code simpler and
cleaner, IMHO.

Waiting for your comments,

Mike.

--
[1] http://www.spinics.net/lists/netdev/msg276490.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pravin B Shelar March 24, 2014, 5:09 a.m. UTC | #7
On Sun, Mar 23, 2014 at 2:27 AM, Mike Rapoport
<mike.rapoport@ravellosystems.com> wrote:
> On Fri, Mar 21, 2014 at 05:22:06AM -0600, David Stevens wrote:
>> -----Mike Rapoport <mike.rapoport@ravellosystems.com> wrote: -----
>>
>> >Checking skb->protocol will drop ARP requests. What about using
>> >ip_hdr(skb)->version?
>>
>> Mike, ip_hdr() here is the outer packet, so it's got to be a UDP packet--
>> we just don't know if it's UDP/IP or UDP/IPv6 when it is bound to INADDR_ANY,
>> since both can be delivered. It could use version in this case, because
>> both possible protocols have version in the same place, but I think it's
>> more correct to use the MAC layer protocol rather than relying on the
>> fact that IPv4 and IPv6 have "version" in the same spot. "It could be ARP"
>> would be the argument for NOT using the version in places where it really
>> could be ARP, even though that isn't the case here.
>>
>> vxlan_rcv() is only called for VXLAN encapsulated packets sent to the bound
>> UDP port.
>>
>> So, if "vs->family" holds the one we want to support, we can't just blindly
>> assume the received packet is IPv4, for example, and start accessing
>> IPv4 fields, because it could be an IPv6 packet. We have to check the
>> packet type too. And if it's not the one we bound to, drop it.
>>
>> That's what the code snippet I outlined is trying to do.
>>
>>                                          +-DLS
>>
>
> I beleive I've groked what's going on in vxlan_udp_encap_recv and
> vxlan_rcv. There are actually two unrelated problems:
>
> 1) When the vxlan is configured with IPv4 group it crashes when it
> starts to receive IPv6 IGMP packets encapsulated into IPv4 vxlan
> packets. This happens because when ipv6_rcv handles the inner packet,
> the skb->dst still refernces outer IPv4 info. The very old vxlan code
> had skb_dst_drop call in vxlan_udp_encap_recv, which was removed when
> vxlan was refactored to use iptunnel_pull_header (commit
> 7ce04758279514ca1d8ebfe322508a4a430fe2c8: "vxlan: Restructure vxlan
> receive"). The iptunnel_pull_header called skb_dst_drop until recent
> commit 10ddceb22bab11dab10ba645c7df2e4a8e7a5db5 ("ip_tunnel:multicast
> process cause panic due to skb->_skb_refdst NULL pointer").
> The simplest fix, I think, would be to restore call to skb_dst_drop in
> vxlan_udp_encap_recv.
>
iptunnel_pull() is used in multiple tunnel implementation, therefore
we need drop dst in that function.
I send out fix : http://marc.info/?l=linux-netdev&m=139563761515707
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index a7eb3f2..3a23623 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1206,7 +1206,7 @@  static void vxlan_rcv(struct vxlan_sock *vs,
 		goto drop;
 
 	/* Re-examine inner Ethernet packet */
-	if (remote_ip->sa.sa_family == AF_INET) {
+	if (vs->family == AF_INET) {
 		oip = ip_hdr(skb);
 		saddr.sin.sin_addr.s_addr = oip->saddr;
 		saddr.sa.sa_family = AF_INET;
@@ -2409,10 +2409,13 @@  static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 
 	INIT_WORK(&vs->del_work, vxlan_del_work);
 
-	if (ipv6)
+	if (ipv6) {
+		vs->family = AF_INET6;
 		sock = create_v6_sock(net, port);
-	else
+	} else {
+		vs->family = AF_INET;
 		sock = create_v4_sock(net, port);
+	}
 	if (IS_ERR(sock)) {
 		kfree(vs);
 		return ERR_CAST(sock);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 5deef1a..6f00731 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -16,6 +16,7 @@  struct vxlan_sock {
 	struct hlist_node hlist;
 	vxlan_rcv_t	 *rcv;
 	void		 *data;
+	__u16		  family;
 	struct work_struct del_work;
 	struct socket	 *sock;
 	struct rcu_head	  rcu;