Message ID | 20190815060904.19426-1-liuhangbin@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net] tunnel: fix dev null pointer dereference when send pkg larger than mtu in collect_md mode | expand |
On 8/15/19 8:09 AM, Hangbin Liu wrote: > When we send a packet larger than PMTU, we need to reply with > icmp_send(ICMP_FRAG_NEEDED) or icmpv6_send(ICMPV6_PKT_TOOBIG). > > But in collect_md mode, kernel will crash while accessing the dst dev > as __metadata_dst_init() init dst->dev to NULL by default. Here is what > the code path looks like, for GRE: > > - ip6gre_tunnel_xmit > - ip6gre_xmit_ipv4 > - __gre6_xmit > - ip6_tnl_xmit > - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE > - icmp_send > - net = dev_net(rt->dst.dev); <-- here > - ip6gre_xmit_ipv6 > - __gre6_xmit > - ip6_tnl_xmit > - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE > - icmpv6_send > ... > - decode_session4 > - oif = skb_dst(skb)->dev->ifindex; <-- here > - decode_session6 > - oif = skb_dst(skb)->dev->ifindex; <-- here > > Fix it by updating the dst dev if not set. > > The reproducer is easy: > > ovs-vsctl add-br br0 > ip link set br0 up > ovs-vsctl add-port br0 gre0 -- \ > set interface gre0 type=gre options:remote_ip=$dst_addr > ip link set gre0 up > ip addr add ${local_gre6}/64 dev br0 > ping6 $remote_gre6 -s 1500 > > Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit") > Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels") > Reviewed-by: Stefano Brivio <sbrivio@redhat.com> > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> > --- > net/ipv4/ip_tunnel.c | 3 +++ > net/ipv6/ip6_tunnel.c | 13 +++++++++---- > 2 files changed, 12 insertions(+), 4 deletions(-) > > diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c > index 38c02bb62e2c..c6713c7287df 100644 > --- a/net/ipv4/ip_tunnel.c > +++ b/net/ipv4/ip_tunnel.c > @@ -597,6 +597,9 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, > goto tx_error; > } > > + if (skb_dst(skb) && !skb_dst(skb)->dev) > + skb_dst(skb)->dev = rt->dst.dev; > + IMO this looks wrong. This dst seems shared. Once set, we will reuse the same dev ? If intended, why not doing this in __metadata_dst_init() instead of in the fast path ? > if (key->tun_flags & TUNNEL_DONT_FRAGMENT) > df = htons(IP_DF); > if (tnl_update_pmtu(dev, skb, rt, df, inner_iph, tunnel_hlen,
Hi Eric, Thanks for the review. On Thu, Aug 15, 2019 at 11:16:58AM +0200, Eric Dumazet wrote: > > diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c > > index 38c02bb62e2c..c6713c7287df 100644 > > --- a/net/ipv4/ip_tunnel.c > > +++ b/net/ipv4/ip_tunnel.c > > @@ -597,6 +597,9 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, > > goto tx_error; > > } > > > > + if (skb_dst(skb) && !skb_dst(skb)->dev) > > + skb_dst(skb)->dev = rt->dst.dev; > > + > > > IMO this looks wrong. > This dst seems shared. If the dst is shared, it may cause some problem. Could you point me where the dst may be shared possibly? > Once set, we will reuse the same dev ? If yes, how about just set the skb dst to rt->dst, as the iptunnel_xmit would do later. skb_dst_drop(skb); skb_dst_set(skb, &rt->dst); or do you have any other idea? > > If intended, why not doing this in __metadata_dst_init() instead of in the fast path ? I'm afraid we couldn't do this, I didn't find a way to init dev in __metadata_dst_init(). Do you? Thanks Hangbin
On Fri, Aug 16, 2019 at 11:24:18AM +0800, Hangbin Liu wrote: > If yes, how about just set the skb dst to rt->dst, as the > iptunnel_xmit would do later. > > skb_dst_drop(skb); > skb_dst_set(skb, &rt->dst); > Tested and this donesn't work good....
On 8/16/19 5:24 AM, Hangbin Liu wrote: > Hi Eric, > > Thanks for the review. > On Thu, Aug 15, 2019 at 11:16:58AM +0200, Eric Dumazet wrote: >>> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c >>> index 38c02bb62e2c..c6713c7287df 100644 >>> --- a/net/ipv4/ip_tunnel.c >>> +++ b/net/ipv4/ip_tunnel.c >>> @@ -597,6 +597,9 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, >>> goto tx_error; >>> } >>> >>> + if (skb_dst(skb) && !skb_dst(skb)->dev) >>> + skb_dst(skb)->dev = rt->dst.dev; >>> + >> >> >> IMO this looks wrong. >> This dst seems shared. > > If the dst is shared, it may cause some problem. Could you point me where the > dst may be shared possibly? > dst are inherently shared. This is why we have a refcount on them. Only when the dst has been allocated by the current thread we can make changes on them.
On Fri, Aug 16, 2019 at 10:23:55AM +0200, Eric Dumazet wrote: > > > On 8/16/19 5:24 AM, Hangbin Liu wrote: > > Hi Eric, > > > > Thanks for the review. > > On Thu, Aug 15, 2019 at 11:16:58AM +0200, Eric Dumazet wrote: > >>> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c > >>> index 38c02bb62e2c..c6713c7287df 100644 > >>> --- a/net/ipv4/ip_tunnel.c > >>> +++ b/net/ipv4/ip_tunnel.c > >>> @@ -597,6 +597,9 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, > >>> goto tx_error; > >>> } > >>> > >>> + if (skb_dst(skb) && !skb_dst(skb)->dev) > >>> + skb_dst(skb)->dev = rt->dst.dev; > >>> + > >> > >> > >> IMO this looks wrong. > >> This dst seems shared. > > > > If the dst is shared, it may cause some problem. Could you point me where the > > dst may be shared possibly? > > > > dst are inherently shared. > > This is why we have a refcount on them. > > Only when the dst has been allocated by the current thread we can make changes on them. > OK, I see now. Then how about fix the issue in __icmp_send and decode_session{4,6}. The fix in there is safe, as in __icmp_send() we only want to get net, dev_net(skb_in->dev) could also do the work, just as icmp6_send() does. For decode_session{4,6} the oif is also not needed in this scenario as this is called by xfrm_decode_session_reverse(), we only need the skb_iif fl4->flowi4_oif = reverse ? skb->skb_iif : oif; I also need to check more code in OVS.. diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 1510e951f451..95d803543df5 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -582,7 +582,11 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, if (!rt) goto out; - net = dev_net(rt->dst.dev); + + if (skb_in->dev) + net = dev_net(skb_in->dev); + else + goto out; /* * Find the original header. It is expected to be valid, of course. diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 8ca637a72697..ec94f5795ea4 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -3269,7 +3269,7 @@ decode_session4(struct sk_buff *skb, struct flowi *fl, bool reverse) struct flowi4 *fl4 = &fl->u.ip4; int oif = 0; - if (skb_dst(skb)) + if (skb_dst(skb) && skb_dst(skb)->dev) oif = skb_dst(skb)->dev->ifindex; memset(fl4, 0, sizeof(struct flowi4)); @@ -3387,7 +3387,7 @@ decode_session6(struct sk_buff *skb, struct flowi *fl, bool reverse) nexthdr = nh[nhoff]; - if (skb_dst(skb)) + if (skb_dst(skb) && skb_dst(skb)->dev) oif = skb_dst(skb)->dev->ifindex; memset(fl6, 0, sizeof(struct flowi6)); Thanks Hangbin
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 38c02bb62e2c..c6713c7287df 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -597,6 +597,9 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev, goto tx_error; } + if (skb_dst(skb) && !skb_dst(skb)->dev) + skb_dst(skb)->dev = rt->dst.dev; + if (key->tun_flags & TUNNEL_DONT_FRAGMENT) df = htons(IP_DF); if (tnl_update_pmtu(dev, skb, rt, df, inner_iph, tunnel_hlen, diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 754a484d35df..6ccf8f0eb8e7 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1109,10 +1109,15 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield, dst = NULL; goto tx_err_link_failure; } - if (t->parms.collect_md && ipv6_addr_any(&fl6->saddr) && - ipv6_dev_get_saddr(net, ip6_dst_idev(dst)->dev, - &fl6->daddr, 0, &fl6->saddr)) - goto tx_err_link_failure; + if (t->parms.collect_md) { + if (ipv6_addr_any(&fl6->saddr) && + ipv6_dev_get_saddr(net, ip6_dst_idev(dst)->dev, + &fl6->daddr, 0, &fl6->saddr)) + goto tx_err_link_failure; + + if (skb_dst(skb) && !skb_dst(skb)->dev) + skb_dst(skb)->dev = dst->dev; + } ndst = dst; }