Message ID | 20160709132230.GD2067@breakpoint.cc |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Sat, 9 Jul 2016 15:22:30 +0200, fw@strlen.de wrote: > Shmulik Ladkani <shmulik.ladkani@ravellosystems.com> wrote: > > I'd appreciate any suggestion how to determine traffic is local OTHER > > THAN testing IPSKB_FORWARDED; If we have such a way, there wouldn't be an > > impact on local traffic. > > > > > What about setting IPCB FORWARD flag in iptunnel_xmit if > > > skb->skb_iif != 0... instead? > > > > Can you please elaborate? > > diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c > --- a/net/ipv4/ip_tunnel_core.c > +++ b/net/ipv4/ip_tunnel_core.c > @@ -65,6 +65,7 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb, > struct net_device *dev = skb->dev; > struct iphdr *iph; > int err; > + bool fwd = skb->skb_iif > 0; > > skb_scrub_packet(skb, xnet); > > @@ -72,6 +73,9 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb, > skb_dst_set(skb, &rt->dst); > memset(IPCB(skb), 0, sizeof(*IPCB(skb))); > > + if (fwd) > + IPCB(skb)->flags = IPSKB_FORWARDED; > + Thanks. OK with the general idea; However I don't want to abuse IPSKB_FORWARDED as the skb is not really "ip forwarded", and it might have undesirable affects, such as in 'ip_skb_dst_mtu' which follows. How about a new IPSKB bit flag, say IPSKB_TUNNEL_FORWARDED, or a different way of marking? Regards, Shmulik
Shmulik Ladkani <shmulik.ladkani@ravellosystems.com> wrote: > OK with the general idea; However I don't want to abuse IPSKB_FORWARDED > as the skb is not really "ip forwarded", and it might have undesirable > affects, such as in 'ip_skb_dst_mtu' which follows. I don't see a problem with that. Hannes?
On 11.07.2016 04:15, Florian Westphal wrote: > Shmulik Ladkani <shmulik.ladkani@ravellosystems.com> wrote: >> OK with the general idea; However I don't want to abuse IPSKB_FORWARDED >> as the skb is not really "ip forwarded", and it might have undesirable >> affects, such as in 'ip_skb_dst_mtu' which follows. > > I don't see a problem with that. Hannes? If we segment/fragment along the way we need to use the non-pmtu/interface-mtu, so this change would be correct in regard to the mtu selection. Bye, Hannes
On Sat, 9 Jul 2016 15:22:30 +0200 Florian Westphal <fw@strlen.de> wrote: > Shmulik Ladkani <shmulik.ladkani@ravellosystems.com> wrote: > > I'd appreciate any suggestion how to determine traffic is local OTHER > > THAN testing IPSKB_FORWARDED; If we have such a way, there wouldn't be an > > impact on local traffic. > > > > > What about setting IPCB FORWARD flag in iptunnel_xmit if > > > skb->skb_iif != 0... instead? I've came up with a suggestion that does not abuse IPSKB_FORWARDED, while properly addressing the use case (and similar ones), without introducing the cost of entering 'skb_gso_validate_mtu' in the local case. How about: @@ -220,12 +220,15 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk, struct sk_buff *skb, unsigned int mtu) { netdev_features_t features; + int local_trusted_gso; struct sk_buff *segs; int ret = 0; - /* common case: locally created skb or seglen is <= mtu */ - if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) || - skb_gso_validate_mtu(skb, mtu)) + local_trusted_gso = (IPCB(skb)->flags & IPSKB_FORWARDED) == 0 && + !(skb_shinfo(skb)->gso_type & SKB_GSO_DODGY); + /* common case: locally created skb from a trusted gso source or + * seglen is <= mtu */ + if (local_trusted_gso || skb_gso_validate_mtu(skb, mtu)) return ip_finish_output2(net, sk, skb); /* Slowpath - GSO segment length is exceeding the dst MTU. This well addresses the usecase where we have gso-skb arriving from an untrusted source, thus its gso_size is out of our control (e.g. tun/tap, macvtap, af_packet, xen-netfront...). Locally "gso trusted" skbs (the common case) will NOT suffer the additional (possibly costy) call to 'skb_gso_validate_mtu'. Also, if IPSKB_FORWARDED is true, behavior stays exactly the same. Regards, Shmulik
Hi Florian, Hannes, On Tue, 12 Jul 2016 08:56:56 +0300 Shmulik Ladkani <shmulik.ladkani@ravellosystems.com> wrote: > On Sat, 9 Jul 2016 15:22:30 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > What about setting IPCB FORWARD flag in iptunnel_xmit if > > > > skb->skb_iif != 0... instead? > > I've came up with a suggestion that does not abuse IPSKB_FORWARDED, > while properly addressing the use case (and similar ones), without > introducing the cost of entering 'skb_gso_validate_mtu' in the local > case. > > How about: > > @@ -220,12 +220,15 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk, > struct sk_buff *skb, unsigned int mtu) > { > netdev_features_t features; > + int local_trusted_gso; > struct sk_buff *segs; > int ret = 0; > > - /* common case: locally created skb or seglen is <= mtu */ > - if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) || > - skb_gso_validate_mtu(skb, mtu)) > + local_trusted_gso = (IPCB(skb)->flags & IPSKB_FORWARDED) == 0 && > + !(skb_shinfo(skb)->gso_type & SKB_GSO_DODGY); > + /* common case: locally created skb from a trusted gso source or > + * seglen is <= mtu */ > + if (local_trusted_gso || skb_gso_validate_mtu(skb, mtu)) > return ip_finish_output2(net, sk, skb); > > /* Slowpath - GSO segment length is exceeding the dst MTU. > > This well addresses the usecase where we have gso-skb arriving from an > untrusted source, thus its gso_size is out of our control (e.g. tun/tap, > macvtap, af_packet, xen-netfront...). > > Locally "gso trusted" skbs (the common case) will NOT suffer the > additional (possibly costy) call to 'skb_gso_validate_mtu'. > > Also, if IPSKB_FORWARDED is true, behavior stays exactly the same. Any commnets regarding the latest suggestion above? I'd like to post it as v2 - if it is in the right direction. It handles the problem of gso_size values which are not in host's control, it addresses the usecase described, and has a benefit of not overloading IPSKB_FORWARDED with a new semantic that might be hard to maintain. PS: Also, if we'd like to pinpoint it even further, we can: local_trusted_gso = (IPCB(skb)->flags & IPSKB_FORWARDED) == 0 && (!sk || !(skb_shinfo(skb)->gso_type & SKB_GSO_DODGY)); Which ensures only the following conditions go to the expensive skb_gso_validate_mtu: 1. IPSKB_FORWARDED is on 2. IPSKB_FORWARDED is off, but sk exists and gso_size is untrusted. Meaning: we have a packet arriving from higher layers (sk is set) with a gso_size out of host's control. This fine-tuining leaves standard l2 bridging case (e.g 2x taps bridged) of a gso skb unaffected, as sk would be NULL. Many thanks, Shmulik
Hello Shmulik, On Wed, Jul 13, 2016, at 16:00, Shmulik Ladkani wrote: > Hi Florian, Hannes, > > On Tue, 12 Jul 2016 08:56:56 +0300 Shmulik Ladkani > <shmulik.ladkani@ravellosystems.com> wrote: > > On Sat, 9 Jul 2016 15:22:30 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > > > What about setting IPCB FORWARD flag in iptunnel_xmit if > > > > > skb->skb_iif != 0... instead? > > > > I've came up with a suggestion that does not abuse IPSKB_FORWARDED, > > while properly addressing the use case (and similar ones), without > > introducing the cost of entering 'skb_gso_validate_mtu' in the local > > case. > > > > How about: > > > > @@ -220,12 +220,15 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk, > > struct sk_buff *skb, unsigned int mtu) > > { > > netdev_features_t features; > > + int local_trusted_gso; > > struct sk_buff *segs; > > int ret = 0; > > > > - /* common case: locally created skb or seglen is <= mtu */ > > - if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) || > > - skb_gso_validate_mtu(skb, mtu)) > > + local_trusted_gso = (IPCB(skb)->flags & IPSKB_FORWARDED) == 0 && > > + !(skb_shinfo(skb)->gso_type & SKB_GSO_DODGY); > > + /* common case: locally created skb from a trusted gso source or > > + * seglen is <= mtu */ > > + if (local_trusted_gso || skb_gso_validate_mtu(skb, mtu)) > > return ip_finish_output2(net, sk, skb); > > > > /* Slowpath - GSO segment length is exceeding the dst MTU. > > > > This well addresses the usecase where we have gso-skb arriving from an > > untrusted source, thus its gso_size is out of our control (e.g. tun/tap, > > macvtap, af_packet, xen-netfront...). > > > > Locally "gso trusted" skbs (the common case) will NOT suffer the > > additional (possibly costy) call to 'skb_gso_validate_mtu'. > > > > Also, if IPSKB_FORWARDED is true, behavior stays exactly the same. Sorry for the late reply, I am right now travelling and can't review that closely. > Any commnets regarding the latest suggestion above? > I'd like to post it as v2 - if it is in the right direction. > > It handles the problem of gso_size values which are not in host's > control, it addresses the usecase described, and has a benefit of not > overloading IPSKB_FORWARDED with a new semantic that might be hard to > maintain. I liked the fact that setting IPSKB_FORWARDED was only contained in vxlan and as such wouldn't have as much impact. It was more logically easy to review for me actually. > PS: > Also, if we'd like to pinpoint it even further, we can: > > local_trusted_gso = (IPCB(skb)->flags & IPSKB_FORWARDED) == 0 && > (!sk || !(skb_shinfo(skb)->gso_type & SKB_GSO_DODGY)); This also looks valid but too random. It seems to be a mix of random conditions to make it work. ;) > > Which ensures only the following conditions go to the expensive > skb_gso_validate_mtu: > > 1. IPSKB_FORWARDED is on > 2. IPSKB_FORWARDED is off, but sk exists and gso_size is untrusted. > Meaning: we have a packet arriving from higher layers (sk is set) > with a gso_size out of host's control. When can this really happen? In general we don't want to refragment gso skb's and I think we can only make an exception for vxlan or udp. > This fine-tuining leaves standard l2 bridging case (e.g 2x taps bridged) > of a gso skb unaffected, as sk would be NULL. Bridging does not in general orphan the socket, no? Bye, Hannes
Hi, On Thu, 14 Jul 2016 15:12:07 +0200, hannes@stressinduktion.org wrote: > I liked the fact that setting IPSKB_FORWARDED was only contained in > vxlan and as such wouldn't have as much impact. It was more logically > easy to review for me actually. I agree here. It is rather safe and to the point. I'm trying to exaust other alternatives because it has one potential drawback: the name IPSKB_FORWARDED suggests ipv4 forwarding had happened. Indeed, current setters of IPSKB_FORWARDED are ip_forward and ip_mr_forward. If we set IPSKB_FORWARDED in iptunnel_xmit, with packet not being ipv4 forwarded (e.g. bridged from some ingress device to a tunnel device), it presents a nuance whose impact is yet to be determined. For example, what about a packet that gets encapsulated and sent to a multicast destination? The condition controlling mc loop-back in ip_mc_output is affected by the flag. > > Which ensures only the following conditions go to the expensive > > skb_gso_validate_mtu: > > > > 1. IPSKB_FORWARDED is on > > 2. IPSKB_FORWARDED is off, but sk exists and gso_size is untrusted. > > Meaning: we have a packet arriving from higher layers (sk is set) > > with a gso_size out of host's control. > > When can this really happen? In general we don't want to refragment gso > skb's and I think we can only make an exception for vxlan or udp. When IPSKB_FORWARDED is off, we'll get SKB_GSO_DODGY if packet originally arrived from tap/macvtap/packet and it did NOT pass ipv4 forwarding (e.g bridges: tap0 to eth0 bridge, or tap0 to vxlan0 bridge). The rationale: in the SKB_GSO_DODGY cases, the gso_size is given by the user's virtio-net header, which is not in kernel's control. This exactly resembles the usecase: tap0 gives packets with gso_size unsuitable for encapsulation and segmentation. I have no control on the source that gives those packets. If (1) it does not make sense, or (2) considered too broad-spectrum to asses, then we can go with the safer IPSKB_FORWARDED approach. Let me know. Regards, Shmulik
Hello, On Thu, Jul 14, 2016, at 16:13, Shmulik Ladkani wrote: > Hi, > > On Thu, 14 Jul 2016 15:12:07 +0200, hannes@stressinduktion.org wrote: > > I liked the fact that setting IPSKB_FORWARDED was only contained in > > vxlan and as such wouldn't have as much impact. It was more logically > > easy to review for me actually. > > I agree here. It is rather safe and to the point. > > I'm trying to exaust other alternatives because it has one potential > drawback: the name IPSKB_FORWARDED suggests ipv4 forwarding had > happened. Indeed, current setters of IPSKB_FORWARDED are ip_forward and > ip_mr_forward. > > If we set IPSKB_FORWARDED in iptunnel_xmit, with packet not being ipv4 > forwarded (e.g. bridged from some ingress device to a tunnel device), it > presents a nuance whose impact is yet to be determined. The reason why I rather don't want to see a general solution is that currently gre and ipip are pmtu-safe and I rather would like to keep the old behavior that gre and ipip packets don't get treated alike. I couldn't check the current throughly but it seems they would also be affected by this change. My idea was to put those IPSKB_FORWARD just into the vxlan and geneve endpoints which could be seen as UDP endpoints and don't forward and care about pmtu events. > For example, what about a packet that gets encapsulated and sent to a > multicast destination? The condition controlling mc loop-back in > ip_mc_output is affected by the flag. I have to look at that more closely after my trip. What about adding a new flag with a proper name. It shouldn't affect performance in any way if you check for two bits instead of just the FORWARDED one. > > > Which ensures only the following conditions go to the expensive > > > skb_gso_validate_mtu: > > > > > > 1. IPSKB_FORWARDED is on > > > 2. IPSKB_FORWARDED is off, but sk exists and gso_size is untrusted. > > > Meaning: we have a packet arriving from higher layers (sk is set) > > > with a gso_size out of host's control. > > > > When can this really happen? In general we don't want to refragment gso > > skb's and I think we can only make an exception for vxlan or udp. > > When IPSKB_FORWARDED is off, we'll get SKB_GSO_DODGY if packet > originally arrived from tap/macvtap/packet and it did NOT pass ipv4 > forwarding (e.g bridges: tap0 to eth0 bridge, or tap0 to vxlan0 bridge). > > The rationale: in the SKB_GSO_DODGY cases, the gso_size is given by > the user's virtio-net header, which is not in kernel's control. But we can assume it being correct based on the mtu of the interface which is not in control of the current namespace instance or in another vm. We should act accordingly to normal ethernet bridging here, IMHO. > This exactly resembles the usecase: tap0 gives packets with gso_size > unsuitable for encapsulation and segmentation. I have no control on > the source that gives those packets. My argumentation is that vxlan and geneve can be seen as endpoints and don't have proper pmtu handling. Thus I would be fine with adding hacks to just make it working. For GRE I would like to keep strict internet pmtu handling and also keep the normal ethernet semantics in place, that we should drop packets if another interface did transmit on an ethernet bridge with a too big frame size. > If (1) it does not make sense, or (2) considered too broad-spectrum to > asses, then we can go with the safer IPSKB_FORWARDED approach. Please have a look at my reasoning. I probably can follow-up more deeply from Sunday on. Thanks, Hannes
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -65,6 +65,7 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb, struct net_device *dev = skb->dev; struct iphdr *iph; int err; + bool fwd = skb->skb_iif > 0; skb_scrub_packet(skb, xnet); @@ -72,6 +73,9 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb, skb_dst_set(skb, &rt->dst); memset(IPCB(skb), 0, sizeof(*IPCB(skb))); + if (fwd) + IPCB(skb)->flags = IPSKB_FORWARDED; + /* Push down and install the IP header. */ skb_push(skb, sizeof(struct iphdr)); skb_reset_network_header(skb);