Message ID | 20190111144357.9700-1-nicolas.dichtel@6wind.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net] af_packet: fix raw sockets over 6in4 tunnel | expand |
On Fri, Jan 11, 2019 at 9:44 AM Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote: > > Since commit cb9f1b783850, scapy (which uses an AF_PACKET socket in > SOCK_RAW mode) is unable to send a basic icmp packet over a sit tunnel: > > Here is a example of the setup: > $ ip link set ntfp2 up > $ ip addr add 10.125.0.1/24 dev ntfp2 > $ ip tunnel add tun1 mode sit ttl 64 local 10.125.0.1 remote 10.125.0.2 dev ntfp2 > $ ip addr add fd00:cafe:cafe::1/128 dev tun1 > $ ip link set dev tun1 up > $ ip route add fd00:200::/64 dev tun1 > $ scapy > >>> p = [] > >>> p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest() > >>> send(p, count=1, inter=0.1) > >>> quit() > $ ip -s link ls dev tun1 | grep -A1 "TX.*errors" > TX: bytes packets errors dropped carrier collsns > 0 0 1 0 0 0 > > The problem is that the network offset is set to the hard_header_len of the > output device (tun1, ie 14 + 20) and in our case, because the packet is > small (48 bytes) the pskb_inet_may_pull() fails (it tries to pull 40 bytes > (ipv6 header) starting from the network offset). Let's reset the network > offset so that it points to the beginning of the L3 data, ie skb->data. This only just landed in my inbox, sorry for the delay. So far I'm not reproducing the issue, but I'm trying and am having a look. Which pskb_inet_may_pull() is failing?
On Fri, Jan 11, 2019 at 4:29 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Fri, Jan 11, 2019 at 9:44 AM Nicolas Dichtel > <nicolas.dichtel@6wind.com> wrote: > > > > Since commit cb9f1b783850, scapy (which uses an AF_PACKET socket in > > SOCK_RAW mode) is unable to send a basic icmp packet over a sit tunnel: > > > > Here is a example of the setup: > > $ ip link set ntfp2 up > > $ ip addr add 10.125.0.1/24 dev ntfp2 > > $ ip tunnel add tun1 mode sit ttl 64 local 10.125.0.1 remote 10.125.0.2 dev ntfp2 > > $ ip addr add fd00:cafe:cafe::1/128 dev tun1 > > $ ip link set dev tun1 up > > $ ip route add fd00:200::/64 dev tun1 > > $ scapy > > >>> p = [] > > >>> p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest() > > >>> send(p, count=1, inter=0.1) > > >>> quit() > > $ ip -s link ls dev tun1 | grep -A1 "TX.*errors" > > TX: bytes packets errors dropped carrier collsns > > 0 0 1 0 0 0 > > > > The problem is that the network offset is set to the hard_header_len of the > > output device (tun1, ie 14 + 20) and in our case, because the packet is > > small (48 bytes) the pskb_inet_may_pull() fails (it tries to pull 40 bytes > > (ipv6 header) starting from the network offset). Let's reset the network > > offset so that it points to the beginning of the L3 data, ie skb->data. > > This only just landed in my inbox, sorry for the delay. So far I'm not > reproducing the issue, but I'm trying and am having a look. > > Which pskb_inet_may_pull() is failing? Okay. I see what's going on. Commit cb9f1b783850 converted a pskb_may_pull ipv6hdr, pulling from skb->data, to pskb_network_may_pull, pulling from skb->network_header. Normally packet sockets with SOCK_RAW write a fixed size hardware header and place skb->network_header directly after that. Dropping these in sit_tunnel_xmit if they fail pskb_inet_may_pull is correct. But with devices with variable length hardware headers like sit, it is possible to cook packets that are shorter than the upper bound hhlen. In that case, we have no way of knowing where the variable length header ends (short of device specific parsing if implemented, see also header_ops.validate), so we set the network header to the same as the mac header as of commit 993675a3100b1. This is not a good thing to do in general, and not needed in the common case. The issue here is that the icmpv6 packet (48B) exceeds the sit hhlen (34), so is not caught by that workaround. We need to extend it to cases where the variable length hhlen is less than its upper bound in such a way that the packet as a whole exceeds hhlen, but by so little that the L3 header still starts before dev->hard_header_len. The simplest, to handle ip validation, is to extend the check from if (len < reserve) skb_reset_network_header(skb); to one that accounts for requiring a fully formed protocol header } else if (reserve) { + int l3_min_len = reserve; + skb_reserve(skb, -reserve); - if (len < reserve) + + if (proto == htons(ETH_P_IPV6)) + l3_min_len += sizeof(struct ipv6hdr); + else if (proto == htons(ETH_P_IP)) + l3_min_len += sizeof(struct iphdr); + + if (len < l3_min_len) skb_reset_network_header(skb); } Instead of special casing protocols, it might be cleaner to convert dev_validate_header to return instead of a pass/fail bool the hardware header length. And then update skb->network_header accordingly. That would definitely only be for net-next. For completeness, my variant of your test on top of v5.0-rc1 " ip netns add test ip netns exec test bash ip link set lo up ip addr add 10.0.0.1/24 dev lo ip tunnel add tun1 mode sit ttl 64 local 10.0.0.1 remote 10.0.0.2 dev lo ip addr add fd00:cafe:cafe::1/128 dev tun1 ip link set dev tun1 up ip route add fd00:200::/64 dev tun1 tcpdump -vvv -i tun1 -n & sleep 0.4 scapy p = [] p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest() send(p, count=1, inter=0.1) quit() "
Le 12/01/2019 à 19:00, Willem de Bruijn a écrit : > On Fri, Jan 11, 2019 at 4:29 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: >> >> On Fri, Jan 11, 2019 at 9:44 AM Nicolas Dichtel >> <nicolas.dichtel@6wind.com> wrote: >>> >>> Since commit cb9f1b783850, scapy (which uses an AF_PACKET socket in >>> SOCK_RAW mode) is unable to send a basic icmp packet over a sit tunnel: >>> >>> Here is a example of the setup: >>> $ ip link set ntfp2 up >>> $ ip addr add 10.125.0.1/24 dev ntfp2 >>> $ ip tunnel add tun1 mode sit ttl 64 local 10.125.0.1 remote 10.125.0.2 dev ntfp2 >>> $ ip addr add fd00:cafe:cafe::1/128 dev tun1 >>> $ ip link set dev tun1 up >>> $ ip route add fd00:200::/64 dev tun1 >>> $ scapy >>>>>> p = [] >>>>>> p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest() >>>>>> send(p, count=1, inter=0.1) >>>>>> quit() >>> $ ip -s link ls dev tun1 | grep -A1 "TX.*errors" >>> TX: bytes packets errors dropped carrier collsns >>> 0 0 1 0 0 0 >>> >>> The problem is that the network offset is set to the hard_header_len of the >>> output device (tun1, ie 14 + 20) and in our case, because the packet is >>> small (48 bytes) the pskb_inet_may_pull() fails (it tries to pull 40 bytes >>> (ipv6 header) starting from the network offset). Let's reset the network >>> offset so that it points to the beginning of the L3 data, ie skb->data. >> >> This only just landed in my inbox, sorry for the delay. So far I'm not >> reproducing the issue, but I'm trying and am having a look. >> >> Which pskb_inet_may_pull() is failing? > > Okay. I see what's going on. > > Commit cb9f1b783850 converted a pskb_may_pull ipv6hdr, pulling from > skb->data, to pskb_network_may_pull, pulling from skb->network_header. > > Normally packet sockets with SOCK_RAW write a fixed size hardware > header and place skb->network_header directly after that. Dropping > these in sit_tunnel_xmit if they fail pskb_inet_may_pull is correct. > > But with devices with variable length hardware headers like sit, it is > possible to cook packets that are shorter than the upper bound hhlen. > In that case, we have no way of knowing where the variable length > header ends (short of device specific parsing if implemented, see also > header_ops.validate), so we set the network header to the same as the > mac header as of commit 993675a3100b1. > > This is not a good thing to do in general, and not needed in the > common case. Sorry, but I fail to understand what we try to achieve here. In this case, the packet socket builds a packet without a hard header: <-------48B---------> <--40B---> +--------+----------+ |IPv6 hdr|ICMPv6 hdr| +--------+----------+ ^ skb->data ^ nh offset: 34 (ie dev->hard_header_len, ie ethernet hdr + ipv4 hdr) The nh offset does not match any data in the packet. dev_validate_header() checks that datalen (ie 48) is >= dev->hard_header_len (ie 34). Still no hard header here. Then, dev->xmit (ie sit_tunnel_xmit()) is called. This function calls pskb_inet_may_pull() which tries to pull 40B from the nh offset (34). But the nh offset still points somewhere in the ipv6 hdr, thus the test fails (34 + 40 > 48). I don't understand why it is wrong to set the nh offset to 0 (skb->data), ie exactly where the ipv6 header starts. Regards, Nicolas
On Mon, Jan 14, 2019 at 9:51 AM Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote: > > Le 12/01/2019 à 19:00, Willem de Bruijn a écrit : > > On Fri, Jan 11, 2019 at 4:29 PM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > >> > >> On Fri, Jan 11, 2019 at 9:44 AM Nicolas Dichtel > >> <nicolas.dichtel@6wind.com> wrote: > >>> > >>> Since commit cb9f1b783850, scapy (which uses an AF_PACKET socket in > >>> SOCK_RAW mode) is unable to send a basic icmp packet over a sit tunnel: > >>> > >>> Here is a example of the setup: > >>> $ ip link set ntfp2 up > >>> $ ip addr add 10.125.0.1/24 dev ntfp2 > >>> $ ip tunnel add tun1 mode sit ttl 64 local 10.125.0.1 remote 10.125.0.2 dev ntfp2 > >>> $ ip addr add fd00:cafe:cafe::1/128 dev tun1 > >>> $ ip link set dev tun1 up > >>> $ ip route add fd00:200::/64 dev tun1 > >>> $ scapy > >>>>>> p = [] > >>>>>> p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest() > >>>>>> send(p, count=1, inter=0.1) > >>>>>> quit() > >>> $ ip -s link ls dev tun1 | grep -A1 "TX.*errors" > >>> TX: bytes packets errors dropped carrier collsns > >>> 0 0 1 0 0 0 > >>> > >>> The problem is that the network offset is set to the hard_header_len of the > >>> output device (tun1, ie 14 + 20) and in our case, because the packet is > >>> small (48 bytes) the pskb_inet_may_pull() fails (it tries to pull 40 bytes > >>> (ipv6 header) starting from the network offset). Let's reset the network > >>> offset so that it points to the beginning of the L3 data, ie skb->data. > >> > >> This only just landed in my inbox, sorry for the delay. So far I'm not > >> reproducing the issue, but I'm trying and am having a look. > >> > >> Which pskb_inet_may_pull() is failing? > > > > Okay. I see what's going on. > > > > Commit cb9f1b783850 converted a pskb_may_pull ipv6hdr, pulling from > > skb->data, to pskb_network_may_pull, pulling from skb->network_header. > > > > Normally packet sockets with SOCK_RAW write a fixed size hardware > > header and place skb->network_header directly after that. Dropping > > these in sit_tunnel_xmit if they fail pskb_inet_may_pull is correct. > > > > But with devices with variable length hardware headers like sit, it is > > possible to cook packets that are shorter than the upper bound hhlen. > > In that case, we have no way of knowing where the variable length > > header ends (short of device specific parsing if implemented, see also > > header_ops.validate), so we set the network header to the same as the > > mac header as of commit 993675a3100b1. > > > > This is not a good thing to do in general, and not needed in the > > common case. > Sorry, but I fail to understand what we try to achieve here. In this case, the > packet socket builds a packet without a hard header: > > <-------48B---------> > <--40B---> > +--------+----------+ > |IPv6 hdr|ICMPv6 hdr| > +--------+----------+ > ^ skb->data > ^ nh offset: 34 (ie dev->hard_header_len, ie ethernet hdr + ipv4 hdr) > > The nh offset does not match any data in the packet. > > dev_validate_header() checks that datalen (ie 48) is >= dev->hard_header_len (ie > 34). Still no hard header here. > > Then, dev->xmit (ie sit_tunnel_xmit()) is called. This function calls > pskb_inet_may_pull() which tries to pull 40B from the nh offset (34). But the nh > offset still points somewhere in the ipv6 hdr, thus the test fails (34 + 40 > 48). > I don't understand why it is wrong to set the nh offset to 0 (skb->data), ie > exactly where the ipv6 header starts. It is wrong because for other devices l2 header length is not zero, so this incorrectly sets skb->network_header to the start of the link layer header on all those devices. A one line variant of the above would be - if (len < reserve) + if (len < reserve + sizeof(struct ipv6hdr)) > > Regards, > Nicolas
On Mon, Jan 14, 2019 at 10:15 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Mon, Jan 14, 2019 at 9:51 AM Nicolas Dichtel > <nicolas.dichtel@6wind.com> wrote: > > > > Le 12/01/2019 à 19:00, Willem de Bruijn a écrit : > > > On Fri, Jan 11, 2019 at 4:29 PM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > >> > > >> On Fri, Jan 11, 2019 at 9:44 AM Nicolas Dichtel > > >> <nicolas.dichtel@6wind.com> wrote: > > >>> > > >>> Since commit cb9f1b783850, scapy (which uses an AF_PACKET socket in > > >>> SOCK_RAW mode) is unable to send a basic icmp packet over a sit tunnel: > > >>> > > >>> Here is a example of the setup: > > >>> $ ip link set ntfp2 up > > >>> $ ip addr add 10.125.0.1/24 dev ntfp2 > > >>> $ ip tunnel add tun1 mode sit ttl 64 local 10.125.0.1 remote 10.125.0.2 dev ntfp2 > > >>> $ ip addr add fd00:cafe:cafe::1/128 dev tun1 > > >>> $ ip link set dev tun1 up > > >>> $ ip route add fd00:200::/64 dev tun1 > > >>> $ scapy > > >>>>>> p = [] > > >>>>>> p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest() > > >>>>>> send(p, count=1, inter=0.1) > > >>>>>> quit() > > >>> $ ip -s link ls dev tun1 | grep -A1 "TX.*errors" > > >>> TX: bytes packets errors dropped carrier collsns > > >>> 0 0 1 0 0 0 > > >>> > > >>> The problem is that the network offset is set to the hard_header_len of the > > >>> output device (tun1, ie 14 + 20) and in our case, because the packet is > > >>> small (48 bytes) the pskb_inet_may_pull() fails (it tries to pull 40 bytes > > >>> (ipv6 header) starting from the network offset). Let's reset the network > > >>> offset so that it points to the beginning of the L3 data, ie skb->data. > > >> > > >> This only just landed in my inbox, sorry for the delay. So far I'm not > > >> reproducing the issue, but I'm trying and am having a look. > > >> > > >> Which pskb_inet_may_pull() is failing? > > > > > > Okay. I see what's going on. > > > > > > Commit cb9f1b783850 converted a pskb_may_pull ipv6hdr, pulling from > > > skb->data, to pskb_network_may_pull, pulling from skb->network_header. > > > > > > Normally packet sockets with SOCK_RAW write a fixed size hardware > > > header and place skb->network_header directly after that. Dropping > > > these in sit_tunnel_xmit if they fail pskb_inet_may_pull is correct. > > > > > > But with devices with variable length hardware headers like sit, it is > > > possible to cook packets that are shorter than the upper bound hhlen. > > > In that case, we have no way of knowing where the variable length > > > header ends (short of device specific parsing if implemented, see also > > > header_ops.validate), so we set the network header to the same as the > > > mac header as of commit 993675a3100b1. > > > > > > This is not a good thing to do in general, and not needed in the > > > common case. > > Sorry, but I fail to understand what we try to achieve here. In this case, the > > packet socket builds a packet without a hard header: > > > > <-------48B---------> > > <--40B---> > > +--------+----------+ > > |IPv6 hdr|ICMPv6 hdr| > > +--------+----------+ > > ^ skb->data > > ^ nh offset: 34 (ie dev->hard_header_len, ie ethernet hdr + ipv4 hdr) > > > > The nh offset does not match any data in the packet. > > > > dev_validate_header() checks that datalen (ie 48) is >= dev->hard_header_len (ie > > 34). Still no hard header here. > > > > Then, dev->xmit (ie sit_tunnel_xmit()) is called. This function calls > > pskb_inet_may_pull() which tries to pull 40B from the nh offset (34). But the nh > > offset still points somewhere in the ipv6 hdr, thus the test fails (34 + 40 > 48). > > I don't understand why it is wrong to set the nh offset to 0 (skb->data), ie > > exactly where the ipv6 header starts. > > It is wrong because for other devices l2 header length is not zero, so > this incorrectly sets skb->network_header to the start of the link > layer header on all those devices. > > A one line variant of the above would be > > - if (len < reserve) > + if (len < reserve + sizeof(struct ipv6hdr)) and exclude the majority of devices with fixed hard header len. Those require len to be >= reserve, so this workaround does not apply to them: if (len < reserve + sizeof(struct ipv6hdr) && dev->min_header_len != dev->hard_header_len)
Le 14/01/2019 à 16:38, Willem de Bruijn a écrit : > On Mon, Jan 14, 2019 at 10:15 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: [snip] >> It is wrong because for other devices l2 header length is not zero, so >> this incorrectly sets skb->network_header to the start of the link >> layer header on all those devices. Ok, thank you for the details. >> >> A one line variant of the above would be >> >> - if (len < reserve) >> + if (len < reserve + sizeof(struct ipv6hdr)) > > and exclude the majority of devices with fixed hard header len. Those > require len to be >= reserve, so this workaround does not apply to > them: > > if (len < reserve + sizeof(struct ipv6hdr) && > dev->min_header_len != dev->hard_header_len) > And what about: - if (len < reserve) + if (dev->min_header_len != dev->hard_header_len) skb_reset_network_header(skb); ?
On Tue, Jan 15, 2019 at 4:49 AM Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote: > > Le 14/01/2019 à 16:38, Willem de Bruijn a écrit : > > On Mon, Jan 14, 2019 at 10:15 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > [snip] > >> It is wrong because for other devices l2 header length is not zero, so > >> this incorrectly sets skb->network_header to the start of the link > >> layer header on all those devices. > Ok, thank you for the details. > > >> > >> A one line variant of the above would be > >> > >> - if (len < reserve) > >> + if (len < reserve + sizeof(struct ipv6hdr)) > > > > and exclude the majority of devices with fixed hard header len. Those > > require len to be >= reserve, so this workaround does not apply to > > them: > > > > if (len < reserve + sizeof(struct ipv6hdr) && > > dev->min_header_len != dev->hard_header_len) > > > And what about: > - if (len < reserve) > + if (dev->min_header_len != dev->hard_header_len) > skb_reset_network_header(skb); > ? As a fix for the regression introduced by cb9f1b783850 ("ip: validate header length on virtual device xmit"), the test I proposed is a bit more narrow by only applying this ugly workaround in cases where that test might have started failing. By limiting to short packets we also avoid in the common case reading dev->min_header_len, which may not be cached (btw, we should use reserve instead of dev->hard_header_len for the same reason). But the length restriction does look rather arbitrary, so there is something to say for your suggestion to apply this uniformly to devices with variable length. Note also the concurrent discussion in http://patchwork.ozlabs.org/patch/1024489/ about extending header_ops with a hard header parser which may return the exist header length. That is only for net-next, but it will allow us to set skb->network_header correctly even for these devices where (dev->min_header_len != dev->hard_header_len).
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index d0945253f43b..911c3870e66d 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2887,8 +2887,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) goto out_free; } else if (reserve) { skb_reserve(skb, -reserve); - if (len < reserve) - skb_reset_network_header(skb); + skb_reset_network_header(skb); } /* Returns -EFAULT on error */