Message ID | CAMJ5cBHZ4DqjE6Md-0apA8aaLLk9Hpiypfooo7ud-p9XyFyeng@mail.gmail.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [v1,1/1] veth: Do not drop packets larger then the mtu set on the receiving side | expand |
On Thu, Jan 10, 2019 at 02:26:55PM +0100, Fredrik Gustavsson wrote: > commit affede4a779420bd8510ab937251a3796d3228df > Author: Fredrik Gustavsson <gustfred@gmail.com> > Date: Tue Jan 8 11:21:39 2019 +0100 > > veth: Do not drop packets larger then the mtu set on the receiving side > > Currently veth drops all packets larger then the mtu set on the receiving > end of the pair. This is inconsistent with most hardware ethernet drivers > that happily receives packets up the the ethernet MTU independent of the > configured MTU. I agree with your argument, but i wonder if there could be a better implementation. ____dev_forward_skb() is on the hot path, so your additional check is going to slow down packet forwarding for everybody, not just veth. is_skb_forwardable() also does some additional checks which you are now skipping. Is that O.K? Since we are talking about a veth pair here, you have access to both ends of the link. Maybe consider if the mtu is changed on one end, you also change it on the other? Lets see what others think of that. Andrew
On Thu, Jan 10, 2019 at 6:21 AM Andrew Lunn <andrew@lunn.ch> wrote: > > On Thu, Jan 10, 2019 at 02:26:55PM +0100, Fredrik Gustavsson wrote: > > commit affede4a779420bd8510ab937251a3796d3228df > > Author: Fredrik Gustavsson <gustfred@gmail.com> > > Date: Tue Jan 8 11:21:39 2019 +0100 > > > > veth: Do not drop packets larger then the mtu set on the receiving side > > > > Currently veth drops all packets larger then the mtu set on the receiving > > end of the pair. This is inconsistent with most hardware ethernet drivers > > that happily receives packets up the the ethernet MTU independent of the > > configured MTU. > > I agree with your argument, but i wonder if there could be a better > implementation. > > ____dev_forward_skb() is on the hot path, so your additional check is > going to slow down packet forwarding for everybody, not just veth. > is_skb_forwardable() also does some additional checks which you are > now skipping. Is that O.K? > > Since we are talking about a veth pair here, you have access to both > ends of the link. Maybe consider if the mtu is changed on one end, you > also change it on the other? > > Lets see what others think of that. > adding a IFF_VETH just for this seems like an overkill. especially when there are other ways to indicate a virtual device type like rtnetlink kind. Also, its best to keep such checks local to veth, maybe with something along the lines of what Andrew suggests above.
On 2019/01/10 22:26, Fredrik Gustavsson wrote: > commit affede4a779420bd8510ab937251a3796d3228df > Author: Fredrik Gustavsson <gustfred@gmail.com> > Date: Tue Jan 8 11:21:39 2019 +0100 > > veth: Do not drop packets larger then the mtu set on the receiving side > > Currently veth drops all packets larger then the mtu set on the receiving > end of the pair. This is inconsistent with most hardware ethernet drivers > that happily receives packets up the the ethernet MTU independent of the > configured MTU. > > There should not be a need for dropping IP packets at receiver with > size > configured IP MTU, IP MTU is for fragmentation at sender side. > And IP packets with size > receiver L2 MTU will be dropped at sub-IP layer. But with current veth behavior setting MTU effectively sets MRU as well. This may be being used to drop unexpectedly long packets e.g. from containers on container host. If we unconditionally change this behavior it can cause regression on some environment. This should be an option at least. BTW there was a similar precedent attempt. You might want to take a look. https://www.mail-archive.com/netdev@vger.kernel.org/msg167636.html https://www.mail-archive.com/netdev@vger.kernel.org/msg167899.html
Den tors 10 jan. 2019 kl 16:39 skrev Roopa Prabhu <roopa@cumulusnetworks.com>: > > On Thu, Jan 10, 2019 at 6:21 AM Andrew Lunn <andrew@lunn.ch> wrote: > > > > On Thu, Jan 10, 2019 at 02:26:55PM +0100, Fredrik Gustavsson wrote: > > > commit affede4a779420bd8510ab937251a3796d3228df > > > Author: Fredrik Gustavsson <gustfred@gmail.com> > > > Date: Tue Jan 8 11:21:39 2019 +0100 > > > > > > veth: Do not drop packets larger then the mtu set on the receiving side > > > > > > Currently veth drops all packets larger then the mtu set on the receiving > > > end of the pair. This is inconsistent with most hardware ethernet drivers > > > that happily receives packets up the the ethernet MTU independent of the > > > configured MTU. > > > > I agree with your argument, but i wonder if there could be a better > > implementation. > > > > ____dev_forward_skb() is on the hot path, so your additional check is > > going to slow down packet forwarding for everybody, not just veth. > > is_skb_forwardable() also does some additional checks which you are > > now skipping. Is that O.K? > > > > Since we are talking about a veth pair here, you have access to both > > ends of the link. Maybe consider if the mtu is changed on one end, you > > also change it on the other? > > > > Lets see what others think of that. > > > > adding a IFF_VETH just for this seems like an overkill. especially > when there are other ways to indicate a virtual device type like > rtnetlink kind. > Also, its best to keep such checks local to veth, maybe with something > along the lines of what Andrew suggests above. Andrew: Nice to hear that my arguments were good. You are totally right about the test in is_skb_forwardable The one that is missed now is: if (!(dev->flags & IFF_UP)) So of course moving the check inside is_skb_forwadable instead would solved that. The example provide was to show how the behaviour could be provoked. For the actual use case I can not change to MTU on one of the interface. Roopa: Overkill or not seems like a slim patch this way. Of course dev_forward_skb version just for veth would of course be possible to do. Just thought that kind of change would not be accepted with duplicated code. Or do you mean in some other way to do it? About the point of slowing down (not sure how much we actaully are talking about) is a really good one. Since this is not only for veth. Is it then better to do if for just veth with a veth specific dev_forward_skb?
Den fre 11 jan. 2019 kl 05:23 skrev Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>: > > On 2019/01/10 22:26, Fredrik Gustavsson wrote: > > commit affede4a779420bd8510ab937251a3796d3228df > > Author: Fredrik Gustavsson <gustfred@gmail.com> > > Date: Tue Jan 8 11:21:39 2019 +0100 > > > > veth: Do not drop packets larger then the mtu set on the receiving side > > > > Currently veth drops all packets larger then the mtu set on the receiving > > end of the pair. This is inconsistent with most hardware ethernet drivers > > that happily receives packets up the the ethernet MTU independent of the > > configured MTU. > > > > There should not be a need for dropping IP packets at receiver with > > size > configured IP MTU, IP MTU is for fragmentation at sender side. > > And IP packets with size > receiver L2 MTU will be dropped at sub-IP layer. > > But with current veth behavior setting MTU effectively sets MRU as well. > This may be being used to drop unexpectedly long packets e.g. from > containers on container host. If we unconditionally change this behavior > it can cause regression on some environment. This should be an option at > least. > > BTW there was a similar precedent attempt. You might want to take a look. > > https://www.mail-archive.com/netdev@vger.kernel.org/msg167636.html > https://www.mail-archive.com/netdev@vger.kernel.org/msg167899.html > > -- > Toshiaki Makita > Good argument, but do you agree that it shouldn't be working like it does? But not sure if such a case would exist but breaking compatitbility have been discussed in other threads. So your saying that people have actually used it as a receiving limit for IP packets? Nice that you found that older thread it is also availble here: https://lkml.org/lkml/2017/5/12/254 Were I thought the ending argument was that if it shouldn't be dropped don't drop it. And that introducing a MRU parameter was just making things more complicated. But for my use case either way would work.
> > adding a IFF_VETH just for this seems like an overkill. especially > > when there are other ways to indicate a virtual device type like > > rtnetlink kind. > > Also, its best to keep such checks local to veth, maybe with something > > along the lines of what Andrew suggests above. > > Andrew: Nice to hear that my arguments were good. With DSA switches we somewhat rely on this behaviour, because there is an extra header in the frame. I recently added code to automagically increase the MTU, but that does not always work. > For the actual use case I can not change to MTU on one of the > interface. Please could you explain why. Andrew
On 2019/01/11 22:13, Fredrik Gustavsson wrote: > Den fre 11 jan. 2019 kl 05:23 skrev Toshiaki Makita > <makita.toshiaki@lab.ntt.co.jp>: >> >> On 2019/01/10 22:26, Fredrik Gustavsson wrote: >>> commit affede4a779420bd8510ab937251a3796d3228df >>> Author: Fredrik Gustavsson <gustfred@gmail.com> >>> Date: Tue Jan 8 11:21:39 2019 +0100 >>> >>> veth: Do not drop packets larger then the mtu set on the receiving side >>> >>> Currently veth drops all packets larger then the mtu set on the receiving >>> end of the pair. This is inconsistent with most hardware ethernet drivers >>> that happily receives packets up the the ethernet MTU independent of the >>> configured MTU. >>> >>> There should not be a need for dropping IP packets at receiver with >>> size > configured IP MTU, IP MTU is for fragmentation at sender side. >>> And IP packets with size > receiver L2 MTU will be dropped at sub-IP layer. >> >> But with current veth behavior setting MTU effectively sets MRU as well. >> This may be being used to drop unexpectedly long packets e.g. from >> containers on container host. If we unconditionally change this behavior >> it can cause regression on some environment. This should be an option at >> least. >> >> BTW there was a similar precedent attempt. You might want to take a look. >> >> https://www.mail-archive.com/netdev@vger.kernel.org/msg167636.html >> https://www.mail-archive.com/netdev@vger.kernel.org/msg167899.html >> > Good argument, but do you agree that it shouldn't be working like it > does? Not sure. As discussed in the previous thread I think some hardware can limit MRU < 1500 by configuring MTU, so to me it is not especially odd. > But not sure if such a case would exist but breaking > compatitbility have been discussed in other threads. So your saying > that people have actually used it as a receiving limit for IP packets? No, I'm saying it can be used in that way. Even though we did not confirm if there really are people using features in a certain way, we usually assume there are, as long as the usage is possible, because so many people use Linux.
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 890fa5b905e2..74460278a621 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1161,6 +1161,7 @@ static void veth_setup(struct net_device *dev) dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; dev->priv_flags |= IFF_NO_QUEUE; dev->priv_flags |= IFF_PHONY_HEADROOM; + dev->priv_flags |= IFF_VETH; dev->netdev_ops = &veth_netdev_ops; dev->ethtool_ops = &veth_ethtool_ops; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 857f8abf7b91..75465b3e4ecb 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1487,6 +1487,7 @@ struct net_device_ops { * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook * @IFF_FAILOVER: device is a failover master device * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device + * @IFF_VETH: Veth device */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1518,6 +1519,7 @@ enum netdev_priv_flags { IFF_NO_RX_HANDLER = 1<<26, IFF_FAILOVER = 1<<27, IFF_FAILOVER_SLAVE = 1<<28, + IFF_VETH = 1<<29, }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN @@ -1548,6 +1550,7 @@ enum netdev_priv_flags { #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER #define IFF_FAILOVER IFF_FAILOVER #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE +#define IFF_VETH IFF_VETH /** * struct net_device - The DEVICE structure. @@ -3650,11 +3653,17 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb); bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb); +static inline bool netif_is_veth(const struct net_device *dev) +{ + return dev->priv_flags & IFF_VETH; +} + static __always_inline int ____dev_forward_skb(struct net_device *dev, struct sk_buff *skb) { if (skb_orphan_frags(skb, GFP_ATOMIC) || - unlikely(!is_skb_forwardable(dev, skb))) { + netif_is_veth(dev) ? false : + unlikely(!is_skb_forwardable(dev, skb))) { atomic_long_inc(&dev->rx_dropped); kfree_skb(skb);
commit affede4a779420bd8510ab937251a3796d3228df Author: Fredrik Gustavsson <gustfred@gmail.com> Date: Tue Jan 8 11:21:39 2019 +0100 veth: Do not drop packets larger then the mtu set on the receiving side Currently veth drops all packets larger then the mtu set on the receiving end of the pair. This is inconsistent with most hardware ethernet drivers that happily receives packets up the the ethernet MTU independent of the configured MTU. There should not be a need for dropping IP packets at receiver with size > configured IP MTU, IP MTU is for fragmentation at sender side. And IP packets with size > receiver L2 MTU will be dropped at sub-IP layer. The drop is done inside is_skb_forwardable() so for this patch a netdev_priv_flags called IFF_VETH is introduced. A function for checking the flag is also created netif_is_veth. This flag is set in veth.c and a control of is is now made in ____dev_forward_skb() in include/linux/netdevice.h To reproduce the behaviour and to compare it to other network drivers these steps can be done: veth: ip link add type veth ip netns add foo ip link set dev veth1 netns foo ip netns exec foo ip addr add 192.168.45.1/24 dev veth1 ip netns exec foo ip link set veth1 up ip netns exec foo ip link set dev veth1 mtu 300 ip addr add 192.168.45.2/24 dev veth0 ip link set dev veth0 up ip link set dev veth0 mtu 500 ping -c 1 -W 1 -s 400 192.168.45.1 Output: ping -c 1 -W 1 -s 400 192.168.45.1 PING 192.168.45.1 (192.168.45.1) 400(428) bytes of data. --- 192.168.45.1 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms eth: ip addr add 192.168.45.1/24 dev eth1 ip link set dev eth0 mtu 300 up ip addr add 192.168.45.2/24 dev eth0 ip link set dev eth0 mtu 500 up ping -c 1 -W 1 -s 400 192.168.45.1 Output: --- 192.168.45.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms Signed-off-by: Fredrik Gustavsson <gustfred@gmail.com> return NET_RX_DROP;