diff mbox series

[v1,1/1] veth: Do not drop packets larger then the mtu set on the receiving side

Message ID CAMJ5cBHZ4DqjE6Md-0apA8aaLLk9Hpiypfooo7ud-p9XyFyeng@mail.gmail.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [v1,1/1] veth: Do not drop packets larger then the mtu set on the receiving side | expand

Commit Message

Fredrik Gustavsson Jan. 10, 2019, 1:26 p.m. UTC
commit affede4a779420bd8510ab937251a3796d3228df
Author: Fredrik Gustavsson <gustfred@gmail.com>
Date:   Tue Jan 8 11:21:39 2019 +0100

veth: Do not drop packets larger then the mtu set on the receiving side

Currently veth drops all packets larger then the mtu set on the receiving
end of the pair. This is inconsistent with most hardware ethernet drivers
that happily receives packets up the the ethernet MTU independent of the
configured MTU.

There should not be a need for dropping IP packets at receiver with
size > configured IP MTU, IP MTU is for fragmentation at sender side.
And IP packets with size > receiver L2 MTU will be dropped at sub-IP layer.

The drop is done inside is_skb_forwardable() so for this patch
a netdev_priv_flags called IFF_VETH is introduced.
A function for checking the flag is also created netif_is_veth.
This flag is set in veth.c and a control of is is now made in
____dev_forward_skb() in include/linux/netdevice.h

To reproduce the behaviour and to compare it to other
network drivers these steps can be done:

veth:
ip link add type veth
ip netns add foo
ip link set dev veth1 netns foo
ip netns exec foo ip addr add 192.168.45.1/24 dev veth1
ip netns exec foo ip link set veth1 up
ip netns exec foo ip link set dev veth1 mtu 300

ip addr add 192.168.45.2/24 dev veth0
ip link set dev veth0 up
ip link set dev veth0 mtu 500
ping -c 1 -W 1 -s 400 192.168.45.1

Output:
ping -c 1 -W 1 -s 400 192.168.45.1
PING 192.168.45.1 (192.168.45.1) 400(428) bytes of data.

--- 192.168.45.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

eth:
ip addr add 192.168.45.1/24 dev eth1
ip link set dev eth0 mtu 300 up

ip addr add 192.168.45.2/24 dev eth0
ip link set dev eth0 mtu 500 up
ping -c 1 -W 1 -s 400 192.168.45.1

Output:
--- 192.168.45.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms

Signed-off-by: Fredrik Gustavsson <gustfred@gmail.com>

  return NET_RX_DROP;

Comments

Andrew Lunn Jan. 10, 2019, 2:21 p.m. UTC | #1
On Thu, Jan 10, 2019 at 02:26:55PM +0100, Fredrik Gustavsson wrote:
> commit affede4a779420bd8510ab937251a3796d3228df
> Author: Fredrik Gustavsson <gustfred@gmail.com>
> Date:   Tue Jan 8 11:21:39 2019 +0100
> 
> veth: Do not drop packets larger then the mtu set on the receiving side
> 
> Currently veth drops all packets larger then the mtu set on the receiving
> end of the pair. This is inconsistent with most hardware ethernet drivers
> that happily receives packets up the the ethernet MTU independent of the
> configured MTU.

I agree with your argument, but i wonder if there could be a better
implementation.

____dev_forward_skb() is on the hot path, so your additional check is
going to slow down packet forwarding for everybody, not just veth.
is_skb_forwardable() also does some additional checks which you are
now skipping. Is that O.K?

Since we are talking about a veth pair here, you have access to both
ends of the link. Maybe consider if the mtu is changed on one end, you
also change it on the other?

Lets see what others think of that.

     Andrew
Roopa Prabhu Jan. 10, 2019, 3:39 p.m. UTC | #2
On Thu, Jan 10, 2019 at 6:21 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Thu, Jan 10, 2019 at 02:26:55PM +0100, Fredrik Gustavsson wrote:
> > commit affede4a779420bd8510ab937251a3796d3228df
> > Author: Fredrik Gustavsson <gustfred@gmail.com>
> > Date:   Tue Jan 8 11:21:39 2019 +0100
> >
> > veth: Do not drop packets larger then the mtu set on the receiving side
> >
> > Currently veth drops all packets larger then the mtu set on the receiving
> > end of the pair. This is inconsistent with most hardware ethernet drivers
> > that happily receives packets up the the ethernet MTU independent of the
> > configured MTU.
>
> I agree with your argument, but i wonder if there could be a better
> implementation.
>
> ____dev_forward_skb() is on the hot path, so your additional check is
> going to slow down packet forwarding for everybody, not just veth.
> is_skb_forwardable() also does some additional checks which you are
> now skipping. Is that O.K?
>
> Since we are talking about a veth pair here, you have access to both
> ends of the link. Maybe consider if the mtu is changed on one end, you
> also change it on the other?
>
> Lets see what others think of that.
>

adding a IFF_VETH just for this seems like an overkill. especially
when there are other ways to indicate a virtual device type like
rtnetlink kind.
Also, its best to keep such checks local to veth, maybe with something
along the lines of what Andrew suggests above.
Toshiaki Makita Jan. 11, 2019, 4:22 a.m. UTC | #3
On 2019/01/10 22:26, Fredrik Gustavsson wrote:
> commit affede4a779420bd8510ab937251a3796d3228df
> Author: Fredrik Gustavsson <gustfred@gmail.com>
> Date:   Tue Jan 8 11:21:39 2019 +0100
> 
> veth: Do not drop packets larger then the mtu set on the receiving side
> 
> Currently veth drops all packets larger then the mtu set on the receiving
> end of the pair. This is inconsistent with most hardware ethernet drivers
> that happily receives packets up the the ethernet MTU independent of the
> configured MTU.
> 
> There should not be a need for dropping IP packets at receiver with
> size > configured IP MTU, IP MTU is for fragmentation at sender side.
> And IP packets with size > receiver L2 MTU will be dropped at sub-IP layer.

But with current veth behavior setting MTU effectively sets MRU as well.
This may be being used to drop unexpectedly long packets e.g. from
containers on container host. If we unconditionally change this behavior
it can cause regression on some environment. This should be an option at
least.

BTW there was a similar precedent attempt. You might want to take a look.

https://www.mail-archive.com/netdev@vger.kernel.org/msg167636.html
https://www.mail-archive.com/netdev@vger.kernel.org/msg167899.html
Fredrik Gustavsson Jan. 11, 2019, 1:06 p.m. UTC | #4
Den tors 10 jan. 2019 kl 16:39 skrev Roopa Prabhu <roopa@cumulusnetworks.com>:
>
> On Thu, Jan 10, 2019 at 6:21 AM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > On Thu, Jan 10, 2019 at 02:26:55PM +0100, Fredrik Gustavsson wrote:
> > > commit affede4a779420bd8510ab937251a3796d3228df
> > > Author: Fredrik Gustavsson <gustfred@gmail.com>
> > > Date:   Tue Jan 8 11:21:39 2019 +0100
> > >
> > > veth: Do not drop packets larger then the mtu set on the receiving side
> > >
> > > Currently veth drops all packets larger then the mtu set on the receiving
> > > end of the pair. This is inconsistent with most hardware ethernet drivers
> > > that happily receives packets up the the ethernet MTU independent of the
> > > configured MTU.
> >
> > I agree with your argument, but i wonder if there could be a better
> > implementation.
> >
> > ____dev_forward_skb() is on the hot path, so your additional check is
> > going to slow down packet forwarding for everybody, not just veth.
> > is_skb_forwardable() also does some additional checks which you are
> > now skipping. Is that O.K?
> >
> > Since we are talking about a veth pair here, you have access to both
> > ends of the link. Maybe consider if the mtu is changed on one end, you
> > also change it on the other?
> >
> > Lets see what others think of that.
> >
>
> adding a IFF_VETH just for this seems like an overkill. especially
> when there are other ways to indicate a virtual device type like
> rtnetlink kind.
> Also, its best to keep such checks local to veth, maybe with something
> along the lines of what Andrew suggests above.

Andrew: Nice to hear that my arguments were good.
You are totally right about the test in is_skb_forwardable
The one that is missed now is:
if (!(dev->flags & IFF_UP))
So of course moving the check inside is_skb_forwadable instead would
solved that.
The example provide was to show how the behaviour could be provoked.
For the actual use case I can not change to MTU on one of the
interface.

Roopa:
Overkill or not seems like a slim patch this way. Of course
dev_forward_skb version just for veth would of course be possible to
do. Just thought that kind of change would not be accepted with
duplicated code. Or do you mean in some other way to do it?

About the point of slowing down (not sure how much we actaully are
talking about) is a really good one. Since this is not only for veth.
Is it then better to do if for just veth with a veth specific
dev_forward_skb?
Fredrik Gustavsson Jan. 11, 2019, 1:13 p.m. UTC | #5
Den fre 11 jan. 2019 kl 05:23 skrev Toshiaki Makita
<makita.toshiaki@lab.ntt.co.jp>:
>
> On 2019/01/10 22:26, Fredrik Gustavsson wrote:
> > commit affede4a779420bd8510ab937251a3796d3228df
> > Author: Fredrik Gustavsson <gustfred@gmail.com>
> > Date:   Tue Jan 8 11:21:39 2019 +0100
> >
> > veth: Do not drop packets larger then the mtu set on the receiving side
> >
> > Currently veth drops all packets larger then the mtu set on the receiving
> > end of the pair. This is inconsistent with most hardware ethernet drivers
> > that happily receives packets up the the ethernet MTU independent of the
> > configured MTU.
> >
> > There should not be a need for dropping IP packets at receiver with
> > size > configured IP MTU, IP MTU is for fragmentation at sender side.
> > And IP packets with size > receiver L2 MTU will be dropped at sub-IP layer.
>
> But with current veth behavior setting MTU effectively sets MRU as well.
> This may be being used to drop unexpectedly long packets e.g. from
> containers on container host. If we unconditionally change this behavior
> it can cause regression on some environment. This should be an option at
> least.
>
> BTW there was a similar precedent attempt. You might want to take a look.
>
> https://www.mail-archive.com/netdev@vger.kernel.org/msg167636.html
> https://www.mail-archive.com/netdev@vger.kernel.org/msg167899.html
>
> --
> Toshiaki Makita
>
Good argument, but do you agree that it shouldn't be working like it
does? But not sure if such a case would exist but breaking
compatitbility have been discussed in other threads. So your saying
that people have actually used it as a receiving limit for IP packets?

Nice that you found that older thread it is also availble here:
https://lkml.org/lkml/2017/5/12/254
Were I thought the ending argument was that if it shouldn't be dropped
don't drop it. And that introducing a MRU parameter was just making
things more complicated. But for my use case either way would work.
Andrew Lunn Jan. 11, 2019, 1:41 p.m. UTC | #6
> > adding a IFF_VETH just for this seems like an overkill. especially
> > when there are other ways to indicate a virtual device type like
> > rtnetlink kind.
> > Also, its best to keep such checks local to veth, maybe with something
> > along the lines of what Andrew suggests above.
> 
> Andrew: Nice to hear that my arguments were good.

With DSA switches we somewhat rely on this behaviour, because there is
an extra header in the frame. I recently added code to automagically
increase the MTU, but that does not always work.

> For the actual use case I can not change to MTU on one of the
> interface.

Please could you explain why.

       Andrew
Toshiaki Makita Jan. 15, 2019, 6:35 a.m. UTC | #7
On 2019/01/11 22:13, Fredrik Gustavsson wrote:
> Den fre 11 jan. 2019 kl 05:23 skrev Toshiaki Makita
> <makita.toshiaki@lab.ntt.co.jp>:
>>
>> On 2019/01/10 22:26, Fredrik Gustavsson wrote:
>>> commit affede4a779420bd8510ab937251a3796d3228df
>>> Author: Fredrik Gustavsson <gustfred@gmail.com>
>>> Date:   Tue Jan 8 11:21:39 2019 +0100
>>>
>>> veth: Do not drop packets larger then the mtu set on the receiving side
>>>
>>> Currently veth drops all packets larger then the mtu set on the receiving
>>> end of the pair. This is inconsistent with most hardware ethernet drivers
>>> that happily receives packets up the the ethernet MTU independent of the
>>> configured MTU.
>>>
>>> There should not be a need for dropping IP packets at receiver with
>>> size > configured IP MTU, IP MTU is for fragmentation at sender side.
>>> And IP packets with size > receiver L2 MTU will be dropped at sub-IP layer.
>>
>> But with current veth behavior setting MTU effectively sets MRU as well.
>> This may be being used to drop unexpectedly long packets e.g. from
>> containers on container host. If we unconditionally change this behavior
>> it can cause regression on some environment. This should be an option at
>> least.
>>
>> BTW there was a similar precedent attempt. You might want to take a look.
>>
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg167636.html
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg167899.html
>>
> Good argument, but do you agree that it shouldn't be working like it
> does?

Not sure. As discussed in the previous thread I think some hardware can
limit MRU < 1500 by configuring MTU, so to me it is not especially odd.

> But not sure if such a case would exist but breaking
> compatitbility have been discussed in other threads. So your saying
> that people have actually used it as a receiving limit for IP packets?

No, I'm saying it can be used in that way. Even though we did not
confirm if there really are people using features in a certain way, we
usually assume there are, as long as the usage is possible, because so
many people use Linux.
diff mbox series

Patch

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 890fa5b905e2..74460278a621 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1161,6 +1161,7 @@  static void veth_setup(struct net_device *dev)
  dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
  dev->priv_flags |= IFF_NO_QUEUE;
  dev->priv_flags |= IFF_PHONY_HEADROOM;
+ dev->priv_flags |= IFF_VETH;

  dev->netdev_ops = &veth_netdev_ops;
  dev->ethtool_ops = &veth_ethtool_ops;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 857f8abf7b91..75465b3e4ecb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1487,6 +1487,7 @@  struct net_device_ops {
  * @IFF_NO_RX_HANDLER: device doesn't support the rx_handler hook
  * @IFF_FAILOVER: device is a failover master device
  * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
+ * @IFF_VETH: Veth device
  */
 enum netdev_priv_flags {
  IFF_802_1Q_VLAN = 1<<0,
@@ -1518,6 +1519,7 @@  enum netdev_priv_flags {
  IFF_NO_RX_HANDLER = 1<<26,
  IFF_FAILOVER = 1<<27,
  IFF_FAILOVER_SLAVE = 1<<28,
+ IFF_VETH = 1<<29,
 };

 #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN
@@ -1548,6 +1550,7 @@  enum netdev_priv_flags {
 #define IFF_NO_RX_HANDLER IFF_NO_RX_HANDLER
 #define IFF_FAILOVER IFF_FAILOVER
 #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE
+#define IFF_VETH IFF_VETH

 /**
  * struct net_device - The DEVICE structure.
@@ -3650,11 +3653,17 @@  int dev_forward_skb(struct net_device *dev,
struct sk_buff *skb);
 bool is_skb_forwardable(const struct net_device *dev,
  const struct sk_buff *skb);

+static inline bool netif_is_veth(const struct net_device *dev)
+{
+ return dev->priv_flags & IFF_VETH;
+}
+
 static __always_inline int ____dev_forward_skb(struct net_device *dev,
         struct sk_buff *skb)
 {
  if (skb_orphan_frags(skb, GFP_ATOMIC) ||
-     unlikely(!is_skb_forwardable(dev, skb))) {
+     netif_is_veth(dev) ? false :
+      unlikely(!is_skb_forwardable(dev, skb))) {
  atomic_long_inc(&dev->rx_dropped);
  kfree_skb(skb);