mbox series

[net-next,0/3] vxlan, geneve: allow to turn off PMTU updates on encap socket

Message ID 20200712200705.9796-1-fw@strlen.de
Headers show
Series vxlan, geneve: allow to turn off PMTU updates on encap socket | expand

Message

Florian Westphal July 12, 2020, 8:07 p.m. UTC
There are existing deployments where a vxlan or geneve interface is part
of a bridge.

In this case, MTU may look like this:

bridge mtu: 1450
vxlan (bridge port) mtu: 1450
other bridge ports: 1450

physical link (used by vxlan) mtu: 1500.

This makes sure that vxlan overhead (50 bytes) doesn't bring packets over the
1500 MTU of the physical link.

Unfortunately, in some cases, PMTU updates on the encap socket
can bring such setups into a non-working state: no traffic will pass
over the vxlan port (physical link) anymore.
Because of the bridge-based usage of the vxlan interface, the original
sender never learns of the change in path mtu and TCP clients will retransmit
the over-sized packets until timeout.


When this happens, a 'ip route flush cache' in the netns holding
the vxlan interface resolves the problem, i.e. the network is capable
of transporting the packets and the PMTU update is bogus.

Another workaround is to enable 'net.ipv4.tcp_mtu_probing'.

This patch series allows to configure vxlan and geneve interfaces
to ignore path mtu updates.
This is only useful in cases where the vxlan/geneve interface is
part of a bridge (which prevents clients from receiving pmtu updates)
and where all involved links are known to be able to handle link-mtu
sized packets.

Florian Westphal (3):
      udp_tunnel: allow to turn off path mtu discovery on encap sockets
      vxlan: allow to disable path mtu learning on encap socket
      geneve: allow disabling of pmtu detection on encap sk

 drivers/net/geneve.c         | 59 ++++++++++++++++++++++++++++++++++++----
 drivers/net/vxlan.c          | 65 ++++++++++++++++++++++++++++++++++++++------
 include/net/ipv6.h           |  7 +++++
 include/net/udp_tunnel.h     |  2 ++
 include/net/vxlan.h          |  2 ++
 include/uapi/linux/if_link.h |  2 ++
 net/ipv4/udp_tunnel_core.c   |  2 ++
 net/ipv6/ip6_udp_tunnel.c    |  7 +++++
 8 files changed, 131 insertions(+), 15 deletions(-)

Comments

Stefano Brivio July 12, 2020, 10:39 p.m. UTC | #1
On Sun, 12 Jul 2020 22:07:02 +0200
Florian Westphal <fw@strlen.de> wrote:

> There are existing deployments where a vxlan or geneve interface is part
> of a bridge.
> 
> In this case, MTU may look like this:
> 
> bridge mtu: 1450
> vxlan (bridge port) mtu: 1450
> other bridge ports: 1450
> 
> physical link (used by vxlan) mtu: 1500.
> 
> This makes sure that vxlan overhead (50 bytes) doesn't bring packets over the
> 1500 MTU of the physical link.
> 
> Unfortunately, in some cases, PMTU updates on the encap socket
> can bring such setups into a non-working state: no traffic will pass
> over the vxlan port (physical link) anymore.
> Because of the bridge-based usage of the vxlan interface, the original
> sender never learns of the change in path mtu and TCP clients will retransmit
> the over-sized packets until timeout.
> 
> 
> When this happens, a 'ip route flush cache' in the netns holding
> the vxlan interface resolves the problem, i.e. the network is capable
> of transporting the packets and the PMTU update is bogus.
> 
> Another workaround is to enable 'net.ipv4.tcp_mtu_probing'.
> 
> This patch series allows to configure vxlan and geneve interfaces
> to ignore path mtu updates.

Regardless of the comments to 1/3, I don't have any problem with this
(didn't review yet) if it's the only way to currently work around the
issue (of course :)).

I think we should eventually fix PMTU discovery for bridged setups, but
perhaps it's more complicated than that.

I wonder, though:

- wouldn't setting /proc/sys/net/ipv4/ip_no_pmtu_disc have the same
  effect?

- does it really make sense to have this configurable for IPv6?