mbox series

[PATCHv5,net,0/8] disable neigh update for tunnels during pmtu update

Message ID 20191222025116.2897-1-liuhangbin@gmail.com
Headers show
Series disable neigh update for tunnels during pmtu update | expand

Message

Hangbin Liu Dec. 22, 2019, 2:51 a.m. UTC
When we setup a pair of gretap, ping each other and create neighbour cache.
Then delete and recreate one side. We will never be able to ping6 to the new
created gretap.

The reason is when we ping6 remote via gretap, we will call like

gre_tap_xmit()
 - ip_tunnel_xmit()
   - tnl_update_pmtu()
     - skb_dst_update_pmtu()
       - ip6_rt_update_pmtu()
         - __ip6_rt_update_pmtu()
           - dst_confirm_neigh()
             - ip6_confirm_neigh()
               - __ipv6_confirm_neigh()
                 - n->confirmed = now

As the confirmed time updated, in neigh_timer_handler() the check for
NUD_DELAY confirm time will pass and the neigh state will back to
NUD_REACHABLE. So the old/wrong mac address will be used again.

If we do not update the confirmed time, the neigh state will go to
neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
neigh later, which is what IPv4 does.

We couldn't remove the ip6_confirm_neigh() directly as we still need it
for TCP flows. To fix it, we have to pass a bool parameter to
dst_ops.update_pmtu() and only disable neighbor update for tunnels.

v5: No code change, upate some commits description
v4: No code change, upate some commits description
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
    dst_ops.update_pmtu to control whether we should do neighbor confirm.
    Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.

---
Reproducer:

#!/bin/bash
set -x
ip -a netns del
modprobe -r veth
modprobe -r bridge

ip netns add ha
ip netns add hb
ip link add br0 type bridge
ip link set br0 up
ip link add br_ha type veth peer name veth0 netns ha
ip link add br_hb type veth peer name veth0 netns hb
ip link set br_ha up
ip link set br_hb up
ip link set br_ha master br0
ip link set br_hb master br0
ip netns exec ha ip link set veth0 up
ip netns exec hb ip link set veth0 up
ip netns exec ha ip addr add 192.168.0.1/24 dev veth0
ip netns exec hb ip addr add 192.168.0.2/24 dev veth0

ip netns exec ha ip link add gretap1 type gretap local 192.168.0.1 remote 192.168.0.2
ip netns exec ha ip link set gretap1 up
ip netns exec ha ip addr add 1.1.1.1/24 dev gretap1
ip netns exec ha ip addr add 1111::1/64 dev gretap1

ip netns exec hb ip link add gretap1 type gretap local 192.168.0.2 remote 192.168.0.1
ip netns exec hb ip link set gretap1 up
ip netns exec hb ip addr add 1.1.1.2/24 dev gretap1
ip netns exec hb ip addr add 1111::2/64 dev gretap1

ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
sleep 30

# recreate gretap
ip netns exec hb ip link del gretap1
ip netns exec hb ip link add gretap1 type gretap local 192.168.0.2 remote 192.168.0.1
ip netns exec hb ip link set gretap1 up
ip netns exec hb ip addr add 1.1.1.2/24 dev gretap1
ip netns exec hb ip addr add 1111::2/64 dev gretap1
ip netns exec hb ip link show dev gretap1

ip netns exec ha ip neigh show dev gretap1
ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
ip netns exec ha ip neigh show dev gretap1
sleep 10
ip netns exec ha ip neigh show dev gretap1
ip netns exec ha ping 1.1.1.2 -c 4
ip netns exec ha ping6 1111::2 -c 4
ip netns exec ha ip neigh show dev gretap1
---

Hangbin Liu (8):
  net: add bool confirm_neigh parameter for dst_ops.update_pmtu
  ip6_gre: do not confirm neighbor when do pmtu update
  gtp: do not confirm neighbor when do pmtu update
  net/dst: add new function skb_dst_update_pmtu_no_confirm
  tunnel: do not confirm neighbor when do pmtu update
  vti: do not confirm neighbor when do pmtu update
  sit: do not confirm neighbor when do pmtu update
  net/dst: do not confirm neighbor for vxlan and geneve pmtu update

 drivers/net/gtp.c                |  2 +-
 include/net/dst.h                | 13 +++++++++++--
 include/net/dst_ops.h            |  3 ++-
 net/bridge/br_nf_core.c          |  3 ++-
 net/decnet/dn_route.c            |  6 ++++--
 net/ipv4/inet_connection_sock.c  |  2 +-
 net/ipv4/ip_tunnel.c             |  2 +-
 net/ipv4/ip_vti.c                |  2 +-
 net/ipv4/route.c                 |  9 ++++++---
 net/ipv4/xfrm4_policy.c          |  5 +++--
 net/ipv6/inet6_connection_sock.c |  2 +-
 net/ipv6/ip6_gre.c               |  2 +-
 net/ipv6/ip6_tunnel.c            |  4 ++--
 net/ipv6/ip6_vti.c               |  2 +-
 net/ipv6/route.c                 | 22 +++++++++++++++-------
 net/ipv6/sit.c                   |  2 +-
 net/ipv6/xfrm6_policy.c          |  5 +++--
 net/netfilter/ipvs/ip_vs_xmit.c  |  2 +-
 net/sctp/transport.c             |  2 +-
 19 files changed, 58 insertions(+), 32 deletions(-)

Comments

Guillaume Nault Dec. 22, 2019, 10:10 p.m. UTC | #1
On Sun, Dec 22, 2019 at 10:51:08AM +0800, Hangbin Liu wrote:
> When we setup a pair of gretap, ping each other and create neighbour cache.
> Then delete and recreate one side. We will never be able to ping6 to the new
> created gretap.
> 
> The reason is when we ping6 remote via gretap, we will call like
> 
> gre_tap_xmit()
>  - ip_tunnel_xmit()
>    - tnl_update_pmtu()
>      - skb_dst_update_pmtu()
>        - ip6_rt_update_pmtu()
>          - __ip6_rt_update_pmtu()
>            - dst_confirm_neigh()
>              - ip6_confirm_neigh()
>                - __ipv6_confirm_neigh()
>                  - n->confirmed = now
> 
> As the confirmed time updated, in neigh_timer_handler() the check for
> NUD_DELAY confirm time will pass and the neigh state will back to
> NUD_REACHABLE. So the old/wrong mac address will be used again.
> 
> If we do not update the confirmed time, the neigh state will go to
> neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
> neigh later, which is what IPv4 does.
> 
> We couldn't remove the ip6_confirm_neigh() directly as we still need it
> for TCP flows. To fix it, we have to pass a bool parameter to
> dst_ops.update_pmtu() and only disable neighbor update for tunnels.
> 
No more objection from me (and you already have my Reviewed-by tag).
Thanks for your work Hangbin.
David Miller Dec. 25, 2019, 6:30 a.m. UTC | #2
From: Hangbin Liu <liuhangbin@gmail.com>
Date: Sun, 22 Dec 2019 10:51:08 +0800

> When we setup a pair of gretap, ping each other and create neighbour cache.
> Then delete and recreate one side. We will never be able to ping6 to the new
> created gretap.
> 
> The reason is when we ping6 remote via gretap, we will call like
> 
> gre_tap_xmit()
>  - ip_tunnel_xmit()
>    - tnl_update_pmtu()
>      - skb_dst_update_pmtu()
>        - ip6_rt_update_pmtu()
>          - __ip6_rt_update_pmtu()
>            - dst_confirm_neigh()
>              - ip6_confirm_neigh()
>                - __ipv6_confirm_neigh()
>                  - n->confirmed = now
> 
> As the confirmed time updated, in neigh_timer_handler() the check for
> NUD_DELAY confirm time will pass and the neigh state will back to
> NUD_REACHABLE. So the old/wrong mac address will be used again.
> 
> If we do not update the confirmed time, the neigh state will go to
> neigh->nud_state = NUD_PROBE; then go to NUD_FAILED and re-create the
> neigh later, which is what IPv4 does.
> 
> We couldn't remove the ip6_confirm_neigh() directly as we still need it
> for TCP flows. To fix it, we have to pass a bool parameter to
> dst_ops.update_pmtu() and only disable neighbor update for tunnels.
 ...

Series applied and queued up for -stable, thanks.