Message ID | 20191016190315.151095-1-weiwan@google.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net] ipv4: fix race condition between route lookup and invalidation | expand |
On Wed, Oct 16, 2019 at 12:03:15PM -0700, Wei Wang wrote: > Jesse and Ido reported the following race condition: > <CPU A, t0> - Received packet A is forwarded and cached dst entry is > taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set() > > <t1> - Given Jesse has busy routers ("ingesting full BGP routing tables > from multiple ISPs"), route is added / deleted and rt_cache_flush() is > called > > <CPU B, t2> - Received packet B tries to use the same cached dst entry > from t0, but rt_cache_valid() is no longer true and it is replaced in > rt_cache_route() by the newer one. This calls dst_dev_put() on the > original dst entry which assigns the blackhole netdev to 'dst->dev' > > <CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due > to 'dst->dev' being the blackhole netdev > > There are 2 issues in the v4 routing code: > 1. A per-netns counter is used to do the validation of the route. That > means whenever a route is changed in the netns, users of all routes in > the netns needs to redo lookup. v6 has an implementation of only > updating fn_sernum for routes that are affected. > 2. When rt_cache_valid() returns false, rt_cache_route() is called to > throw away the current cache, and create a new one. This seems > unnecessary because as long as this route does not change, the route > cache does not need to be recreated. > > To fully solve the above 2 issues, it probably needs quite some code > changes and requires careful testing, and does not suite for net branch. > > So this patch only tries to add the deleted cached rt into the uncached > list, so user could still be able to use it to receive packets until > it's done. > > Fixes: 95c47f9cf5e0 ("ipv4: call dst_dev_put() properly") > Signed-off-by: Wei Wang <weiwan@google.com> > Reported-by: Ido Schimmel <idosch@idosch.org> > Reported-by: Jesse Hathaway <jesse@mbuki-mvuki.org> > Tested-by: Jesse Hathaway <jesse@mbuki-mvuki.org> > Acked-by: Martin KaFai Lau <kafai@fb.com> > Cc: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com>
From: Wei Wang <weiwan@google.com> Date: Wed, 16 Oct 2019 12:03:15 -0700 > Jesse and Ido reported the following race condition: > <CPU A, t0> - Received packet A is forwarded and cached dst entry is > taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set() > > <t1> - Given Jesse has busy routers ("ingesting full BGP routing tables > from multiple ISPs"), route is added / deleted and rt_cache_flush() is > called > > <CPU B, t2> - Received packet B tries to use the same cached dst entry > from t0, but rt_cache_valid() is no longer true and it is replaced in > rt_cache_route() by the newer one. This calls dst_dev_put() on the > original dst entry which assigns the blackhole netdev to 'dst->dev' > > <CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due > to 'dst->dev' being the blackhole netdev > > There are 2 issues in the v4 routing code: > 1. A per-netns counter is used to do the validation of the route. That > means whenever a route is changed in the netns, users of all routes in > the netns needs to redo lookup. v6 has an implementation of only > updating fn_sernum for routes that are affected. > 2. When rt_cache_valid() returns false, rt_cache_route() is called to > throw away the current cache, and create a new one. This seems > unnecessary because as long as this route does not change, the route > cache does not need to be recreated. > > To fully solve the above 2 issues, it probably needs quite some code > changes and requires careful testing, and does not suite for net branch. > > So this patch only tries to add the deleted cached rt into the uncached > list, so user could still be able to use it to receive packets until > it's done. > > Fixes: 95c47f9cf5e0 ("ipv4: call dst_dev_put() properly") > Signed-off-by: Wei Wang <weiwan@google.com> > Reported-by: Ido Schimmel <idosch@idosch.org> > Reported-by: Jesse Hathaway <jesse@mbuki-mvuki.org> > Tested-by: Jesse Hathaway <jesse@mbuki-mvuki.org> > Acked-by: Martin KaFai Lau <kafai@fb.com> Applied and queued up for -stable.
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 14654876127e..9e0c8dff2cd6 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1482,7 +1482,7 @@ static bool rt_cache_route(struct fib_nh_common *nhc, struct rtable *rt) prev = cmpxchg(p, orig, rt); if (prev == orig) { if (orig) { - dst_dev_put(&orig->dst); + rt_add_uncached_list(orig); dst_release(&orig->dst); } } else {