From patchwork Wed Oct 16 00:02:31 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Simon Horman X-Patchwork-Id: 283801 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id D1EAD2C0361 for ; Wed, 16 Oct 2013 11:03:05 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759917Ab3JPADA (ORCPT ); Tue, 15 Oct 2013 20:03:00 -0400 Received: from kirsty.vergenet.net ([202.4.237.240]:57144 "EHLO kirsty.vergenet.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759727Ab3JPAC7 (ORCPT ); Tue, 15 Oct 2013 20:02:59 -0400 Received: from ayumi.isobedori.kobe.vergenet.net (p3094-ipbfp1203kobeminato.hyogo.ocn.ne.jp [118.10.152.94]) by kirsty.vergenet.net (Postfix) with ESMTP id BD92225BE33; Wed, 16 Oct 2013 11:02:57 +1100 (EST) Received: by ayumi.isobedori.kobe.vergenet.net (Postfix, from userid 7100) id 2A1526CE045; Wed, 16 Oct 2013 09:02:54 +0900 (JST) From: Simon Horman To: =?UTF-8?q?YOSHIFUJI=20Hideaki=20/=20=E5=90=89=E8=97=A4=E8=8B=B1=E6=98=8E?= Cc: lvs-devel@vger.kernel.org, netdev@vger.kernel.org, Julian Anastasov , Mark Brooks , Simon Horman Subject: [RFC net-next] ipv6: Use destination address determined by IPVS Date: Wed, 16 Oct 2013 09:02:31 +0900 Message-Id: <1381881751-6719-1-git-send-email-horms@verge.net.au> X-Mailer: git-send-email 1.8.4 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in ip6_finish_output2()") changed the behaviour of ip6_finish_output2() such that it creates and uses a neigh entry if none is found. Subsequently the 'n' field was removed from struct rt6_info. Unfortunately my analysis is that in the case of IPVS direct routing this change leads to incorrect behaviour as in this case packets may be output to a destination other than where they would be output according to the route table. In particular, the destination address may actually be a local address and empirically a neighbour lookup seems to result in it becoming unreachable. This patch resolves the problem by providing the destination address determined by IPVS to ip6_finish_output2() in the skb callback. Although this seems to work I can see several problems with this approach: * It is rather ugly, stuffing an IPVS exception right in the middle of IPv6 code. The overhead could be eliminated for many users by using a staic key. But none the less it is not attractive. * The use of the skb callback is may not be valid as it crosses from IPVS to IPv6 code. A possible, though unpleasant, alternative is to add a new field to struct sk_buff. * This covers all IPv6 packets output by IPVS but actually only those output using IPVS Direct-Routing need this. One way to resolve this would be to add a more fine-grained ipvs_property to struct sk_buff. Reported-by: Mark Brooks Signed-off-by: Simon Horman --- include/net/ip_vs.h | 6 ++++++ net/ipv6/ip6_output.c | 9 +++++++-- net/netfilter/ipvs/ip_vs_xmit.c | 2 ++ 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index 1c2e1b9..11d90a6 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest) atomic_read(&dest->inactconns); } +struct ipvs_skb_cb { + struct in6_addr *daddr; +}; + +#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb) + #endif /* _NET_IP_VS_H */ diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index a54c45c..a340180 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -52,6 +52,7 @@ #include #include #include +#include #include #include #include @@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb) struct dst_entry *dst = skb_dst(skb); struct net_device *dev = dst->dev; struct neighbour *neigh; - struct in6_addr *nexthop; + struct in6_addr *nexthop, *daddr; int ret; skb->protocol = htons(ETH_P_IPV6); @@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb) } rcu_read_lock_bh(); - nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr); + if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property)) + daddr = IP_VS_SKB_CB(skb)->daddr; + else + daddr = &ipv6_hdr(skb)->daddr; + nexthop = rt6_nexthop((struct rt6_info *)dst, daddr); neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop); if (unlikely(!neigh)) neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false); diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index c47444e..054b679 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -391,6 +391,8 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest, rt = (struct rt6_info *) dst; } + IP_VS_SKB_CB(skb)->daddr = daddr; + local = __ip_vs_is_local_route6(rt); if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) & rt_mode)) {