From patchwork Wed Feb 24 19:47:02 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Ahern X-Patchwork-Id: 587620 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 2C4DC140C8F for ; Thu, 25 Feb 2016 06:47:24 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=cumulusnetworks.com header.i=@cumulusnetworks.com header.b=Hn4aEZvi; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756439AbcBXTrR (ORCPT ); Wed, 24 Feb 2016 14:47:17 -0500 Received: from mail-pa0-f51.google.com ([209.85.220.51]:34326 "EHLO mail-pa0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751305AbcBXTrO (ORCPT ); Wed, 24 Feb 2016 14:47:14 -0500 Received: by mail-pa0-f51.google.com with SMTP id fy10so17959618pac.1 for ; Wed, 24 Feb 2016 11:47:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cumulusnetworks.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=41zo9IBIYGjgInXs1FsADI6GGBSGrPp1CxyPPOa3OZ4=; b=Hn4aEZvijPFFsaJNg3dRH6O7T5mr1duI2CKpuep6C9N8iuoEU+AZ1j13XWJ7sgpeLh WSiHw/ZNeFc9+/HLCRMVVrHLhVHwHczcm5pC1YIQCMBZVS/A1i4XVSB5NJn2Viv1r69R vZpHn1OIBUHwz7zdGUqI4o+2WcaHYx5jWfuMA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=41zo9IBIYGjgInXs1FsADI6GGBSGrPp1CxyPPOa3OZ4=; b=HzK/ZkBnWN5b32WEWpEn/fvXitnXx9pvhj7GrQTsQ5MF0R7qirz5a/KKMPl2M5uP+u F91ywixtOKVzXckwIa3Fm7A9+/tvf5jpJtPXXjk09CBgA3v+veigoeh+0sZqkHuHd9k5 JWZ5auIoH74ncbiwrMjas9tpLmhKV1eBRdDT/qTE+wgh3+L4w7NVeuyOTxiZrhQvjkFX wLVC+PLML7D6PE4eedrpvGPnGrKBWWd+xI4x+BWISmzoIG+rRGMB5EimGMeJ8h6R6KYd k4CNsF9TbL74ViDh3xT5liAfwRKE1U8TeCtgpPlj8YvOJtDJA3XcEJjJnUeh0gUEvgTS +tGA== X-Gm-Message-State: AG10YORS29st4x3PKEO/blWbGedM5PwT7/Rnuwb9YglFhFuCw9k1i6/PrYE4+se2aExqLiiI X-Received: by 10.66.90.136 with SMTP id bw8mr57523587pab.52.1456343233451; Wed, 24 Feb 2016 11:47:13 -0800 (PST) Received: from kenny.cumulusnetworks.com. ([216.129.126.126]) by smtp.googlemail.com with ESMTPSA id u90sm6862168pfa.60.2016.02.24.11.47.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 24 Feb 2016 11:47:12 -0800 (PST) From: David Ahern To: netdev@vger.kernel.org Cc: David Ahern , David Lamparter Subject: [PATCH net-next 1/2] net: l3mdev: address selection should only consider devices in L3 domain Date: Wed, 24 Feb 2016 11:47:02 -0800 Message-Id: <1456343223-17622-2-git-send-email-dsa@cumulusnetworks.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1456343223-17622-1-git-send-email-dsa@cumulusnetworks.com> References: <1456343223-17622-1-git-send-email-dsa@cumulusnetworks.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org David Lamparter noted a use case where the source address selection fails to pick an address from a VRF interface - unnumbered interfaces. Relevant commands from his script: ip addr add 9.9.9.9/32 dev lo ip link set lo up ip link add name vrf0 type vrf table 101 ip rule add oif vrf0 table 101 ip rule add iif vrf0 table 101 ip link set vrf0 up ip addr add 10.0.0.3/32 dev vrf0 ip link add name dummy2 type dummy ip link set dummy2 master vrf0 up --> note dummy2 has no address - unnumbered device ip route add 10.2.2.2/32 dev dummy2 table 101 ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02 tcpdump -ni dummy2 & And using ping instead of his socat example: $ ping -I vrf0 -c1 10.2.2.2 ping: Warning: source address might be selected on device other than vrf0. PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data. From tcpdump: 12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64 Note the source address is from lo and is not a VRF local address. With this patch: $ ping -I vrf0 -c1 10.2.2.2 PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data. From tcpdump: 12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64 Now the source address comes from vrf0. The ipv4 function for selecting source address takes a const argument. Removing the const requires touching a lot of places, so instead l3mdev_master_ifindex_rcu is changed to take a const argument and then do the typecast to non-const as required by netdev_master_upper_dev_get_rcu. This is similar to what l3mdev_fib_table_rcu does. IPv6 for unnumbered interfaces appears to be selecting the addresses properly. Cc: David Lamparter Signed-off-by: David Ahern --- include/net/l3mdev.h | 4 ++-- net/ipv4/devinet.c | 5 +++++ net/l3mdev/l3mdev.c | 11 +++++++++-- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h index 5567d46b3cff..c43a9c73de5e 100644 --- a/include/net/l3mdev.h +++ b/include/net/l3mdev.h @@ -39,7 +39,7 @@ struct l3mdev_ops { #ifdef CONFIG_NET_L3_MASTER_DEV -int l3mdev_master_ifindex_rcu(struct net_device *dev); +int l3mdev_master_ifindex_rcu(const struct net_device *dev); static inline int l3mdev_master_ifindex(struct net_device *dev) { int ifindex; @@ -179,7 +179,7 @@ struct dst_entry *l3mdev_rt6_dst_by_oif(struct net *net, #else -static inline int l3mdev_master_ifindex_rcu(struct net_device *dev) +static inline int l3mdev_master_ifindex_rcu(const struct net_device *dev) { return 0; } diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 29b8d3a7b19b..18d510fa7ee2 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1194,6 +1194,7 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope) __be32 addr = 0; struct in_device *in_dev; struct net *net = dev_net(dev); + int master_idx; rcu_read_lock(); in_dev = __in_dev_get_rcu(dev); @@ -1214,12 +1215,16 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope) if (addr) goto out_unlock; no_in_dev: + master_idx = l3mdev_master_ifindex_rcu(dev); /* Not loopback addresses on loopback should be preferred in this case. It is important that lo is the first interface in dev_base list. */ for_each_netdev_rcu(net, dev) { + if (l3mdev_master_ifindex_rcu(dev) != master_idx) + continue; + in_dev = __in_dev_get_rcu(dev); if (!in_dev) continue; diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c index 8e5ead366e7f..e925037fa0df 100644 --- a/net/l3mdev/l3mdev.c +++ b/net/l3mdev/l3mdev.c @@ -17,7 +17,7 @@ * @dev: targeted interface */ -int l3mdev_master_ifindex_rcu(struct net_device *dev) +int l3mdev_master_ifindex_rcu(const struct net_device *dev) { int ifindex = 0; @@ -28,8 +28,15 @@ int l3mdev_master_ifindex_rcu(struct net_device *dev) ifindex = dev->ifindex; } else if (netif_is_l3_slave(dev)) { struct net_device *master; + struct net_device *_dev = (struct net_device *)dev; - master = netdev_master_upper_dev_get_rcu(dev); + /* netdev_master_upper_dev_get_rcu calls + * list_first_or_null_rcu to walk the upper dev list. + * list_first_or_null_rcu does not handle a const arg. We aren't + * making changes, just want the master device from that list so + * typecast to remove the const + */ + master = netdev_master_upper_dev_get_rcu(_dev); if (master) ifindex = master->ifindex; }