diff mbox

veth: replace iflink by a dedicated symlink in sysfs

Message ID 1439966648-26195-2-git-send-email-vincent@bernat.im
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Vincent Bernat Aug. 19, 2015, 6:44 a.m. UTC
While the documentation doesn't say exactly what kind of relationship
iflink should represent, until a45253, only lower devices were
advertised this way. While veth cannot have a lower device, using iflink
to advertise the peer may create infinite loops in programs using iflink
to discover device topology.

Instead of advertising the peer link with iflink, a symbolic link "peer"
is added to each peer.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
---
 drivers/net/veth.c | 40 +++++++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 15 deletions(-)

Comments

Jiri Benc Aug. 19, 2015, 11 a.m. UTC | #1
On Wed, 19 Aug 2015 08:44:08 +0200, Vincent Bernat wrote:
> While the documentation doesn't say exactly what kind of relationship
> iflink should represent, until a45253, only lower devices were
> advertised this way. While veth cannot have a lower device, using iflink
> to advertise the peer may create infinite loops in programs using iflink
> to discover device topology.
> 
> Instead of advertising the peer link with iflink, a symbolic link "peer"
> is added to each peer.

By removing veth_get_iflink, you're also stopping IFLA_LINK being
advertised in netlink messages, which consequently makes impossible to
reliably match veth peers across name spaces again. This would be a
huge step backwards.

 Jiri
Vincent Bernat Aug. 19, 2015, 12:13 p.m. UTC | #2
❦ 19 août 2015 13:00 +0200, Jiri Benc <jbenc@redhat.com> :

>> While the documentation doesn't say exactly what kind of relationship
>> iflink should represent, until a45253, only lower devices were
>> advertised this way. While veth cannot have a lower device, using iflink
>> to advertise the peer may create infinite loops in programs using iflink
>> to discover device topology.
>> 
>> Instead of advertising the peer link with iflink, a symbolic link "peer"
>> is added to each peer.
>
> By removing veth_get_iflink, you're also stopping IFLA_LINK being
> advertised in netlink messages, which consequently makes impossible to
> reliably match veth peers across name spaces again. This would be a
> huge step backwards.

That's the main goal of this patch: advertising the peer link as
IFLA_LINK attribute triggers an infinite loop in userland software when
they follow iflink to discover network devices topology. iflink has
always been the index of a lower device. If a sysfs symbolic link is not
good enough, I can propose a new IFLA_PEER attribute instead.
Jiri Benc Aug. 19, 2015, 12:38 p.m. UTC | #3
On Wed, 19 Aug 2015 14:13:00 +0200, Vincent Bernat wrote:
> That's the main goal of this patch: advertising the peer link as
> IFLA_LINK attribute triggers an infinite loop in userland software when
> they follow iflink to discover network devices topology. iflink has
> always been the index of a lower device. If a sysfs symbolic link is not
> good enough, I can propose a new IFLA_PEER attribute instead.

This would cause regression and break applications for those of us who
started relying on the netnsid feature to match interfaces across net
name spaces.

This is tough. If you're going to do such thing, you would at least
need to also introduce IFLA_PEER_NETNSID.

 Jiri
Vincent Bernat Aug. 19, 2015, 12:48 p.m. UTC | #4
❦ 19 août 2015 14:38 +0200, Jiri Benc <jbenc@redhat.com> :

>> That's the main goal of this patch: advertising the peer link as
>> IFLA_LINK attribute triggers an infinite loop in userland software when
>> they follow iflink to discover network devices topology. iflink has
>> always been the index of a lower device. If a sysfs symbolic link is not
>> good enough, I can propose a new IFLA_PEER attribute instead.
>
> This would cause regression and break applications for those of us who
> started relying on the netnsid feature to match interfaces across net
> name spaces.

Yes. Unfortunately.

> This is tough. If you're going to do such thing, you would at least
> need to also introduce IFLA_PEER_NETNSID.

Yes I can.

In my opinion, the change of semantics of IFLA_LINK is a break of
API. However, I can live with it since it's easy to workaround it. It
just seemed easier to start the discussion with a patch.
Nicolas Dichtel Aug. 19, 2015, 4:33 p.m. UTC | #5
Le 19/08/2015 14:48, Vincent Bernat a écrit :
>   ❦ 19 août 2015 14:38 +0200, Jiri Benc <jbenc@redhat.com> :
>
>>> That's the main goal of this patch: advertising the peer link as
>>> IFLA_LINK attribute triggers an infinite loop in userland software when
>>> they follow iflink to discover network devices topology. iflink has
>>> always been the index of a lower device. If a sysfs symbolic link is not
>>> good enough, I can propose a new IFLA_PEER attribute instead.
>>
>> This would cause regression and break applications for those of us who
>> started relying on the netnsid feature to match interfaces across net
>> name spaces.
>
> Yes. Unfortunately.
>
>> This is tough. If you're going to do such thing, you would at least
>> need to also introduce IFLA_PEER_NETNSID.
Probably better to introduce veth netlink attribute then, something like
IFLA_VETH_PEER and keeps IFLA_LINK_NETNSID.

>
> Yes I can.
>
> In my opinion, the change of semantics of IFLA_LINK is a break of
> API. However, I can live with it since it's easy to workaround it. It
> just seemed easier to start the discussion with a patch.
>
I also don't know what is the best way to handle this. veth advertises
its peer via IFLA_LINK since 4.1, so it's too late to change it for this
release.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Benc Aug. 20, 2015, 11:53 a.m. UTC | #6
On Wed, 19 Aug 2015 18:33:14 +0200, Nicolas Dichtel wrote:
> Probably better to introduce veth netlink attribute then, something like
> IFLA_VETH_PEER and keeps IFLA_LINK_NETNSID.

I'd prefer IFLA_PEER. More generic attribute will be helpful should we
introduce an interface similar to veth in the future.

Also, I'd not combine IFLA_LINK_NETNSID with IFLA_PEER. There might
very well be an interface in the future that will need both IFLA_LINK and
IFLA_PEER and this would just create a confusion. It may be unlikely
but the attributes are cheap and it doesn't make sense to design uAPI
in a way that might bring problems in the future.

> I also don't know what is the best way to handle this. veth advertises
> its peer via IFLA_LINK since 4.1, so it's too late to change it for this
> release.

Apparently we need to pick our poison. Either way, we break something.

 Jiri
Nicolas Dichtel Aug. 20, 2015, 2:31 p.m. UTC | #7
Le 20/08/2015 13:53, Jiri Benc a écrit :
> On Wed, 19 Aug 2015 18:33:14 +0200, Nicolas Dichtel wrote:
>> Probably better to introduce veth netlink attribute then, something like
>> IFLA_VETH_PEER and keeps IFLA_LINK_NETNSID.
>
> I'd prefer IFLA_PEER. More generic attribute will be helpful should we
> introduce an interface similar to veth in the future.s
Ok.

>
> Also, I'd not combine IFLA_LINK_NETNSID with IFLA_PEER. There might
> very well be an interface in the future that will need both IFLA_LINK and
> IFLA_PEER and this would just create a confusion. It may be unlikely
> but the attributes are cheap and it doesn't make sense to design uAPI
> in a way that might bring problems in the future.
Ok, but then this IFLA_PEER can include the ifindex and the nsid. No need
to have two new attributes.

>
>> I also don't know what is the best way to handle this. veth advertises
>> its peer via IFLA_LINK since 4.1, so it's too late to change it for this
>> release.
>
> Apparently we need to pick our poison. Either way, we break something.
Sure. I would prefer to have the same mechanism in all version, but I can
live with the other solution.

David, any thoughts about this?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 20, 2015, 9:07 p.m. UTC | #8
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 20 Aug 2015 16:31:11 +0200

> Le 20/08/2015 13:53, Jiri Benc a écrit :
>> On Wed, 19 Aug 2015 18:33:14 +0200, Nicolas Dichtel wrote:
>>> Probably better to introduce veth netlink attribute then, something
>>> like
>>> IFLA_VETH_PEER and keeps IFLA_LINK_NETNSID.
>>
>> I'd prefer IFLA_PEER. More generic attribute will be helpful should we
>> introduce an interface similar to veth in the future.s
> Ok.
> 
>>
>> Also, I'd not combine IFLA_LINK_NETNSID with IFLA_PEER. There might
>> very well be an interface in the future that will need both IFLA_LINK
>> and
>> IFLA_PEER and this would just create a confusion. It may be unlikely
>> but the attributes are cheap and it doesn't make sense to design uAPI
>> in a way that might bring problems in the future.
> Ok, but then this IFLA_PEER can include the ifindex and the nsid. No
> need
> to have two new attributes.
> 
>>
>>> I also don't know what is the best way to handle this. veth advertises
>>> its peer via IFLA_LINK since 4.1, so it's too late to change it for
>>> this
>>> release.
>>
>> Apparently we need to pick our poison. Either way, we break something.
> Sure. I would prefer to have the same mechanism in all version, but I
> can
> live with the other solution.
> 
> David, any thoughts about this?

You can't break the 4.1 semantics, it's in a released kernel and people
_ARE_ using it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vincent Bernat Aug. 22, 2015, 8:51 p.m. UTC | #9
❦ 20 août 2015 14:07 -0700, David Miller <davem@davemloft.net> :

>>>> I also don't know what is the best way to handle this. veth advertises
>>>> its peer via IFLA_LINK since 4.1, so it's too late to change it for
>>>> this
>>>> release.
>>>
>>> Apparently we need to pick our poison. Either way, we break something.
>> Sure. I would prefer to have the same mechanism in all version, but I
>> can live with the other solution.
>> 
>> David, any thoughts about this?
>
> You can't break the 4.1 semantics, it's in a released kernel and people
> _ARE_ using it.

I had a look at what other kind of daemons may exploit the pre-4.1
semantics (of not having an infinite loop when following iflink) and
failed to find any other users than "lldpd". Other LLDP daemons (lldpad,
ladvd, openlldpd) have other ways to find the lower interface. I would
also have thought that NetSNMP would use it to implement ifStackTable
but it doesn't in fact implement this table.
diff mbox

Patch

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index c8186ffda1a3..47f165bc3107 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -105,6 +105,17 @@  static const struct ethtool_ops veth_ethtool_ops = {
 	.get_ethtool_stats	= veth_get_ethtool_stats,
 };
 
+static int veth_peer_sysfs_add(struct net_device *dev,
+			      struct net_device *peer_dev)
+{
+	return sysfs_create_link(&(dev->dev.kobj), &(peer_dev->dev.kobj),
+				 "peer");
+}
+static void veth_peer_sysfs_del(struct net_device *dev)
+{
+	sysfs_remove_link(&(dev->dev.kobj), "peer");
+}
+
 static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct veth_priv *priv = netdev_priv(dev);
@@ -263,20 +274,6 @@  static void veth_poll_controller(struct net_device *dev)
 }
 #endif	/* CONFIG_NET_POLL_CONTROLLER */
 
-static int veth_get_iflink(const struct net_device *dev)
-{
-	struct veth_priv *priv = netdev_priv(dev);
-	struct net_device *peer;
-	int iflink;
-
-	rcu_read_lock();
-	peer = rcu_dereference(priv->peer);
-	iflink = peer ? peer->ifindex : 0;
-	rcu_read_unlock();
-
-	return iflink;
-}
-
 static const struct net_device_ops veth_netdev_ops = {
 	.ndo_init            = veth_dev_init,
 	.ndo_open            = veth_open,
@@ -289,7 +286,6 @@  static const struct net_device_ops veth_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= veth_poll_controller,
 #endif
-	.ndo_get_iflink		= veth_get_iflink,
 };
 
 #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_ALL_TSO |    \
@@ -436,6 +432,13 @@  static int veth_newlink(struct net *src_net, struct net_device *dev,
 
 	netif_carrier_off(dev);
 
+	err = veth_peer_sysfs_add(dev, peer);
+	if (err < 0)
+		goto err_configure_dev;
+	err = veth_peer_sysfs_add(peer, dev);
+	if (err < 0)
+		goto err_configure_dev;
+
 	/*
 	 * tie the deviced together
 	 */
@@ -447,9 +450,13 @@  static int veth_newlink(struct net *src_net, struct net_device *dev,
 	rcu_assign_pointer(priv->peer, dev);
 	return 0;
 
+err_configure_dev:
+	veth_peer_sysfs_del(dev);
+	veth_peer_sysfs_del(peer);
 err_register_dev:
 	/* nothing to do */
 err_configure_peer:
+	veth_peer_sysfs_del(dev);
 	unregister_netdevice(peer);
 	return err;
 
@@ -466,6 +473,9 @@  static void veth_dellink(struct net_device *dev, struct list_head *head)
 	priv = netdev_priv(dev);
 	peer = rtnl_dereference(priv->peer);
 
+	veth_peer_sysfs_del(dev);
+	if (peer) veth_peer_sysfs_del(dev);
+
 	/* Note : dellink() is called from default_device_exit_batch(),
 	 * before a rcu_synchronize() point. The devices are guaranteed
 	 * not being freed before one RCU grace period.