diff mbox

[net-next] net: netlink messages for HW addr programming

Message ID 1473958082-30982-1-git-send-email-pruddy@brocade.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Patrick Ruddy Sept. 15, 2016, 4:48 p.m. UTC
Add RTM_NEWADDR and RTM_DELADDR netlink messages with family
AF_UNSPEC to indicate interest in specific unicast and multicast
hardware addresses. These messages are sent when addresses are
added or deleted from the appropriate interface driver.
Added AF_UNSPEC GETADDR function to allow the netlink notifications
to be replayed to avoid loss of state due to application start
ordering or restart.

Signed-off-by: Patrick Ruddy <pruddy@brocade.com>
---
 include/linux/netdevice.h |   1 +
 net/core/dev_addr_lists.c | 157 ++++++++++++++++++++++++++++++++++++++++++++--
 net/core/rtnetlink.c      |   8 ++-
 3 files changed, 161 insertions(+), 5 deletions(-)

Comments

Roopa Prabhu Sept. 18, 2016, 2:51 p.m. UTC | #1
On 9/15/16, 9:48 AM, Patrick Ruddy wrote:
> Add RTM_NEWADDR and RTM_DELADDR netlink messages with family
> AF_UNSPEC to indicate interest in specific unicast and multicast
> hardware addresses. These messages are sent when addresses are
> added or deleted from the appropriate interface driver.
> Added AF_UNSPEC GETADDR function to allow the netlink notifications
> to be replayed to avoid loss of state due to application start
> ordering or restart.
>
> Signed-off-by: Patrick Ruddy <pruddy@brocade.com>
> ---

RTM_NEWADDR and RTM_DELADDR are not used to add these entries to the kernel.
so, it seems a bit wrong to use RTM_NEWADDR and RTM_DELADDR to notify them to
userspace and also to request a special dump of these addresses.

This could just be a new nested netlink attribute in the existing link dump ?
Patrick Ruddy Sept. 19, 2016, 2:46 p.m. UTC | #2
On Sun, 2016-09-18 at 07:51 -0700, Roopa Prabhu wrote:
> On 9/15/16, 9:48 AM, Patrick Ruddy wrote:

> > Add RTM_NEWADDR and RTM_DELADDR netlink messages with family

> > AF_UNSPEC to indicate interest in specific unicast and multicast

> > hardware addresses. These messages are sent when addresses are

> > added or deleted from the appropriate interface driver.

> > Added AF_UNSPEC GETADDR function to allow the netlink notifications

> > to be replayed to avoid loss of state due to application start

> > ordering or restart.

> >

> > Signed-off-by: Patrick Ruddy <pruddy@brocade.com>

> > ---

> 

> RTM_NEWADDR and RTM_DELADDR are not used to add these entries to the kernel.

> so, it seems a bit wrong to use RTM_NEWADDR and RTM_DELADDR to notify them to

> userspace and also to request a special dump of these addresses.

> 

> This could just be a new nested netlink attribute in the existing link dump ?


Hi Roopa

Thanks for the review. I did initially code this using NEW/DEL/GET_LINK
messages but was asked to change to to ADDR messages by Stephen
Hemminger (cc'd). 

However I agree that these addresses fall between the LINK and ADDR
areas so I'm happy to change this if we can reach some consensus on the
format.

thanks

-pr
Roopa Prabhu Sept. 20, 2016, 5:31 a.m. UTC | #3
On 9/19/16, 7:46 AM, Patrick Ruddy wrote:
> On Sun, 2016-09-18 at 07:51 -0700, Roopa Prabhu wrote:
>> On 9/15/16, 9:48 AM, Patrick Ruddy wrote:
>>> Add RTM_NEWADDR and RTM_DELADDR netlink messages with family
>>> AF_UNSPEC to indicate interest in specific unicast and multicast
>>> hardware addresses. These messages are sent when addresses are
>>> added or deleted from the appropriate interface driver.
>>> Added AF_UNSPEC GETADDR function to allow the netlink notifications
>>> to be replayed to avoid loss of state due to application start
>>> ordering or restart.
>>>
>>> Signed-off-by: Patrick Ruddy <pruddy@brocade.com>
>>> ---
>> RTM_NEWADDR and RTM_DELADDR are not used to add these entries to the kernel.
>> so, it seems a bit wrong to use RTM_NEWADDR and RTM_DELADDR to notify them to
>> userspace and also to request a special dump of these addresses.
>>
>> This could just be a new nested netlink attribute in the existing link dump ?
> Hi Roopa
>
> Thanks for the review. I did initially code this using NEW/DEL/GET_LINK
> messages but was asked to change to to ADDR messages by Stephen
> Hemminger (cc'd). 
>
> However I agree that these addresses fall between the LINK and ADDR
> areas so I'm happy to change this if we can reach some consensus on the
> format.
>
ok, thanks for the history. yes, they do lie in a weird spot.
the general convention for other rtnl registrations seems to be
AF_UNSPEC family means include all supported families. thats where this seems a bit odd.

On the other hand, one reason I see where using RTM_*ADDR will be useful for this is if we wanted
to provide a way to add these uc and mc address via ip addr add in the future.
ip addr add <lladdr> dev eth0

Does this patch allow that in the future ?

also, will these l2 addresses now show up in 'ip addr show' output ?.

thanks,
Roopa
Jiri Pirko Sept. 20, 2016, 5:49 a.m. UTC | #4
Tue, Sep 20, 2016 at 07:31:27AM CEST, roopa@cumulusnetworks.com wrote:
>On 9/19/16, 7:46 AM, Patrick Ruddy wrote:
>> On Sun, 2016-09-18 at 07:51 -0700, Roopa Prabhu wrote:
>>> On 9/15/16, 9:48 AM, Patrick Ruddy wrote:
>>>> Add RTM_NEWADDR and RTM_DELADDR netlink messages with family
>>>> AF_UNSPEC to indicate interest in specific unicast and multicast
>>>> hardware addresses. These messages are sent when addresses are
>>>> added or deleted from the appropriate interface driver.
>>>> Added AF_UNSPEC GETADDR function to allow the netlink notifications
>>>> to be replayed to avoid loss of state due to application start
>>>> ordering or restart.
>>>>
>>>> Signed-off-by: Patrick Ruddy <pruddy@brocade.com>
>>>> ---
>>> RTM_NEWADDR and RTM_DELADDR are not used to add these entries to the kernel.
>>> so, it seems a bit wrong to use RTM_NEWADDR and RTM_DELADDR to notify them to
>>> userspace and also to request a special dump of these addresses.
>>>
>>> This could just be a new nested netlink attribute in the existing link dump ?
>> Hi Roopa
>>
>> Thanks for the review. I did initially code this using NEW/DEL/GET_LINK
>> messages but was asked to change to to ADDR messages by Stephen
>> Hemminger (cc'd). 
>>
>> However I agree that these addresses fall between the LINK and ADDR
>> areas so I'm happy to change this if we can reach some consensus on the
>> format.
>>
>ok, thanks for the history. yes, they do lie in a weird spot.

They are l2 addresses, they should be threated accordingly. Am I missing
something?


>the general convention for other rtnl registrations seems to be
>AF_UNSPEC family means include all supported families. thats where this seems a bit odd.
>
>On the other hand, one reason I see where using RTM_*ADDR will be useful for this is if we wanted
>to provide a way to add these uc and mc address via ip addr add in the future.
>ip addr add <lladdr> dev eth0
>
>Does this patch allow that in the future ?

This shoul go under ip link I believe. "ip addr" is for l3.


>
>also, will these l2 addresses now show up in 'ip addr show' output ?.
>
>thanks,
>Roopa
>
Roopa Prabhu Sept. 20, 2016, 6:36 a.m. UTC | #5
On 9/19/16, 10:49 PM, Jiri Pirko wrote:
> Tue, Sep 20, 2016 at 07:31:27AM CEST, roopa@cumulusnetworks.com wrote:
>> On 9/19/16, 7:46 AM, Patrick Ruddy wrote:
>>> On Sun, 2016-09-18 at 07:51 -0700, Roopa Prabhu wrote:
>>>> On 9/15/16, 9:48 AM, Patrick Ruddy wrote:
>>>>> Add RTM_NEWADDR and RTM_DELADDR netlink messages with family
>>>>> AF_UNSPEC to indicate interest in specific unicast and multicast
>>>>> hardware addresses. These messages are sent when addresses are
>>>>> added or deleted from the appropriate interface driver.
>>>>> Added AF_UNSPEC GETADDR function to allow the netlink notifications
>>>>> to be replayed to avoid loss of state due to application start
>>>>> ordering or restart.
>>>>>
>>>>> Signed-off-by: Patrick Ruddy <pruddy@brocade.com>
>>>>> ---
>>>> RTM_NEWADDR and RTM_DELADDR are not used to add these entries to the kernel.
>>>> so, it seems a bit wrong to use RTM_NEWADDR and RTM_DELADDR to notify them to
>>>> userspace and also to request a special dump of these addresses.
>>>>
>>>> This could just be a new nested netlink attribute in the existing link dump ?
>>> Hi Roopa
>>>
>>> Thanks for the review. I did initially code this using NEW/DEL/GET_LINK
>>> messages but was asked to change to to ADDR messages by Stephen
>>> Hemminger (cc'd). 
>>>
>>> However I agree that these addresses fall between the LINK and ADDR
>>> areas so I'm happy to change this if we can reach some consensus on the
>>> format.
>>>
>> ok, thanks for the history. yes, they do lie in a weird spot.
> They are l2 addresses, they should be threated accordingly. Am I missing
> something?
>
>
>> the general convention for other rtnl registrations seems to be
>> AF_UNSPEC family means include all supported families. thats where this seems a bit odd.
>>
>> On the other hand, one reason I see where using RTM_*ADDR will be useful for this is if we wanted
>> to provide a way to add these uc and mc address via ip addr add in the future.
>> ip addr add <lladdr> dev eth0
>>
>> Does this patch allow that in the future ?
> This shoul go under ip link I believe. "ip addr" is for l3.
>
>
yes, ...my initial comment was the same (two new attributes to cover UC and MC addresses).
patrick had it in link first..and there were some suggestions on doing it in addr. he is ok with either.

My questions were to make sure we don't lose anything ...by adding it under link.
there is no external way to add addrs to uc and mc lists today. hence would be nice
to cover that case as well when we are exposing the dev uc and mc lists to userspace.
and ofcourse ..it does not have to be RTM_NEWADDR ...
RTM_NEWLINK can cover it both ways also.

so, if stephen has no major objections, we can still go with attributes in RTM_*LINK.
Patrick Ruddy Oct. 3, 2016, 10:42 a.m. UTC | #6
On Tue, 2016-09-20 at 07:49 +0200, Jiri Pirko wrote:
> Tue, Sep 20, 2016 at 07:31:27AM CEST, roopa@cumulusnetworks.com wrote:

> >On 9/19/16, 7:46 AM, Patrick Ruddy wrote:

> >> On Sun, 2016-09-18 at 07:51 -0700, Roopa Prabhu wrote:

> >>> On 9/15/16, 9:48 AM, Patrick Ruddy wrote:

> >>>> Add RTM_NEWADDR and RTM_DELADDR netlink messages with family

> >>>> AF_UNSPEC to indicate interest in specific unicast and multicast

> >>>> hardware addresses. These messages are sent when addresses are

> >>>> added or deleted from the appropriate interface driver.

> >>>> Added AF_UNSPEC GETADDR function to allow the netlink notifications

> >>>> to be replayed to avoid loss of state due to application start

> >>>> ordering or restart.

> >>>>

> >>>> Signed-off-by: Patrick Ruddy <pruddy@brocade.com>

> >>>> ---

> >>> RTM_NEWADDR and RTM_DELADDR are not used to add these entries to the kernel.

> >>> so, it seems a bit wrong to use RTM_NEWADDR and RTM_DELADDR to notify them to

> >>> userspace and also to request a special dump of these addresses.

> >>>

> >>> This could just be a new nested netlink attribute in the existing link dump ?

> >> Hi Roopa

> >>

> >> Thanks for the review. I did initially code this using NEW/DEL/GET_LINK

> >> messages but was asked to change to to ADDR messages by Stephen

> >> Hemminger (cc'd). 

> >>

> >> However I agree that these addresses fall between the LINK and ADDR

> >> areas so I'm happy to change this if we can reach some consensus on the

> >> format.

> >>

> >ok, thanks for the history. yes, they do lie in a weird spot.

> 

> They are l2 addresses, they should be threated accordingly. Am I missing

> something?


In looking to rework this I remembered something. One of the plus sides
of using the ADDR messages to associate addresses with the device is
that we can use the NEW/DELADDR messages to signify the addition and
deletion of these addresses individually as they come and go, which they
do all the time e.g with IGMP joins and leaves. If we embed these
addresses within the LINK message we cannot use NEWLINK and DELLINK to
add and remove individual addresses as that would signify the link
coming and going which is not the case. SETLINK might work but then the
onus is on the netlink client to be able to work out the address
additions/deletions based on it's own state.  

> 

> >the general convention for other rtnl registrations seems to be

> >AF_UNSPEC family means include all supported families. thats where this seems a bit odd.

> >

> >On the other hand, one reason I see where using RTM_*ADDR will be useful for this is if we wanted

> >to provide a way to add these uc and mc address via ip addr add in the future.

> >ip addr add <lladdr> dev eth0

> >

> >Does this patch allow that in the future ?

> 

> This shoul go under ip link I believe. "ip addr" is for l3.

> 

> 

> >

> >also, will these l2 addresses now show up in 'ip addr show' output ?.

> >

I think the us ones will display in ip addr show and the mutlicast ones
in ip maddr show - this is the case today (i.e. without my changes)

thanks

-pr
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2095b6a..2029618 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3751,6 +3751,7 @@  int dev_mc_sync_multiple(struct net_device *to, struct net_device *from);
 void dev_mc_unsync(struct net_device *to, struct net_device *from);
 void dev_mc_flush(struct net_device *dev);
 void dev_mc_init(struct net_device *dev);
+int unspec_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb);
 
 /**
  *  __dev_mc_sync - Synchonize device's multicast list
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index c0548d2..70343e6 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -12,9 +12,17 @@ 
  */
 
 #include <linux/netdevice.h>
+#include <net/netlink.h>
 #include <linux/rtnetlink.h>
 #include <linux/export.h>
 #include <linux/list.h>
+#include <net/sock.h>
+
+enum unspec_addr_idx {
+	UNSPEC_UCAST = 0,
+	UNSPEC_MCAST,
+	UNSPEC_MAX
+};
 
 /*
  * General list handling functions
@@ -477,6 +485,139 @@  out:
 }
 EXPORT_SYMBOL(dev_uc_add_excl);
 
+static int fill_addr(struct sk_buff *skb, struct net_device *dev,
+		     const unsigned char *addr, u32 seq, int type,
+		     int addr_type, int ifa_flags, unsigned int flags)
+{
+	struct nlmsghdr *nlh;
+	struct ifaddrmsg *ifm;
+
+	nlh = nlmsg_put(skb, 0, seq, type, sizeof(*ifm), flags);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	ifm = nlmsg_data(nlh);
+	ifm->ifa_family = AF_UNSPEC;
+	ifm->ifa_prefixlen = 0;
+	ifm->ifa_flags = ifa_flags;
+	ifm->ifa_scope = RT_SCOPE_LINK;
+	ifm->ifa_index = dev->ifindex;
+	if (nla_put(skb, addr_type, dev->addr_len, addr))
+		goto nla_put_failure;
+	nlmsg_end(skb, nlh);
+	return 0;
+
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
+static inline size_t addr_nlmsg_size(void)
+{
+	return NLMSG_ALIGN(sizeof(struct ifaddrmsg))
+		+ nla_total_size(MAX_ADDR_LEN);
+}
+
+static void addr_notify(struct net_device *dev, const unsigned char *addr,
+			int type, int addr_type)
+{
+	struct net *net = dev_net(dev);
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(addr_nlmsg_size(), GFP_ATOMIC);
+	if (!skb)
+		goto errout;
+
+	err = fill_addr(skb, dev, addr, 0, type, addr_type, IFA_F_SECONDARY,
+			0);
+	if (err < 0) {
+		WARN_ON(err == -EMSGSIZE);
+		kfree_skb(skb);
+		goto errout;
+	}
+	rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
+	return;
+errout:
+	if (err < 0)
+		rtnl_set_sk_err(net, RTNLGRP_LINK, err);
+}
+
+int unspec_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct net *net = sock_net(skb->sk);
+	struct net_device *dev;
+	struct hlist_head *head;
+	struct netdev_hw_addr_list *list;
+	struct netdev_hw_addr *ha;
+	int h, s_h;
+	int idx = 0, s_idx;
+	int mac_idx = 0, s_mac_idx;
+	enum unspec_addr_idx addr_idx = 0, s_addr_idx;
+	int err = 0;
+
+	s_h = cb->args[0];
+	s_idx = cb->args[1];
+	s_addr_idx = cb->args[2];
+	s_mac_idx = cb->args[3];
+
+	rcu_read_lock();
+	for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
+		idx = 0;
+		head = &net->dev_index_head[h];
+		cb->seq = atomic_read(&net->ipv4.dev_addr_genid) ^
+			  net->dev_base_seq;
+		hlist_for_each_entry_rcu(dev, head, index_hlist) {
+			if (idx < s_idx)
+				goto cont;
+			if (h > s_h || idx > s_idx)
+				s_mac_idx = 0;
+			for (addr_idx = 0; addr_idx < UNSPEC_MAX;
+			     addr_idx++, s_addr_idx = 0) {
+				if (addr_idx < s_addr_idx)
+					continue;
+				list = (addr_idx == UNSPEC_UCAST) ? &dev->uc :
+					&dev->mc;
+				if (netdev_hw_addr_list_empty(list))
+					continue;
+				mac_idx = 0;
+				list_for_each_entry(ha, &list->list, list) {
+					if (mac_idx < s_mac_idx) {
+						mac_idx++;
+						continue;
+					}
+					err = fill_addr(skb, dev, ha->addr,
+							cb->nlh->nlmsg_seq,
+							RTM_NEWADDR,
+							(addr_idx ==
+							 UNSPEC_UCAST) ?
+							IFA_ADDRESS :
+							IFA_MULTICAST,
+							IFA_F_SECONDARY,
+							NLM_F_MULTI);
+					if (err < 0)
+						goto done;
+					nl_dump_check_consistent(cb,
+								 nlmsg_hdr(skb)
+								 );
+					mac_idx++;
+				}
+				s_mac_idx = 0;
+			}
+cont:
+			idx++;
+		}
+	}
+done:
+	rcu_read_unlock();
+	cb->args[0] = h;
+	cb->args[1] = idx;
+	cb->args[2] = addr_idx;
+	cb->args[3] = mac_idx;
+
+	return skb->len;
+}
+
 /**
  *	dev_uc_add - Add a secondary unicast address
  *	@dev: device
@@ -492,8 +633,10 @@  int dev_uc_add(struct net_device *dev, const unsigned char *addr)
 	netif_addr_lock_bh(dev);
 	err = __hw_addr_add(&dev->uc, addr, dev->addr_len,
 			    NETDEV_HW_ADDR_T_UNICAST);
-	if (!err)
+	if (!err) {
 		__dev_set_rx_mode(dev);
+		addr_notify(dev, addr, RTM_NEWADDR, IFA_ADDRESS);
+	}
 	netif_addr_unlock_bh(dev);
 	return err;
 }
@@ -514,8 +657,10 @@  int dev_uc_del(struct net_device *dev, const unsigned char *addr)
 	netif_addr_lock_bh(dev);
 	err = __hw_addr_del(&dev->uc, addr, dev->addr_len,
 			    NETDEV_HW_ADDR_T_UNICAST);
-	if (!err)
+	if (!err) {
 		__dev_set_rx_mode(dev);
+		addr_notify(dev, addr, RTM_DELADDR, IFA_ADDRESS);
+	}
 	netif_addr_unlock_bh(dev);
 	return err;
 }
@@ -669,8 +814,10 @@  static int __dev_mc_add(struct net_device *dev, const unsigned char *addr,
 	netif_addr_lock_bh(dev);
 	err = __hw_addr_add_ex(&dev->mc, addr, dev->addr_len,
 			       NETDEV_HW_ADDR_T_MULTICAST, global, false, 0);
-	if (!err)
+	if (!err) {
 		__dev_set_rx_mode(dev);
+		addr_notify(dev, addr, RTM_NEWADDR, IFA_MULTICAST);
+	}
 	netif_addr_unlock_bh(dev);
 	return err;
 }
@@ -709,8 +856,10 @@  static int __dev_mc_del(struct net_device *dev, const unsigned char *addr,
 	netif_addr_lock_bh(dev);
 	err = __hw_addr_del_ex(&dev->mc, addr, dev->addr_len,
 			       NETDEV_HW_ADDR_T_MULTICAST, global, false);
-	if (!err)
+	if (!err) {
 		__dev_set_rx_mode(dev);
+		addr_notify(dev, addr, RTM_DELADDR, IFA_MULTICAST);
+	}
 	netif_addr_unlock_bh(dev);
 	return err;
 }
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 937e459..e6292bb 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2686,8 +2686,14 @@  static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 	int idx;
 	int s_idx = cb->family;
 
-	if (s_idx == 0)
+	if (s_idx == 0) {
+		if (unspec_dump_ifaddr(skb, cb))
+			return skb->len;
+		memset(&cb->args[0], 0, sizeof(cb->args));
+		cb->prev_seq = 0;
+		cb->seq = 0;
 		s_idx = 1;
+	}
 	for (idx = 1; idx <= RTNL_FAMILY_MAX; idx++) {
 		int type = cb->nlh->nlmsg_type-RTM_BASE;
 		if (idx < s_idx || idx == PF_PACKET)