diff mbox

[net-next] net: Add phys_port identifier to struct net_device and export it to sysfs

Message ID 20130617181004.GA1364@fedora-17-guest.dell.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Narendra K June 17, 2013, 6:10 p.m. UTC
It is useful to know if network interfaces from NIC partitions
'map to/use the' same physical port. For example,  when creating
bonding in fault tolerance mode, if  two network interfaces map to/use
the same physical port, it might not have the desired result. This
information is not available today in a standard format or it is not
present.  If this information can be made available in a generic  way
to user space, tools such as NetworkManager or Libteam or Wicked can
make smarter bonding decisions (such as warn users when setting up
configurations which will not have desired effect).

The requirement is to have a generic interface using which
kernel/drivers can provide information/hints to user space about the
physical port number used by a network interface.

The following options were explored -

1. 'dev_id' sysfs attribute:

In addition to being used to differentiate between devices that share
the same link layer address, it is being used to indicate the physical
port number used by a network interface.

As dev_id exists to differentiate between devices sharing the same
link layer address, dev_id option is not selected.

2. Re-using 'if_port' field in 'struct net_device':

if_port field exists to indicate the media type(please refer to
netdevice.h). It seemed like it was also used to indicate the physical
port number.

As re-using 'if_port' might possibly break user space, this option is
not selected.

3. Add a new field 'phys_port' to 'struct net_device' and export it
to sysfs:

The 'phys_port' will be a universally unique identifier, which
would be a MAC-48 or EUI-64 or a 128 bit UUID value, but not
restricted to these spaces. It will uniquely identify the physical
port used by a network interface. The 'length' of the identifier will
be zero if the field is not set for a network interface.

This patch implements option 3. It creates a new sysfs attribute
'phys_port' -

/sys/class/net/<interface name>/phys_port

References: http://marc.info/?l=linux-netdev&m=136920998009209&w=2
References: http://marc.info/?l=linux-netdev&m=136992041432498&w=2

Signed-off-by: Narendra K <narendra_k@dell.com>
---
Changes from RFC version:

Suggestions from Ben Hutchings -
1. 'struct port_identifier' is changed to be generic instead of
restricting it to MAC-48 or EUI-64 or 128 bit UUID.
2. Commit message updated to indicate point 1.
3. 'show_phys_port' function modified to handle zero length
instead of returning -EINVAL
4. 'show_phys_port' function made generic to handle all
lengths instead 6, 8 or 16 bytes.

Hi Ben, I have retained the commit message to indicate that 'dev_id'
is being used to indicate the physical port number also.

Thank you.

 include/linux/netdevice.h | 13 +++++++++++++
 net/core/net-sysfs.c      | 17 +++++++++++++++++
 2 files changed, 30 insertions(+)

Comments

John Fastabend June 17, 2013, 6:47 p.m. UTC | #1
On 06/17/2013 11:10 AM, Narendra_K@Dell.com wrote:
> It is useful to know if network interfaces from NIC partitions
> 'map to/use the' same physical port. For example,  when creating
> bonding in fault tolerance mode, if  two network interfaces map to/use
> the same physical port, it might not have the desired result. This
> information is not available today in a standard format or it is not
> present.  If this information can be made available in a generic  way
> to user space, tools such as NetworkManager or Libteam or Wicked can
> make smarter bonding decisions (such as warn users when setting up
> configurations which will not have desired effect).
>
> The requirement is to have a generic interface using which
> kernel/drivers can provide information/hints to user space about the
> physical port number used by a network interface.
>
> The following options were explored -
>
> 1. 'dev_id' sysfs attribute:
>
> In addition to being used to differentiate between devices that share
> the same link layer address, it is being used to indicate the physical
> port number used by a network interface.
>
> As dev_id exists to differentiate between devices sharing the same
> link layer address, dev_id option is not selected.
>
> 2. Re-using 'if_port' field in 'struct net_device':
>
> if_port field exists to indicate the media type(please refer to
> netdevice.h). It seemed like it was also used to indicate the physical
> port number.
>
> As re-using 'if_port' might possibly break user space, this option is
> not selected.
>
> 3. Add a new field 'phys_port' to 'struct net_device' and export it
> to sysfs:
>
> The 'phys_port' will be a universally unique identifier, which
> would be a MAC-48 or EUI-64 or a 128 bit UUID value, but not
> restricted to these spaces. It will uniquely identify the physical
> port used by a network interface. The 'length' of the identifier will
> be zero if the field is not set for a network interface.
>
> This patch implements option 3. It creates a new sysfs attribute
> 'phys_port' -
>
> /sys/class/net/<interface name>/phys_port
>
> References: http://marc.info/?l=linux-netdev&m=136920998009209&w=2
> References: http://marc.info/?l=linux-netdev&m=136992041432498&w=2
>
> Signed-off-by: Narendra K <narendra_k@dell.com>
> ---
> Changes from RFC version:
>
> Suggestions from Ben Hutchings -
> 1. 'struct port_identifier' is changed to be generic instead of
> restricting it to MAC-48 or EUI-64 or 128 bit UUID.
> 2. Commit message updated to indicate point 1.
> 3. 'show_phys_port' function modified to handle zero length
> instead of returning -EINVAL
> 4. 'show_phys_port' function made generic to handle all
> lengths instead 6, 8 or 16 bytes.
>
> Hi Ben, I have retained the commit message to indicate that 'dev_id'
> is being used to indicate the physical port number also.
>
> Thank you.
>
>   include/linux/netdevice.h | 13 +++++++++++++
>   net/core/net-sysfs.c      | 17 +++++++++++++++++
>   2 files changed, 30 insertions(+)

[...]

> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -334,6 +334,22 @@ static ssize_t store_group(struct device *dev, struct device_attribute *attr,
>   	return netdev_store(dev, attr, buf, len, change_group);
>   }
>

Is there some missing locking here?

> +static ssize_t show_phys_port(struct device *dev,
> +			      struct device_attribute *attr, char *buf)
> +{
> +	struct net_device *net = to_net_dev(dev);
> +	unsigned char len;
> +

         read_lock(&dev_base_lock);
> +	if (!dev_isalive(net))
> +		return -EINVAL;
> +
> +	len = net->phys_port.port_id_len;
> +	if (!len)
> +		return 0;

	ret = sysfs_format_mac(buf, net->phys_port.port_id, len);
	read_unlock(&dev_base_lock);

	return ret;
}

Please take a look maybe I missed something.

Thanks,
John
Narendra K June 19, 2013, 2:29 p.m. UTC | #2
> -----Original Message-----
> From: John Fastabend [mailto:john.fastabend@gmail.com]
> Sent: Tuesday, June 18, 2013 12:18 AM
> To: K, Narendra
> Cc: netdev@vger.kernel.org; bhutchings@solarflare.com;
> john.r.fastabend@intel.com
> Subject: Re: [PATCH net-next] net: Add phys_port identifier to struct
> net_device and export it to sysfs
> 
> On 06/17/2013 11:10 AM, Narendra_K@Dell.com wrote:
[...] 

> > 3. Add a new field 'phys_port' to 'struct net_device' and export it to
> > sysfs:
> >
> > The 'phys_port' will be a universally unique identifier, which would
> > be a MAC-48 or EUI-64 or a 128 bit UUID value, but not restricted to
> > these spaces. It will uniquely identify the physical port used by a
> > network interface. The 'length' of the identifier will be zero if the
> > field is not set for a network interface.
> >
> > This patch implements option 3. It creates a new sysfs attribute
> > 'phys_port' -
> >
> > /sys/class/net/<interface name>/phys_port
> >
> > References: http://marc.info/?l=linux-netdev&m=136920998009209&w=2
> > References: http://marc.info/?l=linux-netdev&m=136992041432498&w=2
> >
> > Signed-off-by: Narendra K <narendra_k@dell.com>
> > ---
> > Changes from RFC version:
> >
> > Suggestions from Ben Hutchings -
> > 1. 'struct port_identifier' is changed to be generic instead of
> > restricting it to MAC-48 or EUI-64 or 128 bit UUID.
> > 2. Commit message updated to indicate point 1.
> > 3. 'show_phys_port' function modified to handle zero length instead of
> > returning -EINVAL 4. 'show_phys_port' function made generic to handle
> > all lengths instead 6, 8 or 16 bytes.
> >
> > Hi Ben, I have retained the commit message to indicate that 'dev_id'
> > is being used to indicate the physical port number also.
> >
> > Thank you.
> >
> >   include/linux/netdevice.h | 13 +++++++++++++
> >   net/core/net-sysfs.c      | 17 +++++++++++++++++
> >   2 files changed, 30 insertions(+)
> 
> [...]
> 
> > --- a/net/core/net-sysfs.c
> > +++ b/net/core/net-sysfs.c
> > @@ -334,6 +334,22 @@ static ssize_t store_group(struct device *dev,
> struct device_attribute *attr,
> >   	return netdev_store(dev, attr, buf, len, change_group);
> >   }
> >
> 
> Is there some missing locking here?
> 
> > +static ssize_t show_phys_port(struct device *dev,
> > +			      struct device_attribute *attr, char *buf) {
> > +	struct net_device *net = to_net_dev(dev);
> > +	unsigned char len;
> > +
> 
>          read_lock(&dev_base_lock);
> > +	if (!dev_isalive(net))
> > +		return -EINVAL;
> > +
> > +	len = net->phys_port.port_id_len;
> > +	if (!len)
> > +		return 0;
> 
> 	ret = sysfs_format_mac(buf, net->phys_port.port_id, len);
> 	read_unlock(&dev_base_lock);
> 
> 	return ret;
> }
> 
> Please take a look maybe I missed something.
> 

Hi John, thanks for the pointer. It seems like we need  to hold the ' dev_base_lock' here.  I missed this initially as I was looking at ' show_broadcast' function . But looks like the 'show_broadcast' function is also missing the lock.  Attributes such as 'dev_id' are read with read_lock(&dev_base_lock) generically in netdev_show function. 

While looking at the use of ' dev_base_lock',  the 'write_lock' is being held when the 'netdev' is being added to and removed from 'dev_base_head'.  It is also being held when the 'dev->operstate'  and 'dev->link_mode' are being changed. 

The 'read_lock(&dev_base_lock)' needs to be held  before the 'dev_isalive(net) ' call because

1. netdev is not removed from 'dev_base_head' when 'show_phys_port'   accesses 'netdev->phys_port.port_id' (and port_id_len)
2. show_phys_port  function sees a consistent value of  'netdev->phys_port.port_id and netdev->phys_port.port_id_len '  if another execution path changes the value of 'netdev->phys_port.port_id and netdev->phys_port.port_id_len '  with write_lock(&dev_base_lock) held (similar to how dev->operstate is being changed).

Is the above understanding correct ? Sorry, if I missed some detail here. 

With regards,
Narendra K
Linux Engineering
Dell Inc.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings June 19, 2013, 3:36 p.m. UTC | #3
On Wed, 2013-06-19 at 07:29 -0700, Narendra_K@Dell.com wrote:
[...]
> 2. show_phys_port  function sees a consistent value of
> 'netdev->phys_port.port_id and netdev->phys_port.port_id_len '  if
> another execution path changes the value of 'netdev->phys_port.port_id
> and netdev->phys_port.port_id_len '  with write_lock(&dev_base_lock)
> held (similar to how dev->operstate is being changed).
[...]

If the physical port ID can change dynamically (I hadn't thought of
that, but an embedded switch could support such reconfiguration) then
any such change also needs to be announced through rtnetlink.  Actually,
I think the value needs to be included in rtnetlink information anyway.

Ben.
Ben Hutchings June 19, 2013, 7:34 p.m. UTC | #4
On Thu, 2013-06-20 at 00:23 +0530, Narendra_K@Dell.com wrote:
> > -----Original Message-----
> > From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> > Sent: Wednesday, June 19, 2013 9:07 PM
> > To: K, Narendra
> > Cc: john.fastabend@gmail.com; netdev@vger.kernel.org;
> > john.r.fastabend@intel.com
> > Subject: Re: [PATCH net-next] net: Add phys_port identifier to struct
> > net_device and export it to sysfs
> > 
> > On Wed, 2013-06-19 at 07:29 -0700, Narendra_K@Dell.com wrote:
> > [...]
> > > 2. show_phys_port  function sees a consistent value of
> > > 'netdev->phys_port.port_id and netdev->phys_port.port_id_len '  if
> > > another execution path changes the value of 'netdev->phys_port.port_id
> > > and netdev->phys_port.port_id_len '  with write_lock(&dev_base_lock)
> > > held (similar to how dev->operstate is being changed).
> > [...]
> > 
> > If the physical port ID can change dynamically (I hadn't thought of that, but an
> > embedded switch could support such reconfiguration) then any such change
> > also needs to be announced through rtnetlink.  Actually, I think the value
> > needs to be included in rtnetlink information anyway.
> > 
> 
> Ok. Thank you Ben. I had not thought about this scenario.  I was
> thinking about the reason to hold the dev_base_lock.  Do you think
> points 1 and 2 are correct reason to hold the dev_base_lock ?

I think so.

> If correct,  I think the 'show_broadcast' function also needs to be
> fixed as it is not holding the lock.

I think the broadcast address should never change during the lifetime of
a device, so it doesn't need the lock.  That might not be true for all
layer 2 protocols though.

Ben.
Praveen_Paladugu@Dell.com June 19, 2013, 9:37 p.m. UTC | #5
Having an unique identifier per port, the userspace tools might be able to check if the VFs/partitions are of the same physical port. Just being paranoid, wouldn't it be better to bond VFs/Partitions across NICs instead of across ports of the same NIC?


To be able to check if the VFs/partitions are of the same port & NIC, the unique identifier should probably have two identifiers  in it like ABC_XYZ. The ABC uniquely identifying the NIC and XYZ uniquely identifying the port.   

Any thoughts?


Thank you 
Praveen K Paladugu
John Fastabend June 21, 2013, 5:11 p.m. UTC | #6
On 06/19/2013 12:34 PM, Ben Hutchings wrote:
> On Thu, 2013-06-20 at 00:23 +0530, Narendra_K@Dell.com wrote:
>>> -----Original Message-----
>>> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>>> Sent: Wednesday, June 19, 2013 9:07 PM
>>> To: K, Narendra
>>> Cc: john.fastabend@gmail.com; netdev@vger.kernel.org;
>>> john.r.fastabend@intel.com
>>> Subject: Re: [PATCH net-next] net: Add phys_port identifier to struct
>>> net_device and export it to sysfs
>>>
>>> On Wed, 2013-06-19 at 07:29 -0700, Narendra_K@Dell.com wrote:
>>> [...]
>>>> 2. show_phys_port  function sees a consistent value of
>>>> 'netdev->phys_port.port_id and netdev->phys_port.port_id_len '  if
>>>> another execution path changes the value of 'netdev->phys_port.port_id
>>>> and netdev->phys_port.port_id_len '  with write_lock(&dev_base_lock)
>>>> held (similar to how dev->operstate is being changed).
>>> [...]
>>>
>>> If the physical port ID can change dynamically (I hadn't thought of that, but an
>>> embedded switch could support such reconfiguration) then any such change
>>> also needs to be announced through rtnetlink.  Actually, I think the value
>>> needs to be included in rtnetlink information anyway.
>>>
>>
>> Ok. Thank you Ben. I had not thought about this scenario.  I was
>> thinking about the reason to hold the dev_base_lock.  Do you think
>> points 1 and 2 are correct reason to hold the dev_base_lock ?
>
> I think so.
>
>> If correct,  I think the 'show_broadcast' function also needs to be
>> fixed as it is not holding the lock.
>
> I think the broadcast address should never change during the lifetime of
> a device, so it doesn't need the lock.  That might not be true for all
> layer 2 protocols though.
>
> Ben.
>

Also, do you think this will be primarily useful for partitioning
devices that expose multiple physical functions? Or do you see
a use case for SR-IOV with virtual functions as well. The pyhs_port
attribute provides a common interface for both cases which is good I
suppose in the VF case however the host can already learn this. I
gather from your original post here that you are aware of all this.

quoting:
> I was thinking about the scenario of VF0 and VF1 coming from PF0 in the host
> Network Controller 1 being direct assigned to a KVM guest via VTD and netdevices
> from VF0 and VF1 being bonded in the guest. Assuming that physical port number used
> by VF0 and VF1 is 1, additional information is needed to identify if port number 1
> is on Network controller 1 or Network controller 2. (In the host we could use
> PCI b/d/f to differentiate two Network Controllers). I think it is similar to
> hybrid guest acceleration on the VF assignment aspect.

I'm curious though why would the host/libvirt assign two VFs from the
same PF to a guest like this? Is this really a host mis-configuration
that you want a way to detect in the guest?

Thanks,
John
Narendra K June 25, 2013, 5:33 p.m. UTC | #7
> -----Original Message-----

> From: John Fastabend [mailto:john.fastabend@gmail.com]

> Sent: Friday, June 21, 2013 10:41 PM

> To: K, Narendra

> Cc: Ben Hutchings; netdev@vger.kernel.org; john.r.fastabend@intel.com

> Subject: Re: [PATCH net-next] net: Add phys_port identifier to struct

> net_device and export it to sysfs

> 

> On 06/19/2013 12:34 PM, Ben Hutchings wrote:

[...]

> > I think so.

> >

> >> If correct,  I think the 'show_broadcast' function also needs to be

> >> fixed as it is not holding the lock.

> >

> > I think the broadcast address should never change during the lifetime

> > of a device, so it doesn't need the lock.  That might not be true for

> > all layer 2 protocols though.

> >

> > Ben.

> >

> 

> Also, do you think this will be primarily useful for partitioning devices that

> expose multiple physical functions? Or do you see a use case for SR-IOV with

> virtual functions as well. The pyhs_port attribute provides a common

> interface for both cases which is good I suppose in the VF case however the

> host can already learn this. I gather from your original post here that you are

> aware of all this.

> 


John,   I think it will be useful in the SRIOV scenario also when more than one VF from two NICs are assigned to the guest. phys_port would be helpful in choosing the correct slave interfaces when host details are not available.
 
With regards,
Narendra K
Linux Engineering
Dell Inc.
John Fastabend June 28, 2013, 4:33 p.m. UTC | #8
[...]

>>
>> Also, do you think this will be primarily useful for partitioning devices that
>> expose multiple physical functions? Or do you see a use case for SR-IOV with
>> virtual functions as well. The pyhs_port attribute provides a common
>> interface for both cases which is good I suppose in the VF case however the
>> host can already learn this. I gather from your original post here that you are
>> aware of all this.
>>
>
> John, I think it will be useful in the SRIOV scenario also when more
than one VF from two NICs are assigned to the guest. phys_port would be
helpful in choosing the correct slave interfaces when host details are
not available.

OK. But I'm not sure why you would assign two VFs from the same NIC
to a guest? This doesn't seem like a good configuration for failover
because if one VF fails it seems likely both will fail. Maybe there
are some benefits for load balancing? Or my assumption both VFs will
fail is wrong.

Anyways it does seem useful in the partitioning case with multiple
physical functions.

.John
Ben Hutchings June 28, 2013, 5:09 p.m. UTC | #9
On Fri, 2013-06-28 at 09:33 -0700, John Fastabend wrote:
> [...]
> 
> >>
> >> Also, do you think this will be primarily useful for partitioning devices that
> >> expose multiple physical functions? Or do you see a use case for SR-IOV with
> >> virtual functions as well. The pyhs_port attribute provides a common
> >> interface for both cases which is good I suppose in the VF case however the
> >> host can already learn this. I gather from your original post here that you are
> >> aware of all this.
> >>
> >
> > John, I think it will be useful in the SRIOV scenario also when more
> than one VF from two NICs are assigned to the guest. phys_port would be
> helpful in choosing the correct slave interfaces when host details are
> not available.
> 
> OK. But I'm not sure why you would assign two VFs from the same NIC
> to a guest? This doesn't seem like a good configuration for failover
> because if one VF fails it seems likely both will fail. Maybe there
> are some benefits for load balancing? Or my assumption both VFs will
> fail is wrong.

I believe Narendra is trying to provide hints to the guest that would
allow it to avoid such broken bonding configurations.  But it is
certainly a good question why there would be two VFs assigned in the
first place.

I could imagine passing through two VFs for the same physical port that
have been assigned to different VLANs.  But then you wouldn't want to
bond two devices that are on different VLANs, whether or not they're
using the same port!

> Anyways it does seem useful in the partitioning case with multiple
> physical functions.

I was thinking it could also help to support the hybrid guest networking
mode.  In this mode, the guest gets a PV (e.g. virtio_net) device and a
VF bridged to the same physical port, and the VF can be removed before
the guest is migrated (and maybe reinserted if there's a VF available on
the new host) without a major disruption to the guest.  In that case the
guest *should* bond together the two net devices that have the same
physical port ID but different drivers.  This would require the physical
port ID to be propagated through macvtap/macvlan and virtio.

Ben.
Narendra K July 2, 2013, 2:40 p.m. UTC | #10
On Fri, Jun 28, 2013 at 10:39:13PM +0530, Ben Hutchings wrote:
> 
> On Fri, 2013-06-28 at 09:33 -0700, John Fastabend wrote:
[...]
> > > John, I think it will be useful in the SRIOV scenario also when more
> > than one VF from two NICs are assigned to the guest. phys_port would be
> > helpful in choosing the correct slave interfaces when host details are
> > not available.
> >
> > OK. But I'm not sure why you would assign two VFs from the same NIC
> > to a guest? This doesn't seem like a good configuration for failover
> > because if one VF fails it seems likely both will fail. Maybe there
> > are some benefits for load balancing? Or my assumption both VFs will
> > fail is wrong.
> 
> I believe Narendra is trying to provide hints to the guest that would
> allow it to avoid such broken bonding configurations.  But it is
> certainly a good question why there would be two VFs assigned in the
> first place.
> 
> I could imagine passing through two VFs for the same physical port that
> have been assigned to different VLANs.  But then you wouldn't want to
> bond two devices that are on different VLANs, whether or not they're
> using the same port!

I was thinking of the following scenario in the guest. 

bond0 = NIC1 VF0 + NIC2 VF0
bond1 = NIC1 VF1 + NIC2 VF1

bond0 and bond1 are on different VLANs. The phys_port identifier hint
would be helpful to the guest in selecting the correct slaves for the
above configuration. Sorry if I missed any detail here.

> 
> > Anyways it does seem useful in the partitioning case with multiple
> > physical functions.

Yes, I agree. 

> 
> I was thinking it could also help to support the hybrid guest networking
> mode.  In this mode, the guest gets a PV (e.g. virtio_net) device and a
> VF bridged to the same physical port, and the VF can be removed before
> the guest is migrated (and maybe reinserted if there's a VF available on
> the new host) without a major disruption to the guest.  In that case the
> guest *should* bond together the two net devices that have the same
> physical port ID but different drivers.  This would require the physical
> port ID to be propagated through macvtap/macvlan and virtio.
> 
> Ben.
> 
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Jiri Pirko July 11, 2013, 8:39 p.m. UTC | #11
Mon, Jun 17, 2013 at 08:10:32PM CEST, Narendra_K@Dell.com wrote:
>It is useful to know if network interfaces from NIC partitions
>'map to/use the' same physical port. For example,  when creating
>bonding in fault tolerance mode, if  two network interfaces map to/use
>the same physical port, it might not have the desired result. This
>information is not available today in a standard format or it is not
>present.  If this information can be made available in a generic  way
>to user space, tools such as NetworkManager or Libteam or Wicked can
>make smarter bonding decisions (such as warn users when setting up
>configurations which will not have desired effect).
>
>The requirement is to have a generic interface using which
>kernel/drivers can provide information/hints to user space about the
>physical port number used by a network interface.
>
>The following options were explored -
>
>1. 'dev_id' sysfs attribute:
>
>In addition to being used to differentiate between devices that share
>the same link layer address, it is being used to indicate the physical
>port number used by a network interface.
>
>As dev_id exists to differentiate between devices sharing the same
>link layer address, dev_id option is not selected.
>
>2. Re-using 'if_port' field in 'struct net_device':
>
>if_port field exists to indicate the media type(please refer to
>netdevice.h). It seemed like it was also used to indicate the physical
>port number.
>
>As re-using 'if_port' might possibly break user space, this option is
>not selected.
>
>3. Add a new field 'phys_port' to 'struct net_device' and export it
>to sysfs:
>
>The 'phys_port' will be a universally unique identifier, which
>would be a MAC-48 or EUI-64 or a 128 bit UUID value, but not
>restricted to these spaces. It will uniquely identify the physical
>port used by a network interface. The 'length' of the identifier will
>be zero if the field is not set for a network interface.
>
>This patch implements option 3. It creates a new sysfs attribute
>'phys_port' -


I think that correct way is to (Ben mentioned already part of it):
1) introduce ndo_phys_port_id() which would be used by core to get the
struct port_identifier filled by the driver (struct port_identifier is
not really a good name (namespace prefix should be there))

2) add netdev nofitier event type which would allow driver to propagate
changes of phys to to rtnetlink code and drivers which might be
interested (like bond/bridge/whatever) as well.

3) export phys port id through rtnetlink api to userspace.

I can cook up a patch like this after I return from my weekend trip if
you are interested :)

Jiri

>
>/sys/class/net/<interface name>/phys_port
>
>References: http://marc.info/?l=linux-netdev&m=136920998009209&w=2
>References: http://marc.info/?l=linux-netdev&m=136992041432498&w=2
>
>Signed-off-by: Narendra K <narendra_k@dell.com>
>---
>Changes from RFC version:
>
>Suggestions from Ben Hutchings -
>1. 'struct port_identifier' is changed to be generic instead of
>restricting it to MAC-48 or EUI-64 or 128 bit UUID.
>2. Commit message updated to indicate point 1.
>3. 'show_phys_port' function modified to handle zero length
>instead of returning -EINVAL
>4. 'show_phys_port' function made generic to handle all
>lengths instead 6, 8 or 16 bytes.
>
>Hi Ben, I have retained the commit message to indicate that 'dev_id'
>is being used to indicate the physical port number also.
>
>Thank you.
>
> include/linux/netdevice.h | 13 +++++++++++++
> net/core/net-sysfs.c      | 17 +++++++++++++++++
> 2 files changed, 30 insertions(+)
>
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 09b4188..ddb14ef 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1062,6 +1062,14 @@ struct net_device_ops {
> 						      bool new_carrier);
> };
> 
>+/* This structure holds a universally unique identifier to
>+ * identify the physical port used by a netdevice
>+ */
>+struct port_identifier {
>+	unsigned char port_id[MAX_ADDR_LEN];
>+	unsigned port_id_len;
>+};
>+
> /*
>  *	The DEVICE structure.
>  *	Actually, this whole structure is a big mistake.  It mixes I/O
>@@ -1181,6 +1189,11 @@ struct net_device {
> 						 * that share the same link
> 						 * layer address
> 						 */
>+	struct port_identifier	phys_port;	/* Universally unique physical
>+						 * port identifier, MAC-48 or
>+						 * EUI-64 or 128 bit UUID,
>+						 * length is zero if not set
>+						 */
> 	spinlock_t		addr_list_lock;
> 	struct netdev_hw_addr_list	uc;	/* Unicast mac addresses */
> 	struct netdev_hw_addr_list	mc;	/* Multicast mac addresses */
>diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
>index 981fed3..3245e90 100644
>--- a/net/core/net-sysfs.c
>+++ b/net/core/net-sysfs.c
>@@ -334,6 +334,22 @@ static ssize_t store_group(struct device *dev, struct device_attribute *attr,
> 	return netdev_store(dev, attr, buf, len, change_group);
> }
> 
>+static ssize_t show_phys_port(struct device *dev,
>+			      struct device_attribute *attr, char *buf)
>+{
>+	struct net_device *net = to_net_dev(dev);
>+	unsigned char len;
>+
>+	if (!dev_isalive(net))
>+		return -EINVAL;
>+
>+	len = net->phys_port.port_id_len;
>+	if (!len)
>+		return 0;
>+
>+	return sysfs_format_mac(buf, net->phys_port.port_id, len);
>+}
>+
> static struct device_attribute net_class_attributes[] = {
> 	__ATTR(addr_assign_type, S_IRUGO, show_addr_assign_type, NULL),
> 	__ATTR(addr_len, S_IRUGO, show_addr_len, NULL),
>@@ -355,6 +371,7 @@ static struct device_attribute net_class_attributes[] = {
> 	__ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len,
> 	       store_tx_queue_len),
> 	__ATTR(netdev_group, S_IRUGO | S_IWUSR, show_group, store_group),
>+	__ATTR(phys_port, S_IRUGO, show_phys_port, NULL),
> 	{}
> };
> 
>-- 
>1.8.0.1
>
>-- 
>With regards,
>Narendra K
>Linux Engineering
>Dell Inc.
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 09b4188..ddb14ef 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1062,6 +1062,14 @@  struct net_device_ops {
 						      bool new_carrier);
 };
 
+/* This structure holds a universally unique identifier to
+ * identify the physical port used by a netdevice
+ */
+struct port_identifier {
+	unsigned char port_id[MAX_ADDR_LEN];
+	unsigned port_id_len;
+};
+
 /*
  *	The DEVICE structure.
  *	Actually, this whole structure is a big mistake.  It mixes I/O
@@ -1181,6 +1189,11 @@  struct net_device {
 						 * that share the same link
 						 * layer address
 						 */
+	struct port_identifier	phys_port;	/* Universally unique physical
+						 * port identifier, MAC-48 or
+						 * EUI-64 or 128 bit UUID,
+						 * length is zero if not set
+						 */
 	spinlock_t		addr_list_lock;
 	struct netdev_hw_addr_list	uc;	/* Unicast mac addresses */
 	struct netdev_hw_addr_list	mc;	/* Multicast mac addresses */
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 981fed3..3245e90 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -334,6 +334,22 @@  static ssize_t store_group(struct device *dev, struct device_attribute *attr,
 	return netdev_store(dev, attr, buf, len, change_group);
 }
 
+static ssize_t show_phys_port(struct device *dev,
+			      struct device_attribute *attr, char *buf)
+{
+	struct net_device *net = to_net_dev(dev);
+	unsigned char len;
+
+	if (!dev_isalive(net))
+		return -EINVAL;
+
+	len = net->phys_port.port_id_len;
+	if (!len)
+		return 0;
+
+	return sysfs_format_mac(buf, net->phys_port.port_id, len);
+}
+
 static struct device_attribute net_class_attributes[] = {
 	__ATTR(addr_assign_type, S_IRUGO, show_addr_assign_type, NULL),
 	__ATTR(addr_len, S_IRUGO, show_addr_len, NULL),
@@ -355,6 +371,7 @@  static struct device_attribute net_class_attributes[] = {
 	__ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len,
 	       store_tx_queue_len),
 	__ATTR(netdev_group, S_IRUGO | S_IWUSR, show_group, store_group),
+	__ATTR(phys_port, S_IRUGO, show_phys_port, NULL),
 	{}
 };