diff mbox

udev: create empty regular files to represent net interfaces

Message ID 20091022063619.GB6321@ldl.fc.hp.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

dann frazier Oct. 22, 2009, 6:36 a.m. UTC
Here's a proof of concept to further the discussion..

The default filename uses the format:
  /dev/netdev/by-ifindex/$ifindex

This provides the infrastructure to permit udev rules to create aliases for
network devices using symlinks, for example:

  /dev/netdev/by-name/eth0 -> ../by-ifindex/1
  /dev/netdev/by-biosname/LOM0 -> ../by-ifindex/3

A library (such as the proposed libnetdevname) could use this information
to provide an alias->realname mapping for network utilities.

Tested with the following rule:

SUBSYSTEM=="net", PROGRAM=="/usr/local/bin/ifindex2name $attr{ifindex}", SYMLINK+="netdev/by-name/%c"

$ cat /usr/local/bin/ifindex2name
#!/bin/sh

set -e

ifindex="$1"

for d in /sys/class/net/*; do
    testindex="$(cat $d/ifindex)"
    if [ "$ifindex" = "$testindex" ]; then
	echo "$(basename $d)"
	exit 0
    fi
done

exit 1

---
 libudev/exported_symbols |    1 +
 libudev/libudev.c        |   29 ++++++++++++++++
 libudev/libudev.h        |    1 +
 udev/udev-event.c        |   82 ++++++++++++++++++++--------------------------
 udev/udev-node.c         |   41 ++++++++++++++++++++---
 udev/udev-rules.c        |    3 +-
 6 files changed, 105 insertions(+), 52 deletions(-)

Comments

Matt Domsch Oct. 27, 2009, 8:55 p.m. UTC | #1
On Thu, Oct 22, 2009 at 12:36:20AM -0600, dann frazier wrote:
> Here's a proof of concept to further the discussion..
> 
> The default filename uses the format:
>   /dev/netdev/by-ifindex/$ifindex
> 
> This provides the infrastructure to permit udev rules to create aliases for
> network devices using symlinks, for example:
> 
>   /dev/netdev/by-name/eth0 -> ../by-ifindex/1
>   /dev/netdev/by-biosname/LOM0 -> ../by-ifindex/3
> 
> A library (such as the proposed libnetdevname) could use this information
> to provide an alias->realname mapping for network utilities.

yes, this could work, as IFINDEX is already exported in the uevents,
and that's the primary value udev needs to set up the mapping.

While I like the little ifindex2name script you've got, I think udev
could simply call if_indextoname() to get this, and not call an
external program?  I suppose it could be a really really simple
external program too.

I'd be fine with this approach.  It has the advantages of not
requiring a kernel change at all, and not creating a whole character
device which would be useless.  And it doesn't preclude someone in the
future from creating a char device for network devices should they so
choose.

Kay, what say you as udev owner?

Thanks,
Matt
Kay Sievers Oct. 28, 2009, 8:23 a.m. UTC | #2
On Tue, Oct 27, 2009 at 21:55, Matt Domsch <Matt_Domsch@dell.com> wrote:
> On Thu, Oct 22, 2009 at 12:36:20AM -0600, dann frazier wrote:
>> Here's a proof of concept to further the discussion..
>>
>> The default filename uses the format:
>>   /dev/netdev/by-ifindex/$ifindex
>>
>> This provides the infrastructure to permit udev rules to create aliases for
>> network devices using symlinks, for example:
>>
>>   /dev/netdev/by-name/eth0 -> ../by-ifindex/1
>>   /dev/netdev/by-biosname/LOM0 -> ../by-ifindex/3
>>
>> A library (such as the proposed libnetdevname) could use this information
>> to provide an alias->realname mapping for network utilities.
>
> yes, this could work, as IFINDEX is already exported in the uevents,
> and that's the primary value udev needs to set up the mapping.
>
> While I like the little ifindex2name script you've got, I think udev
> could simply call if_indextoname() to get this, and not call an
> external program?  I suppose it could be a really really simple
> external program too.

What's the point of all this? Why would udev ever need to find the
name of a device by the ifindex? The device name is the primary value
for the kernel events udev acts on.

> I'd be fine with this approach.  It has the advantages of not
> requiring a kernel change at all, and not creating a whole character
> device which would be useless.  And it doesn't preclude someone in the
> future from creating a char device for network devices should they so
> choose.
>
> Kay, what say you as udev owner?

That all sounds very much like something which will hit us back some
day. I'm not sure, if udev should publish such dead text files in
/dev, it does not seem to fit the usual APIs/assumptions where /sys
and /dev match, and libudev provides access to both. It all sounds
more like a database for a possible netdevname library, which does not
need to be public in /dev, right?

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Domsch Oct. 28, 2009, 1:03 p.m. UTC | #3
On Wed, Oct 28, 2009 at 09:23:57AM +0100, Kay Sievers wrote:
> On Tue, Oct 27, 2009 at 21:55, Matt Domsch <Matt_Domsch@dell.com> wrote:
> > On Thu, Oct 22, 2009 at 12:36:20AM -0600, dann frazier wrote:
> >> Here's a proof of concept to further the discussion..
> >>
> >> The default filename uses the format:
> >> ?? /dev/netdev/by-ifindex/$ifindex
> >>
> >> This provides the infrastructure to permit udev rules to create aliases for
> >> network devices using symlinks, for example:
> >>
> >> ?? /dev/netdev/by-name/eth0 -> ../by-ifindex/1
> >> ?? /dev/netdev/by-biosname/LOM0 -> ../by-ifindex/3
> >>
> >> A library (such as the proposed libnetdevname) could use this information
> >> to provide an alias->realname mapping for network utilities.
> >
> > yes, this could work, as IFINDEX is already exported in the uevents,
> > and that's the primary value udev needs to set up the mapping.
> >
> > While I like the little ifindex2name script you've got, I think udev
> > could simply call if_indextoname() to get this, and not call an
> > external program? ??I suppose it could be a really really simple
> > external program too.
> 
> What's the point of all this? Why would udev ever need to find the
> name of a device by the ifindex? The device name is the primary value
> for the kernel events udev acts on.

Ultimately, udev doesn't care.  I just want to use udev to keep track
of the pathname to device connections, like it does for all other
types of devices.

Applications such as net-tools, iproute, ethtool, etc.  take a kernel
device name.  I want to extend them to also take a path, and resolve
that path to a kernel device name.  libnetdevname currently is _one
small function_ which does this.  It need not even be in a library.
But whatever the mechanism, the path names need to be anchored
somewhere, so the library or all apps doing this kind of lookup know
where to look.

> That all sounds very much like something which will hit us back some
> day. I'm not sure, if udev should publish such dead text files in
> /dev, it does not seem to fit the usual APIs/assumptions where /sys
> and /dev match, and libudev provides access to both. It all sounds
> more like a database for a possible netdevname library, which does not
> need to be public in /dev, right?

Right, it doesn't need to be in /dev.  We could have udev rules that
simply call yet another program to maintain that database, in yet
another way.  But I really like how udev maintains the database of
symlinks for other device types, using symlinks in /dev/, and which
people are quite familiar with.  Why can't it be extended to do
likewise for network device names too?

There is a completely different approach possible here, if people
don't want to use something like /dev to track device name aliases.
We could put the whole name alias mechanism in the kernel, with new
netlink commands to add/remove/list aliases (and now we've overloaded that
term, as the old eth0:1 "alias" and dmz -> eth1 "alias" wouldn't be
the same thing).  But that idea hasn't met with a lot of interest
either.
Ben Hutchings Oct. 28, 2009, 1:06 p.m. UTC | #4
On Wed, 2009-10-28 at 09:23 +0100, Kay Sievers wrote:
> On Tue, Oct 27, 2009 at 21:55, Matt Domsch <Matt_Domsch@dell.com> wrote:
> > On Thu, Oct 22, 2009 at 12:36:20AM -0600, dann frazier wrote:
> >> Here's a proof of concept to further the discussion..
> >>
> >> The default filename uses the format:
> >>   /dev/netdev/by-ifindex/$ifindex
> >>
> >> This provides the infrastructure to permit udev rules to create aliases for
> >> network devices using symlinks, for example:
> >>
> >>   /dev/netdev/by-name/eth0 -> ../by-ifindex/1
> >>   /dev/netdev/by-biosname/LOM0 -> ../by-ifindex/3
> >>
> >> A library (such as the proposed libnetdevname) could use this information
> >> to provide an alias->realname mapping for network utilities.
> >
> > yes, this could work, as IFINDEX is already exported in the uevents,
> > and that's the primary value udev needs to set up the mapping.
> >
> > While I like the little ifindex2name script you've got, I think udev
> > could simply call if_indextoname() to get this, and not call an
> > external program?  I suppose it could be a really really simple
> > external program too.
> 
> What's the point of all this? Why would udev ever need to find the
> name of a device by the ifindex? The device name is the primary value
> for the kernel events udev acts on.
[...]

Since net devices can be renamed, unlike other devices, the ifindex is
the proper stable identifier.  Using the name as an identifier opens up
race conditions.  If there are events that don't include the ifindex,
this should be fixed.

Ben.
dann frazier Oct. 28, 2009, 3:09 p.m. UTC | #5
On Wed, Oct 28, 2009 at 08:03:08AM -0500, Matt Domsch wrote:
> On Wed, Oct 28, 2009 at 09:23:57AM +0100, Kay Sievers wrote:
[...]
> > That all sounds very much like something which will hit us back some
> > day. I'm not sure, if udev should publish such dead text files in
> > /dev, it does not seem to fit the usual APIs/assumptions where /sys
> > and /dev match, and libudev provides access to both. It all sounds
> > more like a database for a possible netdevname library, which does not
> > need to be public in /dev, right?
> 
> Right, it doesn't need to be in /dev.  We could have udev rules that
> simply call yet another program to maintain that database, in yet
> another way.

Or have udev maintain them in a private directory (e.g.,
/var/lib/udev/netalias). Personally, I like the approach of having
udev manage them as files - its an abstraction our users already get,
and they don't have to learn two mechanisms when aliasing disks and
nics (SYMLINK ftw). Plus there's obviously a lot of code reuse to be
had (most of my patch was moving code into a common section).

If we want to hide the file implementation - we could invent another
udev construct that basically aliases SYMLINK (e.g. NETALIAS) that
works iff the device is a netdevice. That would let us switch out
implementations in the future, but would obviously be much more
invasive.
Jordan_Hargrave@Dell.com Oct. 28, 2009, 4:09 p.m. UTC | #6
I was thinking, if we're not planning on use the chardev/kernel route.  There already exists an ifindex file in /sys/class/net/XXX/ifindex.
Should udev/helper be creating links to this, or is it better to keep everything under the /dev/ tree?
Using this method would require the patch to udev to handle renaming events.

--jordan hargrave
Dell Enterprise Linux Engineering



-----Original Message-----
From: dann frazier [mailto:dannf@hp.com]
Sent: Wed 10/28/2009 10:09
To: Domsch, Matt
Cc: Kay Sievers; linux-hotplug@vger.kernel.org; K, Narendra; netdev@vger.kernel.org; Hargrave, Jordan; Rose, Charles; Ben Hutchings
Subject: Re: [PATCH] udev: create empty regular files to represent net interfaces
 
On Wed, Oct 28, 2009 at 08:03:08AM -0500, Matt Domsch wrote:
> On Wed, Oct 28, 2009 at 09:23:57AM +0100, Kay Sievers wrote:
[...]
> > That all sounds very much like something which will hit us back some
> > day. I'm not sure, if udev should publish such dead text files in
> > /dev, it does not seem to fit the usual APIs/assumptions where /sys
> > and /dev match, and libudev provides access to both. It all sounds
> > more like a database for a possible netdevname library, which does not
> > need to be public in /dev, right?
> 
> Right, it doesn't need to be in /dev.  We could have udev rules that
> simply call yet another program to maintain that database, in yet
> another way.

Or have udev maintain them in a private directory (e.g.,
/var/lib/udev/netalias). Personally, I like the approach of having
udev manage them as files - its an abstraction our users already get,
and they don't have to learn two mechanisms when aliasing disks and
nics (SYMLINK ftw). Plus there's obviously a lot of code reuse to be
had (most of my patch was moving code into a common section).

If we want to hide the file implementation - we could invent another
udev construct that basically aliases SYMLINK (e.g. NETALIAS) that
works iff the device is a netdevice. That would let us switch out
implementations in the future, but would obviously be much more
invasive.
Jordan_Hargrave@Dell.com Oct. 28, 2009, 4:09 p.m. UTC | #7
I was thinking, if we're not planning on use the chardev/kernel route.  There already exists an ifindex file in /sys/class/net/XXX/ifindex.
Should udev/helper be creating links to this, or is it better to keep everything under the /dev/ tree?
Using this method would require the patch to udev to handle renaming events.


--jordan hargrave
Dell Enterprise Linux Engineering



-----Original Message-----
From: dann frazier [mailto:dannf@hp.com]
Sent: Wed 10/28/2009 10:09
To: Domsch, Matt
Cc: Kay Sievers; linux-hotplug@vger.kernel.org; K, Narendra; netdev@vger.kernel.org; Hargrave, Jordan; Rose, Charles; Ben Hutchings
Subject: Re: [PATCH] udev: create empty regular files to represent net interfaces
 
On Wed, Oct 28, 2009 at 08:03:08AM -0500, Matt Domsch wrote:
> On Wed, Oct 28, 2009 at 09:23:57AM +0100, Kay Sievers wrote:
[...]
> > That all sounds very much like something which will hit us back some
> > day. I'm not sure, if udev should publish such dead text files in
> > /dev, it does not seem to fit the usual APIs/assumptions where /sys
> > and /dev match, and libudev provides access to both. It all sounds
> > more like a database for a possible netdevname library, which does not
> > need to be public in /dev, right?
> 
> Right, it doesn't need to be in /dev.  We could have udev rules that
> simply call yet another program to maintain that database, in yet
> another way.

Or have udev maintain them in a private directory (e.g.,
/var/lib/udev/netalias). Personally, I like the approach of having
udev manage them as files - its an abstraction our users already get,
and they don't have to learn two mechanisms when aliasing disks and
nics (SYMLINK ftw). Plus there's obviously a lot of code reuse to be
had (most of my patch was moving code into a common section).

If we want to hide the file implementation - we could invent another
udev construct that basically aliases SYMLINK (e.g. NETALIAS) that
works iff the device is a netdevice. That would let us switch out
implementations in the future, but would obviously be much more
invasive.
Greg KH Oct. 28, 2009, 4:11 p.m. UTC | #8
A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top

On Wed, Oct 28, 2009 at 11:09:36AM -0500, Jordan_Hargrave@Dell.com wrote:
> I was thinking, if we're not planning on use the chardev/kernel route.
> There already exists an ifindex file in /sys/class/net/XXX/ifindex.
> Should udev/helper be creating links to this, or is it better to keep
> everything under the /dev/ tree?  Using this method would require the
> patch to udev to handle renaming events.

Please never create symlinks out of /dev/ to /sys that doesn't make
sense at all and probably violates part of the LSB somewhere...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Narendra K Oct. 28, 2009, 7:15 p.m. UTC | #9
>That all sounds very much like something which will hit us 
>back some day. I'm not sure, if udev should publish such dead 
>text files in /dev, it does not seem to fit the usual 
>APIs/assumptions where /sys and /dev match, and libudev 
>provides access to both. It all sounds more like a database 
>for a possible netdevname library, which does not need to be 
>public in /dev, right?

The char device nodes under /dev/netdev/ do seem to adhere to the
assumption of what is there under /sys and /dev match.

With regards,
Narendra K
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Domsch Oct. 29, 2009, 1:11 p.m. UTC | #10
On Wed, Oct 28, 2009 at 09:23:57AM +0100, Kay Sievers wrote:
> On Tue, Oct 27, 2009 at 21:55, Matt Domsch <Matt_Domsch@dell.com> wrote:
> > On Thu, Oct 22, 2009 at 12:36:20AM -0600, dann frazier wrote:
> >> Here's a proof of concept to further the discussion..
> >>
> >> The default filename uses the format:
> >> ?? /dev/netdev/by-ifindex/$ifindex
> >>
> >> This provides the infrastructure to permit udev rules to create aliases for
> >> network devices using symlinks, for example:
> >>
> >> ?? /dev/netdev/by-name/eth0 -> ../by-ifindex/1
> >> ?? /dev/netdev/by-biosname/LOM0 -> ../by-ifindex/3
> >>
> >> A library (such as the proposed libnetdevname) could use this information
> >> to provide an alias->realname mapping for network utilities.
>
> That all sounds very much like something which will hit us back some
> day. I'm not sure, if udev should publish such dead text files in
> /dev, it does not seem to fit the usual APIs/assumptions where /sys
> and /dev match, and libudev provides access to both.

While we could do this without any kernel changes at all, it does
still leave unresolved the concern of the in-kernel users of
netdevice-by-name - everything that uses dev_get_by_name(), and any
userspace tool that doesn't get converted to use the netdevice-by-path
concept.

Which brings us back to still looking for options.

Multiple names for the same device gives us a way out.
Users of the ethX naming convention continue unchanged, and live
with the nondeterminism; Users of other naming conventions can get
names that work for them, with needed determinism.

Netdev team - are you in agreement that having multiple names to
address the same netdevice is a worthwhile thing to add, to allow a
variety of naming schemes to exist simultaneously?  If not, this whole
discussion will be moot, and my basic problem, that the ethX naming
convention is nondeterministic, but we need determinism, remains
unresolved.

Assuming we agree it's worthwhile, I'm open to ideas for ways to solve
it.  We've proposed char devices for udev to manage, and fixing up
userspace programs, but that doesn't solve in-kernel users by name.
Dann proposed another method (above) which Kay isn't fond of, and
which has the same drawback.  I need another option then.

Thanks,
Matt
Greg KH Oct. 29, 2009, 2:25 p.m. UTC | #11
On Thu, Oct 29, 2009 at 08:11:25AM -0500, Matt Domsch wrote:
> Netdev team - are you in agreement that having multiple names to
> address the same netdevice is a worthwhile thing to add, to allow a
> variety of naming schemes to exist simultaneously?  If not, this whole
> discussion will be moot, and my basic problem, that the ethX naming
> convention is nondeterministic, but we need determinism, remains
> unresolved.

I'm still totally confused as to why you think this.  What is wrong with
what we do today, which is name network devices in a deterministic
manner by their MAC in userspace?  That name goes into the kernel, and
everyone uses the same name and is happy.

If you don't like naming by MAC, then pick some other deterministic
naming scheme that works for your hardware and write udev rules for it.

You could easily name them in a way that could keep the lowest number
(eth0) for the lowest PCI id if you so desired and your BIOS guaranteed
it.

This way the kernel has only one name, and so does userspace, and
everyone is happy.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Narendra K Oct. 29, 2009, 4:44 p.m. UTC | #12
>> Netdev team - are you in agreement that having multiple names to 
>> address the same netdevice is a worthwhile thing to add, to allow a 
>> variety of naming schemes to exist simultaneously?  If not, 
>this whole 
>> discussion will be moot, and my basic problem, that the ethX naming 
>> convention is nondeterministic, but we need determinism, remains 
>> unresolved.
>
>I'm still totally confused as to why you think this.  What is 
>wrong with what we do today, which is name network devices in 
>a deterministic manner by their MAC in userspace?  That name 
>goes into the kernel, and everyone uses the same name and is happy.

The interface name as assigned by the OS is determined by how the
interface is named first during the OS installation. This name is made
persistent by associating the name with it's MAC address in userspace,
either by udev or ifcfg-eth files. In cases where there are one or more
add-in cards along with one or more integrated cards (Lan on
Motherboard), the integrated port 1, which is designated as Gb1 on the
chassis may or may not get the name "eth0". And that is the customer
expectation, most of the times.
Unattended installs and large scale image based installs are the most
affected scenarios. 

>If you don't like naming by MAC, then pick some other 
>deterministic naming scheme that works for your hardware and 
>write udev rules for it.
>
>You could easily name them in a way that could keep the lowest number
>(eth0) for the lowest PCI id if you so desired and your BIOS 
>guaranteed it.
>

This is how the lspci tree view on a PER710 (PowerEdge R710) server with
Four BCM5709 integrated NIC ports and One add-in Intel NIC port looks
like. The integrated ports are always found before the add-in nic (or
nics) by the BIOS consistently and BIOS guarantees it across every
reboot. If the OS also found and named the network ports in the same
manner, then there is no issue as integrated NIC port 1, designated Gb1
on the chassis, is always named as "eth0". But the observation is that,
it is not the case always.

-[0000:00]-+-00.0  Intel Corporation 5520 I/O Hub to ESI Port
           +-01.0-[0000:01]--+-00.0  Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet
           |                 \-00.1  Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet
           +-03.0-[0000:02]--+-00.0  Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet
           |                 \-00.1  Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet
           +-04.0-[0000:03]----00.0  LSI Logic / Symbios Logic MegaRAID
SAS 1078
           +-05.0-[0000:04]--
           +-06.0-[0000:05]--
           +-07.0-[0000:06]--
           +-09.0-[0000:07]----00.0  Intel Corporation 82598EB
10-Gigabit AT Network Connection

In such cases, pathnames like Embedded_NIC_1 -> eth[01..], point to the
right interface, and communicate a more meaningful name without any
state embedded in them.

With regards,
Narendra K
Ben Hutchings Oct. 29, 2009, 4:49 p.m. UTC | #13
On Thu, 2009-10-29 at 07:25 -0700, Greg KH wrote:
> On Thu, Oct 29, 2009 at 08:11:25AM -0500, Matt Domsch wrote:
> > Netdev team - are you in agreement that having multiple names to
> > address the same netdevice is a worthwhile thing to add, to allow a
> > variety of naming schemes to exist simultaneously?  If not, this whole
> > discussion will be moot, and my basic problem, that the ethX naming
> > convention is nondeterministic, but we need determinism, remains
> > unresolved.
> 
> I'm still totally confused as to why you think this.  What is wrong with
> what we do today, which is name network devices in a deterministic
> manner by their MAC in userspace?  That name goes into the kernel, and
> everyone uses the same name and is happy.
> 
> If you don't like naming by MAC, then pick some other deterministic
> naming scheme that works for your hardware and write udev rules for it.
> 
> You could easily name them in a way that could keep the lowest number
> (eth0) for the lowest PCI id if you so desired and your BIOS guaranteed
> it.
> 
> This way the kernel has only one name, and so does userspace, and
> everyone is happy.

I thought there was a general trend in udev development to provide
default rules that work for almost everyone, so few users/administrators
need to override or add to them.  Compare disks and net devices:

1. Stable kernel device id
Disks: block device number
Net devices: ifindex

2. Unique identifier (across reboot)
Disks: label or UUID (each with limitations)
Net devices: (MAC address, subtype)

3. Name assignment mechanism
Disks: kernel suggests a name; udev can assign any number
Net devices: kernel assigns a single name; udev can override it

4. Default name assignment policy
Disks: names disk by device path (id), label and UUID
Net devices: assigns arbitrary stable names per (MAC address, subtype)

5. Naming by users
Disks: user can identify by any method without having to choose on a
system-wide basis
Net devices: user must identify by single name; policy can be overridden
on a system-wide basis

I fully understand the technical reasons for differences 3-5, but why
should users have to put up with it?

Ben.
Greg KH Oct. 29, 2009, 4:52 p.m. UTC | #14
On Thu, Oct 29, 2009 at 10:14:08PM +0530, Narendra_K@Dell.com wrote:
> 
> >> Netdev team - are you in agreement that having multiple names to 
> >> address the same netdevice is a worthwhile thing to add, to allow a 
> >> variety of naming schemes to exist simultaneously?  If not, 
> >this whole 
> >> discussion will be moot, and my basic problem, that the ethX naming 
> >> convention is nondeterministic, but we need determinism, remains 
> >> unresolved.
> >
> >I'm still totally confused as to why you think this.  What is 
> >wrong with what we do today, which is name network devices in 
> >a deterministic manner by their MAC in userspace?  That name 
> >goes into the kernel, and everyone uses the same name and is happy.
> 
> The interface name as assigned by the OS is determined by how the
> interface is named first during the OS installation.

That sounds like a distro install issue to me, why not fix it there?

> This name is made persistent by associating the name with it's MAC
> address in userspace, either by udev or ifcfg-eth files. In cases
> where there are one or more add-in cards along with one or more
> integrated cards (Lan on Motherboard), the integrated port 1, which is
> designated as Gb1 on the chassis may or may not get the name "eth0".

Exactly, who cares about "eth0" as a name?

> And that is the customer expectation, most of the times.

Then again, fix the installer to allow you to either pick the name, or
specify some rule in which to use to pick the name.

> Unattended installs and large scale image based installs are the most
> affected scenarios. 

Then fix the installer.

> >If you don't like naming by MAC, then pick some other 
> >deterministic naming scheme that works for your hardware and 
> >write udev rules for it.
> >
> >You could easily name them in a way that could keep the lowest number
> >(eth0) for the lowest PCI id if you so desired and your BIOS 
> >guaranteed it.
> >
> 
> This is how the lspci tree view on a PER710 (PowerEdge R710) server with
> Four BCM5709 integrated NIC ports and One add-in Intel NIC port looks
> like. The integrated ports are always found before the add-in nic (or
> nics) by the BIOS consistently and BIOS guarantees it across every
> reboot.

Great, then you are set to write a udev rule for this, right?

> If the OS also found and named the network ports in the same manner,
> then there is no issue as integrated NIC port 1, designated Gb1 on the
> chassis, is always named as "eth0". But the observation is that, it is
> not the case always.

Sure, it's never guaranteed by the kernel that this will happen,
especially as we speed up the boot process by doing things async.

So again, just fix your installer, or write a new udev rule for your
hardware platforms, or both.  But I still fail to see why multiple names
for network devices _in the kernel_ is a solution for your issue.

> In such cases, pathnames like Embedded_NIC_1 -> eth[01..], point to the
> right interface, and communicate a more meaningful name without any
> state embedded in them.

Yes, pathnames would be nice to work for network devices, but
unfortunatly, that's just not how network devices work :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Oct. 29, 2009, 4:55 p.m. UTC | #15
On Thu, Oct 29, 2009 at 04:49:35PM +0000, Ben Hutchings wrote:
> 3. Name assignment mechanism
> Disks: kernel suggests a name; udev can assign any number
> ???Net devices: kernel assigns a single name; udev can override it
> 
> 4. Default name assignment policy
> Disks: names disk by device path (id), label and UUID
> ???Net devices: assigns arbitrary stable names per (MAC address, subtype)
> 
> 5. Naming by users
> Disks: user can identify by any method without having to choose on a
> system-wide basis
> Net devices: user must identify by single name; policy can be overridden
> on a system-wide basis
> 
> I fully understand the technical reasons for differences 3-5, but why
> should users have to put up with it?

That is because network devices are not referred to by /dev/ nodes where
multiple symlinks would solve the naming problem.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Oct. 29, 2009, 5:12 p.m. UTC | #16
On Thu, 2009-10-29 at 09:55 -0700, Greg KH wrote:
> On Thu, Oct 29, 2009 at 04:49:35PM +0000, Ben Hutchings wrote:
> > 3. Name assignment mechanism
> > Disks: kernel suggests a name; udev can assign any number
> > ???Net devices: kernel assigns a single name; udev can override it
> > 
> > 4. Default name assignment policy
> > Disks: names disk by device path (id), label and UUID
> > ???Net devices: assigns arbitrary stable names per (MAC address, subtype)
> > 
> > 5. Naming by users
> > Disks: user can identify by any method without having to choose on a
> > system-wide basis
> > Net devices: user must identify by single name; policy can be overridden
> > on a system-wide basis
> > 
> > I fully understand the technical reasons for differences 3-5, but why
> > should users have to put up with it?
> 
> That is because network devices are not referred to by /dev/ nodes where
> multiple symlinks would solve the naming problem.

Did you even read the last sentence?

Ben.
Greg KH Oct. 29, 2009, 5:20 p.m. UTC | #17
On Thu, Oct 29, 2009 at 05:12:13PM +0000, Ben Hutchings wrote:
> On Thu, 2009-10-29 at 09:55 -0700, Greg KH wrote:
> > On Thu, Oct 29, 2009 at 04:49:35PM +0000, Ben Hutchings wrote:
> > > 3. Name assignment mechanism
> > > Disks: kernel suggests a name; udev can assign any number
> > > ???Net devices: kernel assigns a single name; udev can override it
> > > 
> > > 4. Default name assignment policy
> > > Disks: names disk by device path (id), label and UUID
> > > ???Net devices: assigns arbitrary stable names per (MAC address, subtype)
> > > 
> > > 5. Naming by users
> > > Disks: user can identify by any method without having to choose on a
> > > system-wide basis
> > > Net devices: user must identify by single name; policy can be overridden
> > > on a system-wide basis
> > > 
> > > I fully understand the technical reasons for differences 3-5, but why
> > > should users have to put up with it?
> > 
> > That is because network devices are not referred to by /dev/ nodes where
> > multiple symlinks would solve the naming problem.
> 
> Did you even read the last sentence?

Yes, the technical reason is the reason why users have to put up
with it :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Narendra K Oct. 29, 2009, 5:22 p.m. UTC | #18
>Sure, it's never guaranteed by the kernel that this will 
>happen, especially as we speed up the boot process by doing 
>things async.

>So again, just fix your installer, or write a new udev rule 
>for your hardware platforms, or both.  But I still fail to see 
>why multiple names for network devices _in the kernel_ is a 
>solution for your issue.
>

The char device nodes solution does not propose having multiple names
for the network interfaces _in the kernel_. It is suggesting that we
have alternate names for kernel assigned names in user space and user
space utilities refer to the interface by these alternate names. The
userspace utilities would have to map the pathnames to kernel names
before issuing the ioctls. That way there is determinism without
embedding MAC or any other attribute. An Embedded_NIC_1 interface would
always refer to the Gb1 without having to depend on any attribute.

With regards,
Narendra K 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
dann frazier Oct. 29, 2009, 5:46 p.m. UTC | #19
On Thu, Oct 29, 2009 at 07:25:54AM -0700, Greg KH wrote:
> On Thu, Oct 29, 2009 at 08:11:25AM -0500, Matt Domsch wrote:
> > Netdev team - are you in agreement that having multiple names to
> > address the same netdevice is a worthwhile thing to add, to allow a
> > variety of naming schemes to exist simultaneously?  If not, this whole
> > discussion will be moot, and my basic problem, that the ethX naming
> > convention is nondeterministic, but we need determinism, remains
> > unresolved.
> 
> I'm still totally confused as to why you think this.  What is wrong with
> what we do today, which is name network devices in a deterministic
> manner by their MAC in userspace?  That name goes into the kernel, and
> everyone uses the same name and is happy.
> 
> If you don't like naming by MAC, then pick some other deterministic
> naming scheme that works for your hardware and write udev rules for it.
> 
> You could easily name them in a way that could keep the lowest number
> (eth0) for the lowest PCI id if you so desired and your BIOS guaranteed
> it.
> 
> This way the kernel has only one name, and so does userspace, and
> everyone is happy.

There are two issues, which really seem distinct to me.

Users expect eth0 to map to first-onboard-nic. That's an installer
issue (since the BIOS can already export this info) and I agree that
if we want to "fix" that, we should fix it there.

Users also want to have a name that matches the way they think of
their hardware - pci slot, bios-exposed-name, mac address,
whatever. This can be done today w/ custom udev rules, and I can
visualize an installer that would generate these rules for you:

Configure a NIC
    \-> Choose NIC by: MAC/CHASSIS-NAME/PCI-SLOT
      [ Present list of unconfigured NICs by selected property ]
      \-> What name would you like to use for this interface [eth3]?
          How do you want this configured (DHCP/STATIC/..)
          ...

That would make a lot of users much happier (myself included), but it
does restrict us into one view. At different times, admins think of
their NICs by different properties. I may want to do IP assignment by
the chassis name, but then run ethereal on a specific mac address. Or
I may want to see the routes assigned to all NICs in a given PCI
slot. Sure, I can lookup all of these properties and map them back to
an interface name by hand, but aliasing provides a nice way to
short-circuit that. And, by providing a library that translates the
aliases for us, we can help ensure that all apps that want to provide
aliasing can do so in a common way.
dann frazier Oct. 29, 2009, 5:50 p.m. UTC | #20
On Thu, Oct 29, 2009 at 10:52:52PM +0530, Narendra_K@Dell.com wrote:
> 
> >Sure, it's never guaranteed by the kernel that this will 
> >happen, especially as we speed up the boot process by doing 
> >things async.
> 
> >So again, just fix your installer, or write a new udev rule 
> >for your hardware platforms, or both.  But I still fail to see 
> >why multiple names for network devices _in the kernel_ is a 
> >solution for your issue.
> >
> 
> The char device nodes solution does not propose having multiple names
> for the network interfaces _in the kernel_.

Nor does the udev-only implementation I posted which doesn't rely on
new character devices.
Marco d'Itri Oct. 30, 2009, 3:30 a.m. UTC | #21
On Oct 29, dann frazier <dannf@dannf.org> wrote:

> That would make a lot of users much happier (myself included), but it
> does restrict us into one view. At different times, admins think of
> their NICs by different properties. I may want to do IP assignment by
[Citation needed]

(An admin.)
dann frazier Oct. 30, 2009, 5:38 a.m. UTC | #22
On Fri, Oct 30, 2009 at 04:30:29AM +0100, Marco d'Itri wrote:
> On Oct 29, dann frazier <dannf@dannf.org> wrote:
> 
> > That would make a lot of users much happier (myself included), but it
> > does restrict us into one view. At different times, admins think of
> > their NICs by different properties. I may want to do IP assignment by
> [Citation needed]

Is "I" not clear, or do you just not believe I admin servers?
Marco d'Itri Oct. 30, 2009, 6:22 a.m. UTC | #23
On Oct 30, dann frazier <dannf@hp.com> wrote:

> On Fri, Oct 30, 2009 at 04:30:29AM +0100, Marco d'Itri wrote:
> > On Oct 29, dann frazier <dannf@dannf.org> wrote:
> > 
> > > That would make a lot of users much happier (myself included), but it
> > > does restrict us into one view. At different times, admins think of
> > > their NICs by different properties. I may want to do IP assignment by
> > [Citation needed]
> Is "I" not clear, or do you just not believe I admin servers?
I was referring to "At different times, admins think of their NICs by
different properties", which I am not sure is a valid generalization.
Hannes Reinecke Oct. 30, 2009, 7:45 a.m. UTC | #24
Kay Sievers wrote:
> On Tue, Oct 27, 2009 at 21:55, Matt Domsch <Matt_Domsch@dell.com> wrote:
>> On Thu, Oct 22, 2009 at 12:36:20AM -0600, dann frazier wrote:
>>> Here's a proof of concept to further the discussion..
>>>
>>> The default filename uses the format:
>>>   /dev/netdev/by-ifindex/$ifindex
>>>
>>> This provides the infrastructure to permit udev rules to create aliases for
>>> network devices using symlinks, for example:
>>>
>>>   /dev/netdev/by-name/eth0 -> ../by-ifindex/1
>>>   /dev/netdev/by-biosname/LOM0 -> ../by-ifindex/3
>>>
>>> A library (such as the proposed libnetdevname) could use this information
>>> to provide an alias->realname mapping for network utilities.
>> yes, this could work, as IFINDEX is already exported in the uevents,
>> and that's the primary value udev needs to set up the mapping.
>>
>> While I like the little ifindex2name script you've got, I think udev
>> could simply call if_indextoname() to get this, and not call an
>> external program?  I suppose it could be a really really simple
>> external program too.
> 
> What's the point of all this? Why would udev ever need to find the
> name of a device by the ifindex? The device name is the primary value
> for the kernel events udev acts on.
> 
>> I'd be fine with this approach.  It has the advantages of not
>> requiring a kernel change at all, and not creating a whole character
>> device which would be useless.  And it doesn't preclude someone in the
>> future from creating a char device for network devices should they so
>> choose.
>>
>> Kay, what say you as udev owner?
> 
> That all sounds very much like something which will hit us back some
> day. I'm not sure, if udev should publish such dead text files in
> /dev, it does not seem to fit the usual APIs/assumptions where /sys
> and /dev match, and libudev provides access to both. It all sounds
> more like a database for a possible netdevname library, which does not
> need to be public in /dev, right?
> 
And to throw in some bit of useless information;

I've pondered the idea of persistent device names for network interfaces
a while back when designing the original layout of the persistent
device naming scheme.

The one reason I didn't to this was that a network interface is
_not_ a file, but rather an abstract type which is known only internally
in the kernel (ie the one exemption from the 'everything is a file'
UNIX rule).
And as udev is primarily concerned with the _files_, using it for
network interfaces would be a workaround at best.

When I were to design this, I would be implementing network interface
_aliases_, so that a network interface could be accessed either by
name or by alias. This mechanism can then be managed by udev, much
like we (ie SUSE) is using it nowadays to assign the network interface
numbers.

But the mechanism for the aliases has to live in the same instance which
also handles the network interface nowadays, ie the net subsystem of
the kernel. Implementing this in udev is _not_ the way to go.

Cheers,

Hannes
dann frazier Oct. 30, 2009, 3 p.m. UTC | #25
On Fri, Oct 30, 2009 at 07:22:49AM +0100, Marco d'Itri wrote:
> On Oct 30, dann frazier <dannf@hp.com> wrote:
> 
> > On Fri, Oct 30, 2009 at 04:30:29AM +0100, Marco d'Itri wrote:
> > > On Oct 29, dann frazier <dannf@dannf.org> wrote:
> > > 
> > > > That would make a lot of users much happier (myself included), but it
> > > > does restrict us into one view. At different times, admins think of
> > > > their NICs by different properties. I may want to do IP assignment by
> > > [Citation needed]
> > Is "I" not clear, or do you just not believe I admin servers?
> I was referring to "At different times, admins think of their NICs by
> different properties", which I am not sure is a valid generalization.

Fair enough, I can't speak for others.
Narendra K Oct. 30, 2009, 3:13 p.m. UTC | #26
>> This way the kernel has only one name, and so does userspace, and 
>> everyone is happy.
>
>There are two issues, which really seem distinct to me.
>
>Users expect eth0 to map to first-onboard-nic. That's an 
>installer issue (since the BIOS can already export this info) 
>and I agree that if we want to "fix" that, we should fix it there.
>

I agree that installers have to be fixed in the sense that they can be
told to find the right interface. But, they expect determinism and
depend on "eth0 to map to first-onboard-nic". Installer is one of the
applications that is affected by this and needs user intervention, if it
is not told about the right interface. I discussed installer as it is so
much part of a user experience.

But the real issue is "eth0 does not map to first-onboard-nic" always
and applications expecting this would break in data center environments.
Both the solutions proposed provide a way to overcome it without
introducing state.

With regards,
Narendra K
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
dann frazier Oct. 30, 2009, 4:08 p.m. UTC | #27
On Fri, Oct 30, 2009 at 08:43:44PM +0530, Narendra_K@Dell.com wrote:
>  
> >> This way the kernel has only one name, and so does userspace, and 
> >> everyone is happy.
> >
> >There are two issues, which really seem distinct to me.
> >
> >Users expect eth0 to map to first-onboard-nic. That's an 
> >installer issue (since the BIOS can already export this info) 
> >and I agree that if we want to "fix" that, we should fix it there.
> >
> 
> I agree that installers have to be fixed in the sense that they can be
> told to find the right interface. But, they expect determinism and
> depend on "eth0 to map to first-onboard-nic". Installer is one of the
> applications that is affected by this and needs user intervention, if it
> is not told about the right interface. I discussed installer as it is so
> much part of a user experience.

Right, but couldn't the installer do the work of scanning the SMBIOS
to figure out which nics are onboard, and reorder the 'eth*' names
such that these are first? This state could then be written out as
udev rules so that they persist across reboots.

> But the real issue is "eth0 does not map to first-onboard-nic" always
> and applications expecting this would break in data center environments.
> Both the solutions proposed provide a way to overcome it without
> introducing state.
> 
> With regards,
> Narendra K
>
Bryan Kadzban Oct. 30, 2009, 4:22 p.m. UTC | #28
Hannes Reinecke wrote:
> And to throw in some bit of useless information;

Stirring the pot a bit myself with this message...

> The one reason I didn't to this was that a network interface is _not_
> a file, but rather an abstract type which is known only internally in
> the kernel (ie the one exemption from the 'everything is a file' UNIX
> rule).

Why?  Why not make it a file?  I've heard rumors of other Unix-like
systems that do exactly that, FWIW.

(Yes, I'm joking.  Well, maybe half-joking...  It'd be nice, but I don't
expect it to happen.)

> When I were to design this, I would be implementing network interface
> _aliases_, so that a network interface could be accessed either by 
> name or by alias. This mechanism can then be managed by udev, much 
> like we (ie SUSE) is using it nowadays to assign the network
> interface numbers.

The problem with that, if I understand what you're suggesting, is the
value of IFNAMSIZ, and the fact that it can't be made any bigger.  All
your aliases have to be IFNAMSIZ characters or less.  And that's too
short to be able to embed the same level of information as we get for
e.g. disks.  It's *barely* long enough to fit "mac-" plus 12 hex digits
(for the MAC address), but is completely incapable of holding a USB bus
path, for instance.

(Not that you'd want to use path persistence for USB devices.  But it is
possible that you'd want it for some other setup, at which point it
becomes impossible to use the same rules for USB.)
stephen hemminger Oct. 30, 2009, 4:34 p.m. UTC | #29
On Fri, 30 Oct 2009 09:22:36 -0700
Bryan Kadzban <bryan@kadzban.is-a-geek.net> wrote:

> Hannes Reinecke wrote:
> > And to throw in some bit of useless information;
> 
> Stirring the pot a bit myself with this message...
> 
> > The one reason I didn't to this was that a network interface is _not_
> > a file, but rather an abstract type which is known only internally in
> > the kernel (ie the one exemption from the 'everything is a file' UNIX
> > rule).
> 
> Why?  Why not make it a file?  I've heard rumors of other Unix-like
> systems that do exactly that, FWIW.
> 
> (Yes, I'm joking.  Well, maybe half-joking...  It'd be nice, but I don't
> expect it to happen.)
> 
> > When I were to design this, I would be implementing network interface
> > _aliases_, so that a network interface could be accessed either by 
> > name or by alias. This mechanism can then be managed by udev, much 
> > like we (ie SUSE) is using it nowadays to assign the network
> > interface numbers.
> 
> The problem with that, if I understand what you're suggesting, is the
> value of IFNAMSIZ, and the fact that it can't be made any bigger.  All
> your aliases have to be IFNAMSIZ characters or less.  And that's too
> short to be able to embed the same level of information as we get for
> e.g. disks.  It's *barely* long enough to fit "mac-" plus 12 hex digits
> (for the MAC address), but is completely incapable of holding a USB bus
> path, for instance.
> 
> (Not that you'd want to use path persistence for USB devices.  But it is
> possible that you'd want it for some other setup, at which point it
> becomes impossible to use the same rules for USB.)

Not a big fan of multiple names, it is the wrong solution to the human
question of "what is eth0 really?". Router o/s use description field
for that kind of information.

I added ifalias to provide place to put user visible description information.
It is in latest kernels via sysfs and iproute utilities.  Also plan to add
support for it in net-snmp.
Narendra K Oct. 30, 2009, 4:53 p.m. UTC | #30
>> >There are two issues, which really seem distinct to me.
>> >
>> >Users expect eth0 to map to first-onboard-nic. That's an installer 
>> >issue (since the BIOS can already export this info) and I 
>agree that 
>> >if we want to "fix" that, we should fix it there.
>> >
>> 
>> I agree that installers have to be fixed in the sense that 
>they can be 
>> told to find the right interface. But, they expect determinism and 
>> depend on "eth0 to map to first-onboard-nic". Installer is 
>one of the 
>> applications that is affected by this and needs user 
>intervention, if 
>> it is not told about the right interface. I discussed 
>installer as it 
>> is so much part of a user experience.
>
>Right, but couldn't the installer do the work of scanning the 
>SMBIOS to figure out which nics are onboard, and reorder the 
>'eth*' names such that these are first? This state could then 
>be written out as udev rules so that they persist across reboots.
>

I suppose, with udev loading modules, the rules generated at runtime
could run into the problem of duplicate names, if names are reordered in
the kernel namespace. (I.e the eth* namespace). Hence idea of an
alternate namespace.

With regards,
Narendra K
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
dann frazier Oct. 30, 2009, 5:05 p.m. UTC | #31
On Fri, Oct 30, 2009 at 10:23:57PM +0530, Narendra_K@Dell.com wrote:
> 
> >> >There are two issues, which really seem distinct to me.
> >> >
> >> >Users expect eth0 to map to first-onboard-nic. That's an installer 
> >> >issue (since the BIOS can already export this info) and I 
> >agree that 
> >> >if we want to "fix" that, we should fix it there.
> >> >
> >> 
> >> I agree that installers have to be fixed in the sense that 
> >they can be 
> >> told to find the right interface. But, they expect determinism and 
> >> depend on "eth0 to map to first-onboard-nic". Installer is 
> >one of the 
> >> applications that is affected by this and needs user 
> >intervention, if 
> >> it is not told about the right interface. I discussed 
> >installer as it 
> >> is so much part of a user experience.
> >
> >Right, but couldn't the installer do the work of scanning the 
> >SMBIOS to figure out which nics are onboard, and reorder the 
> >'eth*' names such that these are first? This state could then 
> >be written out as udev rules so that they persist across reboots.
> >
> 
> I suppose, with udev loading modules, the rules generated at runtime
> could run into the problem of duplicate names, if names are reordered in
> the kernel namespace. (I.e the eth* namespace). Hence idea of an
> alternate namespace.

I don't see a risk of duplicate names - after all drivers are loaded,
the installer can take the names enumerated by the kernel, figure out
what it thinks a preferrable order is (i.e. by querying SMBIOS), then
change the kernel names/mac mapping appropriately. Where can a
duplicate name become an issue using this method?
Matt Domsch Oct. 30, 2009, 5:10 p.m. UTC | #32
On Fri, Oct 30, 2009 at 10:08:45AM -0600, dann frazier wrote:
> On Fri, Oct 30, 2009 at 08:43:44PM +0530, Narendra_K@Dell.com wrote:
> >  
> > >> This way the kernel has only one name, and so does userspace, and 
> > >> everyone is happy.
> > >
> > >There are two issues, which really seem distinct to me.
> > >
> > >Users expect eth0 to map to first-onboard-nic. That's an 
> > >installer issue (since the BIOS can already export this info) 
> > >and I agree that if we want to "fix" that, we should fix it there.
> > >
> > 
> > I agree that installers have to be fixed in the sense that they can be
> > told to find the right interface. But, they expect determinism and
> > depend on "eth0 to map to first-onboard-nic". Installer is one of the
> > applications that is affected by this and needs user intervention, if it
> > is not told about the right interface. I discussed installer as it is so
> > much part of a user experience.
> 
> Right, but couldn't the installer do the work of scanning the SMBIOS
> to figure out which nics are onboard, and reorder the 'eth*' names
> such that these are first? This state could then be written out as
> udev rules so that they persist across reboots.

No, there is a catch-22.  To be sure you know the "proper" ethN name
to assign a device based on an ordering, you have to know about all
the devices.  When udev runs, one device at a time, it can only see
the current device and all those that have come before it, but it can't know
when all the drivers for all the NICs have been loaded.  And if you
hotplug a device in later, it should presumably just go at the end of
the list, but after a reboot, it'll most likely show up somewhere in
the middle of the list.  SMBIOS is static from boottime, not hotplug
aware. If I add a 4-port NIC in slot 3 after boot, it becomes
ethN..N+3.  After reboot, it may well show up at completely different
places, and even the N..N+3 ordering of individiual ports on the card
aren't guaranteed to be consistent.

ethN is fundamentally a nondeterministic namespace, and trying to
enforce determinism on it is, from all my attempts, impossible.  Hence
the desire to change the namespace.  But there can be many
different naming policies one might want (including the
nondeterministic ethN policy), and for all other types of devices this
isn't a problem - we can have all the policies we want, in parallel.
Only for network devices we can't.

Stephen, I hadn't seen the ifalias field you added.  I can see that
being helpful to a user (some tool can write a more meaningful string
to it), but I can't see it being useful programmatically.  It still
doesn't get me to "ifconfig the NIC in slot 3 port 2" or "ifconfig the
NIC I booted from".

Thanks,
Matt
Greg KH Oct. 30, 2009, 5:13 p.m. UTC | #33
On Fri, Oct 30, 2009 at 12:10:03PM -0500, Matt Domsch wrote:
> ethN is fundamentally a nondeterministic namespace, and trying to
> enforce determinism on it is, from all my attempts, impossible.  Hence
> the desire to change the namespace.  But there can be many
> different naming policies one might want (including the
> nondeterministic ethN policy), and for all other types of devices this
> isn't a problem - we can have all the policies we want, in parallel.
> Only for network devices we can't.

So pick one in your installer and stick with it.  Doesn't seem that
complicated to me...

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/libudev/exported_symbols b/libudev/exported_symbols
index 018463d..31c616a 100644
--- a/libudev/exported_symbols
+++ b/libudev/exported_symbols
@@ -8,6 +8,7 @@  udev_get_userdata
 udev_set_userdata
 udev_get_sys_path
 udev_get_dev_path
+udev_get_netdev_path
 udev_list_entry_get_next
 udev_list_entry_get_by_name
 udev_list_entry_get_name
diff --git a/libudev/libudev.c b/libudev/libudev.c
index 1909138..2a83417 100644
--- a/libudev/libudev.c
+++ b/libudev/libudev.c
@@ -42,6 +42,7 @@  struct udev {
 	void *userdata;
 	char *sys_path;
 	char *dev_path;
+	char *netdev_path;
 	char *rules_path;
 	struct udev_list_node properties_list;
 	int log_priority;
@@ -125,8 +126,10 @@  struct udev *udev_new(void)
 	udev->run = 1;
 	udev->dev_path = strdup("/dev");
 	udev->sys_path = strdup("/sys");
+	udev->netdev_path = strdup("/dev/netdev/by-ifindex");
 	config_file = strdup(SYSCONFDIR "/udev/udev.conf");
 	if (udev->dev_path == NULL ||
+	    udev->netdev_path == NULL ||
 	    udev->sys_path == NULL ||
 	    config_file == NULL)
 		goto err;
@@ -243,6 +246,14 @@  struct udev *udev_new(void)
 		udev_add_property(udev, "UDEV_ROOT", udev->dev_path);
 	}
 
+	env = getenv("NETDEV_ROOT");
+	if (env != NULL) {
+		free(udev->netdev_path);
+		udev->netdev_path = strdup(env);
+		util_remove_trailing_chars(udev->netdev_path, '/');
+		udev_add_property(udev, "NETDEV_ROOT", udev->netdev_path);
+	}
+
 	env = getenv("UDEV_LOG");
 	if (env != NULL)
 		udev_set_log_priority(udev, util_log_priority(env));
@@ -253,6 +264,7 @@  struct udev *udev_new(void)
 	dbg(udev, "log_priority=%d\n", udev->log_priority);
 	dbg(udev, "config_file='%s'\n", config_file);
 	dbg(udev, "dev_path='%s'\n", udev->dev_path);
+	dbg(udev, "netdev_path='%s'\n", udev->netdev_path);
 	dbg(udev, "sys_path='%s'\n", udev->sys_path);
 	if (udev->rules_path != NULL)
 		dbg(udev, "rules_path='%s'\n", udev->rules_path);
@@ -398,6 +410,23 @@  const char *udev_get_dev_path(struct udev *udev)
 	return udev->dev_path;
 }
 
+/**
+ * udev_get_netdev_path:
+ * @udev: udev library context
+ *
+ * Retrieve the device directory path. The default value is "/etc/udev/net",
+ * the actual value may be overridden in the udev configuration
+ * file.
+ *
+ * Returns: the device directory path
+ **/
+const char *udev_get_netdev_path(struct udev *udev)
+{
+	if (udev == NULL)
+		return NULL;
+	return udev->netdev_path;
+}
+
 struct udev_list_entry *udev_add_property(struct udev *udev, const char *key, const char *value)
 {
 	if (value == NULL) {
diff --git a/libudev/libudev.h b/libudev/libudev.h
index 4bcf442..5834781 100644
--- a/libudev/libudev.h
+++ b/libudev/libudev.h
@@ -77,6 +77,7 @@  struct udev_device *udev_device_get_parent_with_subsystem_devtype(struct udev_de
 								  const char *subsystem, const char *devtype);
 /* retrieve device properties */
 const char *udev_device_get_devpath(struct udev_device *udev_device);
+const char *udev_device_get_netdevpath(struct udev_device *udev_device);
 const char *udev_device_get_subsystem(struct udev_device *udev_device);
 const char *udev_device_get_devtype(struct udev_device *udev_device);
 const char *udev_device_get_syspath(struct udev_device *udev_device);
diff --git a/udev/udev-event.c b/udev/udev-event.c
index d5b4d09..953f87a 100644
--- a/udev/udev-event.c
+++ b/udev/udev-event.c
@@ -542,7 +542,7 @@  int udev_event_execute_rules(struct udev_event *event, struct udev_rules *rules)
 	}
 
 	/* add device node */
-	if (major(udev_device_get_devnum(dev)) != 0 &&
+	if ((major(udev_device_get_devnum(dev)) != 0 || strcmp(udev_device_get_subsystem(dev), "net") == 0) &&
 	    (strcmp(udev_device_get_action(dev), "add") == 0 || strcmp(udev_device_get_action(dev), "change") == 0)) {
 		char filename[UTIL_PATH_SIZE];
 		struct udev_device *dev_old;
@@ -603,10 +603,38 @@  int udev_event_execute_rules(struct udev_event *event, struct udev_rules *rules)
 			goto exit_add;
 		}
 
-		/* set device node name */
-		util_strscpyl(filename, sizeof(filename), udev_get_dev_path(event->udev), "/", event->name, NULL);
-		udev_device_set_devnode(dev, filename);
-
+		/* add netif */
+		if (strcmp(udev_device_get_subsystem(dev), "net") == 0 &&
+		    strcmp(udev_device_get_action(dev), "add") == 0) {
+			char syspath[UTIL_PATH_SIZE];
+			info(event->udev, "netif add '%s'\n", udev_device_get_devpath(dev));
+			/* look if we want to change the name of the netif */
+			if (strcmp(event->name, udev_device_get_sysname(dev)) != 0) {
+				char *pos;
+				err = rename_netif(event);
+				if (err != 0)
+					goto exit;
+				info(event->udev, "renamed netif to '%s'\n", event->name);
+				
+				/* remember old name */
+				udev_device_add_property(dev, "INTERFACE_OLD", udev_device_get_sysname(dev));
+				
+				/* now change the devpath, because the kernel device name has changed */
+				util_strscpy(syspath, sizeof(syspath), udev_device_get_syspath(dev));
+				pos = strrchr(syspath, '/');
+				if (pos != NULL) {
+					pos++;
+					util_strscpy(pos, sizeof(syspath) - (pos - syspath), event->name);
+					udev_device_set_syspath(event->dev, syspath);
+					udev_device_add_property(dev, "INTERFACE", udev_device_get_sysname(dev));
+					info(event->udev, "changed devpath to '%s'\n", udev_device_get_devpath(dev));
+				}
+			}
+			snprintf(syspath, sizeof(syspath), "%s/%s", udev_get_netdev_path(event->udev),
+				 udev_device_get_property_value(event->dev, "IFINDEX"));
+			udev_device_set_devnode(dev, syspath);
+		}
+		    
 		/* write current database entry */
 		udev_device_update_db(dev);
 
@@ -632,49 +660,11 @@  exit_add:
 		goto exit;
 	}
 
-	/* add netif */
-	if (strcmp(udev_device_get_subsystem(dev), "net") == 0 && strcmp(udev_device_get_action(dev), "add") == 0) {
-		dbg(event->udev, "netif add '%s'\n", udev_device_get_devpath(dev));
-		udev_device_delete_db(dev);
-
-		udev_rules_apply_to_event(rules, event);
-		if (event->ignore_device) {
-			info(event->udev, "device event will be ignored\n");
-			goto exit;
-		}
-		if (event->name == NULL)
-			goto exit;
-
-		/* look if we want to change the name of the netif */
-		if (strcmp(event->name, udev_device_get_sysname(dev)) != 0) {
-			char syspath[UTIL_PATH_SIZE];
-			char *pos;
-
-			err = rename_netif(event);
-			if (err != 0)
-				goto exit;
-			info(event->udev, "renamed netif to '%s'\n", event->name);
-
-			/* remember old name */
-			udev_device_add_property(dev, "INTERFACE_OLD", udev_device_get_sysname(dev));
-
-			/* now change the devpath, because the kernel device name has changed */
-			util_strscpy(syspath, sizeof(syspath), udev_device_get_syspath(dev));
-			pos = strrchr(syspath, '/');
-			if (pos != NULL) {
-				pos++;
-				util_strscpy(pos, sizeof(syspath) - (pos - syspath), event->name);
-				udev_device_set_syspath(event->dev, syspath);
-				udev_device_add_property(dev, "INTERFACE", udev_device_get_sysname(dev));
-				info(event->udev, "changed devpath to '%s'\n", udev_device_get_devpath(dev));
-			}
-		}
-		udev_device_update_db(dev);
-		goto exit;
-	}
 
 	/* remove device node */
-	if (major(udev_device_get_devnum(dev)) != 0 && strcmp(udev_device_get_action(dev), "remove") == 0) {
+	if ((major(udev_device_get_devnum(dev)) != 0 ||
+	     strcmp(udev_device_get_subsystem(dev), "net") == 0) &&
+	    strcmp(udev_device_get_action(dev), "remove") == 0) {
 		/* import database entry and delete it */
 		udev_device_read_db(dev);
 		udev_device_set_info_loaded(dev);
diff --git a/udev/udev-node.c b/udev/udev-node.c
index 39bec3e..da96a4a 100644
--- a/udev/udev-node.c
+++ b/udev/udev-node.c
@@ -32,6 +32,34 @@ 
 
 #define TMP_FILE_EXT		".udev-tmp"
 
+static bool udev_node_mode_matches(struct stat *stats, dev_t devnum, mode_t mode)
+{
+	if ((stats->st_mode & S_IFMT) != (mode & S_IFMT))
+		return false;
+
+	if ((S_ISCHR(mode) || S_ISBLK(mode)) && (stats->st_rdev != devnum))
+		return false;
+
+	return true;
+}
+
+static int udev_node_create_file(struct udev *udev, const char *path, dev_t devnum, mode_t mode)
+{
+	int fd, ret = 0;
+
+	if (S_ISCHR(mode) || S_ISBLK(mode))
+		ret = mknod(path, mode, devnum);
+	else {
+		fd = creat(path, mode);
+		if (fd < 0)
+			ret = fd;
+		else
+			close(fd);
+	}
+
+	return ret;
+}
+
 int udev_node_mknod(struct udev_device *dev, const char *file, dev_t devnum, mode_t mode, uid_t uid, gid_t gid)
 {
 	struct udev *udev = udev_device_get_udev(dev);
@@ -47,12 +75,15 @@  int udev_node_mknod(struct udev_device *dev, const char *file, dev_t devnum, mod
 	else
 		mode |= S_IFCHR;
 
+	if (strcmp(udev_device_get_subsystem(dev), "net") == 0)
+		mode = S_IFREG | S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
+
 	if (file == NULL)
 		file = udev_device_get_devnode(dev);
 
 	if (lstat(file, &stats) == 0) {
-		if (((stats.st_mode & S_IFMT) == (mode & S_IFMT)) && (stats.st_rdev == devnum)) {
-			info(udev, "preserve file '%s', because it has correct dev_t\n", file);
+		if (udev_node_mode_matches(&stats, devnum, mode)) {
+			info(udev, "preserve file '%s', because it has correct type\n", file);
 			preserve = 1;
 			udev_selinux_lsetfilecon(udev, file, mode);
 		} else {
@@ -62,10 +93,10 @@  int udev_node_mknod(struct udev_device *dev, const char *file, dev_t devnum, mod
 			util_strscpyl(file_tmp, sizeof(file_tmp), file, TMP_FILE_EXT, NULL);
 			unlink(file_tmp);
 			udev_selinux_setfscreatecon(udev, file_tmp, mode);
-			err = mknod(file_tmp, mode, devnum);
+			err = udev_node_create_file(udev, file_tmp, devnum, mode);
 			udev_selinux_resetfscreatecon(udev);
 			if (err != 0) {
-				err(udev, "mknod(%s, %#o, %u, %u) failed: %m\n",
+				err(udev, "udev_node_create_file(%s, %#o, %u, %u) failed: %m\n",
 				    file_tmp, mode, major(devnum), minor(devnum));
 				goto exit;
 			}
@@ -80,7 +111,7 @@  int udev_node_mknod(struct udev_device *dev, const char *file, dev_t devnum, mod
 		do {
 			util_create_path(udev, file);
 			udev_selinux_setfscreatecon(udev, file, mode);
-			err = mknod(file, mode, devnum);
+			err = udev_node_create_file(udev, file, devnum, mode);
 			if (err != 0)
 				err = errno;
 			udev_selinux_resetfscreatecon(udev);
diff --git a/udev/udev-rules.c b/udev/udev-rules.c
index ddb51de..a1fe991 100644
--- a/udev/udev-rules.c
+++ b/udev/udev-rules.c
@@ -2435,7 +2435,8 @@  int udev_rules_apply_to_event(struct udev_rules *rules, struct udev_event *event
 
 				if (event->devlink_final)
 					break;
-				if (major(udev_device_get_devnum(event->dev)) == 0)
+				if ((major(udev_device_get_devnum(event->dev)) == 0) &&
+				    (strcmp(udev_device_get_subsystem(event->dev), "net") != 0))
 					break;
 				if (cur->key.op == OP_ASSIGN_FINAL)
 					event->devlink_final = 1;