mbox series

[RFC,0/5] pc: support user provided NIC naming/indexing

Message ID 20201222233934.451578-1-imammedo@redhat.com
Headers show
Series pc: support user provided NIC naming/indexing | expand

Message

Igor Mammedov Dec. 22, 2020, 11:39 p.m. UTC
Series implements support for 'onboard' naming scheme for network
interfaces (1), which is based on PCI firmware spec and lets user
to explicitly specify index that will be used by guest to name
network interface, ex:
    -device e1000,acpi-index=33
should make guest rename NIC name to 'eno33' where 'eno' is default
prefix for this scheme.

Hope is that it will allow to simplify launching VMs from
template disk images with different set VM configurations
without need to reconfigure guest network intrfaces or
risk of loosing network connectivity.

For more detailed description/examples see patches [3-4/5]

1)
 https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/ 

Git repo for testing:
   https://github.com/imammedo/qemu/branches acpi-index-rfc

Igor Mammedov (5):
  acpi: add aml_to_decimalstring() and aml_call6() helpers
  tests: acpi: temporary whitelist DSDT changes
  pci: introduce apci-index property for PCI device
  pci: acpi: add _DSM method to PCI devices
  tests: acpi: update expected data files

 include/hw/acpi/aml-build.h                 |   3 +
 include/hw/acpi/pci.h                       |   1 +
 include/hw/acpi/pcihp.h                     |   7 +-
 include/hw/pci/pci.h                        |   1 +
 tests/qtest/bios-tables-test-allowed-diff.h |  21 +++++
 hw/acpi/aml-build.c                         |  28 +++++++
 hw/acpi/pci.c                               |  84 ++++++++++++++++++++
 hw/acpi/pcihp.c                             |  25 +++++-
 hw/i386/acpi-build.c                        |  31 +++++++-
 hw/pci/pci.c                                |   1 +
 tests/data/acpi/pc/DSDT                     | Bin 5065 -> 6023 bytes
 tests/data/acpi/pc/DSDT.acpihmat            | Bin 6390 -> 7348 bytes
 tests/data/acpi/pc/DSDT.bridge              | Bin 6924 -> 8689 bytes
 tests/data/acpi/pc/DSDT.cphp                | Bin 5529 -> 6487 bytes
 tests/data/acpi/pc/DSDT.dimmpxm             | Bin 6719 -> 7677 bytes
 tests/data/acpi/pc/DSDT.hpbridge            | Bin 5026 -> 5990 bytes
 tests/data/acpi/pc/DSDT.hpbrroot            | Bin 3084 -> 3177 bytes
 tests/data/acpi/pc/DSDT.ipmikcs             | Bin 5137 -> 6095 bytes
 tests/data/acpi/pc/DSDT.memhp               | Bin 6424 -> 7382 bytes
 tests/data/acpi/pc/DSDT.numamem             | Bin 5071 -> 6029 bytes
 tests/data/acpi/pc/DSDT.roothp              | Bin 5261 -> 6324 bytes
 tests/data/acpi/q35/DSDT                    | Bin 7801 -> 7863 bytes
 tests/data/acpi/q35/DSDT.acpihmat           | Bin 9126 -> 9188 bytes
 tests/data/acpi/q35/DSDT.bridge             | Bin 7819 -> 7911 bytes
 tests/data/acpi/q35/DSDT.cphp               | Bin 8265 -> 8327 bytes
 tests/data/acpi/q35/DSDT.dimmpxm            | Bin 9455 -> 9517 bytes
 tests/data/acpi/q35/DSDT.ipmibt             | Bin 7876 -> 7938 bytes
 tests/data/acpi/q35/DSDT.memhp              | Bin 9160 -> 9222 bytes
 tests/data/acpi/q35/DSDT.mmio64             | Bin 8932 -> 9024 bytes
 tests/data/acpi/q35/DSDT.numamem            | Bin 7807 -> 7869 bytes
 tests/data/acpi/q35/DSDT.tis                | Bin 8407 -> 8468 bytes
 31 files changed, 197 insertions(+), 5 deletions(-)

Comments

Michael S. Tsirkin Jan. 13, 2021, 12:09 p.m. UTC | #1
On Tue, Dec 22, 2020 at 06:39:29PM -0500, Igor Mammedov wrote:
> 
> Series implements support for 'onboard' naming scheme for network
> interfaces (1), which is based on PCI firmware spec and lets user
> to explicitly specify index that will be used by guest to name
> network interface, ex:
>     -device e1000,acpi-index=33
> should make guest rename NIC name to 'eno33' where 'eno' is default
> prefix for this scheme.
> 
> Hope is that it will allow to simplify launching VMs from
> template disk images with different set VM configurations
> without need to reconfigure guest network intrfaces or
> risk of loosing network connectivity.

Questions:
the spec says:
Assignment of specific device names to multi-function devices installed in expansion
slots, and/or PCI or PCI Express devices that are hot-added to expansion slots in operating system-
environment would be handled in operating system-specific manner, and is not specified via this
specification.

Accordingly, link below says:
" Names incorporating Firmware/BIOS provided index numbers for on-board devices (example: eno1)"

to what extend does guest assume the index is for on-board devices?
it seems things work for fedora but how confident are we that this
will keep working.

Further, code seems to only look at the slot level.
According to this, and according to the spec, this does not work with
multifunction devices, does it?


The link you supplied lists another option:
"Names incorporating Firmware/BIOS provided PCI Express hotplug slot index numbers (example: ens1)"
these are under management control already ... 

Also if we ask users to supply the property on the slot then it seems
that the property can be baked into ACPI when it's created instead of
being loaded from host - we can avoid adding new registers, this seems
preferable.  Could someone from management side chime in on whether that
is sufficient?

More questions:

does all this affect windows guests at all?

where does the "acpi index" terminology come from?
the pci firmware spec talks about "instance number", right?



> For more detailed description/examples see patches [3-4/5]
> 
> 1)
>  https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/ 
> 
> Git repo for testing:
>    https://github.com/imammedo/qemu/branches acpi-index-rfc
> 
> Igor Mammedov (5):
>   acpi: add aml_to_decimalstring() and aml_call6() helpers
>   tests: acpi: temporary whitelist DSDT changes
>   pci: introduce apci-index property for PCI device
>   pci: acpi: add _DSM method to PCI devices
>   tests: acpi: update expected data files
> 
>  include/hw/acpi/aml-build.h                 |   3 +
>  include/hw/acpi/pci.h                       |   1 +
>  include/hw/acpi/pcihp.h                     |   7 +-
>  include/hw/pci/pci.h                        |   1 +
>  tests/qtest/bios-tables-test-allowed-diff.h |  21 +++++
>  hw/acpi/aml-build.c                         |  28 +++++++
>  hw/acpi/pci.c                               |  84 ++++++++++++++++++++
>  hw/acpi/pcihp.c                             |  25 +++++-
>  hw/i386/acpi-build.c                        |  31 +++++++-
>  hw/pci/pci.c                                |   1 +
>  tests/data/acpi/pc/DSDT                     | Bin 5065 -> 6023 bytes
>  tests/data/acpi/pc/DSDT.acpihmat            | Bin 6390 -> 7348 bytes
>  tests/data/acpi/pc/DSDT.bridge              | Bin 6924 -> 8689 bytes
>  tests/data/acpi/pc/DSDT.cphp                | Bin 5529 -> 6487 bytes
>  tests/data/acpi/pc/DSDT.dimmpxm             | Bin 6719 -> 7677 bytes
>  tests/data/acpi/pc/DSDT.hpbridge            | Bin 5026 -> 5990 bytes
>  tests/data/acpi/pc/DSDT.hpbrroot            | Bin 3084 -> 3177 bytes
>  tests/data/acpi/pc/DSDT.ipmikcs             | Bin 5137 -> 6095 bytes
>  tests/data/acpi/pc/DSDT.memhp               | Bin 6424 -> 7382 bytes
>  tests/data/acpi/pc/DSDT.numamem             | Bin 5071 -> 6029 bytes
>  tests/data/acpi/pc/DSDT.roothp              | Bin 5261 -> 6324 bytes
>  tests/data/acpi/q35/DSDT                    | Bin 7801 -> 7863 bytes
>  tests/data/acpi/q35/DSDT.acpihmat           | Bin 9126 -> 9188 bytes
>  tests/data/acpi/q35/DSDT.bridge             | Bin 7819 -> 7911 bytes
>  tests/data/acpi/q35/DSDT.cphp               | Bin 8265 -> 8327 bytes
>  tests/data/acpi/q35/DSDT.dimmpxm            | Bin 9455 -> 9517 bytes
>  tests/data/acpi/q35/DSDT.ipmibt             | Bin 7876 -> 7938 bytes
>  tests/data/acpi/q35/DSDT.memhp              | Bin 9160 -> 9222 bytes
>  tests/data/acpi/q35/DSDT.mmio64             | Bin 8932 -> 9024 bytes
>  tests/data/acpi/q35/DSDT.numamem            | Bin 7807 -> 7869 bytes
>  tests/data/acpi/q35/DSDT.tis                | Bin 8407 -> 8468 bytes
>  31 files changed, 197 insertions(+), 5 deletions(-)
> 
> -- 
> 2.27.0
Igor Mammedov Jan. 15, 2021, 1:59 a.m. UTC | #2
On Wed, 13 Jan 2021 07:09:56 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Dec 22, 2020 at 06:39:29PM -0500, Igor Mammedov wrote:
> > 
> > Series implements support for 'onboard' naming scheme for network
> > interfaces (1), which is based on PCI firmware spec and lets user
> > to explicitly specify index that will be used by guest to name
> > network interface, ex:
> >     -device e1000,acpi-index=33
> > should make guest rename NIC name to 'eno33' where 'eno' is default
> > prefix for this scheme.
> > 
> > Hope is that it will allow to simplify launching VMs from
> > template disk images with different set VM configurations
> > without need to reconfigure guest network intrfaces or
> > risk of loosing network connectivity.  
> 
> Questions:
> the spec says:
> Assignment of specific device names to multi-function devices installed in expansion
> slots, and/or PCI or PCI Express devices that are hot-added to expansion slots in operating system-
> environment would be handled in operating system-specific manner, and is not specified via this
> specification.
> 
> Accordingly, link below says:
> " Names incorporating Firmware/BIOS provided index numbers for on-board devices (example: eno1)"
> 
> to what extend does guest assume the index is for on-board devices?
> it seems things work for fedora but how confident are we that this
> will keep working.

code itself is not limiting it to onboard devices in any way.
(I can only speculate here, reason for calling it onboard is that
on real hardware ACPI is mostly static tables and firmware provides
an option to set index for only built-in NICs).
Technically there is no reason to call in onboad though.

I'd believe it should work with any distribution that uses
recent enough systemd/udev (released starting from 2003).

> Further, code seems to only look at the slot level.
> According to this, and according to the spec, this does not work with
> multifunction devices, does it?

we probably should disable it for multifunction devices,
any suggestions how to detect those in QEMU?


> The link you supplied lists another option:
> "Names incorporating Firmware/BIOS provided PCI Express hotplug slot index numbers (example: ens1)"
> these are under management control already ... 

with it interface name continues to depend on PCI topology (and theoretically
limited to PCI expess). That's becomes harder to consume as complexity grows
(i.e. mgmt needs to keep NIC in the same place for which guest image was configured for).
acpi-index doesn't impose such limitation.

In case of 1 NIC, it could be moved anywhere within PCI hierarchy and guest
doesn't have to be reconfigured to account for new interface name
(i.e without loosing network connectivity - that's the actual issue coming from
upper layers that made me look into acpi-index approach).

In another words acpi index is easier to consume for users above libvirt
and frees mgmt hands in a way it could distribute PCI devices.
Even better would be if guest image could carry index as metadata

> Also if we ask users to supply the property on the slot then it seems
> that the property can be baked into ACPI when it's created instead of
> being loaded from host - we can avoid adding new registers, this seems
> preferable.  Could someone from management side chime in on whether that
> is sufficient?

I did consider it (it would be simpler, but not much), however unless we disable
PCI hotplug for affected slots it won't work. (unplug device from such slot and
plug another in that place will still return boot time index.
That's why I ended up with hotplug variant. 

I chose abusing existing PCI hotplug registers for it, 
but we can use a new range if that's preferred. 

> More questions:
> 
> does all this affect windows guests at all?
I don't know (spec shows examples that reminded
me about NIC naming which Windows use(s|d)).
But I won't bet on it.
If I recall correctly, for e1000 NIC, it didn't made any
difference in naming (pre-existing guest image).

> where does the "acpi index" terminology come from?
> the pci firmware spec talks about "instance number", right?
it comes from linux kernel (that's how it's named in sysfs)
and systemd/udev uses it. So I tried to avoid making up another
one.


> > For more detailed description/examples see patches [3-4/5]
> > 
> > 1)
> >  https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/ 
> > 
> > Git repo for testing:
> >    https://github.com/imammedo/qemu/branches acpi-index-rfc
> > 
> > Igor Mammedov (5):
> >   acpi: add aml_to_decimalstring() and aml_call6() helpers
> >   tests: acpi: temporary whitelist DSDT changes
> >   pci: introduce apci-index property for PCI device
> >   pci: acpi: add _DSM method to PCI devices
> >   tests: acpi: update expected data files
> > 
> >  include/hw/acpi/aml-build.h                 |   3 +
> >  include/hw/acpi/pci.h                       |   1 +
> >  include/hw/acpi/pcihp.h                     |   7 +-
> >  include/hw/pci/pci.h                        |   1 +
> >  tests/qtest/bios-tables-test-allowed-diff.h |  21 +++++
> >  hw/acpi/aml-build.c                         |  28 +++++++
> >  hw/acpi/pci.c                               |  84 ++++++++++++++++++++
> >  hw/acpi/pcihp.c                             |  25 +++++-
> >  hw/i386/acpi-build.c                        |  31 +++++++-
> >  hw/pci/pci.c                                |   1 +
> >  tests/data/acpi/pc/DSDT                     | Bin 5065 -> 6023 bytes
> >  tests/data/acpi/pc/DSDT.acpihmat            | Bin 6390 -> 7348 bytes
> >  tests/data/acpi/pc/DSDT.bridge              | Bin 6924 -> 8689 bytes
> >  tests/data/acpi/pc/DSDT.cphp                | Bin 5529 -> 6487 bytes
> >  tests/data/acpi/pc/DSDT.dimmpxm             | Bin 6719 -> 7677 bytes
> >  tests/data/acpi/pc/DSDT.hpbridge            | Bin 5026 -> 5990 bytes
> >  tests/data/acpi/pc/DSDT.hpbrroot            | Bin 3084 -> 3177 bytes
> >  tests/data/acpi/pc/DSDT.ipmikcs             | Bin 5137 -> 6095 bytes
> >  tests/data/acpi/pc/DSDT.memhp               | Bin 6424 -> 7382 bytes
> >  tests/data/acpi/pc/DSDT.numamem             | Bin 5071 -> 6029 bytes
> >  tests/data/acpi/pc/DSDT.roothp              | Bin 5261 -> 6324 bytes
> >  tests/data/acpi/q35/DSDT                    | Bin 7801 -> 7863 bytes
> >  tests/data/acpi/q35/DSDT.acpihmat           | Bin 9126 -> 9188 bytes
> >  tests/data/acpi/q35/DSDT.bridge             | Bin 7819 -> 7911 bytes
> >  tests/data/acpi/q35/DSDT.cphp               | Bin 8265 -> 8327 bytes
> >  tests/data/acpi/q35/DSDT.dimmpxm            | Bin 9455 -> 9517 bytes
> >  tests/data/acpi/q35/DSDT.ipmibt             | Bin 7876 -> 7938 bytes
> >  tests/data/acpi/q35/DSDT.memhp              | Bin 9160 -> 9222 bytes
> >  tests/data/acpi/q35/DSDT.mmio64             | Bin 8932 -> 9024 bytes
> >  tests/data/acpi/q35/DSDT.numamem            | Bin 7807 -> 7869 bytes
> >  tests/data/acpi/q35/DSDT.tis                | Bin 8407 -> 8468 bytes
> >  31 files changed, 197 insertions(+), 5 deletions(-)
> > 
> > -- 
> > 2.27.0  
> 
>
Michael S. Tsirkin Jan. 17, 2021, 10:59 a.m. UTC | #3
On Fri, Jan 15, 2021 at 02:59:02AM +0100, Igor Mammedov wrote:
> On Wed, 13 Jan 2021 07:09:56 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Tue, Dec 22, 2020 at 06:39:29PM -0500, Igor Mammedov wrote:
> > > 
> > > Series implements support for 'onboard' naming scheme for network
> > > interfaces (1), which is based on PCI firmware spec and lets user
> > > to explicitly specify index that will be used by guest to name
> > > network interface, ex:
> > >     -device e1000,acpi-index=33
> > > should make guest rename NIC name to 'eno33' where 'eno' is default
> > > prefix for this scheme.
> > > 
> > > Hope is that it will allow to simplify launching VMs from
> > > template disk images with different set VM configurations
> > > without need to reconfigure guest network intrfaces or
> > > risk of loosing network connectivity.  
> > 
> > Questions:
> > the spec says:
> > Assignment of specific device names to multi-function devices installed in expansion
> > slots, and/or PCI or PCI Express devices that are hot-added to expansion slots in operating system-
> > environment would be handled in operating system-specific manner, and is not specified via this
> > specification.
> > 
> > Accordingly, link below says:
> > " Names incorporating Firmware/BIOS provided index numbers for on-board devices (example: eno1)"
> > 
> > to what extend does guest assume the index is for on-board devices?
> > it seems things work for fedora but how confident are we that this
> > will keep working.
> 
> code itself is not limiting it to onboard devices in any way.
> (I can only speculate here, reason for calling it onboard is that
> on real hardware ACPI is mostly static tables and firmware provides
> an option to set index for only built-in NICs).
> Technically there is no reason to call in onboad though.

Well there's an smbios thing that I believe says onboard
in the spec. Come to think of it that one actually
applies to multifunction devices too.


> 
> I'd believe it should work with any distribution that uses
> recent enough systemd/udev (released starting from 2003).
> 
> > Further, code seems to only look at the slot level.
> > According to this, and according to the spec, this does not work with
> > multifunction devices, does it?
> 
> we probably should disable it for multifunction devices,
> any suggestions how to detect those in QEMU?

QEMU_PCI_CAP_MULTIFUNCTION_BITNR ?

Enforcing this will help but can we make this self-documenting somehow?

> 
> > The link you supplied lists another option:
> > "Names incorporating Firmware/BIOS provided PCI Express hotplug slot index numbers (example: ens1)"
> > these are under management control already ... 
> 
> with it interface name continues to depend on PCI topology (and theoretically
> limited to PCI expess). That's becomes harder to consume as complexity grows
> (i.e. mgmt needs to keep NIC in the same place for which guest image was configured for).
> acpi-index doesn't impose such limitation.
>
> In case of 1 NIC, it could be moved anywhere within PCI hierarchy and guest
> doesn't have to be reconfigured to account for new interface name
> (i.e without loosing network connectivity - that's the actual issue coming from
> upper layers that made me look into acpi-index approach).

Could you describe the issue in a bit more detail in the commit
log pls?

> In another words acpi index is easier to consume for users above libvirt
> and frees mgmt hands in a way it could distribute PCI devices.
> Even better would be if guest image could carry index as metadata
> 
> > Also if we ask users to supply the property on the slot then it seems
> > that the property can be baked into ACPI when it's created instead of
> > being loaded from host - we can avoid adding new registers, this seems
> > preferable.  Could someone from management side chime in on whether that
> > is sufficient?
> 
> I did consider it (it would be simpler, but not much), however unless we disable
> PCI hotplug for affected slots it won't work. (unplug device from such slot and
> plug another in that place will still return boot time index.

Well it's interesting that you mention it. Let's say you remove
a device then plug in a different index immediately ...
Guest gets a wrong name temporarily, right?
Should we worry about that?


> That's why I ended up with hotplug variant. 
> 
> I chose abusing existing PCI hotplug registers for it, 
> but we can use a new range if that's preferred. 

Would it work for multi-function devices though?
If yes that's a big advantage ...


> > More questions:
> > 
> > does all this affect windows guests at all?
> I don't know (spec shows examples that reminded
> me about NIC naming which Windows use(s|d)).
> But I won't bet on it.
> If I recall correctly, for e1000 NIC, it didn't made any
> difference in naming (pre-existing guest image).
> 
> > where does the "acpi index" terminology come from?
> > the pci firmware spec talks about "instance number", right?
> it comes from linux kernel (that's how it's named in sysfs)
> and systemd/udev uses it. So I tried to avoid making up another
> one.

OTOH that's guest specific ... and does not seem to imply
the limitations unless you look at the code ...

> 
> > > For more detailed description/examples see patches [3-4/5]
> > > 
> > > 1)
> > >  https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/ 
> > > 
> > > Git repo for testing:
> > >    https://github.com/imammedo/qemu/branches acpi-index-rfc
> > > 
> > > Igor Mammedov (5):
> > >   acpi: add aml_to_decimalstring() and aml_call6() helpers
> > >   tests: acpi: temporary whitelist DSDT changes
> > >   pci: introduce apci-index property for PCI device
> > >   pci: acpi: add _DSM method to PCI devices
> > >   tests: acpi: update expected data files
> > > 
> > >  include/hw/acpi/aml-build.h                 |   3 +
> > >  include/hw/acpi/pci.h                       |   1 +
> > >  include/hw/acpi/pcihp.h                     |   7 +-
> > >  include/hw/pci/pci.h                        |   1 +
> > >  tests/qtest/bios-tables-test-allowed-diff.h |  21 +++++
> > >  hw/acpi/aml-build.c                         |  28 +++++++
> > >  hw/acpi/pci.c                               |  84 ++++++++++++++++++++
> > >  hw/acpi/pcihp.c                             |  25 +++++-
> > >  hw/i386/acpi-build.c                        |  31 +++++++-
> > >  hw/pci/pci.c                                |   1 +
> > >  tests/data/acpi/pc/DSDT                     | Bin 5065 -> 6023 bytes
> > >  tests/data/acpi/pc/DSDT.acpihmat            | Bin 6390 -> 7348 bytes
> > >  tests/data/acpi/pc/DSDT.bridge              | Bin 6924 -> 8689 bytes
> > >  tests/data/acpi/pc/DSDT.cphp                | Bin 5529 -> 6487 bytes
> > >  tests/data/acpi/pc/DSDT.dimmpxm             | Bin 6719 -> 7677 bytes
> > >  tests/data/acpi/pc/DSDT.hpbridge            | Bin 5026 -> 5990 bytes
> > >  tests/data/acpi/pc/DSDT.hpbrroot            | Bin 3084 -> 3177 bytes
> > >  tests/data/acpi/pc/DSDT.ipmikcs             | Bin 5137 -> 6095 bytes
> > >  tests/data/acpi/pc/DSDT.memhp               | Bin 6424 -> 7382 bytes
> > >  tests/data/acpi/pc/DSDT.numamem             | Bin 5071 -> 6029 bytes
> > >  tests/data/acpi/pc/DSDT.roothp              | Bin 5261 -> 6324 bytes
> > >  tests/data/acpi/q35/DSDT                    | Bin 7801 -> 7863 bytes
> > >  tests/data/acpi/q35/DSDT.acpihmat           | Bin 9126 -> 9188 bytes
> > >  tests/data/acpi/q35/DSDT.bridge             | Bin 7819 -> 7911 bytes
> > >  tests/data/acpi/q35/DSDT.cphp               | Bin 8265 -> 8327 bytes
> > >  tests/data/acpi/q35/DSDT.dimmpxm            | Bin 9455 -> 9517 bytes
> > >  tests/data/acpi/q35/DSDT.ipmibt             | Bin 7876 -> 7938 bytes
> > >  tests/data/acpi/q35/DSDT.memhp              | Bin 9160 -> 9222 bytes
> > >  tests/data/acpi/q35/DSDT.mmio64             | Bin 8932 -> 9024 bytes
> > >  tests/data/acpi/q35/DSDT.numamem            | Bin 7807 -> 7869 bytes
> > >  tests/data/acpi/q35/DSDT.tis                | Bin 8407 -> 8468 bytes
> > >  31 files changed, 197 insertions(+), 5 deletions(-)
> > > 
> > > -- 
> > > 2.27.0  
> > 
> >
Igor Mammedov Jan. 20, 2021, 12:07 p.m. UTC | #4
On Sun, 17 Jan 2021 05:59:18 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, Jan 15, 2021 at 02:59:02AM +0100, Igor Mammedov wrote:
> > On Wed, 13 Jan 2021 07:09:56 -0500
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Tue, Dec 22, 2020 at 06:39:29PM -0500, Igor Mammedov wrote:  
> > > > 
> > > > Series implements support for 'onboard' naming scheme for network
> > > > interfaces (1), which is based on PCI firmware spec and lets user
> > > > to explicitly specify index that will be used by guest to name
> > > > network interface, ex:
> > > >     -device e1000,acpi-index=33
> > > > should make guest rename NIC name to 'eno33' where 'eno' is default
> > > > prefix for this scheme.
> > > > 
> > > > Hope is that it will allow to simplify launching VMs from
> > > > template disk images with different set VM configurations
> > > > without need to reconfigure guest network intrfaces or
> > > > risk of loosing network connectivity.    
> > > 
> > > Questions:
> > > the spec says:
> > > Assignment of specific device names to multi-function devices installed in expansion
> > > slots, and/or PCI or PCI Express devices that are hot-added to expansion slots in operating system-
> > > environment would be handled in operating system-specific manner, and is not specified via this
> > > specification.
> > > 
> > > Accordingly, link below says:
> > > " Names incorporating Firmware/BIOS provided index numbers for on-board devices (example: eno1)"
> > > 
> > > to what extend does guest assume the index is for on-board devices?
> > > it seems things work for fedora but how confident are we that this
> > > will keep working.  
> > 
> > code itself is not limiting it to onboard devices in any way.
> > (I can only speculate here, reason for calling it onboard is that
> > on real hardware ACPI is mostly static tables and firmware provides
> > an option to set index for only built-in NICs).
> > Technically there is no reason to call in onboad though.  
> 
> Well there's an smbios thing that I believe says onboard
> in the spec. Come to think of it that one actually
> applies to multifunction devices too.
Can you point to the specific place in the spec, pls?

> 
> 
> > 
> > I'd believe it should work with any distribution that uses
> > recent enough systemd/udev (released starting from 2003).
> >   
> > > Further, code seems to only look at the slot level.
> > > According to this, and according to the spec, this does not work with
> > > multifunction devices, does it?  
> > 
> > we probably should disable it for multifunction devices,
> > any suggestions how to detect those in QEMU?  
> 
> QEMU_PCI_CAP_MULTIFUNCTION_BITNR ?
> 
> Enforcing this will help but can we make this self-documenting somehow?
what do you mean saying "to make it self-documenting somehow"?

> > > The link you supplied lists another option:
> > > "Names incorporating Firmware/BIOS provided PCI Express hotplug slot index numbers (example: ens1)"
> > > these are under management control already ...   
> > 
> > with it interface name continues to depend on PCI topology (and theoretically
> > limited to PCI expess). That's becomes harder to consume as complexity grows
> > (i.e. mgmt needs to keep NIC in the same place for which guest image was configured for).
> > acpi-index doesn't impose such limitation.
> >
> > In case of 1 NIC, it could be moved anywhere within PCI hierarchy and guest
> > doesn't have to be reconfigured to account for new interface name
> > (i.e without loosing network connectivity - that's the actual issue coming from
> > upper layers that made me look into acpi-index approach).  
> 
> Could you describe the issue in a bit more detail in the commit
> log pls?
sure

> > In another words acpi index is easier to consume for users above libvirt
> > and frees mgmt hands in a way it could distribute PCI devices.
> > Even better would be if guest image could carry index as metadata
> >   
> > > Also if we ask users to supply the property on the slot then it seems
> > > that the property can be baked into ACPI when it's created instead of
> > > being loaded from host - we can avoid adding new registers, this seems
> > > preferable.  Could someone from management side chime in on whether that
> > > is sufficient?  
> > 
> > I did consider it (it would be simpler, but not much), however unless we disable
> > PCI hotplug for affected slots it won't work. (unplug device from such slot and
> > plug another in that place will still return boot time index.  
> 
> Well it's interesting that you mention it. Let's say you remove
> a device then plug in a different index immediately ...
> Guest gets a wrong name temporarily, right?
> Should we worry about that?
guest will get not a wrong name but rather name that user asked for.
(i.e. NIC with old index could be moved to other slot and still keep its name)


> > That's why I ended up with hotplug variant. 
> > 
> > I chose abusing existing PCI hotplug registers for it, 
> > but we can use a new range if that's preferred.   
> 
> Would it work for multi-function devices though?
> If yes that's a big advantage ...
Does ACPI hotplug work with multi-function devices (I thought not)?
If not, I'd prefer explicitly disable new feature
for multi-function devices (the code is heavily based on
ACPI hotplug infrastructure we already have).
(
in other words, I don't see immediate benefit in grabbing new registers
)


> > > More questions:
> > > 
> > > does all this affect windows guests at all?  
> > I don't know (spec shows examples that reminded
> > me about NIC naming which Windows use(s|d)).
> > But I won't bet on it.
> > If I recall correctly, for e1000 NIC, it didn't made any
> > difference in naming (pre-existing guest image).
> >   
> > > where does the "acpi index" terminology come from?
> > > the pci firmware spec talks about "instance number", right?  
> > it comes from linux kernel (that's how it's named in sysfs)
> > and systemd/udev uses it. So I tried to avoid making up another
> > one.  
> 
> OTOH that's guest specific ... and does not seem to imply
> the limitations unless you look at the code ...
"instance number" looks to me too generic (it's fine in context of
specific chapter in spec). While linux kernel's variant looks to me as
as better option that fits the bill.

If you don't object, I'd prefer 'acpi-index'.

> >   
> > > > For more detailed description/examples see patches [3-4/5]
> > > > 
> > > > 1)
> > > >  https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/ 
> > > > 
> > > > Git repo for testing:
> > > >    https://github.com/imammedo/qemu/branches acpi-index-rfc
> > > > 
> > > > Igor Mammedov (5):
> > > >   acpi: add aml_to_decimalstring() and aml_call6() helpers
> > > >   tests: acpi: temporary whitelist DSDT changes
> > > >   pci: introduce apci-index property for PCI device
> > > >   pci: acpi: add _DSM method to PCI devices
> > > >   tests: acpi: update expected data files
> > > > 
> > > >  include/hw/acpi/aml-build.h                 |   3 +
> > > >  include/hw/acpi/pci.h                       |   1 +
> > > >  include/hw/acpi/pcihp.h                     |   7 +-
> > > >  include/hw/pci/pci.h                        |   1 +
> > > >  tests/qtest/bios-tables-test-allowed-diff.h |  21 +++++
> > > >  hw/acpi/aml-build.c                         |  28 +++++++
> > > >  hw/acpi/pci.c                               |  84 ++++++++++++++++++++
> > > >  hw/acpi/pcihp.c                             |  25 +++++-
> > > >  hw/i386/acpi-build.c                        |  31 +++++++-
> > > >  hw/pci/pci.c                                |   1 +
> > > >  tests/data/acpi/pc/DSDT                     | Bin 5065 -> 6023 bytes
> > > >  tests/data/acpi/pc/DSDT.acpihmat            | Bin 6390 -> 7348 bytes
> > > >  tests/data/acpi/pc/DSDT.bridge              | Bin 6924 -> 8689 bytes
> > > >  tests/data/acpi/pc/DSDT.cphp                | Bin 5529 -> 6487 bytes
> > > >  tests/data/acpi/pc/DSDT.dimmpxm             | Bin 6719 -> 7677 bytes
> > > >  tests/data/acpi/pc/DSDT.hpbridge            | Bin 5026 -> 5990 bytes
> > > >  tests/data/acpi/pc/DSDT.hpbrroot            | Bin 3084 -> 3177 bytes
> > > >  tests/data/acpi/pc/DSDT.ipmikcs             | Bin 5137 -> 6095 bytes
> > > >  tests/data/acpi/pc/DSDT.memhp               | Bin 6424 -> 7382 bytes
> > > >  tests/data/acpi/pc/DSDT.numamem             | Bin 5071 -> 6029 bytes
> > > >  tests/data/acpi/pc/DSDT.roothp              | Bin 5261 -> 6324 bytes
> > > >  tests/data/acpi/q35/DSDT                    | Bin 7801 -> 7863 bytes
> > > >  tests/data/acpi/q35/DSDT.acpihmat           | Bin 9126 -> 9188 bytes
> > > >  tests/data/acpi/q35/DSDT.bridge             | Bin 7819 -> 7911 bytes
> > > >  tests/data/acpi/q35/DSDT.cphp               | Bin 8265 -> 8327 bytes
> > > >  tests/data/acpi/q35/DSDT.dimmpxm            | Bin 9455 -> 9517 bytes
> > > >  tests/data/acpi/q35/DSDT.ipmibt             | Bin 7876 -> 7938 bytes
> > > >  tests/data/acpi/q35/DSDT.memhp              | Bin 9160 -> 9222 bytes
> > > >  tests/data/acpi/q35/DSDT.mmio64             | Bin 8932 -> 9024 bytes
> > > >  tests/data/acpi/q35/DSDT.numamem            | Bin 7807 -> 7869 bytes
> > > >  tests/data/acpi/q35/DSDT.tis                | Bin 8407 -> 8468 bytes
> > > >  31 files changed, 197 insertions(+), 5 deletions(-)
> > > > 
> > > > -- 
> > > > 2.27.0    
> > > 
> > >   
>