mbox series

[net-next,v2,0/7] devlink: expose PF and VF representors as ports

Message ID 20190301180453.17778-1-jakub.kicinski@netronome.com
Headers show
Series devlink: expose PF and VF representors as ports | expand

Message

Jakub Kicinski March 1, 2019, 6:04 p.m. UTC
Hi!

This series is a long overdue follow up to Jiri's work on providing
a common .ndo_phys_port_name implementation based on devlink ports.

First devlink port flavours for PF and VF ports are added, and
registered by the NFP. Port numbers and split info are reserved
for physical and DSA ports. For PCIe ports we add pf/vf identifiers.
Note that devices may have more than one PF, including multi host
scenarios where not all pfs are connected to the same host.

The port_index is slightly tricky to figure out, we use a bit of
math to create unique IDs for ports.

Next subports for PCIe ports are introduced. This is in case device
has more than one netdev on a physical function (e.g. multi port
SmartNIC).

With the above features in place we can remove the ndo_phys_port_name
implementation from the NFP and use the standard devlink one for
port netdevs.

Last but not least a concept of peer netdevs is added. In multi-host
scenarios its currently not possible to figure out which PF is
associated with the local host. Peer device is "the other side
of the wire" for PCIe ports. In case of NFP we only link the PF
netdevs, but it should be possible to also link VF peers after
VF driver probes, if users request it.

This is the conceptual image of devlink instances:

                    HOST A             ||          HOST B
                                       ||
        PF A       | V | V | V | V     ||       PF B        | V | V | V
                   |*F |*F |*F |*F ... ||                   |*F |*F |*F
*port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2
                                       ||
             PCI Express link          ||        PCI Express link
        \      \      \  |   |   |          |       |      /   /   /
         \      \      \ |   |   |          |       |     /   /   /
      /\  \______\______\'___|___|__________|_______'____/___/___/__    /\
      ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||
  i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
d n H ||  |               <<==========                              |   || d n H
e s O ||  |                            ==========>>                 |   || e s O
v t S ||  |                    SR-IOV e-switch                      |   || v t S
l a T ||  |               <<==========                              |   || l a T
i n   ||  |                            ==========>>                 |   || i n
n c A ||  |               ________ _________ ________               |   || n c B
k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
      ||  \---------------------------------------------------------/   ||
      \/                      |        |         |                      \/
                              |        |         |
                                 ||         ||
                          MAC 0  ||  MAC 1  || MAC 2
                                 ||         ||

 '+' marks the devlink ports and port (-representor-) netdevs
 '*' marks host netdevs (actual VF/PF netdev)

v2 (Jiri):
 - update devlink user space output in a commit message;
 - split the pci attribute setting helper into pf and vf versions;
 - add peer IBDEV_NAME;
 - add nest for peer attributes.

Jakub Kicinski (7):
  nfp: split devlink port init from registration
  devlink: add PF and VF port flavours
  nfp: register devlink ports of all reprs
  devlink: allow subports on devlink PCI ports
  nfp: switch to devlink_port_get_phys_port_name()
  devlink: introduce port's peer netdevs
  nfp: expose PF peer netdevs

 drivers/net/ethernet/netronome/nfp/abm/main.c |   1 +
 .../net/ethernet/netronome/nfp/flower/main.c  |   1 +
 .../net/ethernet/netronome/nfp/nfp_devlink.c  |  54 +++++-
 .../net/ethernet/netronome/nfp/nfp_net_main.c |  17 +-
 .../net/ethernet/netronome/nfp/nfp_net_repr.c |  16 +-
 drivers/net/ethernet/netronome/nfp/nfp_port.c |  33 +---
 drivers/net/ethernet/netronome/nfp/nfp_port.h |   4 +
 include/net/devlink.h                         |  48 ++++-
 include/uapi/linux/devlink.h                  |  12 ++
 net/core/devlink.c                            | 178 ++++++++++++++++--
 10 files changed, 310 insertions(+), 54 deletions(-)

Comments

Jiri Pirko March 2, 2019, 10:13 a.m. UTC | #1
Fri, Mar 01, 2019 at 07:04:46PM CET, jakub.kicinski@netronome.com wrote:
>Hi!
>
>This series is a long overdue follow up to Jiri's work on providing
>a common .ndo_phys_port_name implementation based on devlink ports.
>
>First devlink port flavours for PF and VF ports are added, and
>registered by the NFP. Port numbers and split info are reserved
>for physical and DSA ports. For PCIe ports we add pf/vf identifiers.
>Note that devices may have more than one PF, including multi host
>scenarios where not all pfs are connected to the same host.
>
>The port_index is slightly tricky to figure out, we use a bit of
>math to create unique IDs for ports.
>
>Next subports for PCIe ports are introduced. This is in case device
>has more than one netdev on a physical function (e.g. multi port
>SmartNIC).
>
>With the above features in place we can remove the ndo_phys_port_name
>implementation from the NFP and use the standard devlink one for
>port netdevs.
>
>Last but not least a concept of peer netdevs is added. In multi-host
>scenarios its currently not possible to figure out which PF is
>associated with the local host. Peer device is "the other side
>of the wire" for PCIe ports. In case of NFP we only link the PF
>netdevs, but it should be possible to also link VF peers after
>VF driver probes, if users request it.
>
>This is the conceptual image of devlink instances:
>
>                    HOST A             ||          HOST B
>                                       ||
>        PF A       | V | V | V | V     ||       PF B        | V | V | V
>                   |*F |*F |*F |*F ... ||                   |*F |*F |*F
>*port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2
>                                       ||
>             PCI Express link          ||        PCI Express link
>        \      \      \  |   |   |          |       |      /   /   /
>         \      \      \ |   |   |          |       |     /   /   /
>      /\  \______\______\'___|___|__________|_______'____/___/___/__    /\
>      ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||
>  i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
>d n H ||  |               <<==========                              |   || d n H
>e s O ||  |                            ==========>>                 |   || e s O
>v t S ||  |                    SR-IOV e-switch                      |   || v t S
>l a T ||  |               <<==========                              |   || l a T
>i n   ||  |                            ==========>>                 |   || i n
>n c A ||  |               ________ _________ ________               |   || n c B
>k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
>      ||  \---------------------------------------------------------/   ||
>      \/                      |        |         |                      \/
>                              |        |         |
>                                 ||         ||
>                          MAC 0  ||  MAC 1  || MAC 2
>                                 ||         ||
>
> '+' marks the devlink ports and port (-representor-) netdevs
> '*' marks host netdevs (actual VF/PF netdev)

As I wrote in the reply to patch 4, I think we need to figure out if we
want to model all entities that belong under specific devlink
instance/pci address - which I prefer, or we want to have only eswitch
ports there.

One way or another, I think that it is not good idea to merge this
patchset this late, I would prefer to wait for next net-next opening...
In the meantime we can sync and make this whole thing crystal clear, for
everyone.

Thanks!
Jakub Kicinski March 2, 2019, 7:49 p.m. UTC | #2
On Sat, 2 Mar 2019 11:13:37 +0100, Jiri Pirko wrote:
> >This is the conceptual image of devlink instances:
> >
> >                    HOST A             ||          HOST B
> >                                       ||
> >        PF A       | V | V | V | V     ||       PF B        | V | V | V
> >                   |*F |*F |*F |*F ... ||                   |*F |*F |*F
> >*port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2
> >                                       ||
> >             PCI Express link          ||        PCI Express link
> >        \      \      \  |   |   |          |       |      /   /   /
> >         \      \      \ |   |   |          |       |     /   /   /
> >      /\  \______\______\'___|___|__________|_______'____/___/___/__    /\
> >      ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||
> >  i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
> >d n H ||  |               <<==========                              |   || d n H
> >e s O ||  |                            ==========>>                 |   || e s O
> >v t S ||  |                    SR-IOV e-switch                      |   || v t S
> >l a T ||  |               <<==========                              |   || l a T
> >i n   ||  |                            ==========>>                 |   || i n
> >n c A ||  |               ________ _________ ________               |   || n c B
> >k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
> >      ||  \---------------------------------------------------------/   ||
> >      \/                      |        |         |                      \/
> >                              |        |         |
> >                                 ||         ||
> >                          MAC 0  ||  MAC 1  || MAC 2
> >                                 ||         ||
> >
> > '+' marks the devlink ports and port (-representor-) netdevs
> > '*' marks host netdevs (actual VF/PF netdev)  
> 
> As I wrote in the reply to patch 4, I think we need to figure out if we
> want to model all entities that belong under specific devlink
> instance/pci address - which I prefer, or we want to have only eswitch
> ports there.
> 
> One way or another, I think that it is not good idea to merge this
> patchset this late, I would prefer to wait for next net-next opening...
> In the meantime we can sync and make this whole thing crystal clear, for
> everyone.

Makes sense, let's keep talking.
Parav Pandit March 4, 2019, 5:12 a.m. UTC | #3
> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Jiri Pirko
> Sent: Saturday, March 2, 2019 4:14 AM
> To: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; oss-
> drivers@netronome.com
> Subject: Re: [PATCH net-next v2 0/7] devlink: expose PF and VF representors
> as ports
> 
> Fri, Mar 01, 2019 at 07:04:46PM CET, jakub.kicinski@netronome.com wrote:
> >Hi!
> >
> >This series is a long overdue follow up to Jiri's work on providing a
> >common .ndo_phys_port_name implementation based on devlink ports.
> >
> >First devlink port flavours for PF and VF ports are added, and
> >registered by the NFP. Port numbers and split info are reserved for
> >physical and DSA ports. For PCIe ports we add pf/vf identifiers.
> >Note that devices may have more than one PF, including multi host
> >scenarios where not all pfs are connected to the same host.
> >
> >The port_index is slightly tricky to figure out, we use a bit of math
> >to create unique IDs for ports.
> >
> >Next subports for PCIe ports are introduced. This is in case device has
> >more than one netdev on a physical function (e.g. multi port SmartNIC).
> >
> >With the above features in place we can remove the ndo_phys_port_name
> >implementation from the NFP and use the standard devlink one for port
> >netdevs.
> >
> >Last but not least a concept of peer netdevs is added. In multi-host
> >scenarios its currently not possible to figure out which PF is
> >associated with the local host. Peer device is "the other side of the
> >wire" for PCIe ports. In case of NFP we only link the PF netdevs, but
> >it should be possible to also link VF peers after VF driver probes, if
> >users request it.
> >
> >This is the conceptual image of devlink instances:
> >
> >                    HOST A             ||          HOST B
> >                                       ||
> >        PF A       | V | V | V | V     ||       PF B        | V | V | V
> >                   |*F |*F |*F |*F ... ||                   |*F |*F |*F
> >*port A0 |*port A1 | 0 | 1 | 2 | 3     ||*port B0 |*port B1 | 0 | 1 | 2
> >                                       ||
> >             PCI Express link          ||        PCI Express link
> >        \      \      \  |   |   |          |       |      /   /   /
> >         \      \      \ |   |   |          |       |     /   /   /
> >      /\  \______\______\'___|___|__________|_______'____/___/___/__
> /\
> >      ||  |+PF0s0|+PF0s1 |+VF0|+VF1| ...|   |+PF1s0|+PF1s1|+VF0|+VF1|   ||
> >  i   ||  |------ ------ ----- ---- ----|--- ------ ------ ---- ----|   ||   i
> >d n H ||  |               <<==========                              |   || d n H
> >e s O ||  |                            ==========>>                 |   || e s O
> >v t S ||  |                    SR-IOV e-switch                      |   || v t S
> >l a T ||  |               <<==========                              |   || l a T
> >i n   ||  |                            ==========>>                 |   || i n
> >n c A ||  |               ________ _________ ________               |   || n c B
> >k e   ||  |              |+Phys 0 |+Phys 1  |+Phys 2 |              |   || k e
> >      ||  \---------------------------------------------------------/   ||
> >      \/                      |        |         |                      \/
> >                              |        |         |
> >                                 ||         ||
> >                          MAC 0  ||  MAC 1  || MAC 2
> >                                 ||         ||
> >
> > '+' marks the devlink ports and port (-representor-) netdevs '*' marks
> > host netdevs (actual VF/PF netdev)
> 
> As I wrote in the reply to patch 4, I think we need to figure out if we want to
> model all entities that belong under specific devlink instance/pci address -
> which I prefer, or we want to have only eswitch ports there.
> 
> One way or another, I think that it is not good idea to merge this patchset
> this late, I would prefer to wait for next net-next opening...
> In the meantime we can sync and make this whole thing crystal clear, for
> everyone.
> 
Yes, please.
I replied in other thread, we should not bring peer port concept. It is convoluted.
We just need hostport and switchport representation (likely as port flavours) to configure each separately.
Whether a given devlink device is PF or VF is devlink devie attribute anyway.
David Miller March 4, 2019, 6:22 p.m. UTC | #4
From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Fri,  1 Mar 2019 10:04:46 -0800

> This series is a long overdue follow up to Jiri's work on providing
> a common .ndo_phys_port_name implementation based on devlink ports.

I think this needs enough discussion still that it will have to wait
until the next merge window.
Jakub Kicinski March 20, 2019, 8:25 p.m. UTC | #5
On Fri,  1 Mar 2019 10:04:46 -0800, Jakub Kicinski wrote:
> Hi!
> 
> This series is a long overdue follow up to Jiri's work on providing
> a common .ndo_phys_port_name implementation based on devlink ports.

Hi Jiri,

unfortunately I need to focus on some urgent work, so I won't have time
to work on this.
Jiri Pirko March 21, 2019, 9:11 a.m. UTC | #6
Wed, Mar 20, 2019 at 09:25:53PM CET, jakub.kicinski@netronome.com wrote:
>On Fri,  1 Mar 2019 10:04:46 -0800, Jakub Kicinski wrote:
>> Hi!
>> 
>> This series is a long overdue follow up to Jiri's work on providing
>> a common .ndo_phys_port_name implementation based on devlink ports.
>
>Hi Jiri,
>
>unfortunately I need to focus on some urgent work, so I won't have time
>to work on this.

Okay. I'll start pushing my devlink patches I have in qeueu and we'll
get back to this later, no worries.