diff mbox

[RFC] switchdev: clarify ndo_get_phys_port_name() formats

Message ID 20170725051344.2040-1-jakub.kicinski@netronome.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jakub Kicinski July 25, 2017, 5:13 a.m. UTC
We are still in position where we can suggest uniform naming
convention for ndo_get_phys_port_name().  switchdev.txt file
already contained a suggestion of how to name external ports.
Since the use of switchdev for SR-IOV NIC's eswitches is growing,
establish a format for ports of those devices as well.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 Documentation/networking/switchdev.txt | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Andy Gospodarek July 25, 2017, 3:22 p.m. UTC | #1
On Mon, Jul 24, 2017 at 10:13:44PM -0700, Jakub Kicinski wrote:
> We are still in position where we can suggest uniform naming
> convention for ndo_get_phys_port_name().  switchdev.txt file
> already contained a suggestion of how to name external ports.
> Since the use of switchdev for SR-IOV NIC's eswitches is growing,
> establish a format for ports of those devices as well.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

This is a nice addition and I suspect there could be even more done to
update this file to cover the VF rep usage.

> ---
>  Documentation/networking/switchdev.txt | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> index 3e7b946dea27..7c4b6025fb4b 100644
> --- a/Documentation/networking/switchdev.txt
> +++ b/Documentation/networking/switchdev.txt
> @@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, the device can give a unique
>  SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
>  	ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
>  
> -Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
> -is the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
> -would be sub-port 0 on port 1 on switch 1.
> +Suggested formats of the port name returned by ndo_get_phys_port_name are:
> + - pA     for external ports;
> + - pAsB   for split external ports;
> + - pfC    for PF ports (so called PF representors);
> + - pfCvfD for VF ports (so called VF representors).

I hate to clutter this up, but might be also need to add:

 - pfCsB    for split PF ports (so called PF representors);
 - pfCsBvfD for split VF ports (so called VF representors).

or are we comfortable that these additions to the name for split ports
are implied?

> +Where A is the port name or ID; B is the sub-port name or ID; C is PCIe
> +Physical Function name or ID and D is PCIe Virtual Function name or ID.
> +
> +Suggested naming convention for switches is "swX", where X is the switch name
> +or ID, plus the port name.  For example, sw1p1s0 would be sub-port 0 on port 1
> +on switch 1.
>  
>  Port Features
>  ^^^^^^^^^^^^^
Jakub Kicinski July 25, 2017, 10:26 p.m. UTC | #2
On Tue, 25 Jul 2017 11:22:41 -0400, Andy Gospodarek wrote:
> On Mon, Jul 24, 2017 at 10:13:44PM -0700, Jakub Kicinski wrote:
> > We are still in position where we can suggest uniform naming
> > convention for ndo_get_phys_port_name().  switchdev.txt file
> > already contained a suggestion of how to name external ports.
> > Since the use of switchdev for SR-IOV NIC's eswitches is growing,
> > establish a format for ports of those devices as well.
> > 
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>  
> 
> This is a nice addition and I suspect there could be even more done to
> update this file to cover the VF rep usage.
> 
> > ---
> >  Documentation/networking/switchdev.txt | 14 +++++++++++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> > index 3e7b946dea27..7c4b6025fb4b 100644
> > --- a/Documentation/networking/switchdev.txt
> > +++ b/Documentation/networking/switchdev.txt
> > @@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, the device can give a unique
> >  SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
> >  	ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
> >  
> > -Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
> > -is the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
> > -would be sub-port 0 on port 1 on switch 1.
> > +Suggested formats of the port name returned by ndo_get_phys_port_name are:
> > + - pA     for external ports;
> > + - pAsB   for split external ports;
> > + - pfC    for PF ports (so called PF representors);
> > + - pfCvfD for VF ports (so called VF representors).  
> 
> I hate to clutter this up, but might be also need to add:
> 
>  - pfCsB    for split PF ports (so called PF representors);
>  - pfCsBvfD for split VF ports (so called VF representors).
> 
> or are we comfortable that these additions to the name for split ports
> are implied?

Hm..  What is a split PF port?  Splits happen on the physical port - see
my rant on the thread this is a reply to ;)  PFs are PCIe functions,
on the opposite side of the eswitch from the wires.
Andy Gospodarek July 26, 2017, 1:48 a.m. UTC | #3
On Tue, Jul 25, 2017 at 03:26:47PM -0700, Jakub Kicinski wrote:
> On Tue, 25 Jul 2017 11:22:41 -0400, Andy Gospodarek wrote:
> > On Mon, Jul 24, 2017 at 10:13:44PM -0700, Jakub Kicinski wrote:
> > > We are still in position where we can suggest uniform naming
> > > convention for ndo_get_phys_port_name().  switchdev.txt file
> > > already contained a suggestion of how to name external ports.
> > > Since the use of switchdev for SR-IOV NIC's eswitches is growing,
> > > establish a format for ports of those devices as well.
> > > 
> > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>  
> > 
> > This is a nice addition and I suspect there could be even more done to
> > update this file to cover the VF rep usage.
> > 
> > > ---
> > >  Documentation/networking/switchdev.txt | 14 +++++++++++---
> > >  1 file changed, 11 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> > > index 3e7b946dea27..7c4b6025fb4b 100644
> > > --- a/Documentation/networking/switchdev.txt
> > > +++ b/Documentation/networking/switchdev.txt
> > > @@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, the device can give a unique
> > >  SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
> > >  	ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
> > >  
> > > -Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
> > > -is the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
> > > -would be sub-port 0 on port 1 on switch 1.
> > > +Suggested formats of the port name returned by ndo_get_phys_port_name are:
> > > + - pA     for external ports;
> > > + - pAsB   for split external ports;
> > > + - pfC    for PF ports (so called PF representors);
> > > + - pfCvfD for VF ports (so called VF representors).  
> > 
> > I hate to clutter this up, but might be also need to add:
> > 
> >  - pfCsB    for split PF ports (so called PF representors);
> >  - pfCsBvfD for split VF ports (so called VF representors).
> > 
> > or are we comfortable that these additions to the name for split ports
> > are implied?
> 
> Hm..  What is a split PF port?  Splits happen on the physical port - see
> my rant on the thread this is a reply to ;)  PFs are PCIe functions,
> on the opposite side of the eswitch from the wires.

I'm with you that I think there is value in separate netdevs to
represent "PFs, VFs and external ports/MACs" -- particularly for the
use-case you to create rules to control PF<->VF traffic.

So while I'm not saying it is a _great_ idea to support such a thing as
port-splitting of PFs, I suggested this addition as I'm not willing to restrict
such a design/implementation if a vendor or customer desired.  It seemed
useful to provde some guidance on how to name them -- even if we do not
like them.  :-)
Jakub Kicinski July 26, 2017, 2:34 a.m. UTC | #4
On Tue, 25 Jul 2017 21:48:15 -0400, Andy Gospodarek wrote:
> On Tue, Jul 25, 2017 at 03:26:47PM -0700, Jakub Kicinski wrote:
> > On Tue, 25 Jul 2017 11:22:41 -0400, Andy Gospodarek wrote:  
> > > On Mon, Jul 24, 2017 at 10:13:44PM -0700, Jakub Kicinski wrote:  
> > > > We are still in position where we can suggest uniform naming
> > > > convention for ndo_get_phys_port_name().  switchdev.txt file
> > > > already contained a suggestion of how to name external ports.
> > > > Since the use of switchdev for SR-IOV NIC's eswitches is growing,
> > > > establish a format for ports of those devices as well.
> > > > 
> > > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>    
> > > 
> > > This is a nice addition and I suspect there could be even more done to
> > > update this file to cover the VF rep usage.
> > >   
> > > > ---
> > > >  Documentation/networking/switchdev.txt | 14 +++++++++++---
> > > >  1 file changed, 11 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> > > > index 3e7b946dea27..7c4b6025fb4b 100644
> > > > --- a/Documentation/networking/switchdev.txt
> > > > +++ b/Documentation/networking/switchdev.txt
> > > > @@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, the device can give a unique
> > > >  SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
> > > >  	ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
> > > >  
> > > > -Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
> > > > -is the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
> > > > -would be sub-port 0 on port 1 on switch 1.
> > > > +Suggested formats of the port name returned by ndo_get_phys_port_name are:
> > > > + - pA     for external ports;
> > > > + - pAsB   for split external ports;
> > > > + - pfC    for PF ports (so called PF representors);
> > > > + - pfCvfD for VF ports (so called VF representors).    
> > > 
> > > I hate to clutter this up, but might be also need to add:
> > > 
> > >  - pfCsB    for split PF ports (so called PF representors);
> > >  - pfCsBvfD for split VF ports (so called VF representors).
> > > 
> > > or are we comfortable that these additions to the name for split ports
> > > are implied?  
> > 
> > Hm..  What is a split PF port?  Splits happen on the physical port - see
> > my rant on the thread this is a reply to ;)  PFs are PCIe functions,
> > on the opposite side of the eswitch from the wires.  
> 
> I'm with you that I think there is value in separate netdevs to
> represent "PFs, VFs and external ports/MACs" -- particularly for the
> use-case you to create rules to control PF<->VF traffic.
> 
> So while I'm not saying it is a _great_ idea to support such a thing as
> port-splitting of PFs, I suggested this addition as I'm not willing to restrict
> such a design/implementation if a vendor or customer desired.  It seemed
> useful to provde some guidance on how to name them -- even if we do not
> like them.  :-)

If I understand you correctly split PF would be a situation where
device has multiple port instances on the PCIe PF side?  IOW switch sees
multiple endpoints on the PF side?  Let me attempt an ASCII diagram :)

                                                                           
                    HOST A             ||          HOST B                  
                                       ||                                  
        PF A       | V | V | V | V     ||       PF B        | V | V | V               
                   | F | F | F | F ... ||                   | F | F | F ...           
 port A0 | port A1 | 0 | 1 | 2 | 3     || port B0 | port B1 | 0 | 1 | 2            
                                       ||                                  
             PCI Express link          ||        PCI Express link          
        \      \      \  |   |   |          |        |      /   /   /         
         \      \      \ |   |   |          |        |     /   /   /          
          \______\______\'   |   |          |        '____/___/___/
                         /---------------------------\                         
                         |<<==========               |
                         |             ==========>>  |
                         |     SR-IOV e-switch       |                         
                         |<<==========               |                         
                         |             ==========>>  |                          
                         \---------------------------/                     
                              |        |         |                          
                              |        |         |                          
                                 ||         ||                               
                          MAC 0  ||  MAC 1  || MAC 2                   
                                 ||         ||                              


Seems to be a valid configuration, perhaps this would actually be of
some use in container workloads, especially if the ports could be
instantiated at runtime in high numbers.  I would be cautious though
with calling the instances splits.  The more different PFs look from
MACs the better IMHO.  Do you actually have that problem today?  Is
there any HW supported upstream which would benefit from this?  Could we
decide on naming when we have an example implementation?  In theory
nothing stops us from splitting VFs the same way.

Another note on PF netdevs, perhaps the most awkward thing about them,
is that they result in two netdevs being visible to the host.  This is
not incorrect, since VFs if unassigned to VMs will end up creating an
"actual" netdev and the switchdev port representor too, but it rubs
some people the wrong way.  Which in turn makes those people try to not
spawn separate netdevs, which is incorrect IMHO, and breaks down e.g.
when the real netdev gets assigned to a namespace.

I'm not sure if this clarifies my thinking, I have, however, seem to
have drawn a moose :)
Jiri Pirko July 26, 2017, 5:48 a.m. UTC | #5
Tue, Jul 25, 2017 at 07:13:44AM CEST, jakub.kicinski@netronome.com wrote:
>We are still in position where we can suggest uniform naming
>convention for ndo_get_phys_port_name().  switchdev.txt file
>already contained a suggestion of how to name external ports.
>Since the use of switchdev for SR-IOV NIC's eswitches is growing,
>establish a format for ports of those devices as well.
>
>Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>---
> Documentation/networking/switchdev.txt | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
>diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>index 3e7b946dea27..7c4b6025fb4b 100644
>--- a/Documentation/networking/switchdev.txt
>+++ b/Documentation/networking/switchdev.txt
>@@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, the device can give a unique
> SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
> 	ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
> 
>-Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
>-is the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
>-would be sub-port 0 on port 1 on switch 1.
>+Suggested formats of the port name returned by ndo_get_phys_port_name are:
>+ - pA     for external ports;
>+ - pAsB   for split external ports;
>+ - pfC    for PF ports (so called PF representors);
>+ - pfCvfD for VF ports (so called VF representors).

I think it would make sense if the driver would just fill-up a struct in
the ndo call and core would generate the string.


>+Where A is the port name or ID; B is the sub-port name or ID; C is PCIe
>+Physical Function name or ID and D is PCIe Virtual Function name or ID.
>+
>+Suggested naming convention for switches is "swX", where X is the switch name
>+or ID, plus the port name.  For example, sw1p1s0 would be sub-port 0 on port 1
>+on switch 1.
> 
> Port Features
> ^^^^^^^^^^^^^
>-- 
>2.11.0
>
Jakub Kicinski July 26, 2017, 8:13 a.m. UTC | #6
On Wed, 26 Jul 2017 07:48:40 +0200, Jiri Pirko wrote:
> Tue, Jul 25, 2017 at 07:13:44AM CEST, jakub.kicinski@netronome.com wrote:
> >We are still in position where we can suggest uniform naming
> >convention for ndo_get_phys_port_name().  switchdev.txt file
> >already contained a suggestion of how to name external ports.
> >Since the use of switchdev for SR-IOV NIC's eswitches is growing,
> >establish a format for ports of those devices as well.
> >
> >Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >---
> > Documentation/networking/switchdev.txt | 14 +++++++++++---
> > 1 file changed, 11 insertions(+), 3 deletions(-)
> >
> >diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> >index 3e7b946dea27..7c4b6025fb4b 100644
> >--- a/Documentation/networking/switchdev.txt
> >+++ b/Documentation/networking/switchdev.txt
> >@@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, the device can give a unique
> > SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
> > 	ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
> > 
> >-Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
> >-is the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
> >-would be sub-port 0 on port 1 on switch 1.
> >+Suggested formats of the port name returned by ndo_get_phys_port_name are:
> >+ - pA     for external ports;
> >+ - pAsB   for split external ports;
> >+ - pfC    for PF ports (so called PF representors);
> >+ - pfCvfD for VF ports (so called VF representors).  
> 
> I think it would make sense if the driver would just fill-up a struct in
> the ndo call and core would generate the string.

I do like the idea of core generating the string.  I have a temptation
to try to involve devlink in this somehow, since port id and split info
is already available there.  Perhaps that would only overcomplicate
things.

The other question is: can we remove the option to generate an arbitrary
string completely?  I think Or and mlx5 folks may object since it would
mean mlx5 VF repr names would change.
Or Gerlitz July 26, 2017, 9:23 a.m. UTC | #7
On Wed, Jul 26, 2017 at 11:13 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Wed, 26 Jul 2017 07:48:40 +0200, Jiri Pirko wrote:

>> I think it would make sense if the driver would just fill-up a struct in
>> the ndo call and core would generate the string.

> I do like the idea of core generating the string.  I have a temptation
> to try to involve devlink in this somehow, since port id and split info
> is already available there.  Perhaps that would only overcomplicate
> things.

> The other question is: can we remove the option to generate an arbitrary
> string completely?  I think Or and mlx5 folks may object since it would
> mean mlx5 VF repr names would change.

What we have today is the representor driver instance setting the VF index as
the of phys port name and we're telling users to have this sort of udev rule:

SUBSYSTEM=="net", ACTION=="add",
ATTR{phys_switch_id}=="<phys_switch_id>", \ ATTR{phys_port_name}!="",
NAME="$PF_NIC$attr{phys_port_name}"

that has affiliation to  the PF where this VF belongs. AFAIK this
model/assumption is now under push to
higher level (open stack), so lets see if/what we want to change here
w.r.t to sriov offloading drivers.

I would opt for an approach where the value returned by the kernel is
the minimal possible and
user-space has flexibility to further orchestrate that with udev
rules. I wasn't fully clear on Jakub's suggestion
which parts must come from the kernel. Do we have any length
limitation for the phys port name?

Or.
Andy Gospodarek July 26, 2017, 5:54 p.m. UTC | #8
On Tue, Jul 25, 2017 at 07:34:47PM -0700, Jakub Kicinski wrote:
> On Tue, 25 Jul 2017 21:48:15 -0400, Andy Gospodarek wrote:
> > On Tue, Jul 25, 2017 at 03:26:47PM -0700, Jakub Kicinski wrote:
> > > On Tue, 25 Jul 2017 11:22:41 -0400, Andy Gospodarek wrote:  
> > > > On Mon, Jul 24, 2017 at 10:13:44PM -0700, Jakub Kicinski wrote:  
> > > > > We are still in position where we can suggest uniform naming
> > > > > convention for ndo_get_phys_port_name().  switchdev.txt file
> > > > > already contained a suggestion of how to name external ports.
> > > > > Since the use of switchdev for SR-IOV NIC's eswitches is growing,
> > > > > establish a format for ports of those devices as well.
> > > > > 
> > > > > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>    
> > > > 
> > > > This is a nice addition and I suspect there could be even more done to
> > > > update this file to cover the VF rep usage.
> > > >   
> > > > > ---
> > > > >  Documentation/networking/switchdev.txt | 14 +++++++++++---
> > > > >  1 file changed, 11 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> > > > > index 3e7b946dea27..7c4b6025fb4b 100644
> > > > > --- a/Documentation/networking/switchdev.txt
> > > > > +++ b/Documentation/networking/switchdev.txt
> > > > > @@ -119,9 +119,17 @@ into 4 10G ports, resulting in 4 port netdevs, the device can give a unique
> > > > >  SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
> > > > >  	ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
> > > > >  
> > > > > -Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
> > > > > -is the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
> > > > > -would be sub-port 0 on port 1 on switch 1.
> > > > > +Suggested formats of the port name returned by ndo_get_phys_port_name are:
> > > > > + - pA     for external ports;
> > > > > + - pAsB   for split external ports;
> > > > > + - pfC    for PF ports (so called PF representors);
> > > > > + - pfCvfD for VF ports (so called VF representors).    
> > > > 
> > > > I hate to clutter this up, but might be also need to add:
> > > > 
> > > >  - pfCsB    for split PF ports (so called PF representors);
> > > >  - pfCsBvfD for split VF ports (so called VF representors).
> > > > 
> > > > or are we comfortable that these additions to the name for split ports
> > > > are implied?  
> > > 
> > > Hm..  What is a split PF port?  Splits happen on the physical port - see
> > > my rant on the thread this is a reply to ;)  PFs are PCIe functions,
> > > on the opposite side of the eswitch from the wires.  
> > 
> > I'm with you that I think there is value in separate netdevs to
> > represent "PFs, VFs and external ports/MACs" -- particularly for the
> > use-case you to create rules to control PF<->VF traffic.
> > 
> > So while I'm not saying it is a _great_ idea to support such a thing as
> > port-splitting of PFs, I suggested this addition as I'm not willing to restrict
> > such a design/implementation if a vendor or customer desired.  It seemed
> > useful to provde some guidance on how to name them -- even if we do not
> > like them.  :-)
> 
> If I understand you correctly split PF would be a situation where
> device has multiple port instances on the PCIe PF side?  IOW switch sees
> multiple endpoints on the PF side?  Let me attempt an ASCII diagram :)
> 
>                                                                            
>                     HOST A             ||          HOST B                  
>                                        ||                                  
>         PF A       | V | V | V | V     ||       PF B        | V | V | V               
>                    | F | F | F | F ... ||                   | F | F | F ...           
>  port A0 | port A1 | 0 | 1 | 2 | 3     || port B0 | port B1 | 0 | 1 | 2            
>                                        ||                                  
>              PCI Express link          ||        PCI Express link          
>         \      \      \  |   |   |          |        |      /   /   /         
>          \      \      \ |   |   |          |        |     /   /   /          
>           \______\______\'   |   |          |        '____/___/___/
>                          /---------------------------\                         
>                          |<<==========               |
>                          |             ==========>>  |
>                          |     SR-IOV e-switch       |                         
>                          |<<==========               |                         
>                          |             ==========>>  |                          
>                          \---------------------------/                     
>                               |        |         |                          
>                               |        |         |                          
>                                  ||         ||                               
>                           MAC 0  ||  MAC 1  || MAC 2                   
>                                  ||         ||                              
> 
> 
> Seems to be a valid configuration, perhaps this would actually be of
> some use in container workloads, especially if the ports could be
> instantiated at runtime in high numbers.  I would be cautious though
> with calling the instances splits.  The more different PFs look from
> MACs the better IMHO.  Do you actually have that problem today?
> 
> Is there any HW supported upstream which would benefit from this?  Could we
> decide on naming when we have an example implementation?  In theory
> nothing stops us from splitting VFs the same way.

Not that I know about right now.

> Another note on PF netdevs, perhaps the most awkward thing about them,
> is that they result in two netdevs being visible to the host.  This is
> not incorrect, since VFs if unassigned to VMs will end up creating an
> "actual" netdev and the switchdev port representor too, but it rubs
> some people the wrong way.  Which in turn makes those people try to not
> spawn separate netdevs, which is incorrect IMHO, and breaks down e.g.
> when the real netdev gets assigned to a namespace.

For me, the most awkward part of having a separate netdev for the PF and
the MAC is that is really not how things were thought about in the
nominal switching case (the non e-switch case).

Since idea behind switchdev when it was created was to make sure that
each front-panel port on a switch was represented by a netdev in the
kernel (and the 'CPU interface' was abstracted away by the driver) I was
always a bit uneasy about having a separate netdev allocated for the CPU
port when in the switching case it really wasn't necessary.

> I'm not sure if this clarifies my thinking, I have, however, seem to
> have drawn a moose :)

Which looks great, BTW.  The moose may turn out to be one of the major
benefits from this thread!
Jakub Kicinski July 26, 2017, 8:11 p.m. UTC | #9
On Wed, 26 Jul 2017 12:23:17 +0300, Or Gerlitz wrote:
> On Wed, Jul 26, 2017 at 11:13 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
> > On Wed, 26 Jul 2017 07:48:40 +0200, Jiri Pirko wrote:  
> 
> >> I think it would make sense if the driver would just fill-up a struct in
> >> the ndo call and core would generate the string.  
> 
> > I do like the idea of core generating the string.  I have a temptation
> > to try to involve devlink in this somehow, since port id and split info
> > is already available there.  Perhaps that would only overcomplicate
> > things.  
> 
> > The other question is: can we remove the option to generate an arbitrary
> > string completely?  I think Or and mlx5 folks may object since it would
> > mean mlx5 VF repr names would change.  
> 
> What we have today is the representor driver instance setting the VF index as
> the of phys port name and we're telling users to have this sort of udev rule:
> 
> SUBSYSTEM=="net", ACTION=="add",
> ATTR{phys_switch_id}=="<phys_switch_id>", \ ATTR{phys_port_name}!="",
> NAME="$PF_NIC$attr{phys_port_name}"

Example names generated by this rule would be pfnic_0, pfnic_1 for vf
representors 0 and 1?

> that has affiliation to  the PF where this VF belongs. AFAIK this
> model/assumption is now under push to
> higher level (open stack), so lets see if/what we want to change here
> w.r.t to sriov offloading drivers.
> 
> I would opt for an approach where the value returned by the kernel is
> the minimal possible and
> user-space has flexibility to further orchestrate that with udev
> rules. I wasn't fully clear on Jakub's suggestion
> which parts must come from the kernel. Do we have any length
> limitation for the phys port name?

Looks like the limit today is IFNAMSIZ.  I'm in favor of leaving the
flexibility to the userspace, why I suggested adding the pf%d or
pf%dvf%d to the name is that I don't think we have any other way today
of telling whether representor is for a physical port, VF or PF.

If I understand mlx5 code, you're not reporting port ids for physical
ports so presence of the name already implies it's a VF but in case you
want to add port splitting support, for example, reporting the name on
physical ports will become more of a necessity.

If we adopt Jiri's suggestion of returning structured data it will be
very easy to give user space type and indexes separately, but we should
probably still return the string for backwards compatibility.
Or Gerlitz July 27, 2017, 10:30 a.m. UTC | #10
On Wed, Jul 26, 2017 at 11:11 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
> On Wed, 26 Jul 2017 12:23:17 +0300, Or Gerlitz wrote:
>> On Wed, Jul 26, 2017 at 11:13 AM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> > On Wed, 26 Jul 2017 07:48:40 +0200, Jiri Pirko wrote:
>>
>> >> I think it would make sense if the driver would just fill-up a struct in
>> >> the ndo call and core would generate the string.
>>
>> > I do like the idea of core generating the string.  I have a temptation
>> > to try to involve devlink in this somehow, since port id and split info
>> > is already available there.  Perhaps that would only overcomplicate
>> > things.
>>
>> > The other question is: can we remove the option to generate an arbitrary
>> > string completely?  I think Or and mlx5 folks may object since it would
>> > mean mlx5 VF repr names would change.
>>
>> What we have today is the representor driver instance setting the VF index as
>> the of phys port name and we're telling users to have this sort of udev rule:
>>
>> SUBSYSTEM=="net", ACTION=="add",
>> ATTR{phys_switch_id}=="<phys_switch_id>", \ ATTR{phys_port_name}!="",
>> NAME="$PF_NIC$attr{phys_port_name}"
>
> Example names generated by this rule would be pfnic_0, pfnic_1 for vf
> representors 0 and 1?

YES

>> that has affiliation to  the PF where this VF belongs. AFAIK this
>> model/assumption is now under push to
>> higher level (open stack), so lets see if/what we want to change here
>> w.r.t to sriov offloading drivers.
>>
>> I would opt for an approach where the value returned by the kernel is
>> the minimal possible and
>> user-space has flexibility to further orchestrate that with udev
>> rules. I wasn't fully clear on Jakub's suggestion
>> which parts must come from the kernel. Do we have any length
>> limitation for the phys port name?
>
> Looks like the limit today is IFNAMSIZ.  I'm in favor of leaving the
> flexibility to the userspace, why I suggested adding the pf%d or
> pf%dvf%d to the name is that I don't think we have any other way today
> of telling whether representor is for a physical port, VF or PF.

> If I understand mlx5 code, you're not reporting port ids for physical
> ports so presence of the name already implies it's a VF but in case you

yes, currently in our model the PF serves as the uplink representor
when running in offloads mode.

> want to add port splitting support, for example, reporting the name on
> physical ports will become more of a necessity.

> If we adopt Jiri's suggestion of returning structured data it will be
> very easy to give user space type and indexes separately, but we should
> probably still return the string for backwards compatibility.

I am not still clear how the structured data would look like
diff mbox

Patch

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index 3e7b946dea27..7c4b6025fb4b 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -119,9 +119,17 @@  into 4 10G ports, resulting in 4 port netdevs, the device can give a unique
 SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
 	ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
 
-Suggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
-is the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
-would be sub-port 0 on port 1 on switch 1.
+Suggested formats of the port name returned by ndo_get_phys_port_name are:
+ - pA     for external ports;
+ - pAsB   for split external ports;
+ - pfC    for PF ports (so called PF representors);
+ - pfCvfD for VF ports (so called VF representors).
+Where A is the port name or ID; B is the sub-port name or ID; C is PCIe
+Physical Function name or ID and D is PCIe Virtual Function name or ID.
+
+Suggested naming convention for switches is "swX", where X is the switch name
+or ID, plus the port name.  For example, sw1p1s0 would be sub-port 0 on port 1
+on switch 1.
 
 Port Features
 ^^^^^^^^^^^^^