diff mbox

[iproute2,1/6] iproute2: ipa: show switch id

Message ID 1417683438-10935-2-git-send-email-jiri@resnulli.us
State Changes Requested, archived
Delegated to: stephen hemminger
Headers show

Commit Message

Jiri Pirko Dec. 4, 2014, 8:57 a.m. UTC
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/if_link.h | 1 +
 ip/ipaddress.c          | 8 ++++++++
 2 files changed, 9 insertions(+)

Comments

Jamal Hadi Salim Dec. 4, 2014, 1:17 p.m. UTC | #1
On 12/04/14 03:57, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   include/linux/if_link.h | 1 +
>   ip/ipaddress.c          | 8 ++++++++
>   2 files changed, 9 insertions(+)
>
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 4732063..a6e2594 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
>   	IFLA_CARRIER,
>   	IFLA_PHYS_PORT_ID,
>   	IFLA_CARRIER_CHANGES,
> +	IFLA_PHYS_SWITCH_ID,
>   	__IFLA_MAX
>   };
>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Gospodarek Dec. 4, 2014, 2:20 p.m. UTC | #2
On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/linux/if_link.h | 1 +
>  ip/ipaddress.c          | 8 ++++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 4732063..a6e2594 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
>  	IFLA_CARRIER,
>  	IFLA_PHYS_PORT_ID,
>  	IFLA_CARRIER_CHANGES,
> +	IFLA_PHYS_SWITCH_ID,

Serious question for Stephen et al, once we take this to iproute2 are we
going to be able to change the name but the string diplayed if needed?

I had a patch that called this IFLA_PARENT_ID that I was using with the
older github tree used by Jiri and Scott before these were in net-next.
I wanted to submit that as a change to what became
82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
be accepted and the dust settled.

I like the parent/child/sibling nomenclature better for 4 reasons:

- Most did not seem to like the term 'offload' since that term would be
  confusing with, GRO, GSO, etc.
- A *significant* use case for many of the high-end ASICs in datacenters
  is routing.
- switchid does not make sense in the OVS/flow case because it is all
  about flows, not switches or routers, and parent made sense there.
- I wanted to combine this for use with SR-IOV use case, so one can more
  easily map PF->VF using this.

Can you give me a bit (a day) to clean-up that patch and submit an
alternative proposal to these?

>  	__IFLA_MAX
>  };
>  
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index 4d99324..bd36a07 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>  				      b1, sizeof(b1)));
>  	}
>  
> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
> +		SPRINT_BUF(b1);
> +		fprintf(fp, "switchid %s ",
> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
> +				      b1, sizeof(b1)));
> +	}
> +
>  	if (tb[IFLA_OPERSTATE])
>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>  
> -- 
> 1.9.3
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roopa Prabhu Dec. 4, 2014, 2:29 p.m. UTC | #3
On 12/4/14, 6:20 AM, Andy Gospodarek wrote:
> On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>   include/linux/if_link.h | 1 +
>>   ip/ipaddress.c          | 8 ++++++++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> index 4732063..a6e2594 100644
>> --- a/include/linux/if_link.h
>> +++ b/include/linux/if_link.h
>> @@ -145,6 +145,7 @@ enum {
>>   	IFLA_CARRIER,
>>   	IFLA_PHYS_PORT_ID,
>>   	IFLA_CARRIER_CHANGES,
>> +	IFLA_PHYS_SWITCH_ID,
> Serious question for Stephen et al, once we take this to iproute2 are we
> going to be able to change the name but the string diplayed if needed?
>
> I had a patch that called this IFLA_PARENT_ID that I was using with the
> older github tree used by Jiri and Scott before these were in net-next.
> I wanted to submit that as a change to what became
> 82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
> be accepted and the dust settled.
>
> I like the parent/child/sibling nomenclature better for 4 reasons:
>
> - Most did not seem to like the term 'offload' since that term would be
>    confusing with, GRO, GSO, etc.
> - A *significant* use case for many of the high-end ASICs in datacenters
>    is routing.
> - switchid does not make sense in the OVS/flow case because it is all
>    about flows, not switches or routers, and parent made sense there.
> - I wanted to combine this for use with SR-IOV use case, so one can more
>    easily map PF->VF using this.

ack.., I have voted for calling it a generic parent id in the past.
>
> Can you give me a bit (a day) to clean-up that patch and submit an
> alternative proposal to these?
>
>>   	__IFLA_MAX
>>   };
>>   
>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> index 4d99324..bd36a07 100644
>> --- a/ip/ipaddress.c
>> +++ b/ip/ipaddress.c
>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>   				      b1, sizeof(b1)));
>>   	}
>>   
>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> +		SPRINT_BUF(b1);
>> +		fprintf(fp, "switchid %s ",
>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      b1, sizeof(b1)));
>> +	}
>> +
>>   	if (tb[IFLA_OPERSTATE])
>>   		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>>   
>> -- 
>> 1.9.3
>>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Dec. 4, 2014, 2:33 p.m. UTC | #4
Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
>On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  include/linux/if_link.h | 1 +
>>  ip/ipaddress.c          | 8 ++++++++
>>  2 files changed, 9 insertions(+)
>> 
>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> index 4732063..a6e2594 100644
>> --- a/include/linux/if_link.h
>> +++ b/include/linux/if_link.h
>> @@ -145,6 +145,7 @@ enum {
>>  	IFLA_CARRIER,
>>  	IFLA_PHYS_PORT_ID,
>>  	IFLA_CARRIER_CHANGES,
>> +	IFLA_PHYS_SWITCH_ID,
>
>Serious question for Stephen et al, once we take this to iproute2 are we
>going to be able to change the name but the string diplayed if needed?
>
>I had a patch that called this IFLA_PARENT_ID that I was using with the
>older github tree used by Jiri and Scott before these were in net-next.
>I wanted to submit that as a change to what became
>82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
>be accepted and the dust settled.
>
>I like the parent/child/sibling nomenclature better for 4 reasons:
>
>- Most did not seem to like the term 'offload' since that term would be
>  confusing with, GRO, GSO, etc.
>- A *significant* use case for many of the high-end ASICs in datacenters
>  is routing.
>- switchid does not make sense in the OVS/flow case because it is all
>  about flows, not switches or routers, and parent made sense there.

well ovs is all about flows and has the "switch" word in the name. I
believe that people are talking about "switches" in case of these "flow
devices" as well. I see nothing wrong in that. I think that "switch"
became generic name for "packet forwarding machines".

"parent" is very generic and may mean 100 things...


>- I wanted to combine this for use with SR-IOV use case, so one can more
>  easily map PF->VF using this.

Ugh, please don't mix this up with pf, vf. That is completely different
thing.

pf vf mapping is done in sysfs. In netlink, physportid is used for that
purpose. We can expose this phys port id for pf as well (as a different
attr) and we are done.

>
>Can you give me a bit (a day) to clean-up that patch and submit an
>alternative proposal to these?
>
>>  	__IFLA_MAX
>>  };
>>  
>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> index 4d99324..bd36a07 100644
>> --- a/ip/ipaddress.c
>> +++ b/ip/ipaddress.c
>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>  				      b1, sizeof(b1)));
>>  	}
>>  
>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> +		SPRINT_BUF(b1);
>> +		fprintf(fp, "switchid %s ",
>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      b1, sizeof(b1)));
>> +	}
>> +
>>  	if (tb[IFLA_OPERSTATE])
>>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>>  
>> -- 
>> 1.9.3
>> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Gospodarek Dec. 4, 2014, 2:57 p.m. UTC | #5
On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> >> ---
> >>  include/linux/if_link.h | 1 +
> >>  ip/ipaddress.c          | 8 ++++++++
> >>  2 files changed, 9 insertions(+)
> >> 
> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> >> index 4732063..a6e2594 100644
> >> --- a/include/linux/if_link.h
> >> +++ b/include/linux/if_link.h
> >> @@ -145,6 +145,7 @@ enum {
> >>  	IFLA_CARRIER,
> >>  	IFLA_PHYS_PORT_ID,
> >>  	IFLA_CARRIER_CHANGES,
> >> +	IFLA_PHYS_SWITCH_ID,
> >
> >Serious question for Stephen et al, once we take this to iproute2 are we
> >going to be able to change the name but the string diplayed if needed?
> >
> >I had a patch that called this IFLA_PARENT_ID that I was using with the
> >older github tree used by Jiri and Scott before these were in net-next.
> >I wanted to submit that as a change to what became
> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
> >be accepted and the dust settled.
> >
> >I like the parent/child/sibling nomenclature better for 4 reasons:
> >
> >- Most did not seem to like the term 'offload' since that term would be
> >  confusing with, GRO, GSO, etc.
> >- A *significant* use case for many of the high-end ASICs in datacenters
> >  is routing.
> >- switchid does not make sense in the OVS/flow case because it is all
> >  about flows, not switches or routers, and parent made sense there.
> 
> well ovs is all about flows and has the "switch" word in the name. I
> believe that people are talking about "switches" in case of these "flow
> devices" as well. I see nothing wrong in that. I think that "switch"
> became generic name for "packet forwarding machines".

Just because one chose to use it that way does not mean I agree with it
and that we should copy their bad decision.  :-)

> "parent" is very generic and may mean 100 things...

But in this case, it means 1 thing.  The netdev you are using is
connected to another device that controls it rather than being just a
NIC.

> 
> 
> >- I wanted to combine this for use with SR-IOV use case, so one can more
> >  easily map PF->VF using this.
> 
> Ugh, please don't mix this up with pf, vf. That is completely different
> thing.
> 
> pf vf mapping is done in sysfs. In netlink, physportid is used for that
> purpose. We can expose this phys port id for pf as well (as a different
> attr) and we are done.

I know that attribute is there, but I find it more valuable for
solutions like nPAR than for PF/VF use-case.  Parent/child relationship
makes more sense to me since for forwarding will be controlled by the
embedded switch on those devices.  (Notice I specifically chose not to
use 'master' since that is already overloaded by bridge, bonding,
teaming, etc.)

> 
> >
> >Can you give me a bit (a day) to clean-up that patch and submit an
> >alternative proposal to these?
> >
> >>  	__IFLA_MAX
> >>  };
> >>  
> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> >> index 4d99324..bd36a07 100644
> >> --- a/ip/ipaddress.c
> >> +++ b/ip/ipaddress.c
> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
> >>  				      b1, sizeof(b1)));
> >>  	}
> >>  
> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
> >> +		SPRINT_BUF(b1);
> >> +		fprintf(fp, "switchid %s ",
> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
> >> +				      b1, sizeof(b1)));
> >> +	}
> >> +
> >>  	if (tb[IFLA_OPERSTATE])
> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
> >>  
> >> -- 
> >> 1.9.3
> >> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Dec. 4, 2014, 3:12 p.m. UTC | #6
Thu, Dec 04, 2014 at 03:57:43PM CET, gospo@cumulusnetworks.com wrote:
>On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
>> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
>> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> >> ---
>> >>  include/linux/if_link.h | 1 +
>> >>  ip/ipaddress.c          | 8 ++++++++
>> >>  2 files changed, 9 insertions(+)
>> >> 
>> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> >> index 4732063..a6e2594 100644
>> >> --- a/include/linux/if_link.h
>> >> +++ b/include/linux/if_link.h
>> >> @@ -145,6 +145,7 @@ enum {
>> >>  	IFLA_CARRIER,
>> >>  	IFLA_PHYS_PORT_ID,
>> >>  	IFLA_CARRIER_CHANGES,
>> >> +	IFLA_PHYS_SWITCH_ID,
>> >
>> >Serious question for Stephen et al, once we take this to iproute2 are we
>> >going to be able to change the name but the string diplayed if needed?
>> >
>> >I had a patch that called this IFLA_PARENT_ID that I was using with the
>> >older github tree used by Jiri and Scott before these were in net-next.
>> >I wanted to submit that as a change to what became
>> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
>> >be accepted and the dust settled.
>> >
>> >I like the parent/child/sibling nomenclature better for 4 reasons:
>> >
>> >- Most did not seem to like the term 'offload' since that term would be
>> >  confusing with, GRO, GSO, etc.
>> >- A *significant* use case for many of the high-end ASICs in datacenters
>> >  is routing.
>> >- switchid does not make sense in the OVS/flow case because it is all
>> >  about flows, not switches or routers, and parent made sense there.
>> 
>> well ovs is all about flows and has the "switch" word in the name. I
>> believe that people are talking about "switches" in case of these "flow
>> devices" as well. I see nothing wrong in that. I think that "switch"
>> became generic name for "packet forwarding machines".
>
>Just because one chose to use it that way does not mean I agree with it
>and that we should copy their bad decision.  :-)
>
>> "parent" is very generic and may mean 100 things...
>
>But in this case, it means 1 thing.  The netdev you are using is
>connected to another device that controls it rather than being just a
>NIC.

when we discussed linkage now called upper/lower devices, there was a
proposal to use "parent" as well. And then it ment only 1 thing as well.
Why for example pf is not parent of vf? Using "parent" word here and
assume that it just makes sense for this case is IMHO wrong. Too
generic. "switch_parent" - that I am okay with.


>
>> 
>> 
>> >- I wanted to combine this for use with SR-IOV use case, so one can more
>> >  easily map PF->VF using this.
>> 
>> Ugh, please don't mix this up with pf, vf. That is completely different
>> thing.
>> 
>> pf vf mapping is done in sysfs. In netlink, physportid is used for that
>> purpose. We can expose this phys port id for pf as well (as a different
>> attr) and we are done.
>
>I know that attribute is there, but I find it more valuable for
>solutions like nPAR than for PF/VF use-case.

Why it is not good enough for pf/vf? There are SR-IOV drivers using
this:
$ git grep ndo_get_phys_port_id
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:       .ndo_get_phys_port_id   = bnx2x_get_phys_port_id,
drivers/net/ethernet/intel/i40e/i40e_main.c:    .ndo_get_phys_port_id   = i40e_get_phys_port_id,
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_get_phys_port_id   = mlx4_en_get_phys_port_id,
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_get_phys_port_id   = mlx4_en_get_phys_port_id,
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c:       .ndo_get_phys_port_id   = qlcnic_get_phys_port_id,


>Parent/child relationship
>makes more sense to me since for forwarding will be controlled by the
>embedded switch on those devices.  (Notice I specifically chose not to
>use 'master' since that is already overloaded by bridge, bonding,
>teaming, etc.)

But is that embedded switch the same thing as PF? I doubt that. Let the
aspect of PF/VF aside and look at it as on ordinary switch. The eswitch
driver needs te be implemented and add netdevs for internal ports (they
do not exist now).

You have to look at PF/VF and eSwitch as on separate things.

>
>> 
>> >
>> >Can you give me a bit (a day) to clean-up that patch and submit an
>> >alternative proposal to these?
>> >
>> >>  	__IFLA_MAX
>> >>  };
>> >>  
>> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> >> index 4d99324..bd36a07 100644
>> >> --- a/ip/ipaddress.c
>> >> +++ b/ip/ipaddress.c
>> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>> >>  				      b1, sizeof(b1)));
>> >>  	}
>> >>  
>> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> >> +		SPRINT_BUF(b1);
>> >> +		fprintf(fp, "switchid %s ",
>> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      b1, sizeof(b1)));
>> >> +	}
>> >> +
>> >>  	if (tb[IFLA_OPERSTATE])
>> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>> >>  
>> >> -- 
>> >> 1.9.3
>> >> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Dec. 4, 2014, 3:15 p.m. UTC | #7
Thu, Dec 04, 2014 at 03:57:43PM CET, gospo@cumulusnetworks.com wrote:
>On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
>> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
>> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> >> ---
>> >>  include/linux/if_link.h | 1 +
>> >>  ip/ipaddress.c          | 8 ++++++++
>> >>  2 files changed, 9 insertions(+)
>> >> 
>> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> >> index 4732063..a6e2594 100644
>> >> --- a/include/linux/if_link.h
>> >> +++ b/include/linux/if_link.h
>> >> @@ -145,6 +145,7 @@ enum {
>> >>  	IFLA_CARRIER,
>> >>  	IFLA_PHYS_PORT_ID,
>> >>  	IFLA_CARRIER_CHANGES,
>> >> +	IFLA_PHYS_SWITCH_ID,
>> >
>> >Serious question for Stephen et al, once we take this to iproute2 are we
>> >going to be able to change the name but the string diplayed if needed?
>> >
>> >I had a patch that called this IFLA_PARENT_ID that I was using with the
>> >older github tree used by Jiri and Scott before these were in net-next.
>> >I wanted to submit that as a change to what became
>> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
>> >be accepted and the dust settled.
>> >
>> >I like the parent/child/sibling nomenclature better for 4 reasons:
>> >
>> >- Most did not seem to like the term 'offload' since that term would be
>> >  confusing with, GRO, GSO, etc.
>> >- A *significant* use case for many of the high-end ASICs in datacenters
>> >  is routing.
>> >- switchid does not make sense in the OVS/flow case because it is all
>> >  about flows, not switches or routers, and parent made sense there.
>> 
>> well ovs is all about flows and has the "switch" word in the name. I
>> believe that people are talking about "switches" in case of these "flow
>> devices" as well. I see nothing wrong in that. I think that "switch"
>> became generic name for "packet forwarding machines".
>
>Just because one chose to use it that way does not mean I agree with it
>and that we should copy their bad decision.  :-)
>
>> "parent" is very generic and may mean 100 things...
>
>But in this case, it means 1 thing.  The netdev you are using is
>connected to another device that controls it rather than being just a
>NIC.
>
>> 
>> 
>> >- I wanted to combine this for use with SR-IOV use case, so one can more
>> >  easily map PF->VF using this.
>> 
>> Ugh, please don't mix this up with pf, vf. That is completely different
>> thing.
>> 
>> pf vf mapping is done in sysfs. In netlink, physportid is used for that
>> purpose. We can expose this phys port id for pf as well (as a different
>> attr) and we are done.
>
>I know that attribute is there, but I find it more valuable for
>solutions like nPAR than for PF/VF use-case.  Parent/child relationship
>makes more sense to me since for forwarding will be controlled by the
>embedded switch on those devices.  (Notice I specifically chose not to

see, "embedded switch", not "embedded parent" or "embedded offload". And
yes, the "embedded switch" also may (and some of them do today) work
with flows.


>use 'master' since that is already overloaded by bridge, bonding,
>teaming, etc.)
>
>> 
>> >
>> >Can you give me a bit (a day) to clean-up that patch and submit an
>> >alternative proposal to these?
>> >
>> >>  	__IFLA_MAX
>> >>  };
>> >>  
>> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> >> index 4d99324..bd36a07 100644
>> >> --- a/ip/ipaddress.c
>> >> +++ b/ip/ipaddress.c
>> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>> >>  				      b1, sizeof(b1)));
>> >>  	}
>> >>  
>> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> >> +		SPRINT_BUF(b1);
>> >> +		fprintf(fp, "switchid %s ",
>> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      b1, sizeof(b1)));
>> >> +	}
>> >> +
>> >>  	if (tb[IFLA_OPERSTATE])
>> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>> >>  
>> >> -- 
>> >> 1.9.3
>> >> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Gospodarek Dec. 4, 2014, 3:28 p.m. UTC | #8
On Thu, Dec 04, 2014 at 04:15:12PM +0100, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 03:57:43PM CET, gospo@cumulusnetworks.com wrote:
> >On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
> >> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
> >> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
> >> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> >> >> ---
> >> >>  include/linux/if_link.h | 1 +
> >> >>  ip/ipaddress.c          | 8 ++++++++
> >> >>  2 files changed, 9 insertions(+)
> >> >> 
> >> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> >> >> index 4732063..a6e2594 100644
> >> >> --- a/include/linux/if_link.h
> >> >> +++ b/include/linux/if_link.h
> >> >> @@ -145,6 +145,7 @@ enum {
> >> >>  	IFLA_CARRIER,
> >> >>  	IFLA_PHYS_PORT_ID,
> >> >>  	IFLA_CARRIER_CHANGES,
> >> >> +	IFLA_PHYS_SWITCH_ID,
> >> >
> >> >Serious question for Stephen et al, once we take this to iproute2 are we
> >> >going to be able to change the name but the string diplayed if needed?
> >> >
> >> >I had a patch that called this IFLA_PARENT_ID that I was using with the
> >> >older github tree used by Jiri and Scott before these were in net-next.
> >> >I wanted to submit that as a change to what became
> >> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
> >> >be accepted and the dust settled.
> >> >
> >> >I like the parent/child/sibling nomenclature better for 4 reasons:
> >> >
> >> >- Most did not seem to like the term 'offload' since that term would be
> >> >  confusing with, GRO, GSO, etc.
> >> >- A *significant* use case for many of the high-end ASICs in datacenters
> >> >  is routing.
> >> >- switchid does not make sense in the OVS/flow case because it is all
> >> >  about flows, not switches or routers, and parent made sense there.
> >> 
> >> well ovs is all about flows and has the "switch" word in the name. I
> >> believe that people are talking about "switches" in case of these "flow
> >> devices" as well. I see nothing wrong in that. I think that "switch"
> >> became generic name for "packet forwarding machines".
> >
> >Just because one chose to use it that way does not mean I agree with it
> >and that we should copy their bad decision.  :-)
> >
> >> "parent" is very generic and may mean 100 things...
> >
> >But in this case, it means 1 thing.  The netdev you are using is
> >connected to another device that controls it rather than being just a
> >NIC.
> >
> >> 
> >> 
> >> >- I wanted to combine this for use with SR-IOV use case, so one can more
> >> >  easily map PF->VF using this.
> >> 
> >> Ugh, please don't mix this up with pf, vf. That is completely different
> >> thing.
> >> 
> >> pf vf mapping is done in sysfs. In netlink, physportid is used for that
> >> purpose. We can expose this phys port id for pf as well (as a different
> >> attr) and we are done.
> >
> >I know that attribute is there, but I find it more valuable for
> >solutions like nPAR than for PF/VF use-case.  Parent/child relationship
> >makes more sense to me since for forwarding will be controlled by the
> >embedded switch on those devices.  (Notice I specifically chose not to
> 
> see, "embedded switch", not "embedded parent" or "embedded offload". And
> yes, the "embedded switch" also may (and some of them do today) work
> with flows.

It's fine to exclude the PF/VF part for now since you seem disagree with
that as a valid reason. The other 3 reasons are in my mind, so that was
why I asked to Stephen would mind waiting for me to post my set.  I'm
not asking for you to do anything here but to be patient and let me get
the set out.

> 
> 
> >use 'master' since that is already overloaded by bridge, bonding,
> >teaming, etc.)
> >
> >> 
> >> >
> >> >Can you give me a bit (a day) to clean-up that patch and submit an
> >> >alternative proposal to these?
> >> >
> >> >>  	__IFLA_MAX
> >> >>  };
> >> >>  
> >> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> >> >> index 4d99324..bd36a07 100644
> >> >> --- a/ip/ipaddress.c
> >> >> +++ b/ip/ipaddress.c
> >> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
> >> >>  				      b1, sizeof(b1)));
> >> >>  	}
> >> >>  
> >> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
> >> >> +		SPRINT_BUF(b1);
> >> >> +		fprintf(fp, "switchid %s ",
> >> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
> >> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
> >> >> +				      b1, sizeof(b1)));
> >> >> +	}
> >> >> +
> >> >>  	if (tb[IFLA_OPERSTATE])
> >> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
> >> >>  
> >> >> -- 
> >> >> 1.9.3
> >> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Dec. 4, 2014, 3:37 p.m. UTC | #9
On 12/04/14 at 09:57am, Andy Gospodarek wrote:
> I know that attribute is there, but I find it more valuable for
> solutions like nPAR than for PF/VF use-case.  Parent/child relationship
> makes more sense to me since for forwarding will be controlled by the
> embedded switch on those devices.  (Notice I specifically chose not to
> use 'master' since that is already overloaded by bridge, bonding,
> teaming, etc.)

So that would make it link, master, and parent. Not sure we can make
it any more confusing ;-)

TBH, I'm really struggling on how to best name this. Trying to reach
for 100% correctness leads to generic, amgbigous names. 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Dec. 4, 2014, 4:15 p.m. UTC | #10
Jiri Pirko <jiri@resnulli.us> writes:

Would someone please explain to me what a switch id is?

I looked in the kernel source, and I looked here and while I know
switches I don't have a clue what a switch id is.

My primary concern at this point is that you have introduced a global
identifier that is isn't a hardware property (it certainly does not look
like a mac address) and that is unique across network namespaces and
thus breaks checkpoint/restart (aka CRIU).

Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
mean we can't have a purely software implementation of this interface?
Given that we will want a software implementation at some point
including PHYS in the name seems completely wrong.

> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/linux/if_link.h | 1 +
>  ip/ipaddress.c          | 8 ++++++++
>  2 files changed, 9 insertions(+)
>
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 4732063..a6e2594 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
>  	IFLA_CARRIER,
>  	IFLA_PHYS_PORT_ID,
>  	IFLA_CARRIER_CHANGES,
> +	IFLA_PHYS_SWITCH_ID,
>  	__IFLA_MAX
>  };
>  
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index 4d99324..bd36a07 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>  				      b1, sizeof(b1)));
>  	}
>  
> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
> +		SPRINT_BUF(b1);
> +		fprintf(fp, "switchid %s ",
> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
> +				      b1, sizeof(b1)));
> +	}
> +
>  	if (tb[IFLA_OPERSTATE])
>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Dec. 4, 2014, 4:30 p.m. UTC | #11
Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>Would someone please explain to me what a switch id is?
>
>I looked in the kernel source, and I looked here and while I know
>switches I don't have a clue what a switch id is.
>
>My primary concern at this point is that you have introduced a global
>identifier that is isn't a hardware property (it certainly does not look
>like a mac address) and that is unique across network namespaces and
>thus breaks checkpoint/restart (aka CRIU).

IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
generated by the driver and ensures that there is the same switch id for
all ports belonging to the same switch chip/asic. It is up to the driver
how to implement the id. I would like to point you to driver
implementing ndo_get_phys_port_id

>
>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>mean we can't have a purely software implementation of this interface?
>Given that we will want a software implementation at some point
>including PHYS in the name seems completely wrong.

We can remove the "PHYS", no problem. I do not understand what you say
about "software implementation". The point is to allow hw switch/ish
chips to be supported.


>
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  include/linux/if_link.h | 1 +
>>  ip/ipaddress.c          | 8 ++++++++
>>  2 files changed, 9 insertions(+)
>>
>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> index 4732063..a6e2594 100644
>> --- a/include/linux/if_link.h
>> +++ b/include/linux/if_link.h
>> @@ -145,6 +145,7 @@ enum {
>>  	IFLA_CARRIER,
>>  	IFLA_PHYS_PORT_ID,
>>  	IFLA_CARRIER_CHANGES,
>> +	IFLA_PHYS_SWITCH_ID,
>>  	__IFLA_MAX
>>  };
>>  
>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> index 4d99324..bd36a07 100644
>> --- a/ip/ipaddress.c
>> +++ b/ip/ipaddress.c
>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>  				      b1, sizeof(b1)));
>>  	}
>>  
>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> +		SPRINT_BUF(b1);
>> +		fprintf(fp, "switchid %s ",
>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      b1, sizeof(b1)));
>> +	}
>> +
>>  	if (tb[IFLA_OPERSTATE])
>>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Dec. 4, 2014, 5:52 p.m. UTC | #12
Jiri Pirko <jiri@resnulli.us> writes:

> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>Jiri Pirko <jiri@resnulli.us> writes:
>>
>>Would someone please explain to me what a switch id is?
>>
>>I looked in the kernel source, and I looked here and while I know
>>switches I don't have a clue what a switch id is.
>>
>>My primary concern at this point is that you have introduced a global
>>identifier that is isn't a hardware property (it certainly does not look
>>like a mac address) and that is unique across network namespaces and
>>thus breaks checkpoint/restart (aka CRIU).
>
> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
> generated by the driver and ensures that there is the same switch id for
> all ports belonging to the same switch chip/asic. It is up to the driver
> how to implement the id. I would like to point you to driver
> implementing ndo_get_phys_port_id

Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
guid in the case of infiniband.  Which really makes me wonder why we
didn't use the same abstractions in the code for address types that we
do for hardware addresses.

Using mac address or other hardware addresses that are used for layer 2
addressing makes sense to me.  There is a long tradition of that and as
I recall protocols like STP actually requiring having a different mac
address per port.

When I asked the question I thought the switch id was going to be
something like the ifindex, the software index of a network device.


Finally having tracked down the rocker implementation of 
rocker_port_switch_parent_id_get I see it you are reading some 64bit
hardware register.

Which leads me to ask what are the semantics of switch_id?

Is the switch id an identifier with a prefix from IEEE and assigned by
the manufacture so that it is guaranteed to the tolerances of the
manufacturing process to be globally unique?

Is the switch id a random number that is statistically likely to be
globally unique because you have enough bits?   As I recall you need
at least 128 bits to have a reasonable chance of a random number
avoiding the birthday paradox.

Do we need some kind of manufacturer id to tell one switch id from
another?

Is the switch id persistent across reboots?

>>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>mean we can't have a purely software implementation of this interface?
>>Given that we will want a software implementation at some point
>>including PHYS in the name seems completely wrong.
>
> We can remove the "PHYS", no problem. I do not understand what you say
> about "software implementation". The point is to allow hw switch/ish
> chips to be supported.

If we are talking about something typically stored in a eeprom like a
mac address phys seems appropriate.

Still having a definition of this switch id clean clear enough that
net/bridge and drivers/net/macvlan can implement it seems important.

Even more important is having a definition of switch id clear enough
that userspace can use the switch id to do something useful.

Right now switch id looks like one of those weird one manufacturer
properties that is fine to expose as a driver specific property
but I don't yet see it being a generic property I that can be used
usefully in userspace.

So can we please get some clear semantics or failing that can we please
not expose this to userspace as generic property.

Thanks,
Eric



>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>> ---
>>>  include/linux/if_link.h | 1 +
>>>  ip/ipaddress.c          | 8 ++++++++
>>>  2 files changed, 9 insertions(+)
>>>
>>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>>> index 4732063..a6e2594 100644
>>> --- a/include/linux/if_link.h
>>> +++ b/include/linux/if_link.h
>>> @@ -145,6 +145,7 @@ enum {
>>>  	IFLA_CARRIER,
>>>  	IFLA_PHYS_PORT_ID,
>>>  	IFLA_CARRIER_CHANGES,
>>> +	IFLA_PHYS_SWITCH_ID,
>>>  	__IFLA_MAX
>>>  };
>>>  
>>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>>> index 4d99324..bd36a07 100644
>>> --- a/ip/ipaddress.c
>>> +++ b/ip/ipaddress.c
>>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>>  				      b1, sizeof(b1)));
>>>  	}
>>>  
>>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>>> +		SPRINT_BUF(b1);
>>> +		fprintf(fp, "switchid %s ",
>>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>>> +				      b1, sizeof(b1)));
>>> +	}
>>> +
>>>  	if (tb[IFLA_OPERSTATE])
>>>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roopa Prabhu Dec. 4, 2014, 5:59 p.m. UTC | #13
On 12/4/14, 9:52 AM, Eric W. Biederman wrote:
> Jiri Pirko <jiri@resnulli.us> writes:
>
>> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>> Jiri Pirko <jiri@resnulli.us> writes:
>>>
>>> Would someone please explain to me what a switch id is?
>>>
>>> I looked in the kernel source, and I looked here and while I know
>>> switches I don't have a clue what a switch id is.
>>>
>>> My primary concern at this point is that you have introduced a global
>>> identifier that is isn't a hardware property (it certainly does not look
>>> like a mac address) and that is unique across network namespaces and
>>> thus breaks checkpoint/restart (aka CRIU).
>> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
>> generated by the driver and ensures that there is the same switch id for
>> all ports belonging to the same switch chip/asic. It is up to the driver
>> how to implement the id. I would like to point you to driver
>> implementing ndo_get_phys_port_id
> Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
> guid in the case of infiniband.  Which really makes me wonder why we
> didn't use the same abstractions in the code for address types that we
> do for hardware addresses.
>
> Using mac address or other hardware addresses that are used for layer 2
> addressing makes sense to me.  There is a long tradition of that and as
> I recall protocols like STP actually requiring having a different mac
> address per port.
>
> When I asked the question I thought the switch id was going to be
> something like the ifindex, the software index of a network device.
>
>
> Finally having tracked down the rocker implementation of
> rocker_port_switch_parent_id_get I see it you are reading some 64bit
> hardware register.
>
> Which leads me to ask what are the semantics of switch_id?
>
> Is the switch id an identifier with a prefix from IEEE and assigned by
> the manufacture so that it is guaranteed to the tolerances of the
> manufacturing process to be globally unique?
>
> Is the switch id a random number that is statistically likely to be
> globally unique because you have enough bits?   As I recall you need
> at least 128 bits to have a reasonable chance of a random number
> avoiding the birthday paradox.
>
> Do we need some kind of manufacturer id to tell one switch id from
> another?
>
> Is the switch id persistent across reboots?
>
>>> Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>> mean we can't have a purely software implementation of this interface?
>>> Given that we will want a software implementation at some point
>>> including PHYS in the name seems completely wrong.
>> We can remove the "PHYS", no problem. I do not understand what you say
>> about "software implementation". The point is to allow hw switch/ish
>> chips to be supported.
> If we are talking about something typically stored in a eeprom like a
> mac address phys seems appropriate.
>
> Still having a definition of this switch id clean clear enough that
> net/bridge and drivers/net/macvlan can implement it seems important.
>
> Even more important is having a definition of switch id clear enough
> that userspace can use the switch id to do something useful.
>
> Right now switch id looks like one of those weird one manufacturer
> properties that is fine to expose as a driver specific property
> but I don't yet see it being a generic property I that can be used
> usefully in userspace.
>
> So can we please get some clear semantics or failing that can we please
> not expose this to userspace as generic property.

Agree..100%. This was my original concern as well and i have raised it 
earlier.
  Don't expose it to userspace if the semantics are still not clear.

Thanks.

>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> ---
>>>>   include/linux/if_link.h | 1 +
>>>>   ip/ipaddress.c          | 8 ++++++++
>>>>   2 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>>>> index 4732063..a6e2594 100644
>>>> --- a/include/linux/if_link.h
>>>> +++ b/include/linux/if_link.h
>>>> @@ -145,6 +145,7 @@ enum {
>>>>   	IFLA_CARRIER,
>>>>   	IFLA_PHYS_PORT_ID,
>>>>   	IFLA_CARRIER_CHANGES,
>>>> +	IFLA_PHYS_SWITCH_ID,
>>>>   	__IFLA_MAX
>>>>   };
>>>>   
>>>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>>>> index 4d99324..bd36a07 100644
>>>> --- a/ip/ipaddress.c
>>>> +++ b/ip/ipaddress.c
>>>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>>>   				      b1, sizeof(b1)));
>>>>   	}
>>>>   
>>>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>>>> +		SPRINT_BUF(b1);
>>>> +		fprintf(fp, "switchid %s ",
>>>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      b1, sizeof(b1)));
>>>> +	}
>>>> +
>>>>   	if (tb[IFLA_OPERSTATE])
>>>>   		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Dec. 4, 2014, 6:24 p.m. UTC | #14
Thu, Dec 04, 2014 at 06:52:49PM CET, ebiederm@xmission.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>>Jiri Pirko <jiri@resnulli.us> writes:
>>>
>>>Would someone please explain to me what a switch id is?
>>>
>>>I looked in the kernel source, and I looked here and while I know
>>>switches I don't have a clue what a switch id is.
>>>
>>>My primary concern at this point is that you have introduced a global
>>>identifier that is isn't a hardware property (it certainly does not look
>>>like a mac address) and that is unique across network namespaces and
>>>thus breaks checkpoint/restart (aka CRIU).
>>
>> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
>> generated by the driver and ensures that there is the same switch id for
>> all ports belonging to the same switch chip/asic. It is up to the driver
>> how to implement the id. I would like to point you to driver
>> implementing ndo_get_phys_port_id
>
>Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
>guid in the case of infiniband.  Which really makes me wonder why we
>didn't use the same abstractions in the code for address types that we
>do for hardware addresses.
>
>Using mac address or other hardware addresses that are used for layer 2
>addressing makes sense to me.  There is a long tradition of that and as
>I recall protocols like STP actually requiring having a different mac
>address per port.
>
>When I asked the question I thought the switch id was going to be
>something like the ifindex, the software index of a network device.
>
>
>Finally having tracked down the rocker implementation of 
>rocker_port_switch_parent_id_get I see it you are reading some 64bit
>hardware register.
>
>Which leads me to ask what are the semantics of switch_id?
>
>Is the switch id an identifier with a prefix from IEEE and assigned by
>the manufacture so that it is guaranteed to the tolerances of the
>manufacturing process to be globally unique?

It is up to the driver what to use. It can use mac addr. This is same as
for phys port id.


>
>Is the switch id a random number that is statistically likely to be
>globally unique because you have enough bits?   As I recall you need
>at least 128 bits to have a reasonable chance of a random number
>avoiding the birthday paradox.
>
>Do we need some kind of manufacturer id to tell one switch id from
>another?
>
>Is the switch id persistent across reboots?

Yes it is (as for phys port id).

>
>>>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>>mean we can't have a purely software implementation of this interface?
>>>Given that we will want a software implementation at some point
>>>including PHYS in the name seems completely wrong.
>>
>> We can remove the "PHYS", no problem. I do not understand what you say
>> about "software implementation". The point is to allow hw switch/ish
>> chips to be supported.
>
>If we are talking about something typically stored in a eeprom like a
>mac address phys seems appropriate.

Yes, we are.

>
>Still having a definition of this switch id clean clear enough that
>net/bridge and drivers/net/macvlan can implement it seems important.

I don't understand why would net/bridge or driver/net/macvlan want to
implement this. The purpose is to group ports/netdevs which are part of
the same hw switch.

>
>Even more important is having a definition of switch id clear enough
>that userspace can use the switch id to do something useful.

Userspace threats this the same it treats phys port id.

if two ports/netdevs has same switch id, they belong under same hw
switch. That's all.


>
>Right now switch id looks like one of those weird one manufacturer
>properties that is fine to expose as a driver specific property
>but I don't yet see it being a generic property I that can be used
>usefully in userspace.
>
>So can we please get some clear semantics or failing that can we please
>not expose this to userspace as generic property.

I thought that the semantics is clean. Looks like I will have to update
Documentation/networking/switchdev.txt adding some more info about this.

>
>Thanks,
>Eric
>
>
>
>>>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>>> ---
>>>>  include/linux/if_link.h | 1 +
>>>>  ip/ipaddress.c          | 8 ++++++++
>>>>  2 files changed, 9 insertions(+)
>>>>
>>>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>>>> index 4732063..a6e2594 100644
>>>> --- a/include/linux/if_link.h
>>>> +++ b/include/linux/if_link.h
>>>> @@ -145,6 +145,7 @@ enum {
>>>>  	IFLA_CARRIER,
>>>>  	IFLA_PHYS_PORT_ID,
>>>>  	IFLA_CARRIER_CHANGES,
>>>> +	IFLA_PHYS_SWITCH_ID,
>>>>  	__IFLA_MAX
>>>>  };
>>>>  
>>>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>>>> index 4d99324..bd36a07 100644
>>>> --- a/ip/ipaddress.c
>>>> +++ b/ip/ipaddress.c
>>>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>>>  				      b1, sizeof(b1)));
>>>>  	}
>>>>  
>>>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>>>> +		SPRINT_BUF(b1);
>>>> +		fprintf(fp, "switchid %s ",
>>>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>>>> +				      b1, sizeof(b1)));
>>>> +	}
>>>> +
>>>>  	if (tb[IFLA_OPERSTATE])
>>>>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Dec. 4, 2014, 6:57 p.m. UTC | #15
Jiri Pirko <jiri@resnulli.us> writes:

> Thu, Dec 04, 2014 at 06:52:49PM CET, ebiederm@xmission.com wrote:
>>Jiri Pirko <jiri@resnulli.us> writes:
>>
>>> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>>>Jiri Pirko <jiri@resnulli.us> writes:
>>>>
>>>>Would someone please explain to me what a switch id is?
>>>>
>>>>I looked in the kernel source, and I looked here and while I know
>>>>switches I don't have a clue what a switch id is.
>>>>
>>>>My primary concern at this point is that you have introduced a global
>>>>identifier that is isn't a hardware property (it certainly does not look
>>>>like a mac address) and that is unique across network namespaces and
>>>>thus breaks checkpoint/restart (aka CRIU).
>>>
>>> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
>>> generated by the driver and ensures that there is the same switch id for
>>> all ports belonging to the same switch chip/asic. It is up to the driver
>>> how to implement the id. I would like to point you to driver
>>> implementing ndo_get_phys_port_id
>>
>>Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
>>guid in the case of infiniband.  Which really makes me wonder why we
>>didn't use the same abstractions in the code for address types that we
>>do for hardware addresses.
>>
>>Using mac address or other hardware addresses that are used for layer 2
>>addressing makes sense to me.  There is a long tradition of that and as
>>I recall protocols like STP actually requiring having a different mac
>>address per port.
>>
>>When I asked the question I thought the switch id was going to be
>>something like the ifindex, the software index of a network device.
>>
>>
>>Finally having tracked down the rocker implementation of 
>>rocker_port_switch_parent_id_get I see it you are reading some 64bit
>>hardware register.
>>
>>Which leads me to ask what are the semantics of switch_id?
>>
>>Is the switch id an identifier with a prefix from IEEE and assigned by
>>the manufacture so that it is guaranteed to the tolerances of the
>>manufacturing process to be globally unique?
>
> It is up to the driver what to use. It can use mac addr. This is same as
> for phys port id.

My reading of the code says the phys port id is the layer 2 hardware
address of the port.  Certainly that is the case for all of the
implementations now and it would be insane for it to be anything else.

>>Is the switch id a random number that is statistically likely to be
>>globally unique because you have enough bits?   As I recall you need
>>at least 128 bits to have a reasonable chance of a random number
>>avoiding the birthday paradox.
>>
>>Do we need some kind of manufacturer id to tell one switch id from
>>another?
>>
>>Is the switch id persistent across reboots?
>
> Yes it is (as for phys port id).

>>>>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>>>mean we can't have a purely software implementation of this interface?
>>>>Given that we will want a software implementation at some point
>>>>including PHYS in the name seems completely wrong.
>>>
>>> We can remove the "PHYS", no problem. I do not understand what you say
>>> about "software implementation". The point is to allow hw switch/ish
>>> chips to be supported.
>>
>>If we are talking about something typically stored in a eeprom like a
>>mac address phys seems appropriate.
>
> Yes, we are.
>
>>
>>Still having a definition of this switch id clean clear enough that
>>net/bridge and drivers/net/macvlan can implement it seems important.
>
> I don't understand why would net/bridge or driver/net/macvlan want to
> implement this. The purpose is to group ports/netdevs which are part of
> the same hw switch.

Certainly all of the macvlan devices are part of the same logical
switch, and are all the same hardware.

>>Even more important is having a definition of switch id clear enough
>>that userspace can use the switch id to do something useful.
>
> Userspace threats this the same it treats phys port id.
>
> if two ports/netdevs has same switch id, they belong under same hw
> switch. That's all.

So this id needs to be globally unique?

How does the rocket driver achieve global uniquness?  Is the rocket
driver using an IEEE assigned prefix based id like the ethernet mac
address?

This is an important detail as ensuring global uniqueness can take a lot
of work so there needs to be a general agreement on how global
uniqueness is achieved.

Saying it is up to the driver how to achieve a globally unique id across
every different switch driver is really insufficient.  If I have two
drivers that each implement an id that is 64bit long and they use two
completely different methods there is a real chance of collision.

So either I need a driver id that I also use in the comparison between
switch ids or how a driver generates a globally unique id need to be
specified.

>>Right now switch id looks like one of those weird one manufacturer
>>properties that is fine to expose as a driver specific property
>>but I don't yet see it being a generic property I that can be used
>>usefully in userspace.
>>
>>So can we please get some clear semantics or failing that can we please
>>not expose this to userspace as generic property.
>
> I thought that the semantics is clean. Looks like I will have to update
> Documentation/networking/switchdev.txt adding some more info about
> this.

The semantics as described so far will not achieve a useful id, which
makes them rubbish.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Dec. 4, 2014, 7:19 p.m. UTC | #16
Thu, Dec 04, 2014 at 07:57:41PM CET, ebiederm@xmission.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>> Thu, Dec 04, 2014 at 06:52:49PM CET, ebiederm@xmission.com wrote:
>>>Jiri Pirko <jiri@resnulli.us> writes:
>>>
>>>> Thu, Dec 04, 2014 at 05:15:04PM CET, ebiederm@xmission.com wrote:
>>>>>Jiri Pirko <jiri@resnulli.us> writes:
>>>>>
>>>>>Would someone please explain to me what a switch id is?
>>>>>
>>>>>I looked in the kernel source, and I looked here and while I know
>>>>>switches I don't have a clue what a switch id is.
>>>>>
>>>>>My primary concern at this point is that you have introduced a global
>>>>>identifier that is isn't a hardware property (it certainly does not look
>>>>>like a mac address) and that is unique across network namespaces and
>>>>>thus breaks checkpoint/restart (aka CRIU).
>>>>
>>>> IFLA_PHYS_SWITCH_ID is very similar to IFLA_PHYS_PORT_ID. It is
>>>> generated by the driver and ensures that there is the same switch id for
>>>> all ports belonging to the same switch chip/asic. It is up to the driver
>>>> how to implement the id. I would like to point you to driver
>>>> implementing ndo_get_phys_port_id
>>>
>>>Looking at ndo_get_phys_port_id it is just the per port mac address.  Or
>>>guid in the case of infiniband.  Which really makes me wonder why we
>>>didn't use the same abstractions in the code for address types that we
>>>do for hardware addresses.
>>>
>>>Using mac address or other hardware addresses that are used for layer 2
>>>addressing makes sense to me.  There is a long tradition of that and as
>>>I recall protocols like STP actually requiring having a different mac
>>>address per port.
>>>
>>>When I asked the question I thought the switch id was going to be
>>>something like the ifindex, the software index of a network device.
>>>
>>>
>>>Finally having tracked down the rocker implementation of 
>>>rocker_port_switch_parent_id_get I see it you are reading some 64bit
>>>hardware register.
>>>
>>>Which leads me to ask what are the semantics of switch_id?
>>>
>>>Is the switch id an identifier with a prefix from IEEE and assigned by
>>>the manufacture so that it is guaranteed to the tolerances of the
>>>manufacturing process to be globally unique?
>>
>> It is up to the driver what to use. It can use mac addr. This is same as
>> for phys port id.
>
>My reading of the code says the phys port id is the layer 2 hardware
>address of the port.  Certainly that is the case for all of the
>implementations now and it would be insane for it to be anything else.
>
>>>Is the switch id a random number that is statistically likely to be
>>>globally unique because you have enough bits?   As I recall you need
>>>at least 128 bits to have a reasonable chance of a random number
>>>avoiding the birthday paradox.
>>>
>>>Do we need some kind of manufacturer id to tell one switch id from
>>>another?
>>>
>>>Is the switch id persistent across reboots?
>>
>> Yes it is (as for phys port id).
>
>>>>>Also what in the world does PHYS mean in IFLA_PHYS_SWITCH_ID?  Does that
>>>>>mean we can't have a purely software implementation of this interface?
>>>>>Given that we will want a software implementation at some point
>>>>>including PHYS in the name seems completely wrong.
>>>>
>>>> We can remove the "PHYS", no problem. I do not understand what you say
>>>> about "software implementation". The point is to allow hw switch/ish
>>>> chips to be supported.
>>>
>>>If we are talking about something typically stored in a eeprom like a
>>>mac address phys seems appropriate.
>>
>> Yes, we are.
>>
>>>
>>>Still having a definition of this switch id clean clear enough that
>>>net/bridge and drivers/net/macvlan can implement it seems important.
>>
>> I don't understand why would net/bridge or driver/net/macvlan want to
>> implement this. The purpose is to group ports/netdevs which are part of
>> the same hw switch.
>
>Certainly all of the macvlan devices are part of the same logical
>switch, and are all the same hardware.
>
>>>Even more important is having a definition of switch id clear enough
>>>that userspace can use the switch id to do something useful.
>>
>> Userspace threats this the same it treats phys port id.
>>
>> if two ports/netdevs has same switch id, they belong under same hw
>> switch. That's all.
>
>So this id needs to be globally unique?

No. It is enough to be unique within a single system. It serves for no
more than to find out 2 ids are same or not, no other info value.

So when the drivers uses sane ids (like mac for example, or in case of
rocker an id which is passed by qemu command line), the chances of
collision are very very close to none (never say never).

>
>How does the rocket driver achieve global uniquness?  Is the rocket
>driver using an IEEE assigned prefix based id like the ethernet mac
>address?
>
>This is an important detail as ensuring global uniqueness can take a lot
>of work so there needs to be a general agreement on how global
>uniqueness is achieved.
>
>Saying it is up to the driver how to achieve a globally unique id across
>every different switch driver is really insufficient.  If I have two
>drivers that each implement an id that is 64bit long and they use two
>completely different methods there is a real chance of collision.
>
>So either I need a driver id that I also use in the comparison between
>switch ids or how a driver generates a globally unique id need to be
>specified.
>
>>>Right now switch id looks like one of those weird one manufacturer
>>>properties that is fine to expose as a driver specific property
>>>but I don't yet see it being a generic property I that can be used
>>>usefully in userspace.
>>>
>>>So can we please get some clear semantics or failing that can we please
>>>not expose this to userspace as generic property.
>>
>> I thought that the semantics is clean. Looks like I will have to update
>> Documentation/networking/switchdev.txt adding some more info about
>> this.
>
>The semantics as described so far will not achieve a useful id, which
>makes them rubbish.
>
>Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Dec. 4, 2014, 7:26 p.m. UTC | #17
Jiri Pirko <jiri@resnulli.us> writes:

>>So this id needs to be globally unique?
>
> No. It is enough to be unique within a single system. It serves for no
> more than to find out 2 ids are same or not, no other info value.
>
> So when the drivers uses sane ids (like mac for example, or in case of
> rocker an id which is passed by qemu command line), the chances of
> collision are very very close to none (never say never).

So the switch id isn't necessarily even as good as a manufacturers
serial number that changes between different models, and anyone who uses
this id must look at the driver to get uniqueness.

Yes what is needed in the comparison to get uniqueness in a single
system badly needs to be documented because that switch id is very much
not enough on it's own.

phys_port_id on the other hand is globally unique.  So please stop
saying that this is anything like phys_port_id.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Dec. 4, 2014, 7:54 p.m. UTC | #18
Thu, Dec 04, 2014 at 08:26:14PM CET, ebiederm@xmission.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>>>So this id needs to be globally unique?
>>
>> No. It is enough to be unique within a single system. It serves for no
>> more than to find out 2 ids are same or not, no other info value.
>>
>> So when the drivers uses sane ids (like mac for example, or in case of
>> rocker an id which is passed by qemu command line), the chances of
>> collision are very very close to none (never say never).
>
>So the switch id isn't necessarily even as good as a manufacturers
>serial number that changes between different models, and anyone who uses
>this id must look at the driver to get uniqueness.
>
>Yes what is needed in the comparison to get uniqueness in a single
>system badly needs to be documented because that switch id is very much
>not enough on it's own.
>
>phys_port_id on the other hand is globally unique.  So please stop
>saying that this is anything like phys_port_id.

I'm sorry. But is is the same. And the purpose is very similar. In case
of phys port id to find out if 2 netdevices share the same physical port.
In case of phys switch id to find out if 2 netdevices share the same
switch parent.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Dec. 4, 2014, 8:06 p.m. UTC | #19
ebiederm@xmission.com (Eric W. Biederman) writes:

> Jiri Pirko <jiri@resnulli.us> writes:
>
>>>So this id needs to be globally unique?
>>
>> No. It is enough to be unique within a single system. It serves for no
>> more than to find out 2 ids are same or not, no other info value.
>>
>> So when the drivers uses sane ids (like mac for example, or in case of
>> rocker an id which is passed by qemu command line), the chances of
>> collision are very very close to none (never say never).

Thinking about what you said a little more.

Two different sources of persistent numbers picking numbers by
completely different algorithms can give you no assurance that you don't
produce conflicts.

The switch id as desisgned can not work.

There are expected to be between 2**36 to 2**40 devices in this world.
Your first switch id is a 64it number.  At the very best by the birthday
pardox predicts there will be a conflict ever 2**32 devices or between
2**4 and 2**8 devices in the world with conflicts.  If the ids are not
randomly distributed (which they won't be) things could easily be much
much worse.

That is just good enough the code could get out there and run for years
before you have the nightmare of having to fix all of userspace.   That
is a nightmare no one needs.

So please remove this broken code, and this broken concept from the
kernel and go back to the drawing board.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Dec. 4, 2014, 8:27 p.m. UTC | #20
Thu, Dec 04, 2014 at 09:06:14PM CET, ebiederm@xmission.com wrote:
>ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Jiri Pirko <jiri@resnulli.us> writes:
>>
>>>>So this id needs to be globally unique?
>>>
>>> No. It is enough to be unique within a single system. It serves for no
>>> more than to find out 2 ids are same or not, no other info value.
>>>
>>> So when the drivers uses sane ids (like mac for example, or in case of
>>> rocker an id which is passed by qemu command line), the chances of
>>> collision are very very close to none (never say never).
>
>Thinking about what you said a little more.
>
>Two different sources of persistent numbers picking numbers by
>completely different algorithms can give you no assurance that you don't
>produce conflicts.
>
>The switch id as desisgned can not work.
>
>There are expected to be between 2**36 to 2**40 devices in this world.
>Your first switch id is a 64it number.  At the very best by the birthday
>pardox predicts there will be a conflict ever 2**32 devices or between
>2**4 and 2**8 devices in the world with conflicts.  If the ids are not
>randomly distributed (which they won't be) things could easily be much
>much worse.
>
>That is just good enough the code could get out there and run for years
>before you have the nightmare of having to fix all of userspace.   That
>is a nightmare no one needs.
>
>So please remove this broken code, and this broken concept from the
>kernel and go back to the drawing board.

In that case the phys port id is broken in the same way. Let's rather
think about how to avoid conflicts for both. Given the fact the
conflicts should be avoided only on a single baremetal, that should be
doable (for (bad) example using driver name mixed with driver created id).

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Dec. 4, 2014, 8:55 p.m. UTC | #21
Jiri Pirko <jiri@resnulli.us> writes:

> Thu, Dec 04, 2014 at 09:06:14PM CET, ebiederm@xmission.com wrote:
>>ebiederm@xmission.com (Eric W. Biederman) writes:
>>
>>> Jiri Pirko <jiri@resnulli.us> writes:
>>>
>>>>>So this id needs to be globally unique?
>>>>
>>>> No. It is enough to be unique within a single system. It serves for no
>>>> more than to find out 2 ids are same or not, no other info value.
>>>>
>>>> So when the drivers uses sane ids (like mac for example, or in case of
>>>> rocker an id which is passed by qemu command line), the chances of
>>>> collision are very very close to none (never say never).
>>
>>Thinking about what you said a little more.
>>
>>Two different sources of persistent numbers picking numbers by
>>completely different algorithms can give you no assurance that you don't
>>produce conflicts.
>>
>>The switch id as desisgned can not work.
>>
>>There are expected to be between 2**36 to 2**40 devices in this world.
>>Your first switch id is a 64it number.  At the very best by the birthday
>>pardox predicts there will be a conflict ever 2**32 devices or between
>>2**4 and 2**8 devices in the world with conflicts.  If the ids are not
>>randomly distributed (which they won't be) things could easily be much
>>much worse.
>>
>>That is just good enough the code could get out there and run for years
>>before you have the nightmare of having to fix all of userspace.   That
>>is a nightmare no one needs.
>>
>>So please remove this broken code, and this broken concept from the
>>kernel and go back to the drawing board.
>
> In that case the phys port id is broken in the same way. Let's rather
> think about how to avoid conflicts for both. Given the fact the
> conflicts should be avoided only on a single baremetal, that should be
> doable (for (bad) example using driver name mixed with driver created
> id).

No.  phys_port_id is not broken in the same way, and phys_port_id does
not have the same set of properties.

phys_port_id's in practice all have an IEEE prefix that identifies the
manufacturer and a manufacture assigned serial number.  Aka a mac
address or a EUID-64.  What the mlx4 ethernet driver is doing retunring
a 64bit EUID-64 I don't know.  If there are problems in the worst
case issues with phys_port_id are fixable by simple driver tweaks,
because fundamentally we are working with globally uniuqe identifiers.
Well globally unique baring manufacturing bugs in eeproms.



I agree with you that the switch id concept can be saved.  But I think
we should fix switch id before we export it to userspace so we don't
have to break userspace later.

My intuition says we want something like ifindex, but I am not at all
certain how switch id is planned to be used.  Given that it is single
box I don't expect you are sending it out over the wire.

*shrug*

Why does switch id need to be persistent?  Why can't switch id be
property like ifindex?

What are the actual requirements.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Dec. 4, 2014, 9:10 p.m. UTC | #22
Thu, Dec 04, 2014 at 09:55:07PM CET, ebiederm@xmission.com wrote:
>Jiri Pirko <jiri@resnulli.us> writes:
>
>> Thu, Dec 04, 2014 at 09:06:14PM CET, ebiederm@xmission.com wrote:
>>>ebiederm@xmission.com (Eric W. Biederman) writes:
>>>
>>>> Jiri Pirko <jiri@resnulli.us> writes:
>>>>
>>>>>>So this id needs to be globally unique?
>>>>>
>>>>> No. It is enough to be unique within a single system. It serves for no
>>>>> more than to find out 2 ids are same or not, no other info value.
>>>>>
>>>>> So when the drivers uses sane ids (like mac for example, or in case of
>>>>> rocker an id which is passed by qemu command line), the chances of
>>>>> collision are very very close to none (never say never).
>>>
>>>Thinking about what you said a little more.
>>>
>>>Two different sources of persistent numbers picking numbers by
>>>completely different algorithms can give you no assurance that you don't
>>>produce conflicts.
>>>
>>>The switch id as desisgned can not work.
>>>
>>>There are expected to be between 2**36 to 2**40 devices in this world.
>>>Your first switch id is a 64it number.  At the very best by the birthday
>>>pardox predicts there will be a conflict ever 2**32 devices or between
>>>2**4 and 2**8 devices in the world with conflicts.  If the ids are not
>>>randomly distributed (which they won't be) things could easily be much
>>>much worse.
>>>
>>>That is just good enough the code could get out there and run for years
>>>before you have the nightmare of having to fix all of userspace.   That
>>>is a nightmare no one needs.
>>>
>>>So please remove this broken code, and this broken concept from the
>>>kernel and go back to the drawing board.
>>
>> In that case the phys port id is broken in the same way. Let's rather
>> think about how to avoid conflicts for both. Given the fact the
>> conflicts should be avoided only on a single baremetal, that should be
>> doable (for (bad) example using driver name mixed with driver created
>> id).
>
>No.  phys_port_id is not broken in the same way, and phys_port_id does
>not have the same set of properties.
>
>phys_port_id's in practice all have an IEEE prefix that identifies the
>manufacturer and a manufacture assigned serial number.  Aka a mac
>address or a EUID-64.  What the mlx4 ethernet driver is doing retunring
>a 64bit EUID-64 I don't know.  If there are problems in the worst
>case issues with phys_port_id are fixable by simple driver tweaks,
>because fundamentally we are working with globally uniuqe identifiers.
>Well globally unique baring manufacturing bugs in eeproms.

Well the fact that phys_post_id's are now implemented mostly by putting
mac into it does not mean that other drivers cannot do it differently.
So once again, phys_port_id and phys_switch_id are the same in this
matter.

>
>
>
>I agree with you that the switch id concept can be saved.  But I think
>we should fix switch id before we export it to userspace so we don't
>have to break userspace later.
>
>My intuition says we want something like ifindex, but I am not at all
>certain how switch id is planned to be used.  Given that it is single
>box I don't expect you are sending it out over the wire.

No, it is not to be send out.

>
>*shrug*
>
>Why does switch id need to be persistent?  Why can't switch id be
>property like ifindex?

Well I can imagine that multiple ports of the same switch chip could be
passed through to the virtual machines (similar to SR-IOV pf/vf).

>
>What are the actual requirements.


They are actually very similar to phys_port_id. Therefore I made that
the same.



>
>Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Dec. 4, 2014, 9:24 p.m. UTC | #23
Jiri Pirko <jiri@resnulli.us> writes:

>>No.  phys_port_id is not broken in the same way, and phys_port_id does
>>not have the same set of properties.
>>
>>phys_port_id's in practice all have an IEEE prefix that identifies the
>>manufacturer and a manufacture assigned serial number.  Aka a mac
>>address or a EUID-64.  What the mlx4 ethernet driver is doing retunring
>>a 64bit EUID-64 I don't know.  If there are problems in the worst
>>case issues with phys_port_id are fixable by simple driver tweaks,
>>because fundamentally we are working with globally uniuqe identifiers.
>>Well globally unique baring manufacturing bugs in eeproms.
>
> Well the fact that phys_post_id's are now implemented mostly by putting
> mac into it does not mean that other drivers cannot do it differently.
> So once again, phys_port_id and phys_switch_id are the same in this
> matter.

No exclusively implemented that way.

And yes other drivers can implement bugs, that doesn't mean those bugs
will be toleraged and won't break userspace.

>>I agree with you that the switch id concept can be saved.  But I think
>>we should fix switch id before we export it to userspace so we don't
>>have to break userspace later.
>>
>>My intuition says we want something like ifindex, but I am not at all
>>certain how switch id is planned to be used.  Given that it is single
>>box I don't expect you are sending it out over the wire.
>
> No, it is not to be send out.
>
>>
>>*shrug*
>>
>>Why does switch id need to be persistent?  Why can't switch id be
>>property like ifindex?
>
> Well I can imagine that multiple ports of the same switch chip could be
> passed through to the virtual machines (similar to SR-IOV pf/vf).

In that case you do need something that is globally unique.  Because
if I migrate your virtual machine onto a different physical machine 
seeing the same switch id and I might make the mistaken assumption
that I am remain on the same switch if the ids are not global.

>>What are the actual requirements.
>
>
> They are actually very similar to phys_port_id. Therefore I made that
> the same.

Then please for rocket use a non-buggy implementation with a globally
unique id.

Given your descriptions of the requirements I can't see how any other
implementation isn't buggy.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Dec. 4, 2014, 10:07 p.m. UTC | #24
On 12/04/14 at 10:10pm, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 09:55:07PM CET, ebiederm@xmission.com wrote:
> >My intuition says we want something like ifindex, but I am not at all
> >certain how switch id is planned to be used.  Given that it is single
> >box I don't expect you are sending it out over the wire.
> 
> No, it is not to be send out.

We discussed this before and stated we would fix this as soon as it's
merged. Let's just define a simple and quick switch id prefix generator
ala ifindex which serves as a unique prefix to all switch ids.

I have time early next week if Jiri doesn't.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Dec. 5, 2014, 9:54 a.m. UTC | #25
From: Eric W. Biederman
> Jiri Pirko <jiri@resnulli.us> writes:
> 
> > Thu, Dec 04, 2014 at 09:06:14PM CET, ebiederm@xmission.com wrote:
> >>ebiederm@xmission.com (Eric W. Biederman) writes:
> >>
> >>> Jiri Pirko <jiri@resnulli.us> writes:
> >>>
> >>>>>So this id needs to be globally unique?
> >>>>
> >>>> No. It is enough to be unique within a single system. It serves for no
> >>>> more than to find out 2 ids are same or not, no other info value.
> >>>>
> >>>> So when the drivers uses sane ids (like mac for example, or in case of
> >>>> rocker an id which is passed by qemu command line), the chances of
> >>>> collision are very very close to none (never say never).
> >>
> >>Thinking about what you said a little more.
> >>
> >>Two different sources of persistent numbers picking numbers by
> >>completely different algorithms can give you no assurance that you don't
> >>produce conflicts.
> >>
> >>The switch id as desisgned can not work.
> >>
> >>There are expected to be between 2**36 to 2**40 devices in this world.
> >>Your first switch id is a 64it number.  At the very best by the birthday
> >>pardox predicts there will be a conflict ever 2**32 devices or between
> >>2**4 and 2**8 devices in the world with conflicts.  If the ids are not
> >>randomly distributed (which they won't be) things could easily be much
> >>much worse.
> >>
> >>That is just good enough the code could get out there and run for years
> >>before you have the nightmare of having to fix all of userspace.   That
> >>is a nightmare no one needs.
> >>
> >>So please remove this broken code, and this broken concept from the
> >>kernel and go back to the drawing board.
> >
> > In that case the phys port id is broken in the same way. Let's rather
> > think about how to avoid conflicts for both. Given the fact the
> > conflicts should be avoided only on a single baremetal, that should be
> > doable (for (bad) example using driver name mixed with driver created
> > id).
> 
> No.  phys_port_id is not broken in the same way, and phys_port_id does
> not have the same set of properties.
> 
> phys_port_id's in practice all have an IEEE prefix that identifies the
> manufacturer and a manufacture assigned serial number.  Aka a mac
> address or a EUID-64.  What the mlx4 ethernet driver is doing retunring
> a 64bit EUID-64 I don't know.  If there are problems in the worst
> case issues with phys_port_id are fixable by simple driver tweaks,
> because fundamentally we are working with globally uniuqe identifiers.
> Well globally unique baring manufacturing bugs in eeproms.

Manufacturers have to generate unique MAC addresses - otherwise people complain.
But can't be assumed to put different 'serial numbers' in other devices.
If you look at USB memory sticks you are likely to find that the serial
number in the (equivalent of the) ATA identify response isn't unique.
So I doubt you can use the value to distinguish between devices.

You also get the situation where ethernet MAC addresses only have to be
unique within a collision domain. Many old sun systems used a single MAC
address - valid because they assumed/required that multiple interfaces
be connected to different networks.
So even MAC addresses aren't per-interface.

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Dec. 8, 2014, 9:56 p.m. UTC | #26
David Laight <David.Laight@ACULAB.COM> writes:

> From: Eric W. Biederman
>> Jiri Pirko <jiri@resnulli.us> writes:
>> 
>> > Thu, Dec 04, 2014 at 09:06:14PM CET, ebiederm@xmission.com wrote:
>> >>ebiederm@xmission.com (Eric W. Biederman) writes:
>> >>
>> >>> Jiri Pirko <jiri@resnulli.us> writes:
>> >>>
>> >>>>>So this id needs to be globally unique?
>> >>>>
>> >>>> No. It is enough to be unique within a single system. It serves for no
>> >>>> more than to find out 2 ids are same or not, no other info value.
>> >>>>
>> >>>> So when the drivers uses sane ids (like mac for example, or in case of
>> >>>> rocker an id which is passed by qemu command line), the chances of
>> >>>> collision are very very close to none (never say never).
>> >>
>> >>Thinking about what you said a little more.
>> >>
>> >>Two different sources of persistent numbers picking numbers by
>> >>completely different algorithms can give you no assurance that you don't
>> >>produce conflicts.
>> >>
>> >>The switch id as desisgned can not work.
>> >>
>> >>There are expected to be between 2**36 to 2**40 devices in this world.
>> >>Your first switch id is a 64it number.  At the very best by the birthday
>> >>pardox predicts there will be a conflict ever 2**32 devices or between
>> >>2**4 and 2**8 devices in the world with conflicts.  If the ids are not
>> >>randomly distributed (which they won't be) things could easily be much
>> >>much worse.
>> >>
>> >>That is just good enough the code could get out there and run for years
>> >>before you have the nightmare of having to fix all of userspace.   That
>> >>is a nightmare no one needs.
>> >>
>> >>So please remove this broken code, and this broken concept from the
>> >>kernel and go back to the drawing board.
>> >
>> > In that case the phys port id is broken in the same way. Let's rather
>> > think about how to avoid conflicts for both. Given the fact the
>> > conflicts should be avoided only on a single baremetal, that should be
>> > doable (for (bad) example using driver name mixed with driver created
>> > id).
>> 
>> No.  phys_port_id is not broken in the same way, and phys_port_id does
>> not have the same set of properties.
>> 
>> phys_port_id's in practice all have an IEEE prefix that identifies the
>> manufacturer and a manufacture assigned serial number.  Aka a mac
>> address or a EUID-64.  What the mlx4 ethernet driver is doing retunring
>> a 64bit EUID-64 I don't know.  If there are problems in the worst
>> case issues with phys_port_id are fixable by simple driver tweaks,
>> because fundamentally we are working with globally uniuqe identifiers.
>> Well globally unique baring manufacturing bugs in eeproms.
>
> Manufacturers have to generate unique MAC addresses - otherwise people complain.
> But can't be assumed to put different 'serial numbers' in other devices.
> If you look at USB memory sticks you are likely to find that the serial
> number in the (equivalent of the) ATA identify response isn't unique.
> So I doubt you can use the value to distinguish between devices.

When I said serial number I was talking about MAC addresses.

> You also get the situation where ethernet MAC addresses only have to be
> unique within a collision domain. Many old sun systems used a single MAC
> address - valid because they assumed/required that multiple interfaces
> be connected to different networks.
> So even MAC addresses aren't per-interface.

The old sun system issue does not apply to switches and switch like
devices.

There are common layer two protocls (Spanning Tree? LACP?) that require
each switch port to have a unique mac address.  So people will complain
and software will break if there is not a mac address per port.  So for
switches and things that strongly resemble switches assuming a unique
mac address per port is a reasonable assumption.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 4732063..a6e2594 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -145,6 +145,7 @@  enum {
 	IFLA_CARRIER,
 	IFLA_PHYS_PORT_ID,
 	IFLA_CARRIER_CHANGES,
+	IFLA_PHYS_SWITCH_ID,
 	__IFLA_MAX
 };
 
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 4d99324..bd36a07 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -589,6 +589,14 @@  int print_linkinfo(const struct sockaddr_nl *who,
 				      b1, sizeof(b1)));
 	}
 
+	if (tb[IFLA_PHYS_SWITCH_ID]) {
+		SPRINT_BUF(b1);
+		fprintf(fp, "switchid %s ",
+			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
+				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
+				      b1, sizeof(b1)));
+	}
+
 	if (tb[IFLA_OPERSTATE])
 		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));