diff mbox

[net-next,v3,04/17] net: introduce generic switch devices support

Message ID 1416911328-10979-5-git-send-email-jiri@resnulli.us
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Pirko Nov. 25, 2014, 10:28 a.m. UTC
The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.

Note that user can use random port netdevice to access the switch.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
---
v2->v3:
-fixed documentation typo pointed out by M. Braun
-changed "sw" string to "switch" to avoid confusion
v1->v2:
-no change
---
 Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
 MAINTAINERS                            |  7 ++++
 include/linux/netdevice.h              | 10 ++++++
 include/net/switchdev.h                | 30 +++++++++++++++++
 net/Kconfig                            |  1 +
 net/Makefile                           |  3 ++
 net/switchdev/Kconfig                  | 13 ++++++++
 net/switchdev/Makefile                 |  5 +++
 net/switchdev/switchdev.c              | 33 +++++++++++++++++++
 9 files changed, 161 insertions(+)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 include/net/switchdev.h
 create mode 100644 net/switchdev/Kconfig
 create mode 100644 net/switchdev/Makefile
 create mode 100644 net/switchdev/switchdev.c

Comments

Andy Gospodarek Nov. 25, 2014, 3:02 p.m. UTC | #1
On Tue, Nov 25, 2014 at 11:28:35AM +0100, Jiri Pirko wrote:
> The goal of this is to provide a possibility to support various switch
> chips. Drivers should implement relevant ndos to do so. Now there is
> only one ndo defined:
> - for getting physical switch id is in place.
> 
> Note that user can use random port netdevice to access the switch.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> Reviewed-by: Thomas Graf <tgraf@suug.ch>

Looks good -- thanks for replacing 'sw' with 'switch'

Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>

> ---
> v2->v3:
> -fixed documentation typo pointed out by M. Braun
> -changed "sw" string to "switch" to avoid confusion
> v1->v2:
> -no change
> ---
>  Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>  MAINTAINERS                            |  7 ++++
>  include/linux/netdevice.h              | 10 ++++++
>  include/net/switchdev.h                | 30 +++++++++++++++++
>  net/Kconfig                            |  1 +
>  net/Makefile                           |  3 ++
>  net/switchdev/Kconfig                  | 13 ++++++++
>  net/switchdev/Makefile                 |  5 +++
>  net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>  9 files changed, 161 insertions(+)
>  create mode 100644 Documentation/networking/switchdev.txt
>  create mode 100644 include/net/switchdev.h
>  create mode 100644 net/switchdev/Kconfig
>  create mode 100644 net/switchdev/Makefile
>  create mode 100644 net/switchdev/switchdev.c
> 
> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> new file mode 100644
> index 0000000..f981a92
> --- /dev/null
> +++ b/Documentation/networking/switchdev.txt
> @@ -0,0 +1,59 @@
> +Switch (and switch-ish) device drivers HOWTO
> +===========================
> +
> +Please note that the word "switch" is here used in very generic meaning.
> +This include devices supporting L2/L3 but also various flow offloading chips,
> +including switches embedded into SR-IOV NICs.
> +
> +Lets describe a topology a bit. Imagine the following example:
> +
> +       +----------------------------+    +---------------+
> +       |     SOME switch chip       |    |      CPU      |
> +       +----------------------------+    +---------------+
> +       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
> +         |     |     |     |     |       +---------------+
> +        PHY   PHY    |     |     |         |  NIC0 NIC1
> +                     |     |     |         |   |    |
> +                     |     |     +- PCI-E -+   |    |
> +                     |     +------- MII -------+    |
> +                     +------------- MII ------------+
> +
> +In this example, there are two independent lines between the switch silicon
> +and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
> +separate from the switch driver. SOME switch chip is by managed by a driver
> +via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
> +connected to some other type of bus.
> +
> +Now, for the previous example show the representation in kernel:
> +
> +       +----------------------------+    +---------------+
> +       |     SOME switch chip       |    |      CPU      |
> +       +----------------------------+    +---------------+
> +       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
> +         |     |     |     |     |       +---------------+
> +        PHY   PHY    |     |     |         |  eth0 eth1
> +                     |     |     |         |   |    |
> +                     |     |     +- PCI-E -+   |    |
> +                     |     +------- MII -------+    |
> +                     +------------- MII ------------+
> +
> +Lets call the example switch driver for SOME switch chip "SOMEswitch". This
> +driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
> +created for each port of a switch. These netdevices are instances
> +of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
> +of the switch chip. eth0 and eth1 are instances of some other existing driver.
> +
> +The only difference of the switch-port netdevice from the ordinary netdevice
> +is that is implements couple more NDOs:
> +
> +  ndo_switch_parent_id_get - This returns the same ID for two port netdevices
> +			     of the same physical switch chip. This is
> +			     mandatory to be implemented by all switch drivers
> +			     and serves the caller for recognition of a port
> +			     netdevice.
> +  ndo_switch_parent_* - Functions that serve for a manipulation of the switch
> +			chip itself (it can be though of as a "parent" of the
> +			port, therefore the name). They are not port-specific.
> +			Caller might use arbitrary port netdevice of the same
> +			switch and it will make no difference.
> +  ndo_switch_port_* - Functions that serve for a port-specific manipulation.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a545d68..05addb6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9058,6 +9058,13 @@ F:	lib/swiotlb.c
>  F:	arch/*/kernel/pci-swiotlb.c
>  F:	include/linux/swiotlb.h
>  
> +SWITCHDEV
> +M:	Jiri Pirko <jiri@resnulli.us>
> +L:	netdev@vger.kernel.org
> +S:	Supported
> +F:	net/switchdev/
> +F:	include/net/switchdev.h
> +
>  SYNOPSYS ARC ARCHITECTURE
>  M:	Vineet Gupta <vgupta@synopsys.com>
>  S:	Supported
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 5b491b3..ce096dc 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1018,6 +1018,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>   *	performing GSO on a packet. The device returns true if it is
>   *	able to GSO the packet, false otherwise. If the return value is
>   *	false the stack will do software GSO.
> + *
> + * int (*ndo_switch_parent_id_get)(struct net_device *dev,
> + *				   struct netdev_phys_item_id *psid);
> + *	Called to get an ID of the switch chip this port is part of.
> + *	If driver implements this, it indicates that it represents a port
> + *	of a switch chip.
>   */
>  struct net_device_ops {
>  	int			(*ndo_init)(struct net_device *dev);
> @@ -1171,6 +1177,10 @@ struct net_device_ops {
>  	int			(*ndo_get_lock_subclass)(struct net_device *dev);
>  	bool			(*ndo_gso_check) (struct sk_buff *skb,
>  						  struct net_device *dev);
> +#ifdef CONFIG_NET_SWITCHDEV
> +	int			(*ndo_switch_parent_id_get)(struct net_device *dev,
> +							    struct netdev_phys_item_id *psid);
> +#endif
>  };
>  
>  /**
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> new file mode 100644
> index 0000000..7a52360
> --- /dev/null
> +++ b/include/net/switchdev.h
> @@ -0,0 +1,30 @@
> +/*
> + * include/net/switchdev.h - Switch device API
> + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +#ifndef _LINUX_SWITCHDEV_H_
> +#define _LINUX_SWITCHDEV_H_
> +
> +#include <linux/netdevice.h>
> +
> +#ifdef CONFIG_NET_SWITCHDEV
> +
> +int netdev_switch_parent_id_get(struct net_device *dev,
> +				struct netdev_phys_item_id *psid);
> +
> +#else
> +
> +static inline int netdev_switch_parent_id_get(struct net_device *dev,
> +					      struct netdev_phys_item_id *psid)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +#endif
> +
> +#endif /* _LINUX_SWITCHDEV_H_ */
> diff --git a/net/Kconfig b/net/Kconfig
> index 99815b5..ff9ffc1 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
>  source "net/netlink/Kconfig"
>  source "net/mpls/Kconfig"
>  source "net/hsr/Kconfig"
> +source "net/switchdev/Kconfig"
>  
>  config RPS
>  	boolean
> diff --git a/net/Makefile b/net/Makefile
> index 7ed1970..95fc694 100644
> --- a/net/Makefile
> +++ b/net/Makefile
> @@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
>  obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
>  obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
>  obj-$(CONFIG_HSR)		+= hsr/
> +ifneq ($(CONFIG_NET_SWITCHDEV),)
> +obj-y				+= switchdev/
> +endif
> diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
> new file mode 100644
> index 0000000..1557545
> --- /dev/null
> +++ b/net/switchdev/Kconfig
> @@ -0,0 +1,13 @@
> +#
> +# Configuration for Switch device support
> +#
> +
> +config NET_SWITCHDEV
> +	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
> +	depends on INET
> +	---help---
> +	  This module provides glue between core networking code and device
> +	  drivers in order to support hardware switch chips in very generic
> +	  meaning of the word "switch". This include devices supporting L2/L3 but
> +	  also various flow offloading chips, including switches embedded into
> +	  SR-IOV NICs.
> diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
> new file mode 100644
> index 0000000..5ed63ed
> --- /dev/null
> +++ b/net/switchdev/Makefile
> @@ -0,0 +1,5 @@
> +#
> +# Makefile for the Switch device API
> +#
> +
> +obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> new file mode 100644
> index 0000000..66973de
> --- /dev/null
> +++ b/net/switchdev/switchdev.c
> @@ -0,0 +1,33 @@
> +/*
> + * net/switchdev/switchdev.c - Switch device API
> + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/init.h>
> +#include <linux/netdevice.h>
> +#include <net/switchdev.h>
> +
> +/**
> + *	netdev_switch_parent_id_get - Get ID of a switch
> + *	@dev: port device
> + *	@psid: switch ID
> + *
> + *	Get ID of a switch this port is part of.
> + */
> +int netdev_switch_parent_id_get(struct net_device *dev,
> +				struct netdev_phys_item_id *psid)
> +{
> +	const struct net_device_ops *ops = dev->netdev_ops;
> +
> +	if (!ops->ndo_switch_parent_id_get)
> +		return -EOPNOTSUPP;
> +	return ops->ndo_switch_parent_id_get(dev, psid);
> +}
> +EXPORT_SYMBOL(netdev_switch_parent_id_get);
> -- 
> 1.9.3
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Nov. 25, 2014, 3:51 p.m. UTC | #2
On 11/25/14 05:28, Jiri Pirko wrote:
> The goal of this is to provide a possibility to support various switch
> chips. Drivers should implement relevant ndos to do so. Now there is
> only one ndo defined:
> - for getting physical switch id is in place.
>

I am not sure switch id is the right term. I have a network processor
that *does not* do switching. I am not sure if "chip" or "ASIC" or
"offload_id" would be the right term. switch doesnt sound right.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roopa Prabhu Nov. 25, 2014, 4:07 p.m. UTC | #3
On 11/25/14, 2:28 AM, Jiri Pirko wrote:
> The goal of this is to provide a possibility to support various switch
> chips. Drivers should implement relevant ndos to do so. Now there is
> only one ndo defined:
> - for getting physical switch id is in place.
>
> Note that user can use random port netdevice to access the switch.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> Reviewed-by: Thomas Graf <tgraf@suug.ch>
> ---
> v2->v3:
> -fixed documentation typo pointed out by M. Braun
> -changed "sw" string to "switch" to avoid confusion

Still voting for something generic like "hw" or "offload" or "hw_offload"
> v1->v2:
> -no change
> ---
>   Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>   MAINTAINERS                            |  7 ++++
>   include/linux/netdevice.h              | 10 ++++++
>   include/net/switchdev.h                | 30 +++++++++++++++++
>   net/Kconfig                            |  1 +
>   net/Makefile                           |  3 ++
>   net/switchdev/Kconfig                  | 13 ++++++++
>   net/switchdev/Makefile                 |  5 +++
>   net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>   9 files changed, 161 insertions(+)
>   create mode 100644 Documentation/networking/switchdev.txt
>   create mode 100644 include/net/switchdev.h
>   create mode 100644 net/switchdev/Kconfig
>   create mode 100644 net/switchdev/Makefile
>   create mode 100644 net/switchdev/switchdev.c
>
> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> new file mode 100644
> index 0000000..f981a92
> --- /dev/null
> +++ b/Documentation/networking/switchdev.txt
> @@ -0,0 +1,59 @@
> +Switch (and switch-ish) device drivers HOWTO
> +===========================
> +
> +Please note that the word "switch" is here used in very generic meaning.
> +This include devices supporting L2/L3 but also various flow offloading chips,
> +including switches embedded into SR-IOV NICs.
> +
> +Lets describe a topology a bit. Imagine the following example:
> +
> +       +----------------------------+    +---------------+
> +       |     SOME switch chip       |    |      CPU      |
> +       +----------------------------+    +---------------+
> +       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
> +         |     |     |     |     |       +---------------+
> +        PHY   PHY    |     |     |         |  NIC0 NIC1
> +                     |     |     |         |   |    |
> +                     |     |     +- PCI-E -+   |    |
> +                     |     +------- MII -------+    |
> +                     +------------- MII ------------+
> +
> +In this example, there are two independent lines between the switch silicon
> +and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
> +separate from the switch driver. SOME switch chip is by managed by a driver
> +via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
> +connected to some other type of bus.
> +
> +Now, for the previous example show the representation in kernel:
> +
> +       +----------------------------+    +---------------+
> +       |     SOME switch chip       |    |      CPU      |
> +       +----------------------------+    +---------------+
> +       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
> +         |     |     |     |     |       +---------------+
> +        PHY   PHY    |     |     |         |  eth0 eth1
> +                     |     |     |         |   |    |
> +                     |     |     +- PCI-E -+   |    |
> +                     |     +------- MII -------+    |
> +                     +------------- MII ------------+
> +
> +Lets call the example switch driver for SOME switch chip "SOMEswitch". This
> +driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
> +created for each port of a switch. These netdevices are instances
> +of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
> +of the switch chip. eth0 and eth1 are instances of some other existing driver.
> +
> +The only difference of the switch-port netdevice from the ordinary netdevice
> +is that is implements couple more NDOs:
> +
> +  ndo_switch_parent_id_get - This returns the same ID for two port netdevices
> +			     of the same physical switch chip. This is
> +			     mandatory to be implemented by all switch drivers
> +			     and serves the caller for recognition of a port
> +			     netdevice.
> +  ndo_switch_parent_* - Functions that serve for a manipulation of the switch
> +			chip itself (it can be though of as a "parent" of the
> +			port, therefore the name). They are not port-specific.
> +			Caller might use arbitrary port netdevice of the same
> +			switch and it will make no difference.
> +  ndo_switch_port_* - Functions that serve for a port-specific manipulation.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a545d68..05addb6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9058,6 +9058,13 @@ F:	lib/swiotlb.c
>   F:	arch/*/kernel/pci-swiotlb.c
>   F:	include/linux/swiotlb.h
>   
> +SWITCHDEV
> +M:	Jiri Pirko <jiri@resnulli.us>
> +L:	netdev@vger.kernel.org
> +S:	Supported
> +F:	net/switchdev/
> +F:	include/net/switchdev.h
> +
>   SYNOPSYS ARC ARCHITECTURE
>   M:	Vineet Gupta <vgupta@synopsys.com>
>   S:	Supported
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 5b491b3..ce096dc 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1018,6 +1018,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>    *	performing GSO on a packet. The device returns true if it is
>    *	able to GSO the packet, false otherwise. If the return value is
>    *	false the stack will do software GSO.
> + *
> + * int (*ndo_switch_parent_id_get)(struct net_device *dev,
> + *				   struct netdev_phys_item_id *psid);
> + *	Called to get an ID of the switch chip this port is part of.
> + *	If driver implements this, it indicates that it represents a port
> + *	of a switch chip.
>    */
>   struct net_device_ops {
>   	int			(*ndo_init)(struct net_device *dev);
> @@ -1171,6 +1177,10 @@ struct net_device_ops {
>   	int			(*ndo_get_lock_subclass)(struct net_device *dev);
>   	bool			(*ndo_gso_check) (struct sk_buff *skb,
>   						  struct net_device *dev);
> +#ifdef CONFIG_NET_SWITCHDEV
> +	int			(*ndo_switch_parent_id_get)(struct net_device *dev,
> +							    struct netdev_phys_item_id *psid);
> +#endif
>   };
>   
>   /**
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> new file mode 100644
> index 0000000..7a52360
> --- /dev/null
> +++ b/include/net/switchdev.h
> @@ -0,0 +1,30 @@
> +/*
> + * include/net/switchdev.h - Switch device API
> + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +#ifndef _LINUX_SWITCHDEV_H_
> +#define _LINUX_SWITCHDEV_H_
> +
> +#include <linux/netdevice.h>
> +
> +#ifdef CONFIG_NET_SWITCHDEV
> +
> +int netdev_switch_parent_id_get(struct net_device *dev,
> +				struct netdev_phys_item_id *psid);
> +
> +#else
> +
> +static inline int netdev_switch_parent_id_get(struct net_device *dev,
> +					      struct netdev_phys_item_id *psid)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +#endif
> +
> +#endif /* _LINUX_SWITCHDEV_H_ */
> diff --git a/net/Kconfig b/net/Kconfig
> index 99815b5..ff9ffc1 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
>   source "net/netlink/Kconfig"
>   source "net/mpls/Kconfig"
>   source "net/hsr/Kconfig"
> +source "net/switchdev/Kconfig"
>   
>   config RPS
>   	boolean
> diff --git a/net/Makefile b/net/Makefile
> index 7ed1970..95fc694 100644
> --- a/net/Makefile
> +++ b/net/Makefile
> @@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
>   obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
>   obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
>   obj-$(CONFIG_HSR)		+= hsr/
> +ifneq ($(CONFIG_NET_SWITCHDEV),)
> +obj-y				+= switchdev/
> +endif
> diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
> new file mode 100644
> index 0000000..1557545
> --- /dev/null
> +++ b/net/switchdev/Kconfig
> @@ -0,0 +1,13 @@
> +#
> +# Configuration for Switch device support
> +#
> +
> +config NET_SWITCHDEV
> +	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
> +	depends on INET
> +	---help---
> +	  This module provides glue between core networking code and device
> +	  drivers in order to support hardware switch chips in very generic
> +	  meaning of the word "switch". This include devices supporting L2/L3 but
> +	  also various flow offloading chips, including switches embedded into
> +	  SR-IOV NICs.
> diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
> new file mode 100644
> index 0000000..5ed63ed
> --- /dev/null
> +++ b/net/switchdev/Makefile
> @@ -0,0 +1,5 @@
> +#
> +# Makefile for the Switch device API
> +#
> +
> +obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> new file mode 100644
> index 0000000..66973de
> --- /dev/null
> +++ b/net/switchdev/switchdev.c
> @@ -0,0 +1,33 @@
> +/*
> + * net/switchdev/switchdev.c - Switch device API
> + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/init.h>
> +#include <linux/netdevice.h>
> +#include <net/switchdev.h>
> +
> +/**
> + *	netdev_switch_parent_id_get - Get ID of a switch
> + *	@dev: port device
> + *	@psid: switch ID
> + *
> + *	Get ID of a switch this port is part of.
> + */
> +int netdev_switch_parent_id_get(struct net_device *dev,
> +				struct netdev_phys_item_id *psid)
> +{
> +	const struct net_device_ops *ops = dev->netdev_ops;
> +
> +	if (!ops->ndo_switch_parent_id_get)
> +		return -EOPNOTSUPP;
> +	return ops->ndo_switch_parent_id_get(dev, psid);
> +}
> +EXPORT_SYMBOL(netdev_switch_parent_id_get);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 25, 2014, 4:49 p.m. UTC | #4
Tue, Nov 25, 2014 at 04:51:03PM CET, jhs@mojatatu.com wrote:
>On 11/25/14 05:28, Jiri Pirko wrote:
>>The goal of this is to provide a possibility to support various switch
>>chips. Drivers should implement relevant ndos to do so. Now there is
>>only one ndo defined:
>>- for getting physical switch id is in place.
>>
>
>I am not sure switch id is the right term. I have a network processor
>that *does not* do switching. I am not sure if "chip" or "ASIC" or

What does it do? "L3 switching"?

>"offload_id" would be the right term. switch doesnt sound right.

When we talk about this area, we use word "switch". I know it is not
accurate, but in my opinion it is the closest we can get. "chip" and
"ASIC" are too generic I believe. I would not use "offload" cause it wan
be easily mistaken with NIC offloads + it is alsno not accurate.



>
>cheers,
>jamal
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 25, 2014, 4:50 p.m. UTC | #5
Tue, Nov 25, 2014 at 05:07:02PM CET, roopa@cumulusnetworks.com wrote:
>On 11/25/14, 2:28 AM, Jiri Pirko wrote:
>>The goal of this is to provide a possibility to support various switch
>>chips. Drivers should implement relevant ndos to do so. Now there is
>>only one ndo defined:
>>- for getting physical switch id is in place.
>>
>>Note that user can use random port netdevice to access the switch.
>>
>>Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>>Reviewed-by: Thomas Graf <tgraf@suug.ch>
>>---
>>v2->v3:
>>-fixed documentation typo pointed out by M. Braun
>>-changed "sw" string to "switch" to avoid confusion
>
>Still voting for something generic like "hw" or "offload" or "hw_offload"

See my previous reply to Jamal.

>>v1->v2:
>>-no change
>>---
>>  Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>>  MAINTAINERS                            |  7 ++++
>>  include/linux/netdevice.h              | 10 ++++++
>>  include/net/switchdev.h                | 30 +++++++++++++++++
>>  net/Kconfig                            |  1 +
>>  net/Makefile                           |  3 ++
>>  net/switchdev/Kconfig                  | 13 ++++++++
>>  net/switchdev/Makefile                 |  5 +++
>>  net/switchdev/switchdev.c              | 33 +++++++++++++++++++
>>  9 files changed, 161 insertions(+)
>>  create mode 100644 Documentation/networking/switchdev.txt
>>  create mode 100644 include/net/switchdev.h
>>  create mode 100644 net/switchdev/Kconfig
>>  create mode 100644 net/switchdev/Makefile
>>  create mode 100644 net/switchdev/switchdev.c
>>
>>diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>>new file mode 100644
>>index 0000000..f981a92
>>--- /dev/null
>>+++ b/Documentation/networking/switchdev.txt
>>@@ -0,0 +1,59 @@
>>+Switch (and switch-ish) device drivers HOWTO
>>+===========================
>>+
>>+Please note that the word "switch" is here used in very generic meaning.
>>+This include devices supporting L2/L3 but also various flow offloading chips,
>>+including switches embedded into SR-IOV NICs.
>>+
>>+Lets describe a topology a bit. Imagine the following example:
>>+
>>+       +----------------------------+    +---------------+
>>+       |     SOME switch chip       |    |      CPU      |
>>+       +----------------------------+    +---------------+
>>+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
>>+         |     |     |     |     |       +---------------+
>>+        PHY   PHY    |     |     |         |  NIC0 NIC1
>>+                     |     |     |         |   |    |
>>+                     |     |     +- PCI-E -+   |    |
>>+                     |     +------- MII -------+    |
>>+                     +------------- MII ------------+
>>+
>>+In this example, there are two independent lines between the switch silicon
>>+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
>>+separate from the switch driver. SOME switch chip is by managed by a driver
>>+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
>>+connected to some other type of bus.
>>+
>>+Now, for the previous example show the representation in kernel:
>>+
>>+       +----------------------------+    +---------------+
>>+       |     SOME switch chip       |    |      CPU      |
>>+       +----------------------------+    +---------------+
>>+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
>>+         |     |     |     |     |       +---------------+
>>+        PHY   PHY    |     |     |         |  eth0 eth1
>>+                     |     |     |         |   |    |
>>+                     |     |     +- PCI-E -+   |    |
>>+                     |     +------- MII -------+    |
>>+                     +------------- MII ------------+
>>+
>>+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
>>+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
>>+created for each port of a switch. These netdevices are instances
>>+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
>>+of the switch chip. eth0 and eth1 are instances of some other existing driver.
>>+
>>+The only difference of the switch-port netdevice from the ordinary netdevice
>>+is that is implements couple more NDOs:
>>+
>>+  ndo_switch_parent_id_get - This returns the same ID for two port netdevices
>>+			     of the same physical switch chip. This is
>>+			     mandatory to be implemented by all switch drivers
>>+			     and serves the caller for recognition of a port
>>+			     netdevice.
>>+  ndo_switch_parent_* - Functions that serve for a manipulation of the switch
>>+			chip itself (it can be though of as a "parent" of the
>>+			port, therefore the name). They are not port-specific.
>>+			Caller might use arbitrary port netdevice of the same
>>+			switch and it will make no difference.
>>+  ndo_switch_port_* - Functions that serve for a port-specific manipulation.
>>diff --git a/MAINTAINERS b/MAINTAINERS
>>index a545d68..05addb6 100644
>>--- a/MAINTAINERS
>>+++ b/MAINTAINERS
>>@@ -9058,6 +9058,13 @@ F:	lib/swiotlb.c
>>  F:	arch/*/kernel/pci-swiotlb.c
>>  F:	include/linux/swiotlb.h
>>+SWITCHDEV
>>+M:	Jiri Pirko <jiri@resnulli.us>
>>+L:	netdev@vger.kernel.org
>>+S:	Supported
>>+F:	net/switchdev/
>>+F:	include/net/switchdev.h
>>+
>>  SYNOPSYS ARC ARCHITECTURE
>>  M:	Vineet Gupta <vgupta@synopsys.com>
>>  S:	Supported
>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>index 5b491b3..ce096dc 100644
>>--- a/include/linux/netdevice.h
>>+++ b/include/linux/netdevice.h
>>@@ -1018,6 +1018,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
>>   *	performing GSO on a packet. The device returns true if it is
>>   *	able to GSO the packet, false otherwise. If the return value is
>>   *	false the stack will do software GSO.
>>+ *
>>+ * int (*ndo_switch_parent_id_get)(struct net_device *dev,
>>+ *				   struct netdev_phys_item_id *psid);
>>+ *	Called to get an ID of the switch chip this port is part of.
>>+ *	If driver implements this, it indicates that it represents a port
>>+ *	of a switch chip.
>>   */
>>  struct net_device_ops {
>>  	int			(*ndo_init)(struct net_device *dev);
>>@@ -1171,6 +1177,10 @@ struct net_device_ops {
>>  	int			(*ndo_get_lock_subclass)(struct net_device *dev);
>>  	bool			(*ndo_gso_check) (struct sk_buff *skb,
>>  						  struct net_device *dev);
>>+#ifdef CONFIG_NET_SWITCHDEV
>>+	int			(*ndo_switch_parent_id_get)(struct net_device *dev,
>>+							    struct netdev_phys_item_id *psid);
>>+#endif
>>  };
>>  /**
>>diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>new file mode 100644
>>index 0000000..7a52360
>>--- /dev/null
>>+++ b/include/net/switchdev.h
>>@@ -0,0 +1,30 @@
>>+/*
>>+ * include/net/switchdev.h - Switch device API
>>+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
>>+ *
>>+ * This program is free software; you can redistribute it and/or modify
>>+ * it under the terms of the GNU General Public License as published by
>>+ * the Free Software Foundation; either version 2 of the License, or
>>+ * (at your option) any later version.
>>+ */
>>+#ifndef _LINUX_SWITCHDEV_H_
>>+#define _LINUX_SWITCHDEV_H_
>>+
>>+#include <linux/netdevice.h>
>>+
>>+#ifdef CONFIG_NET_SWITCHDEV
>>+
>>+int netdev_switch_parent_id_get(struct net_device *dev,
>>+				struct netdev_phys_item_id *psid);
>>+
>>+#else
>>+
>>+static inline int netdev_switch_parent_id_get(struct net_device *dev,
>>+					      struct netdev_phys_item_id *psid)
>>+{
>>+	return -EOPNOTSUPP;
>>+}
>>+
>>+#endif
>>+
>>+#endif /* _LINUX_SWITCHDEV_H_ */
>>diff --git a/net/Kconfig b/net/Kconfig
>>index 99815b5..ff9ffc1 100644
>>--- a/net/Kconfig
>>+++ b/net/Kconfig
>>@@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig"
>>  source "net/netlink/Kconfig"
>>  source "net/mpls/Kconfig"
>>  source "net/hsr/Kconfig"
>>+source "net/switchdev/Kconfig"
>>  config RPS
>>  	boolean
>>diff --git a/net/Makefile b/net/Makefile
>>index 7ed1970..95fc694 100644
>>--- a/net/Makefile
>>+++ b/net/Makefile
>>@@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
>>  obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
>>  obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
>>  obj-$(CONFIG_HSR)		+= hsr/
>>+ifneq ($(CONFIG_NET_SWITCHDEV),)
>>+obj-y				+= switchdev/
>>+endif
>>diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
>>new file mode 100644
>>index 0000000..1557545
>>--- /dev/null
>>+++ b/net/switchdev/Kconfig
>>@@ -0,0 +1,13 @@
>>+#
>>+# Configuration for Switch device support
>>+#
>>+
>>+config NET_SWITCHDEV
>>+	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
>>+	depends on INET
>>+	---help---
>>+	  This module provides glue between core networking code and device
>>+	  drivers in order to support hardware switch chips in very generic
>>+	  meaning of the word "switch". This include devices supporting L2/L3 but
>>+	  also various flow offloading chips, including switches embedded into
>>+	  SR-IOV NICs.
>>diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
>>new file mode 100644
>>index 0000000..5ed63ed
>>--- /dev/null
>>+++ b/net/switchdev/Makefile
>>@@ -0,0 +1,5 @@
>>+#
>>+# Makefile for the Switch device API
>>+#
>>+
>>+obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
>>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>new file mode 100644
>>index 0000000..66973de
>>--- /dev/null
>>+++ b/net/switchdev/switchdev.c
>>@@ -0,0 +1,33 @@
>>+/*
>>+ * net/switchdev/switchdev.c - Switch device API
>>+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
>>+ *
>>+ * This program is free software; you can redistribute it and/or modify
>>+ * it under the terms of the GNU General Public License as published by
>>+ * the Free Software Foundation; either version 2 of the License, or
>>+ * (at your option) any later version.
>>+ */
>>+
>>+#include <linux/kernel.h>
>>+#include <linux/types.h>
>>+#include <linux/init.h>
>>+#include <linux/netdevice.h>
>>+#include <net/switchdev.h>
>>+
>>+/**
>>+ *	netdev_switch_parent_id_get - Get ID of a switch
>>+ *	@dev: port device
>>+ *	@psid: switch ID
>>+ *
>>+ *	Get ID of a switch this port is part of.
>>+ */
>>+int netdev_switch_parent_id_get(struct net_device *dev,
>>+				struct netdev_phys_item_id *psid)
>>+{
>>+	const struct net_device_ops *ops = dev->netdev_ops;
>>+
>>+	if (!ops->ndo_switch_parent_id_get)
>>+		return -EOPNOTSUPP;
>>+	return ops->ndo_switch_parent_id_get(dev, psid);
>>+}
>>+EXPORT_SYMBOL(netdev_switch_parent_id_get);
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Nov. 25, 2014, 5:08 p.m. UTC | #6
On 11/25/14 11:49, Jiri Pirko wrote:

>
> What does it do? "L3 switching"?
>

Absolutely not - that is too easy;-> Why not just a mellanox
chip for that? (Testing if Aviad is awake). But flows and associated
constructs apply.


>> "offload_id" would be the right term. switch doesnt sound right.
>
> When we talk about this area, we use word "switch". I know it is not
> accurate, but in my opinion it is the closest we can get. "chip" and
> "ASIC" are too generic I believe. I would not use "offload" cause it wan
> be easily mistaken with NIC offloads + it is alsno not accurate.

I think this interface is usable for example to offload to user space
ala DPDK and friends just as it would be for ASICs or standard NIC
offload (which we already have with fdb offload).
I dont know what a good name is - but switch looks incorrect.

cheers,
jamal



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Nov. 25, 2014, 9:54 p.m. UTC | #7
On 11/25/14 at 12:08pm, Jamal Hadi Salim wrote:
> On 11/25/14 11:49, Jiri Pirko wrote:
> 
> >
> >What does it do? "L3 switching"?
> >
> 
> Absolutely not - that is too easy;-> Why not just a mellanox
> chip for that? (Testing if Aviad is awake). But flows and associated
> constructs apply.

It would definitely help if you could expose some more details on the
"some network processor" you have. We're all very eager ;-)

> I think this interface is usable for example to offload to user space
> ala DPDK and friends just as it would be for ASICs or standard NIC
> offload (which we already have with fdb offload).
> I dont know what a good name is - but switch looks incorrect.

I'm with Jiri but I agree it's not a perfect fit. I doubt there is but
if you can come up with something that fits better I'm open to it.

I considered "dataplane" or "dp" for a bit but it's quite generic as
well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Nov. 26, 2014, 3:33 a.m. UTC | #8
On 11/25/14 16:54, Thomas Graf wrote:
> On 11/25/14 at 12:08pm, Jamal Hadi Salim wrote:

> It would definitely help if you could expose some more details on the
> "some network processor" you have. We're all very eager ;-)
>

Well, this thing doesnt run ovs ;-> (/me runs). If you come
to netdev i may let you play with it ;-> Its a humongous device
(think multi 100G ports).

On a serious note: Even if you took what Simon/Netronome has
(yes, I know they use ovs;->) - there is really no need for a switch
abstraction *at all* if all you want to is hang a packet
processing graph that ingresses at a port and egress at another port.
As you know, Linux supports it just fine with tc.

> I'm with Jiri but I agree it's not a perfect fit. I doubt there is but
> if you can come up with something that fits better I'm open to it.
>
> I considered "dataplane" or "dp" for a bit but it's quite generic as
> well.
>

The purpose is to offload. I think any name would be better than
mapping it to a specific abstraction called "switch". Especially
if it is hanging off a port and there is no switch in the pipeline.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman Nov. 26, 2014, 4:18 a.m. UTC | #9
On Tue, Nov 25, 2014 at 5:33 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 11/25/14 16:54, Thomas Graf wrote:
>>
>> On 11/25/14 at 12:08pm, Jamal Hadi Salim wrote:
>
>
>> It would definitely help if you could expose some more details on the
>> "some network processor" you have. We're all very eager ;-)
>>
>
> Well, this thing doesnt run ovs ;-> (/me runs). If you come
> to netdev i may let you play with it ;-> Its a humongous device
> (think multi 100G ports).
>
> On a serious note: Even if you took what Simon/Netronome has
> (yes, I know they use ovs;->) - there is really no need for a switch
> abstraction *at all* if all you want to is hang a packet
> processing graph that ingresses at a port and egress at another port.
> As you know, Linux supports it just fine with tc.

You have a pointer to the kernel driver for that HW?  Can you show how
you're using Linux tc netlink msg in kernel to program HW?  I'd like
to see the in-kernel API.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Nov. 26, 2014, 11:36 a.m. UTC | #10
On 11/25/14 23:18, Scott Feldman wrote:
> On Tue, Nov 25, 2014 at 5:33 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:

>
> You have a pointer to the kernel driver for that HW?

I wasnt sure if that was a passive aggressive move there to
question what i am claiming?(Only Canadians are allowed to be
passive aggressive Scott). To answer your question, no
code currently littered with vendor SDK unfortunately (as you
would know!).
But hopefully if we get these changes in correctly it would
not be hard to show the driver working fully in the kernel.
There are definetely a few other pieces of hardware that are
making me come back here and invest time and effort in these
long discussions.

> Can you show how
> you're using Linux tc netlink msg in kernel to program HW?  I'd like
> to see the in-kernel API.
>

Lets do the L2/port thing first. But yes, I am using Linux tc in
kernel.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Nov. 26, 2014, 4:08 p.m. UTC | #11
On 11/26/14 at 06:36am, Jamal Hadi Salim wrote:
> On 11/25/14 23:18, Scott Feldman wrote:
> >On Tue, Nov 25, 2014 at 5:33 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> 
> >
> >You have a pointer to the kernel driver for that HW?
> 
> I wasnt sure if that was a passive aggressive move there to
> question what i am claiming?(Only Canadians are allowed to be
> passive aggressive Scott). To answer your question, no
> code currently littered with vendor SDK unfortunately (as you
> would know!).
> But hopefully if we get these changes in correctly it would
> not be hard to show the driver working fully in the kernel.
> There are definetely a few other pieces of hardware that are
> making me come back here and invest time and effort in these
> long discussions.
> 
> >Can you show how
> >you're using Linux tc netlink msg in kernel to program HW?  I'd like
> >to see the in-kernel API.
> >
> 
> Lets do the L2/port thing first. But yes, I am using Linux tc in
> kernel.

Jamal,

What is irriating in this context is that you are pushing back on
Jiri and others while referring to properitary and closed code which
you are unwilling or unable to share. I don't see this as being
passive aggressive, everybody is treated the same way in this regard.

It is exactly the point of this API and related discussions to
decouple the control plane (tc) from any vendor specifics while
allowing them to innovate, compete, and solve different use cases.

I think it's absolutely the right thing to write the API against
code that is public, which in this case is rocker and the existing
in-kernel NIC drivers.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Nov. 26, 2014, 5:09 p.m. UTC | #12
On 11/26/14 11:08, Thomas Graf wrote:
> On 11/26/14 at 06:36am, Jamal Hadi Salim wrote:
>
>
> Jamal,
>
> What is irriating in this context is that you are pushing back on
> Jiri and others while referring to properitary and closed code which
> you are unwilling or unable to share. I don't see this as being
> passive aggressive, everybody is treated the same way in this regard.
>

WTF? I said i have hardware that is not a switch because it doesnt
do switching. This all started with the name being "switch" which
I objected to. You ask me to describe hardware and then you come
back and say I am using that to stop progress?
Where the hell did i push back on Jiri? Stop going around
telling people i do. I invest my time and effort reviewing code,
proposing ideas, posting etc calling meetings. Infact i initiated
this whole effort to begin with.

There is no point to responding to any of your other comments.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 26, 2014, 5:59 p.m. UTC | #13
Wed, Nov 26, 2014 at 06:09:13PM CET, jhs@mojatatu.com wrote:
>On 11/26/14 11:08, Thomas Graf wrote:
>>On 11/26/14 at 06:36am, Jamal Hadi Salim wrote:
>>
>>
>>Jamal,
>>
>>What is irriating in this context is that you are pushing back on
>>Jiri and others while referring to properitary and closed code which
>>you are unwilling or unable to share. I don't see this as being
>>passive aggressive, everybody is treated the same way in this regard.
>>
>
>WTF? I said i have hardware that is not a switch because it doesnt
>do switching. This all started with the name being "switch" which
>I objected to. You ask me to describe hardware and then you come
>back and say I am using that to stop progress?

Stay calm, I'm sure that this is just a misunderstanding.

>Where the hell did i push back on Jiri? Stop going around
>telling people i do. I invest my time and effort reviewing code,
>proposing ideas, posting etc calling meetings. Infact i initiated
>this whole effort to begin with.

I thought I started this :) Anyway, I much appreciate your involvement
in this Jamal with putting the meetings together and stuff, that's for sure.

We need to join forces, not to fight with each other.


>
>There is no point to responding to any of your other comments.
>
>cheers,
>jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Nov. 26, 2014, 9:50 p.m. UTC | #14
On 11/26/14 at 06:59pm, Jiri Pirko wrote:
> Wed, Nov 26, 2014 at 06:09:13PM CET, jhs@mojatatu.com wrote:
> >On 11/26/14 11:08, Thomas Graf wrote:
> >>On 11/26/14 at 06:36am, Jamal Hadi Salim wrote:
> >>What is irriating in this context is that you are pushing back on
> >>Jiri and others while referring to properitary and closed code which
> >>you are unwilling or unable to share. I don't see this as being
> >>passive aggressive, everybody is treated the same way in this regard.
> >
> >WTF? I said i have hardware that is not a switch because it doesnt
> >do switching. This all started with the name being "switch" which
> >I objected to. You ask me to describe hardware and then you come
> >back and say I am using that to stop progress?
> 
> Stay calm, I'm sure that this is just a misunderstanding.
> 
> >Where the hell did i push back on Jiri? Stop going around
> >telling people i do.

You are requesting a name change for a proprietary driver after
confirming that you can't publish the code. We don't even know what
the piece of hardware you refer to is capable of.

We've always written driver facing APIs for the drivers that are
*in* the kernel which in this case is rocker, modelled after OF-DPA,
existing NIC drivers, and DSA drivers.

I can live with the term switch, but if somebody can come up with a
better name, cool. "Chip" or "ASIC" are probably not better choices
though.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Nov. 26, 2014, 11:32 p.m. UTC | #15
On 11/26/14 16:50, Thomas Graf wrote:

> You are requesting a name change for a proprietary driver after
> confirming that you can't publish the code. We don't even know what
> the piece of hardware you refer to is capable of.
>

I am not sure why there is such a misunderstanding. Here's the
sequence of events.

Jiri/Scott: We'll call this offload thing hanging off a port_ops
a "switch". It does one or more of L2, L3 and flows.
Jamal: I am not fond of that name because not everything that offloads
off a port is a switch (some mention of fitting even with dpdk)
Jiri: What do you have - an L3 "switch"?
Jamal: No, it is something that does offloading of packet processing off
a port with flows and action. Example a netronome would be a good fit 
(if you are to ignore Simon going for OVS).

And then things get out of control. This has *nothing* to do with any
driver or any code or anything speacilized.
Not every packet processing offload hanging off ports is a switch (I
dont think even the patch was claiming that although by now ive lost
track of where it started).

Yes, i cannot publish this code. You know that; Scott knows that and
Jiri knows. (and thats why i thought it passive aggressive when Scott
asked about the code when we are discussing a name change).
The reason i am even involved in all this is so we can actually
publish code and i can stop using proprietary SDK stuff.
While i cant release the current code I want to share my experiences
in trying to help make that API sane. Because i want to use it.
I have been doing this offload shit for at least 15 years on Linux.
I have something to say about it. Just throwing in some gauntlet
when it serves some convinience and treating me like some guy who
showed off the street making claims is bordering on the ridiculuos.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simon Horman Nov. 27, 2014, 3:13 a.m. UTC | #16
On Tue, Nov 25, 2014 at 10:33:36PM -0500, Jamal Hadi Salim wrote:
> On 11/25/14 16:54, Thomas Graf wrote:
> >On 11/25/14 at 12:08pm, Jamal Hadi Salim wrote:
> 
> >It would definitely help if you could expose some more details on the
> >"some network processor" you have. We're all very eager ;-)
> >
> 
> Well, this thing doesnt run ovs ;-> (/me runs). If you come
> to netdev i may let you play with it ;-> Its a humongous device
> (think multi 100G ports).
> 
> On a serious note: Even if you took what Simon/Netronome has
> (yes, I know they use ovs;->)

FWIW, we are also interested in non-OVS use cases.

> - there is really no need for a switch
> abstraction *at all* if all you want to is hang a packet
> processing graph that ingresses at a port and egress at another port.
> As you know, Linux supports it just fine with tc.

I may be missing the point but I see two problems that are solved by
the switch abstraction.

- Cases where no ports are configured.

  Perhaps no such use cases exist for the API in question.
  But it does seem plausible to me that non-physical ports could
  be added at run-time and that thus a "switch" could initially
  exist with no configured port. Something like how bridges
  initially have no ports (IIRC).

- Discovering the association between ports and "switches".

My recollection from the double round table discussion on the last day of
the Düsseldorf sessions was that these were reasons that simply accessing
any port belonging to the "switch" were not entirely satisfactory.

> >I'm with Jiri but I agree it's not a perfect fit. I doubt there is but
> >if you can come up with something that fits better I'm open to it.
> >
> >I considered "dataplane" or "dp" for a bit but it's quite generic as
> >well.
> >
> 
> The purpose is to offload. I think any name would be better than
> mapping it to a specific abstraction called "switch". Especially
> if it is hanging off a port and there is no switch in the pipeline.
> 
> cheers,
> jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman Nov. 27, 2014, 5:58 a.m. UTC | #17
On Wed, Nov 26, 2014 at 1:36 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 11/25/14 23:18, Scott Feldman wrote:
>>
>> On Tue, Nov 25, 2014 at 5:33 PM, Jamal Hadi Salim <jhs@mojatatu.com>
>> wrote:
>
>
>>
>> You have a pointer to the kernel driver for that HW?
>
>
> I wasnt sure if that was a passive aggressive move there to
> question what i am claiming?(Only Canadians are allowed to be
> passive aggressive Scott). To answer your question, no
> code currently littered with vendor SDK unfortunately (as you
> would know!).

Drats, I was hoping there might be Open Source here.  I'm actually not
familiar with Netronome offerings.  I went to their web page and all
their Docs downloads require registration, so I should have guessed
same-old-same-old.  But you teased us with it, so I thought I would
ask.  Sorry for the trouble XOXOXOXO.  I'm not Canadian, as far as I
know.

> But hopefully if we get these changes in correctly it would
> not be hard to show the driver working fully in the kernel.
> There are definetely a few other pieces of hardware that are
> making me come back here and invest time and effort in these
> long discussions.

You have access to the inside scope.  We don't.  Ok, I don't.  We
(think we) know what the traditional L2/L3 and OVS-style flow stuff
looks like, but you know more, but you can't show us in code so it's
frustrating.  Not your fault.  Just continue to guide us and give some
disclaimer when we're your close to some proprietary knowledge, but it
is relevant to the discussion.


>> Can you show how
>> you're using Linux tc netlink msg in kernel to program HW?  I'd like
>> to see the in-kernel API.
>>
>
> Lets do the L2/port thing first. But yes, I am using Linux tc in
> kernel.
>
> cheers,
> jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Nov. 27, 2014, 12:35 p.m. UTC | #18
On 11/26/14 22:13, Simon Horman wrote:
> On Tue, Nov 25, 2014 at 10:33:36PM -0500, Jamal Hadi Salim wrote:

[..]
> I may be missing the point but I see two problems that are solved by
> the switch abstraction.
>
> - Cases where no ports are configured.
>
>    Perhaps no such use cases exist for the API in question.
>    But it does seem plausible to me that non-physical ports could
>    be added at run-time and that thus a "switch" could initially
>    exist with no configured port. Something like how bridges
>    initially have no ports (IIRC).
>
> - Discovering the association between ports and "switches".
>
> My recollection from the double round table discussion on the last day of
> the Düsseldorf sessions was that these were reasons that simply accessing
> any port belonging to the "switch" were not entirely satisfactory.
>

So in Du I illustrated in a slide the internals of the Realtek that
Ben had patches on. Ben first exposes the realtek ports and when
you wish you can build a bridge and attach the exposed ports
and then hardware switching functionality is used. What is interesting
about it is infact you didnt need to use the switching on it. You
could attach a filter to any of the exposed ports, then specify an
action to do a redirect to another port for example.
(Scott i know you were not there, but i cant find where those slides
are posted; will send them when i do - or ask Thomas).
This is very easy to map to port/ingress classifier/actions in Linux.
I was hoping i could produce a patch to do this - but waiting on Ben
to complete the reverse engineering.
In any case the realtek is a toy example but there's millions deployed
and producing a patch for tc (if Jiri doesnt beat me to it) is a useful
exercise.
My devices (as would a netronome) would apply the same concept.
Essentially, you take an ingress packet arriving on a port,
you apply a classifier to it, apply actions to i and eventually
ingress it to a port. i.e

Ingress packet-->port->classifier-->...actions..->egress port

I can model the above with tc.

cheers,
jamal


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Nov. 27, 2014, 12:46 p.m. UTC | #19
On 11/27/14 00:58, Scott Feldman wrote:


> You have access to the inside scope.  We don't.  Ok, I don't.  We
> (think we) know what the traditional L2/L3 and OVS-style flow stuff
> looks like, but you know more, but you can't show us in code so it's
> frustrating.  Not your fault.  Just continue to guide us and give some
> disclaimer when we're your close to some proprietary knowledge, but it
> is relevant to the discussion.
>


Scott, I am asking to offload basic functionality that Linux supports.
I may be blind-sided and getting frustrated thinking it is obvious
because i live through this stuff everyday; but I am trying all i can
to share what you call proprietary knowledge whenever i can. If you
think of this as "we need to offload all packet processing linux
supports" you'll see where i am coming from.

cheers,
jamal

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Nov. 27, 2014, 1:03 p.m. UTC | #20
On 11/26/14 at 06:32pm, Jamal Hadi Salim wrote:
> Jiri/Scott: We'll call this offload thing hanging off a port_ops
> a "switch". It does one or more of L2, L3 and flows.
> Jamal: I am not fond of that name because not everything that offloads
> off a port is a switch (some mention of fitting even with dpdk)
> Jiri: What do you have - an L3 "switch"?
> Jamal: No, it is something that does offloading of packet processing off
> a port with flows and action. Example a netronome would be a good fit (if
> you are to ignore Simon going for OVS).

So what is your name suggestion?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Nov. 27, 2014, 1:32 p.m. UTC | #21
On 11/27/14 08:03, Thomas Graf wrote:

> So what is your name suggestion?
>

I would have gone for _offload_ either as a prefix or suffix
somewhere.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Nov. 27, 2014, 1:50 p.m. UTC | #22
Thu, Nov 27, 2014 at 02:32:32PM CET, jhs@mojatatu.com wrote:
>On 11/27/14 08:03, Thomas Graf wrote:
>
>>So what is your name suggestion?
>>
>
>I would have gone for _offload_ either as a prefix or suffix
>somewhere.

$ git grep offload net
Wouldn't it be confusing to add this another different "offload". That's
just confusing.

I still like "switch" the best. If it passes packets around, it's a
"switch", +-. Everybody understand what's going on if you use "switch".
If you use "offload", everybody is confused...


>
>cheers,
>jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamal Hadi Salim Nov. 28, 2014, 1:13 p.m. UTC | #23
On 11/27/14 08:50, Jiri Pirko wrote:

> $ git grep offload net
> Wouldn't it be confusing to add this another different "offload". That's
> just confusing.
>
> I still like "switch" the best. If it passes packets around, it's a
> "switch", +-. Everybody understand what's going on if you use "switch".
> If you use "offload", everybody is confused...
>

Those are all *legitimate offloads* ;->
The macvlan one looks a little creepy. Perhaps we could eventually
merge all that stuff together with this effort.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
new file mode 100644
index 0000000..f981a92
--- /dev/null
+++ b/Documentation/networking/switchdev.txt
@@ -0,0 +1,59 @@ 
+Switch (and switch-ish) device drivers HOWTO
+===========================
+
+Please note that the word "switch" is here used in very generic meaning.
+This include devices supporting L2/L3 but also various flow offloading chips,
+including switches embedded into SR-IOV NICs.
+
+Lets describe a topology a bit. Imagine the following example:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       port1 port2 port3 port4 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  NIC0 NIC1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+In this example, there are two independent lines between the switch silicon
+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
+separate from the switch driver. SOME switch chip is by managed by a driver
+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
+connected to some other type of bus.
+
+Now, for the previous example show the representation in kernel:
+
+       +----------------------------+    +---------------+
+       |     SOME switch chip       |    |      CPU      |
+       +----------------------------+    +---------------+
+       sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT    |     PCI-E     |
+         |     |     |     |     |       +---------------+
+        PHY   PHY    |     |     |         |  eth0 eth1
+                     |     |     |         |   |    |
+                     |     |     +- PCI-E -+   |    |
+                     |     +------- MII -------+    |
+                     +------------- MII ------------+
+
+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
+created for each port of a switch. These netdevices are instances
+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
+of the switch chip. eth0 and eth1 are instances of some other existing driver.
+
+The only difference of the switch-port netdevice from the ordinary netdevice
+is that is implements couple more NDOs:
+
+  ndo_switch_parent_id_get - This returns the same ID for two port netdevices
+			     of the same physical switch chip. This is
+			     mandatory to be implemented by all switch drivers
+			     and serves the caller for recognition of a port
+			     netdevice.
+  ndo_switch_parent_* - Functions that serve for a manipulation of the switch
+			chip itself (it can be though of as a "parent" of the
+			port, therefore the name). They are not port-specific.
+			Caller might use arbitrary port netdevice of the same
+			switch and it will make no difference.
+  ndo_switch_port_* - Functions that serve for a port-specific manipulation.
diff --git a/MAINTAINERS b/MAINTAINERS
index a545d68..05addb6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9058,6 +9058,13 @@  F:	lib/swiotlb.c
 F:	arch/*/kernel/pci-swiotlb.c
 F:	include/linux/swiotlb.h
 
+SWITCHDEV
+M:	Jiri Pirko <jiri@resnulli.us>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	net/switchdev/
+F:	include/net/switchdev.h
+
 SYNOPSYS ARC ARCHITECTURE
 M:	Vineet Gupta <vgupta@synopsys.com>
 S:	Supported
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5b491b3..ce096dc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1018,6 +1018,12 @@  typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
  *	performing GSO on a packet. The device returns true if it is
  *	able to GSO the packet, false otherwise. If the return value is
  *	false the stack will do software GSO.
+ *
+ * int (*ndo_switch_parent_id_get)(struct net_device *dev,
+ *				   struct netdev_phys_item_id *psid);
+ *	Called to get an ID of the switch chip this port is part of.
+ *	If driver implements this, it indicates that it represents a port
+ *	of a switch chip.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1171,6 +1177,10 @@  struct net_device_ops {
 	int			(*ndo_get_lock_subclass)(struct net_device *dev);
 	bool			(*ndo_gso_check) (struct sk_buff *skb,
 						  struct net_device *dev);
+#ifdef CONFIG_NET_SWITCHDEV
+	int			(*ndo_switch_parent_id_get)(struct net_device *dev,
+							    struct netdev_phys_item_id *psid);
+#endif
 };
 
 /**
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
new file mode 100644
index 0000000..7a52360
--- /dev/null
+++ b/include/net/switchdev.h
@@ -0,0 +1,30 @@ 
+/*
+ * include/net/switchdev.h - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#ifndef _LINUX_SWITCHDEV_H_
+#define _LINUX_SWITCHDEV_H_
+
+#include <linux/netdevice.h>
+
+#ifdef CONFIG_NET_SWITCHDEV
+
+int netdev_switch_parent_id_get(struct net_device *dev,
+				struct netdev_phys_item_id *psid);
+
+#else
+
+static inline int netdev_switch_parent_id_get(struct net_device *dev,
+					      struct netdev_phys_item_id *psid)
+{
+	return -EOPNOTSUPP;
+}
+
+#endif
+
+#endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/Kconfig b/net/Kconfig
index 99815b5..ff9ffc1 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -228,6 +228,7 @@  source "net/vmw_vsock/Kconfig"
 source "net/netlink/Kconfig"
 source "net/mpls/Kconfig"
 source "net/hsr/Kconfig"
+source "net/switchdev/Kconfig"
 
 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index 7ed1970..95fc694 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -73,3 +73,6 @@  obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
 obj-$(CONFIG_VSOCKETS)	+= vmw_vsock/
 obj-$(CONFIG_NET_MPLS_GSO)	+= mpls/
 obj-$(CONFIG_HSR)		+= hsr/
+ifneq ($(CONFIG_NET_SWITCHDEV),)
+obj-y				+= switchdev/
+endif
diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
new file mode 100644
index 0000000..1557545
--- /dev/null
+++ b/net/switchdev/Kconfig
@@ -0,0 +1,13 @@ 
+#
+# Configuration for Switch device support
+#
+
+config NET_SWITCHDEV
+	boolean "Switch (and switch-ish) device support (EXPERIMENTAL)"
+	depends on INET
+	---help---
+	  This module provides glue between core networking code and device
+	  drivers in order to support hardware switch chips in very generic
+	  meaning of the word "switch". This include devices supporting L2/L3 but
+	  also various flow offloading chips, including switches embedded into
+	  SR-IOV NICs.
diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile
new file mode 100644
index 0000000..5ed63ed
--- /dev/null
+++ b/net/switchdev/Makefile
@@ -0,0 +1,5 @@ 
+#
+# Makefile for the Switch device API
+#
+
+obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
new file mode 100644
index 0000000..66973de
--- /dev/null
+++ b/net/switchdev/switchdev.c
@@ -0,0 +1,33 @@ 
+/*
+ * net/switchdev/switchdev.c - Switch device API
+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/netdevice.h>
+#include <net/switchdev.h>
+
+/**
+ *	netdev_switch_parent_id_get - Get ID of a switch
+ *	@dev: port device
+ *	@psid: switch ID
+ *
+ *	Get ID of a switch this port is part of.
+ */
+int netdev_switch_parent_id_get(struct net_device *dev,
+				struct netdev_phys_item_id *psid)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_switch_parent_id_get)
+		return -EOPNOTSUPP;
+	return ops->ndo_switch_parent_id_get(dev, psid);
+}
+EXPORT_SYMBOL(netdev_switch_parent_id_get);