Message ID | 1416911328-10979-5-git-send-email-jiri@resnulli.us |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, Nov 25, 2014 at 11:28:35AM +0100, Jiri Pirko wrote: > The goal of this is to provide a possibility to support various switch > chips. Drivers should implement relevant ndos to do so. Now there is > only one ndo defined: > - for getting physical switch id is in place. > > Note that user can use random port netdevice to access the switch. > > Signed-off-by: Jiri Pirko <jiri@resnulli.us> > Reviewed-by: Thomas Graf <tgraf@suug.ch> Looks good -- thanks for replacing 'sw' with 'switch' Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> > --- > v2->v3: > -fixed documentation typo pointed out by M. Braun > -changed "sw" string to "switch" to avoid confusion > v1->v2: > -no change > --- > Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++ > MAINTAINERS | 7 ++++ > include/linux/netdevice.h | 10 ++++++ > include/net/switchdev.h | 30 +++++++++++++++++ > net/Kconfig | 1 + > net/Makefile | 3 ++ > net/switchdev/Kconfig | 13 ++++++++ > net/switchdev/Makefile | 5 +++ > net/switchdev/switchdev.c | 33 +++++++++++++++++++ > 9 files changed, 161 insertions(+) > create mode 100644 Documentation/networking/switchdev.txt > create mode 100644 include/net/switchdev.h > create mode 100644 net/switchdev/Kconfig > create mode 100644 net/switchdev/Makefile > create mode 100644 net/switchdev/switchdev.c > > diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt > new file mode 100644 > index 0000000..f981a92 > --- /dev/null > +++ b/Documentation/networking/switchdev.txt > @@ -0,0 +1,59 @@ > +Switch (and switch-ish) device drivers HOWTO > +=========================== > + > +Please note that the word "switch" is here used in very generic meaning. > +This include devices supporting L2/L3 but also various flow offloading chips, > +including switches embedded into SR-IOV NICs. > + > +Lets describe a topology a bit. Imagine the following example: > + > + +----------------------------+ +---------------+ > + | SOME switch chip | | CPU | > + +----------------------------+ +---------------+ > + port1 port2 port3 port4 MNGMNT | PCI-E | > + | | | | | +---------------+ > + PHY PHY | | | | NIC0 NIC1 > + | | | | | | > + | | +- PCI-E -+ | | > + | +------- MII -------+ | > + +------------- MII ------------+ > + > +In this example, there are two independent lines between the switch silicon > +and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are > +separate from the switch driver. SOME switch chip is by managed by a driver > +via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be > +connected to some other type of bus. > + > +Now, for the previous example show the representation in kernel: > + > + +----------------------------+ +---------------+ > + | SOME switch chip | | CPU | > + +----------------------------+ +---------------+ > + sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT | PCI-E | > + | | | | | +---------------+ > + PHY PHY | | | | eth0 eth1 > + | | | | | | > + | | +- PCI-E -+ | | > + | +------- MII -------+ | > + +------------- MII ------------+ > + > +Lets call the example switch driver for SOME switch chip "SOMEswitch". This > +driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX > +created for each port of a switch. These netdevices are instances > +of "SOMEswitch" driver. sw0pX netdevices serve as a "representation" > +of the switch chip. eth0 and eth1 are instances of some other existing driver. > + > +The only difference of the switch-port netdevice from the ordinary netdevice > +is that is implements couple more NDOs: > + > + ndo_switch_parent_id_get - This returns the same ID for two port netdevices > + of the same physical switch chip. This is > + mandatory to be implemented by all switch drivers > + and serves the caller for recognition of a port > + netdevice. > + ndo_switch_parent_* - Functions that serve for a manipulation of the switch > + chip itself (it can be though of as a "parent" of the > + port, therefore the name). They are not port-specific. > + Caller might use arbitrary port netdevice of the same > + switch and it will make no difference. > + ndo_switch_port_* - Functions that serve for a port-specific manipulation. > diff --git a/MAINTAINERS b/MAINTAINERS > index a545d68..05addb6 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -9058,6 +9058,13 @@ F: lib/swiotlb.c > F: arch/*/kernel/pci-swiotlb.c > F: include/linux/swiotlb.h > > +SWITCHDEV > +M: Jiri Pirko <jiri@resnulli.us> > +L: netdev@vger.kernel.org > +S: Supported > +F: net/switchdev/ > +F: include/net/switchdev.h > + > SYNOPSYS ARC ARCHITECTURE > M: Vineet Gupta <vgupta@synopsys.com> > S: Supported > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 5b491b3..ce096dc 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1018,6 +1018,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, > * performing GSO on a packet. The device returns true if it is > * able to GSO the packet, false otherwise. If the return value is > * false the stack will do software GSO. > + * > + * int (*ndo_switch_parent_id_get)(struct net_device *dev, > + * struct netdev_phys_item_id *psid); > + * Called to get an ID of the switch chip this port is part of. > + * If driver implements this, it indicates that it represents a port > + * of a switch chip. > */ > struct net_device_ops { > int (*ndo_init)(struct net_device *dev); > @@ -1171,6 +1177,10 @@ struct net_device_ops { > int (*ndo_get_lock_subclass)(struct net_device *dev); > bool (*ndo_gso_check) (struct sk_buff *skb, > struct net_device *dev); > +#ifdef CONFIG_NET_SWITCHDEV > + int (*ndo_switch_parent_id_get)(struct net_device *dev, > + struct netdev_phys_item_id *psid); > +#endif > }; > > /** > diff --git a/include/net/switchdev.h b/include/net/switchdev.h > new file mode 100644 > index 0000000..7a52360 > --- /dev/null > +++ b/include/net/switchdev.h > @@ -0,0 +1,30 @@ > +/* > + * include/net/switchdev.h - Switch device API > + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + */ > +#ifndef _LINUX_SWITCHDEV_H_ > +#define _LINUX_SWITCHDEV_H_ > + > +#include <linux/netdevice.h> > + > +#ifdef CONFIG_NET_SWITCHDEV > + > +int netdev_switch_parent_id_get(struct net_device *dev, > + struct netdev_phys_item_id *psid); > + > +#else > + > +static inline int netdev_switch_parent_id_get(struct net_device *dev, > + struct netdev_phys_item_id *psid) > +{ > + return -EOPNOTSUPP; > +} > + > +#endif > + > +#endif /* _LINUX_SWITCHDEV_H_ */ > diff --git a/net/Kconfig b/net/Kconfig > index 99815b5..ff9ffc1 100644 > --- a/net/Kconfig > +++ b/net/Kconfig > @@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig" > source "net/netlink/Kconfig" > source "net/mpls/Kconfig" > source "net/hsr/Kconfig" > +source "net/switchdev/Kconfig" > > config RPS > boolean > diff --git a/net/Makefile b/net/Makefile > index 7ed1970..95fc694 100644 > --- a/net/Makefile > +++ b/net/Makefile > @@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH) += openvswitch/ > obj-$(CONFIG_VSOCKETS) += vmw_vsock/ > obj-$(CONFIG_NET_MPLS_GSO) += mpls/ > obj-$(CONFIG_HSR) += hsr/ > +ifneq ($(CONFIG_NET_SWITCHDEV),) > +obj-y += switchdev/ > +endif > diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig > new file mode 100644 > index 0000000..1557545 > --- /dev/null > +++ b/net/switchdev/Kconfig > @@ -0,0 +1,13 @@ > +# > +# Configuration for Switch device support > +# > + > +config NET_SWITCHDEV > + boolean "Switch (and switch-ish) device support (EXPERIMENTAL)" > + depends on INET > + ---help--- > + This module provides glue between core networking code and device > + drivers in order to support hardware switch chips in very generic > + meaning of the word "switch". This include devices supporting L2/L3 but > + also various flow offloading chips, including switches embedded into > + SR-IOV NICs. > diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile > new file mode 100644 > index 0000000..5ed63ed > --- /dev/null > +++ b/net/switchdev/Makefile > @@ -0,0 +1,5 @@ > +# > +# Makefile for the Switch device API > +# > + > +obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o > diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > new file mode 100644 > index 0000000..66973de > --- /dev/null > +++ b/net/switchdev/switchdev.c > @@ -0,0 +1,33 @@ > +/* > + * net/switchdev/switchdev.c - Switch device API > + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + */ > + > +#include <linux/kernel.h> > +#include <linux/types.h> > +#include <linux/init.h> > +#include <linux/netdevice.h> > +#include <net/switchdev.h> > + > +/** > + * netdev_switch_parent_id_get - Get ID of a switch > + * @dev: port device > + * @psid: switch ID > + * > + * Get ID of a switch this port is part of. > + */ > +int netdev_switch_parent_id_get(struct net_device *dev, > + struct netdev_phys_item_id *psid) > +{ > + const struct net_device_ops *ops = dev->netdev_ops; > + > + if (!ops->ndo_switch_parent_id_get) > + return -EOPNOTSUPP; > + return ops->ndo_switch_parent_id_get(dev, psid); > +} > +EXPORT_SYMBOL(netdev_switch_parent_id_get); > -- > 1.9.3 > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/25/14 05:28, Jiri Pirko wrote: > The goal of this is to provide a possibility to support various switch > chips. Drivers should implement relevant ndos to do so. Now there is > only one ndo defined: > - for getting physical switch id is in place. > I am not sure switch id is the right term. I have a network processor that *does not* do switching. I am not sure if "chip" or "ASIC" or "offload_id" would be the right term. switch doesnt sound right. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/25/14, 2:28 AM, Jiri Pirko wrote: > The goal of this is to provide a possibility to support various switch > chips. Drivers should implement relevant ndos to do so. Now there is > only one ndo defined: > - for getting physical switch id is in place. > > Note that user can use random port netdevice to access the switch. > > Signed-off-by: Jiri Pirko <jiri@resnulli.us> > Reviewed-by: Thomas Graf <tgraf@suug.ch> > --- > v2->v3: > -fixed documentation typo pointed out by M. Braun > -changed "sw" string to "switch" to avoid confusion Still voting for something generic like "hw" or "offload" or "hw_offload" > v1->v2: > -no change > --- > Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++ > MAINTAINERS | 7 ++++ > include/linux/netdevice.h | 10 ++++++ > include/net/switchdev.h | 30 +++++++++++++++++ > net/Kconfig | 1 + > net/Makefile | 3 ++ > net/switchdev/Kconfig | 13 ++++++++ > net/switchdev/Makefile | 5 +++ > net/switchdev/switchdev.c | 33 +++++++++++++++++++ > 9 files changed, 161 insertions(+) > create mode 100644 Documentation/networking/switchdev.txt > create mode 100644 include/net/switchdev.h > create mode 100644 net/switchdev/Kconfig > create mode 100644 net/switchdev/Makefile > create mode 100644 net/switchdev/switchdev.c > > diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt > new file mode 100644 > index 0000000..f981a92 > --- /dev/null > +++ b/Documentation/networking/switchdev.txt > @@ -0,0 +1,59 @@ > +Switch (and switch-ish) device drivers HOWTO > +=========================== > + > +Please note that the word "switch" is here used in very generic meaning. > +This include devices supporting L2/L3 but also various flow offloading chips, > +including switches embedded into SR-IOV NICs. > + > +Lets describe a topology a bit. Imagine the following example: > + > + +----------------------------+ +---------------+ > + | SOME switch chip | | CPU | > + +----------------------------+ +---------------+ > + port1 port2 port3 port4 MNGMNT | PCI-E | > + | | | | | +---------------+ > + PHY PHY | | | | NIC0 NIC1 > + | | | | | | > + | | +- PCI-E -+ | | > + | +------- MII -------+ | > + +------------- MII ------------+ > + > +In this example, there are two independent lines between the switch silicon > +and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are > +separate from the switch driver. SOME switch chip is by managed by a driver > +via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be > +connected to some other type of bus. > + > +Now, for the previous example show the representation in kernel: > + > + +----------------------------+ +---------------+ > + | SOME switch chip | | CPU | > + +----------------------------+ +---------------+ > + sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT | PCI-E | > + | | | | | +---------------+ > + PHY PHY | | | | eth0 eth1 > + | | | | | | > + | | +- PCI-E -+ | | > + | +------- MII -------+ | > + +------------- MII ------------+ > + > +Lets call the example switch driver for SOME switch chip "SOMEswitch". This > +driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX > +created for each port of a switch. These netdevices are instances > +of "SOMEswitch" driver. sw0pX netdevices serve as a "representation" > +of the switch chip. eth0 and eth1 are instances of some other existing driver. > + > +The only difference of the switch-port netdevice from the ordinary netdevice > +is that is implements couple more NDOs: > + > + ndo_switch_parent_id_get - This returns the same ID for two port netdevices > + of the same physical switch chip. This is > + mandatory to be implemented by all switch drivers > + and serves the caller for recognition of a port > + netdevice. > + ndo_switch_parent_* - Functions that serve for a manipulation of the switch > + chip itself (it can be though of as a "parent" of the > + port, therefore the name). They are not port-specific. > + Caller might use arbitrary port netdevice of the same > + switch and it will make no difference. > + ndo_switch_port_* - Functions that serve for a port-specific manipulation. > diff --git a/MAINTAINERS b/MAINTAINERS > index a545d68..05addb6 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -9058,6 +9058,13 @@ F: lib/swiotlb.c > F: arch/*/kernel/pci-swiotlb.c > F: include/linux/swiotlb.h > > +SWITCHDEV > +M: Jiri Pirko <jiri@resnulli.us> > +L: netdev@vger.kernel.org > +S: Supported > +F: net/switchdev/ > +F: include/net/switchdev.h > + > SYNOPSYS ARC ARCHITECTURE > M: Vineet Gupta <vgupta@synopsys.com> > S: Supported > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 5b491b3..ce096dc 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1018,6 +1018,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, > * performing GSO on a packet. The device returns true if it is > * able to GSO the packet, false otherwise. If the return value is > * false the stack will do software GSO. > + * > + * int (*ndo_switch_parent_id_get)(struct net_device *dev, > + * struct netdev_phys_item_id *psid); > + * Called to get an ID of the switch chip this port is part of. > + * If driver implements this, it indicates that it represents a port > + * of a switch chip. > */ > struct net_device_ops { > int (*ndo_init)(struct net_device *dev); > @@ -1171,6 +1177,10 @@ struct net_device_ops { > int (*ndo_get_lock_subclass)(struct net_device *dev); > bool (*ndo_gso_check) (struct sk_buff *skb, > struct net_device *dev); > +#ifdef CONFIG_NET_SWITCHDEV > + int (*ndo_switch_parent_id_get)(struct net_device *dev, > + struct netdev_phys_item_id *psid); > +#endif > }; > > /** > diff --git a/include/net/switchdev.h b/include/net/switchdev.h > new file mode 100644 > index 0000000..7a52360 > --- /dev/null > +++ b/include/net/switchdev.h > @@ -0,0 +1,30 @@ > +/* > + * include/net/switchdev.h - Switch device API > + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + */ > +#ifndef _LINUX_SWITCHDEV_H_ > +#define _LINUX_SWITCHDEV_H_ > + > +#include <linux/netdevice.h> > + > +#ifdef CONFIG_NET_SWITCHDEV > + > +int netdev_switch_parent_id_get(struct net_device *dev, > + struct netdev_phys_item_id *psid); > + > +#else > + > +static inline int netdev_switch_parent_id_get(struct net_device *dev, > + struct netdev_phys_item_id *psid) > +{ > + return -EOPNOTSUPP; > +} > + > +#endif > + > +#endif /* _LINUX_SWITCHDEV_H_ */ > diff --git a/net/Kconfig b/net/Kconfig > index 99815b5..ff9ffc1 100644 > --- a/net/Kconfig > +++ b/net/Kconfig > @@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig" > source "net/netlink/Kconfig" > source "net/mpls/Kconfig" > source "net/hsr/Kconfig" > +source "net/switchdev/Kconfig" > > config RPS > boolean > diff --git a/net/Makefile b/net/Makefile > index 7ed1970..95fc694 100644 > --- a/net/Makefile > +++ b/net/Makefile > @@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH) += openvswitch/ > obj-$(CONFIG_VSOCKETS) += vmw_vsock/ > obj-$(CONFIG_NET_MPLS_GSO) += mpls/ > obj-$(CONFIG_HSR) += hsr/ > +ifneq ($(CONFIG_NET_SWITCHDEV),) > +obj-y += switchdev/ > +endif > diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig > new file mode 100644 > index 0000000..1557545 > --- /dev/null > +++ b/net/switchdev/Kconfig > @@ -0,0 +1,13 @@ > +# > +# Configuration for Switch device support > +# > + > +config NET_SWITCHDEV > + boolean "Switch (and switch-ish) device support (EXPERIMENTAL)" > + depends on INET > + ---help--- > + This module provides glue between core networking code and device > + drivers in order to support hardware switch chips in very generic > + meaning of the word "switch". This include devices supporting L2/L3 but > + also various flow offloading chips, including switches embedded into > + SR-IOV NICs. > diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile > new file mode 100644 > index 0000000..5ed63ed > --- /dev/null > +++ b/net/switchdev/Makefile > @@ -0,0 +1,5 @@ > +# > +# Makefile for the Switch device API > +# > + > +obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o > diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > new file mode 100644 > index 0000000..66973de > --- /dev/null > +++ b/net/switchdev/switchdev.c > @@ -0,0 +1,33 @@ > +/* > + * net/switchdev/switchdev.c - Switch device API > + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + */ > + > +#include <linux/kernel.h> > +#include <linux/types.h> > +#include <linux/init.h> > +#include <linux/netdevice.h> > +#include <net/switchdev.h> > + > +/** > + * netdev_switch_parent_id_get - Get ID of a switch > + * @dev: port device > + * @psid: switch ID > + * > + * Get ID of a switch this port is part of. > + */ > +int netdev_switch_parent_id_get(struct net_device *dev, > + struct netdev_phys_item_id *psid) > +{ > + const struct net_device_ops *ops = dev->netdev_ops; > + > + if (!ops->ndo_switch_parent_id_get) > + return -EOPNOTSUPP; > + return ops->ndo_switch_parent_id_get(dev, psid); > +} > +EXPORT_SYMBOL(netdev_switch_parent_id_get); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tue, Nov 25, 2014 at 04:51:03PM CET, jhs@mojatatu.com wrote: >On 11/25/14 05:28, Jiri Pirko wrote: >>The goal of this is to provide a possibility to support various switch >>chips. Drivers should implement relevant ndos to do so. Now there is >>only one ndo defined: >>- for getting physical switch id is in place. >> > >I am not sure switch id is the right term. I have a network processor >that *does not* do switching. I am not sure if "chip" or "ASIC" or What does it do? "L3 switching"? >"offload_id" would be the right term. switch doesnt sound right. When we talk about this area, we use word "switch". I know it is not accurate, but in my opinion it is the closest we can get. "chip" and "ASIC" are too generic I believe. I would not use "offload" cause it wan be easily mistaken with NIC offloads + it is alsno not accurate. > >cheers, >jamal > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tue, Nov 25, 2014 at 05:07:02PM CET, roopa@cumulusnetworks.com wrote: >On 11/25/14, 2:28 AM, Jiri Pirko wrote: >>The goal of this is to provide a possibility to support various switch >>chips. Drivers should implement relevant ndos to do so. Now there is >>only one ndo defined: >>- for getting physical switch id is in place. >> >>Note that user can use random port netdevice to access the switch. >> >>Signed-off-by: Jiri Pirko <jiri@resnulli.us> >>Reviewed-by: Thomas Graf <tgraf@suug.ch> >>--- >>v2->v3: >>-fixed documentation typo pointed out by M. Braun >>-changed "sw" string to "switch" to avoid confusion > >Still voting for something generic like "hw" or "offload" or "hw_offload" See my previous reply to Jamal. >>v1->v2: >>-no change >>--- >> Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++ >> MAINTAINERS | 7 ++++ >> include/linux/netdevice.h | 10 ++++++ >> include/net/switchdev.h | 30 +++++++++++++++++ >> net/Kconfig | 1 + >> net/Makefile | 3 ++ >> net/switchdev/Kconfig | 13 ++++++++ >> net/switchdev/Makefile | 5 +++ >> net/switchdev/switchdev.c | 33 +++++++++++++++++++ >> 9 files changed, 161 insertions(+) >> create mode 100644 Documentation/networking/switchdev.txt >> create mode 100644 include/net/switchdev.h >> create mode 100644 net/switchdev/Kconfig >> create mode 100644 net/switchdev/Makefile >> create mode 100644 net/switchdev/switchdev.c >> >>diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt >>new file mode 100644 >>index 0000000..f981a92 >>--- /dev/null >>+++ b/Documentation/networking/switchdev.txt >>@@ -0,0 +1,59 @@ >>+Switch (and switch-ish) device drivers HOWTO >>+=========================== >>+ >>+Please note that the word "switch" is here used in very generic meaning. >>+This include devices supporting L2/L3 but also various flow offloading chips, >>+including switches embedded into SR-IOV NICs. >>+ >>+Lets describe a topology a bit. Imagine the following example: >>+ >>+ +----------------------------+ +---------------+ >>+ | SOME switch chip | | CPU | >>+ +----------------------------+ +---------------+ >>+ port1 port2 port3 port4 MNGMNT | PCI-E | >>+ | | | | | +---------------+ >>+ PHY PHY | | | | NIC0 NIC1 >>+ | | | | | | >>+ | | +- PCI-E -+ | | >>+ | +------- MII -------+ | >>+ +------------- MII ------------+ >>+ >>+In this example, there are two independent lines between the switch silicon >>+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are >>+separate from the switch driver. SOME switch chip is by managed by a driver >>+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be >>+connected to some other type of bus. >>+ >>+Now, for the previous example show the representation in kernel: >>+ >>+ +----------------------------+ +---------------+ >>+ | SOME switch chip | | CPU | >>+ +----------------------------+ +---------------+ >>+ sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT | PCI-E | >>+ | | | | | +---------------+ >>+ PHY PHY | | | | eth0 eth1 >>+ | | | | | | >>+ | | +- PCI-E -+ | | >>+ | +------- MII -------+ | >>+ +------------- MII ------------+ >>+ >>+Lets call the example switch driver for SOME switch chip "SOMEswitch". This >>+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX >>+created for each port of a switch. These netdevices are instances >>+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation" >>+of the switch chip. eth0 and eth1 are instances of some other existing driver. >>+ >>+The only difference of the switch-port netdevice from the ordinary netdevice >>+is that is implements couple more NDOs: >>+ >>+ ndo_switch_parent_id_get - This returns the same ID for two port netdevices >>+ of the same physical switch chip. This is >>+ mandatory to be implemented by all switch drivers >>+ and serves the caller for recognition of a port >>+ netdevice. >>+ ndo_switch_parent_* - Functions that serve for a manipulation of the switch >>+ chip itself (it can be though of as a "parent" of the >>+ port, therefore the name). They are not port-specific. >>+ Caller might use arbitrary port netdevice of the same >>+ switch and it will make no difference. >>+ ndo_switch_port_* - Functions that serve for a port-specific manipulation. >>diff --git a/MAINTAINERS b/MAINTAINERS >>index a545d68..05addb6 100644 >>--- a/MAINTAINERS >>+++ b/MAINTAINERS >>@@ -9058,6 +9058,13 @@ F: lib/swiotlb.c >> F: arch/*/kernel/pci-swiotlb.c >> F: include/linux/swiotlb.h >>+SWITCHDEV >>+M: Jiri Pirko <jiri@resnulli.us> >>+L: netdev@vger.kernel.org >>+S: Supported >>+F: net/switchdev/ >>+F: include/net/switchdev.h >>+ >> SYNOPSYS ARC ARCHITECTURE >> M: Vineet Gupta <vgupta@synopsys.com> >> S: Supported >>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>index 5b491b3..ce096dc 100644 >>--- a/include/linux/netdevice.h >>+++ b/include/linux/netdevice.h >>@@ -1018,6 +1018,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, >> * performing GSO on a packet. The device returns true if it is >> * able to GSO the packet, false otherwise. If the return value is >> * false the stack will do software GSO. >>+ * >>+ * int (*ndo_switch_parent_id_get)(struct net_device *dev, >>+ * struct netdev_phys_item_id *psid); >>+ * Called to get an ID of the switch chip this port is part of. >>+ * If driver implements this, it indicates that it represents a port >>+ * of a switch chip. >> */ >> struct net_device_ops { >> int (*ndo_init)(struct net_device *dev); >>@@ -1171,6 +1177,10 @@ struct net_device_ops { >> int (*ndo_get_lock_subclass)(struct net_device *dev); >> bool (*ndo_gso_check) (struct sk_buff *skb, >> struct net_device *dev); >>+#ifdef CONFIG_NET_SWITCHDEV >>+ int (*ndo_switch_parent_id_get)(struct net_device *dev, >>+ struct netdev_phys_item_id *psid); >>+#endif >> }; >> /** >>diff --git a/include/net/switchdev.h b/include/net/switchdev.h >>new file mode 100644 >>index 0000000..7a52360 >>--- /dev/null >>+++ b/include/net/switchdev.h >>@@ -0,0 +1,30 @@ >>+/* >>+ * include/net/switchdev.h - Switch device API >>+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us> >>+ * >>+ * This program is free software; you can redistribute it and/or modify >>+ * it under the terms of the GNU General Public License as published by >>+ * the Free Software Foundation; either version 2 of the License, or >>+ * (at your option) any later version. >>+ */ >>+#ifndef _LINUX_SWITCHDEV_H_ >>+#define _LINUX_SWITCHDEV_H_ >>+ >>+#include <linux/netdevice.h> >>+ >>+#ifdef CONFIG_NET_SWITCHDEV >>+ >>+int netdev_switch_parent_id_get(struct net_device *dev, >>+ struct netdev_phys_item_id *psid); >>+ >>+#else >>+ >>+static inline int netdev_switch_parent_id_get(struct net_device *dev, >>+ struct netdev_phys_item_id *psid) >>+{ >>+ return -EOPNOTSUPP; >>+} >>+ >>+#endif >>+ >>+#endif /* _LINUX_SWITCHDEV_H_ */ >>diff --git a/net/Kconfig b/net/Kconfig >>index 99815b5..ff9ffc1 100644 >>--- a/net/Kconfig >>+++ b/net/Kconfig >>@@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig" >> source "net/netlink/Kconfig" >> source "net/mpls/Kconfig" >> source "net/hsr/Kconfig" >>+source "net/switchdev/Kconfig" >> config RPS >> boolean >>diff --git a/net/Makefile b/net/Makefile >>index 7ed1970..95fc694 100644 >>--- a/net/Makefile >>+++ b/net/Makefile >>@@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH) += openvswitch/ >> obj-$(CONFIG_VSOCKETS) += vmw_vsock/ >> obj-$(CONFIG_NET_MPLS_GSO) += mpls/ >> obj-$(CONFIG_HSR) += hsr/ >>+ifneq ($(CONFIG_NET_SWITCHDEV),) >>+obj-y += switchdev/ >>+endif >>diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig >>new file mode 100644 >>index 0000000..1557545 >>--- /dev/null >>+++ b/net/switchdev/Kconfig >>@@ -0,0 +1,13 @@ >>+# >>+# Configuration for Switch device support >>+# >>+ >>+config NET_SWITCHDEV >>+ boolean "Switch (and switch-ish) device support (EXPERIMENTAL)" >>+ depends on INET >>+ ---help--- >>+ This module provides glue between core networking code and device >>+ drivers in order to support hardware switch chips in very generic >>+ meaning of the word "switch". This include devices supporting L2/L3 but >>+ also various flow offloading chips, including switches embedded into >>+ SR-IOV NICs. >>diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile >>new file mode 100644 >>index 0000000..5ed63ed >>--- /dev/null >>+++ b/net/switchdev/Makefile >>@@ -0,0 +1,5 @@ >>+# >>+# Makefile for the Switch device API >>+# >>+ >>+obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o >>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c >>new file mode 100644 >>index 0000000..66973de >>--- /dev/null >>+++ b/net/switchdev/switchdev.c >>@@ -0,0 +1,33 @@ >>+/* >>+ * net/switchdev/switchdev.c - Switch device API >>+ * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us> >>+ * >>+ * This program is free software; you can redistribute it and/or modify >>+ * it under the terms of the GNU General Public License as published by >>+ * the Free Software Foundation; either version 2 of the License, or >>+ * (at your option) any later version. >>+ */ >>+ >>+#include <linux/kernel.h> >>+#include <linux/types.h> >>+#include <linux/init.h> >>+#include <linux/netdevice.h> >>+#include <net/switchdev.h> >>+ >>+/** >>+ * netdev_switch_parent_id_get - Get ID of a switch >>+ * @dev: port device >>+ * @psid: switch ID >>+ * >>+ * Get ID of a switch this port is part of. >>+ */ >>+int netdev_switch_parent_id_get(struct net_device *dev, >>+ struct netdev_phys_item_id *psid) >>+{ >>+ const struct net_device_ops *ops = dev->netdev_ops; >>+ >>+ if (!ops->ndo_switch_parent_id_get) >>+ return -EOPNOTSUPP; >>+ return ops->ndo_switch_parent_id_get(dev, psid); >>+} >>+EXPORT_SYMBOL(netdev_switch_parent_id_get); > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/25/14 11:49, Jiri Pirko wrote: > > What does it do? "L3 switching"? > Absolutely not - that is too easy;-> Why not just a mellanox chip for that? (Testing if Aviad is awake). But flows and associated constructs apply. >> "offload_id" would be the right term. switch doesnt sound right. > > When we talk about this area, we use word "switch". I know it is not > accurate, but in my opinion it is the closest we can get. "chip" and > "ASIC" are too generic I believe. I would not use "offload" cause it wan > be easily mistaken with NIC offloads + it is alsno not accurate. I think this interface is usable for example to offload to user space ala DPDK and friends just as it would be for ASICs or standard NIC offload (which we already have with fdb offload). I dont know what a good name is - but switch looks incorrect. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/25/14 at 12:08pm, Jamal Hadi Salim wrote: > On 11/25/14 11:49, Jiri Pirko wrote: > > > > >What does it do? "L3 switching"? > > > > Absolutely not - that is too easy;-> Why not just a mellanox > chip for that? (Testing if Aviad is awake). But flows and associated > constructs apply. It would definitely help if you could expose some more details on the "some network processor" you have. We're all very eager ;-) > I think this interface is usable for example to offload to user space > ala DPDK and friends just as it would be for ASICs or standard NIC > offload (which we already have with fdb offload). > I dont know what a good name is - but switch looks incorrect. I'm with Jiri but I agree it's not a perfect fit. I doubt there is but if you can come up with something that fits better I'm open to it. I considered "dataplane" or "dp" for a bit but it's quite generic as well. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/25/14 16:54, Thomas Graf wrote: > On 11/25/14 at 12:08pm, Jamal Hadi Salim wrote: > It would definitely help if you could expose some more details on the > "some network processor" you have. We're all very eager ;-) > Well, this thing doesnt run ovs ;-> (/me runs). If you come to netdev i may let you play with it ;-> Its a humongous device (think multi 100G ports). On a serious note: Even if you took what Simon/Netronome has (yes, I know they use ovs;->) - there is really no need for a switch abstraction *at all* if all you want to is hang a packet processing graph that ingresses at a port and egress at another port. As you know, Linux supports it just fine with tc. > I'm with Jiri but I agree it's not a perfect fit. I doubt there is but > if you can come up with something that fits better I'm open to it. > > I considered "dataplane" or "dp" for a bit but it's quite generic as > well. > The purpose is to offload. I think any name would be better than mapping it to a specific abstraction called "switch". Especially if it is hanging off a port and there is no switch in the pipeline. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 25, 2014 at 5:33 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 11/25/14 16:54, Thomas Graf wrote: >> >> On 11/25/14 at 12:08pm, Jamal Hadi Salim wrote: > > >> It would definitely help if you could expose some more details on the >> "some network processor" you have. We're all very eager ;-) >> > > Well, this thing doesnt run ovs ;-> (/me runs). If you come > to netdev i may let you play with it ;-> Its a humongous device > (think multi 100G ports). > > On a serious note: Even if you took what Simon/Netronome has > (yes, I know they use ovs;->) - there is really no need for a switch > abstraction *at all* if all you want to is hang a packet > processing graph that ingresses at a port and egress at another port. > As you know, Linux supports it just fine with tc. You have a pointer to the kernel driver for that HW? Can you show how you're using Linux tc netlink msg in kernel to program HW? I'd like to see the in-kernel API. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/25/14 23:18, Scott Feldman wrote: > On Tue, Nov 25, 2014 at 5:33 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > You have a pointer to the kernel driver for that HW? I wasnt sure if that was a passive aggressive move there to question what i am claiming?(Only Canadians are allowed to be passive aggressive Scott). To answer your question, no code currently littered with vendor SDK unfortunately (as you would know!). But hopefully if we get these changes in correctly it would not be hard to show the driver working fully in the kernel. There are definetely a few other pieces of hardware that are making me come back here and invest time and effort in these long discussions. > Can you show how > you're using Linux tc netlink msg in kernel to program HW? I'd like > to see the in-kernel API. > Lets do the L2/port thing first. But yes, I am using Linux tc in kernel. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/26/14 at 06:36am, Jamal Hadi Salim wrote: > On 11/25/14 23:18, Scott Feldman wrote: > >On Tue, Nov 25, 2014 at 5:33 PM, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > > > >You have a pointer to the kernel driver for that HW? > > I wasnt sure if that was a passive aggressive move there to > question what i am claiming?(Only Canadians are allowed to be > passive aggressive Scott). To answer your question, no > code currently littered with vendor SDK unfortunately (as you > would know!). > But hopefully if we get these changes in correctly it would > not be hard to show the driver working fully in the kernel. > There are definetely a few other pieces of hardware that are > making me come back here and invest time and effort in these > long discussions. > > >Can you show how > >you're using Linux tc netlink msg in kernel to program HW? I'd like > >to see the in-kernel API. > > > > Lets do the L2/port thing first. But yes, I am using Linux tc in > kernel. Jamal, What is irriating in this context is that you are pushing back on Jiri and others while referring to properitary and closed code which you are unwilling or unable to share. I don't see this as being passive aggressive, everybody is treated the same way in this regard. It is exactly the point of this API and related discussions to decouple the control plane (tc) from any vendor specifics while allowing them to innovate, compete, and solve different use cases. I think it's absolutely the right thing to write the API against code that is public, which in this case is rocker and the existing in-kernel NIC drivers. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/26/14 11:08, Thomas Graf wrote: > On 11/26/14 at 06:36am, Jamal Hadi Salim wrote: > > > Jamal, > > What is irriating in this context is that you are pushing back on > Jiri and others while referring to properitary and closed code which > you are unwilling or unable to share. I don't see this as being > passive aggressive, everybody is treated the same way in this regard. > WTF? I said i have hardware that is not a switch because it doesnt do switching. This all started with the name being "switch" which I objected to. You ask me to describe hardware and then you come back and say I am using that to stop progress? Where the hell did i push back on Jiri? Stop going around telling people i do. I invest my time and effort reviewing code, proposing ideas, posting etc calling meetings. Infact i initiated this whole effort to begin with. There is no point to responding to any of your other comments. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Wed, Nov 26, 2014 at 06:09:13PM CET, jhs@mojatatu.com wrote: >On 11/26/14 11:08, Thomas Graf wrote: >>On 11/26/14 at 06:36am, Jamal Hadi Salim wrote: >> >> >>Jamal, >> >>What is irriating in this context is that you are pushing back on >>Jiri and others while referring to properitary and closed code which >>you are unwilling or unable to share. I don't see this as being >>passive aggressive, everybody is treated the same way in this regard. >> > >WTF? I said i have hardware that is not a switch because it doesnt >do switching. This all started with the name being "switch" which >I objected to. You ask me to describe hardware and then you come >back and say I am using that to stop progress? Stay calm, I'm sure that this is just a misunderstanding. >Where the hell did i push back on Jiri? Stop going around >telling people i do. I invest my time and effort reviewing code, >proposing ideas, posting etc calling meetings. Infact i initiated >this whole effort to begin with. I thought I started this :) Anyway, I much appreciate your involvement in this Jamal with putting the meetings together and stuff, that's for sure. We need to join forces, not to fight with each other. > >There is no point to responding to any of your other comments. > >cheers, >jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/26/14 at 06:59pm, Jiri Pirko wrote: > Wed, Nov 26, 2014 at 06:09:13PM CET, jhs@mojatatu.com wrote: > >On 11/26/14 11:08, Thomas Graf wrote: > >>On 11/26/14 at 06:36am, Jamal Hadi Salim wrote: > >>What is irriating in this context is that you are pushing back on > >>Jiri and others while referring to properitary and closed code which > >>you are unwilling or unable to share. I don't see this as being > >>passive aggressive, everybody is treated the same way in this regard. > > > >WTF? I said i have hardware that is not a switch because it doesnt > >do switching. This all started with the name being "switch" which > >I objected to. You ask me to describe hardware and then you come > >back and say I am using that to stop progress? > > Stay calm, I'm sure that this is just a misunderstanding. > > >Where the hell did i push back on Jiri? Stop going around > >telling people i do. You are requesting a name change for a proprietary driver after confirming that you can't publish the code. We don't even know what the piece of hardware you refer to is capable of. We've always written driver facing APIs for the drivers that are *in* the kernel which in this case is rocker, modelled after OF-DPA, existing NIC drivers, and DSA drivers. I can live with the term switch, but if somebody can come up with a better name, cool. "Chip" or "ASIC" are probably not better choices though. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/26/14 16:50, Thomas Graf wrote: > You are requesting a name change for a proprietary driver after > confirming that you can't publish the code. We don't even know what > the piece of hardware you refer to is capable of. > I am not sure why there is such a misunderstanding. Here's the sequence of events. Jiri/Scott: We'll call this offload thing hanging off a port_ops a "switch". It does one or more of L2, L3 and flows. Jamal: I am not fond of that name because not everything that offloads off a port is a switch (some mention of fitting even with dpdk) Jiri: What do you have - an L3 "switch"? Jamal: No, it is something that does offloading of packet processing off a port with flows and action. Example a netronome would be a good fit (if you are to ignore Simon going for OVS). And then things get out of control. This has *nothing* to do with any driver or any code or anything speacilized. Not every packet processing offload hanging off ports is a switch (I dont think even the patch was claiming that although by now ive lost track of where it started). Yes, i cannot publish this code. You know that; Scott knows that and Jiri knows. (and thats why i thought it passive aggressive when Scott asked about the code when we are discussing a name change). The reason i am even involved in all this is so we can actually publish code and i can stop using proprietary SDK stuff. While i cant release the current code I want to share my experiences in trying to help make that API sane. Because i want to use it. I have been doing this offload shit for at least 15 years on Linux. I have something to say about it. Just throwing in some gauntlet when it serves some convinience and treating me like some guy who showed off the street making claims is bordering on the ridiculuos. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 25, 2014 at 10:33:36PM -0500, Jamal Hadi Salim wrote: > On 11/25/14 16:54, Thomas Graf wrote: > >On 11/25/14 at 12:08pm, Jamal Hadi Salim wrote: > > >It would definitely help if you could expose some more details on the > >"some network processor" you have. We're all very eager ;-) > > > > Well, this thing doesnt run ovs ;-> (/me runs). If you come > to netdev i may let you play with it ;-> Its a humongous device > (think multi 100G ports). > > On a serious note: Even if you took what Simon/Netronome has > (yes, I know they use ovs;->) FWIW, we are also interested in non-OVS use cases. > - there is really no need for a switch > abstraction *at all* if all you want to is hang a packet > processing graph that ingresses at a port and egress at another port. > As you know, Linux supports it just fine with tc. I may be missing the point but I see two problems that are solved by the switch abstraction. - Cases where no ports are configured. Perhaps no such use cases exist for the API in question. But it does seem plausible to me that non-physical ports could be added at run-time and that thus a "switch" could initially exist with no configured port. Something like how bridges initially have no ports (IIRC). - Discovering the association between ports and "switches". My recollection from the double round table discussion on the last day of the Düsseldorf sessions was that these were reasons that simply accessing any port belonging to the "switch" were not entirely satisfactory. > >I'm with Jiri but I agree it's not a perfect fit. I doubt there is but > >if you can come up with something that fits better I'm open to it. > > > >I considered "dataplane" or "dp" for a bit but it's quite generic as > >well. > > > > The purpose is to offload. I think any name would be better than > mapping it to a specific abstraction called "switch". Especially > if it is hanging off a port and there is no switch in the pipeline. > > cheers, > jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 26, 2014 at 1:36 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote: > On 11/25/14 23:18, Scott Feldman wrote: >> >> On Tue, Nov 25, 2014 at 5:33 PM, Jamal Hadi Salim <jhs@mojatatu.com> >> wrote: > > >> >> You have a pointer to the kernel driver for that HW? > > > I wasnt sure if that was a passive aggressive move there to > question what i am claiming?(Only Canadians are allowed to be > passive aggressive Scott). To answer your question, no > code currently littered with vendor SDK unfortunately (as you > would know!). Drats, I was hoping there might be Open Source here. I'm actually not familiar with Netronome offerings. I went to their web page and all their Docs downloads require registration, so I should have guessed same-old-same-old. But you teased us with it, so I thought I would ask. Sorry for the trouble XOXOXOXO. I'm not Canadian, as far as I know. > But hopefully if we get these changes in correctly it would > not be hard to show the driver working fully in the kernel. > There are definetely a few other pieces of hardware that are > making me come back here and invest time and effort in these > long discussions. You have access to the inside scope. We don't. Ok, I don't. We (think we) know what the traditional L2/L3 and OVS-style flow stuff looks like, but you know more, but you can't show us in code so it's frustrating. Not your fault. Just continue to guide us and give some disclaimer when we're your close to some proprietary knowledge, but it is relevant to the discussion. >> Can you show how >> you're using Linux tc netlink msg in kernel to program HW? I'd like >> to see the in-kernel API. >> > > Lets do the L2/port thing first. But yes, I am using Linux tc in > kernel. > > cheers, > jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/26/14 22:13, Simon Horman wrote: > On Tue, Nov 25, 2014 at 10:33:36PM -0500, Jamal Hadi Salim wrote: [..] > I may be missing the point but I see two problems that are solved by > the switch abstraction. > > - Cases where no ports are configured. > > Perhaps no such use cases exist for the API in question. > But it does seem plausible to me that non-physical ports could > be added at run-time and that thus a "switch" could initially > exist with no configured port. Something like how bridges > initially have no ports (IIRC). > > - Discovering the association between ports and "switches". > > My recollection from the double round table discussion on the last day of > the Düsseldorf sessions was that these were reasons that simply accessing > any port belonging to the "switch" were not entirely satisfactory. > So in Du I illustrated in a slide the internals of the Realtek that Ben had patches on. Ben first exposes the realtek ports and when you wish you can build a bridge and attach the exposed ports and then hardware switching functionality is used. What is interesting about it is infact you didnt need to use the switching on it. You could attach a filter to any of the exposed ports, then specify an action to do a redirect to another port for example. (Scott i know you were not there, but i cant find where those slides are posted; will send them when i do - or ask Thomas). This is very easy to map to port/ingress classifier/actions in Linux. I was hoping i could produce a patch to do this - but waiting on Ben to complete the reverse engineering. In any case the realtek is a toy example but there's millions deployed and producing a patch for tc (if Jiri doesnt beat me to it) is a useful exercise. My devices (as would a netronome) would apply the same concept. Essentially, you take an ingress packet arriving on a port, you apply a classifier to it, apply actions to i and eventually ingress it to a port. i.e Ingress packet-->port->classifier-->...actions..->egress port I can model the above with tc. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/27/14 00:58, Scott Feldman wrote: > You have access to the inside scope. We don't. Ok, I don't. We > (think we) know what the traditional L2/L3 and OVS-style flow stuff > looks like, but you know more, but you can't show us in code so it's > frustrating. Not your fault. Just continue to guide us and give some > disclaimer when we're your close to some proprietary knowledge, but it > is relevant to the discussion. > Scott, I am asking to offload basic functionality that Linux supports. I may be blind-sided and getting frustrated thinking it is obvious because i live through this stuff everyday; but I am trying all i can to share what you call proprietary knowledge whenever i can. If you think of this as "we need to offload all packet processing linux supports" you'll see where i am coming from. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/26/14 at 06:32pm, Jamal Hadi Salim wrote: > Jiri/Scott: We'll call this offload thing hanging off a port_ops > a "switch". It does one or more of L2, L3 and flows. > Jamal: I am not fond of that name because not everything that offloads > off a port is a switch (some mention of fitting even with dpdk) > Jiri: What do you have - an L3 "switch"? > Jamal: No, it is something that does offloading of packet processing off > a port with flows and action. Example a netronome would be a good fit (if > you are to ignore Simon going for OVS). So what is your name suggestion? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/27/14 08:03, Thomas Graf wrote: > So what is your name suggestion? > I would have gone for _offload_ either as a prefix or suffix somewhere. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thu, Nov 27, 2014 at 02:32:32PM CET, jhs@mojatatu.com wrote: >On 11/27/14 08:03, Thomas Graf wrote: > >>So what is your name suggestion? >> > >I would have gone for _offload_ either as a prefix or suffix >somewhere. $ git grep offload net Wouldn't it be confusing to add this another different "offload". That's just confusing. I still like "switch" the best. If it passes packets around, it's a "switch", +-. Everybody understand what's going on if you use "switch". If you use "offload", everybody is confused... > >cheers, >jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/27/14 08:50, Jiri Pirko wrote: > $ git grep offload net > Wouldn't it be confusing to add this another different "offload". That's > just confusing. > > I still like "switch" the best. If it passes packets around, it's a > "switch", +-. Everybody understand what's going on if you use "switch". > If you use "offload", everybody is confused... > Those are all *legitimate offloads* ;-> The macvlan one looks a little creepy. Perhaps we could eventually merge all that stuff together with this effort. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt new file mode 100644 index 0000000..f981a92 --- /dev/null +++ b/Documentation/networking/switchdev.txt @@ -0,0 +1,59 @@ +Switch (and switch-ish) device drivers HOWTO +=========================== + +Please note that the word "switch" is here used in very generic meaning. +This include devices supporting L2/L3 but also various flow offloading chips, +including switches embedded into SR-IOV NICs. + +Lets describe a topology a bit. Imagine the following example: + + +----------------------------+ +---------------+ + | SOME switch chip | | CPU | + +----------------------------+ +---------------+ + port1 port2 port3 port4 MNGMNT | PCI-E | + | | | | | +---------------+ + PHY PHY | | | | NIC0 NIC1 + | | | | | | + | | +- PCI-E -+ | | + | +------- MII -------+ | + +------------- MII ------------+ + +In this example, there are two independent lines between the switch silicon +and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are +separate from the switch driver. SOME switch chip is by managed by a driver +via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be +connected to some other type of bus. + +Now, for the previous example show the representation in kernel: + + +----------------------------+ +---------------+ + | SOME switch chip | | CPU | + +----------------------------+ +---------------+ + sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT | PCI-E | + | | | | | +---------------+ + PHY PHY | | | | eth0 eth1 + | | | | | | + | | +- PCI-E -+ | | + | +------- MII -------+ | + +------------- MII ------------+ + +Lets call the example switch driver for SOME switch chip "SOMEswitch". This +driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX +created for each port of a switch. These netdevices are instances +of "SOMEswitch" driver. sw0pX netdevices serve as a "representation" +of the switch chip. eth0 and eth1 are instances of some other existing driver. + +The only difference of the switch-port netdevice from the ordinary netdevice +is that is implements couple more NDOs: + + ndo_switch_parent_id_get - This returns the same ID for two port netdevices + of the same physical switch chip. This is + mandatory to be implemented by all switch drivers + and serves the caller for recognition of a port + netdevice. + ndo_switch_parent_* - Functions that serve for a manipulation of the switch + chip itself (it can be though of as a "parent" of the + port, therefore the name). They are not port-specific. + Caller might use arbitrary port netdevice of the same + switch and it will make no difference. + ndo_switch_port_* - Functions that serve for a port-specific manipulation. diff --git a/MAINTAINERS b/MAINTAINERS index a545d68..05addb6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9058,6 +9058,13 @@ F: lib/swiotlb.c F: arch/*/kernel/pci-swiotlb.c F: include/linux/swiotlb.h +SWITCHDEV +M: Jiri Pirko <jiri@resnulli.us> +L: netdev@vger.kernel.org +S: Supported +F: net/switchdev/ +F: include/net/switchdev.h + SYNOPSYS ARC ARCHITECTURE M: Vineet Gupta <vgupta@synopsys.com> S: Supported diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5b491b3..ce096dc 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1018,6 +1018,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, * performing GSO on a packet. The device returns true if it is * able to GSO the packet, false otherwise. If the return value is * false the stack will do software GSO. + * + * int (*ndo_switch_parent_id_get)(struct net_device *dev, + * struct netdev_phys_item_id *psid); + * Called to get an ID of the switch chip this port is part of. + * If driver implements this, it indicates that it represents a port + * of a switch chip. */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1171,6 +1177,10 @@ struct net_device_ops { int (*ndo_get_lock_subclass)(struct net_device *dev); bool (*ndo_gso_check) (struct sk_buff *skb, struct net_device *dev); +#ifdef CONFIG_NET_SWITCHDEV + int (*ndo_switch_parent_id_get)(struct net_device *dev, + struct netdev_phys_item_id *psid); +#endif }; /** diff --git a/include/net/switchdev.h b/include/net/switchdev.h new file mode 100644 index 0000000..7a52360 --- /dev/null +++ b/include/net/switchdev.h @@ -0,0 +1,30 @@ +/* + * include/net/switchdev.h - Switch device API + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ +#ifndef _LINUX_SWITCHDEV_H_ +#define _LINUX_SWITCHDEV_H_ + +#include <linux/netdevice.h> + +#ifdef CONFIG_NET_SWITCHDEV + +int netdev_switch_parent_id_get(struct net_device *dev, + struct netdev_phys_item_id *psid); + +#else + +static inline int netdev_switch_parent_id_get(struct net_device *dev, + struct netdev_phys_item_id *psid) +{ + return -EOPNOTSUPP; +} + +#endif + +#endif /* _LINUX_SWITCHDEV_H_ */ diff --git a/net/Kconfig b/net/Kconfig index 99815b5..ff9ffc1 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -228,6 +228,7 @@ source "net/vmw_vsock/Kconfig" source "net/netlink/Kconfig" source "net/mpls/Kconfig" source "net/hsr/Kconfig" +source "net/switchdev/Kconfig" config RPS boolean diff --git a/net/Makefile b/net/Makefile index 7ed1970..95fc694 100644 --- a/net/Makefile +++ b/net/Makefile @@ -73,3 +73,6 @@ obj-$(CONFIG_OPENVSWITCH) += openvswitch/ obj-$(CONFIG_VSOCKETS) += vmw_vsock/ obj-$(CONFIG_NET_MPLS_GSO) += mpls/ obj-$(CONFIG_HSR) += hsr/ +ifneq ($(CONFIG_NET_SWITCHDEV),) +obj-y += switchdev/ +endif diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig new file mode 100644 index 0000000..1557545 --- /dev/null +++ b/net/switchdev/Kconfig @@ -0,0 +1,13 @@ +# +# Configuration for Switch device support +# + +config NET_SWITCHDEV + boolean "Switch (and switch-ish) device support (EXPERIMENTAL)" + depends on INET + ---help--- + This module provides glue between core networking code and device + drivers in order to support hardware switch chips in very generic + meaning of the word "switch". This include devices supporting L2/L3 but + also various flow offloading chips, including switches embedded into + SR-IOV NICs. diff --git a/net/switchdev/Makefile b/net/switchdev/Makefile new file mode 100644 index 0000000..5ed63ed --- /dev/null +++ b/net/switchdev/Makefile @@ -0,0 +1,5 @@ +# +# Makefile for the Switch device API +# + +obj-$(CONFIG_NET_SWITCHDEV) += switchdev.o diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c new file mode 100644 index 0000000..66973de --- /dev/null +++ b/net/switchdev/switchdev.c @@ -0,0 +1,33 @@ +/* + * net/switchdev/switchdev.c - Switch device API + * Copyright (c) 2014 Jiri Pirko <jiri@resnulli.us> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include <linux/kernel.h> +#include <linux/types.h> +#include <linux/init.h> +#include <linux/netdevice.h> +#include <net/switchdev.h> + +/** + * netdev_switch_parent_id_get - Get ID of a switch + * @dev: port device + * @psid: switch ID + * + * Get ID of a switch this port is part of. + */ +int netdev_switch_parent_id_get(struct net_device *dev, + struct netdev_phys_item_id *psid) +{ + const struct net_device_ops *ops = dev->netdev_ops; + + if (!ops->ndo_switch_parent_id_get) + return -EOPNOTSUPP; + return ops->ndo_switch_parent_id_get(dev, psid); +} +EXPORT_SYMBOL(netdev_switch_parent_id_get);