diff mbox

[net-next] rocker: forward packets to CPU when port is joined to openvswitch

Message ID 1437010754-29038-1-git-send-email-simon.horman@netronome.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Simon Horman July 16, 2015, 1:39 a.m. UTC
Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
There is scope to later refine what is passed up as per Open vSwitch flows
on a port.

This does not change the behaviour of rocker ports that are
not joined to Open vSwitch.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 drivers/net/ethernet/rocker/rocker.c | 62 +++++++++++++++++++++++++++++-------
 1 file changed, 50 insertions(+), 12 deletions(-)

Comments

Scott Feldman July 16, 2015, 6:40 a.m. UTC | #1
On Wed, Jul 15, 2015 at 6:39 PM, Simon Horman
<simon.horman@netronome.com> wrote:
> Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
> There is scope to later refine what is passed up as per Open vSwitch flows
> on a port.
>
> This does not change the behaviour of rocker ports that are
> not joined to Open vSwitch.
>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>

Acked-by: Scott Feldman <sfeldma@gmail.com>

Now, OVS flows on a port.  Strange enough, that was the first RFC
implementation for switchdev/rocker where we hooked into ovs-kernel
module and programmed flows into hw.  We pulled all of that code
because, IIRC, the ovs folks didn't want us hooking into the kernel
module directly.  We dropped the ovs hooks and focused on hooking
kernel's L2/L3.  The device (rocker) didn't really change: OF-DPA
pipeline was used for both.  Might be interesting to try hooking it
again.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko July 16, 2015, 6:58 a.m. UTC | #2
Thu, Jul 16, 2015 at 08:40:31AM CEST, sfeldma@gmail.com wrote:
>On Wed, Jul 15, 2015 at 6:39 PM, Simon Horman
><simon.horman@netronome.com> wrote:
>> Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
>> There is scope to later refine what is passed up as per Open vSwitch flows
>> on a port.
>>
>> This does not change the behaviour of rocker ports that are
>> not joined to Open vSwitch.
>>
>> Signed-off-by: Simon Horman <simon.horman@netronome.com>
>
>Acked-by: Scott Feldman <sfeldma@gmail.com>
>
>Now, OVS flows on a port.  Strange enough, that was the first RFC
>implementation for switchdev/rocker where we hooked into ovs-kernel
>module and programmed flows into hw.  We pulled all of that code
>because, IIRC, the ovs folks didn't want us hooking into the kernel
>module directly.  We dropped the ovs hooks and focused on hooking
>kernel's L2/L3.  The device (rocker) didn't really change: OF-DPA
>pipeline was used for both.  Might be interesting to try hooking it
>again.


I think that now we have an infrastructure prepared for that. I mean,
what we need to do is to introduce another generic switchdev object
called "ntupleflow" and hook-up again into ovs datapath and cls_flower
and insert/remove the object from those codes. Should be pretty easy to do.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Feldman July 16, 2015, 7:09 a.m. UTC | #3
On Wed, Jul 15, 2015 at 11:58 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, Jul 16, 2015 at 08:40:31AM CEST, sfeldma@gmail.com wrote:
>>On Wed, Jul 15, 2015 at 6:39 PM, Simon Horman
>><simon.horman@netronome.com> wrote:
>>> Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
>>> There is scope to later refine what is passed up as per Open vSwitch flows
>>> on a port.
>>>
>>> This does not change the behaviour of rocker ports that are
>>> not joined to Open vSwitch.
>>>
>>> Signed-off-by: Simon Horman <simon.horman@netronome.com>
>>
>>Acked-by: Scott Feldman <sfeldma@gmail.com>
>>
>>Now, OVS flows on a port.  Strange enough, that was the first RFC
>>implementation for switchdev/rocker where we hooked into ovs-kernel
>>module and programmed flows into hw.  We pulled all of that code
>>because, IIRC, the ovs folks didn't want us hooking into the kernel
>>module directly.  We dropped the ovs hooks and focused on hooking
>>kernel's L2/L3.  The device (rocker) didn't really change: OF-DPA
>>pipeline was used for both.  Might be interesting to try hooking it
>>again.
>
>
> I think that now we have an infrastructure prepared for that. I mean,
> what we need to do is to introduce another generic switchdev object
> called "ntupleflow" and hook-up again into ovs datapath and cls_flower
> and insert/remove the object from those codes. Should be pretty easy to do.

That sounds right.  Is the ovs datapath hooking still happening in the
ovs-kernel module?  Remind me again, what was the objection the last
time we tried that?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko July 16, 2015, 8:14 a.m. UTC | #4
Thu, Jul 16, 2015 at 09:09:39AM CEST, sfeldma@gmail.com wrote:
>On Wed, Jul 15, 2015 at 11:58 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Thu, Jul 16, 2015 at 08:40:31AM CEST, sfeldma@gmail.com wrote:
>>>On Wed, Jul 15, 2015 at 6:39 PM, Simon Horman
>>><simon.horman@netronome.com> wrote:
>>>> Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
>>>> There is scope to later refine what is passed up as per Open vSwitch flows
>>>> on a port.
>>>>
>>>> This does not change the behaviour of rocker ports that are
>>>> not joined to Open vSwitch.
>>>>
>>>> Signed-off-by: Simon Horman <simon.horman@netronome.com>
>>>
>>>Acked-by: Scott Feldman <sfeldma@gmail.com>
>>>
>>>Now, OVS flows on a port.  Strange enough, that was the first RFC
>>>implementation for switchdev/rocker where we hooked into ovs-kernel
>>>module and programmed flows into hw.  We pulled all of that code
>>>because, IIRC, the ovs folks didn't want us hooking into the kernel
>>>module directly.  We dropped the ovs hooks and focused on hooking
>>>kernel's L2/L3.  The device (rocker) didn't really change: OF-DPA
>>>pipeline was used for both.  Might be interesting to try hooking it
>>>again.
>>
>>
>> I think that now we have an infrastructure prepared for that. I mean,
>> what we need to do is to introduce another generic switchdev object
>> called "ntupleflow" and hook-up again into ovs datapath and cls_flower
>> and insert/remove the object from those codes. Should be pretty easy to do.
>
>That sounds right.  Is the ovs datapath hooking still happening in the
>ovs-kernel module?  Remind me again, what was the objection the last
>time we tried that?

Yep, we need to hook there. Otherwise it won't be transparent.

Last time the objection was that this would be ovs specific. But that is
passed today. We have switchdev infra with objects, we have cls_flower
which would use the same object. I say let's do this now.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Fastabend July 16, 2015, 2:41 p.m. UTC | #5
On 15-07-16 01:14 AM, Jiri Pirko wrote:
> Thu, Jul 16, 2015 at 09:09:39AM CEST, sfeldma@gmail.com wrote:
>> On Wed, Jul 15, 2015 at 11:58 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Thu, Jul 16, 2015 at 08:40:31AM CEST, sfeldma@gmail.com wrote:
>>>> On Wed, Jul 15, 2015 at 6:39 PM, Simon Horman
>>>> <simon.horman@netronome.com> wrote:
>>>>> Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
>>>>> There is scope to later refine what is passed up as per Open vSwitch flows
>>>>> on a port.
>>>>>
>>>>> This does not change the behaviour of rocker ports that are
>>>>> not joined to Open vSwitch.
>>>>>
>>>>> Signed-off-by: Simon Horman <simon.horman@netronome.com>
>>>>
>>>> Acked-by: Scott Feldman <sfeldma@gmail.com>
>>>>
>>>> Now, OVS flows on a port.  Strange enough, that was the first RFC
>>>> implementation for switchdev/rocker where we hooked into ovs-kernel
>>>> module and programmed flows into hw.  We pulled all of that code
>>>> because, IIRC, the ovs folks didn't want us hooking into the kernel
>>>> module directly.  We dropped the ovs hooks and focused on hooking
>>>> kernel's L2/L3.  The device (rocker) didn't really change: OF-DPA
>>>> pipeline was used for both.  Might be interesting to try hooking it
>>>> again.
>>>
>>>
>>> I think that now we have an infrastructure prepared for that. I mean,
>>> what we need to do is to introduce another generic switchdev object
>>> called "ntupleflow" and hook-up again into ovs datapath and cls_flower
>>> and insert/remove the object from those codes. Should be pretty easy to do.
>>
>> That sounds right.  Is the ovs datapath hooking still happening in the
>> ovs-kernel module?  Remind me again, what was the objection the last
>> time we tried that?
> 
> Yep, we need to hook there. Otherwise it won't be transparent.
> 
> Last time the objection was that this would be ovs specific. But that is
> passed today. We have switchdev infra with objects, we have cls_flower
> which would use the same object. I say let's do this now.
> 

My objection wasn't that it was OVS specific but based on two
observations. First the user-kernel interface for OVS would need
to changed to optimally use hardware and then userspace would need
to be changed to pack rules optimally for hardware. The reason is
hardware has wildcards _and_ priority fields typically. This is a
different structure than we would want to use in software. Maybe
there is value in having a sub-optimal 'transparent' implementation
though. Note I can't see how you can possibly reverse engineer this
from what the kernel gets from userspace today and build out an
optimal solution.

Second I was hoping to use the interface as a "better" ethtool flow
classifier with a control plane in user space that controllers in the
network could interface with. In this mode I'm not running on a TOR but
at the edge. In this case I want to do some pre-processing of packets
before sending them up to the kernel to complete processing. Examples
like partial completion of classification and rule chaining where I
implement some rules in software and others in hardware. Perhaps this is
not OVS and I should just write a better ethtool flow classifier. But
with a 'bit' similar to how we do L2 I would get this.

All that said seeing a switchdev infra object could be interesting.

.John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Gross July 17, 2015, 11:33 p.m. UTC | #6
On Thu, Jul 16, 2015 at 7:41 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 15-07-16 01:14 AM, Jiri Pirko wrote:
>> Thu, Jul 16, 2015 at 09:09:39AM CEST, sfeldma@gmail.com wrote:
>>> On Wed, Jul 15, 2015 at 11:58 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> Thu, Jul 16, 2015 at 08:40:31AM CEST, sfeldma@gmail.com wrote:
>>>>> On Wed, Jul 15, 2015 at 6:39 PM, Simon Horman
>>>>> <simon.horman@netronome.com> wrote:
>>>>>> Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
>>>>>> There is scope to later refine what is passed up as per Open vSwitch flows
>>>>>> on a port.
>>>>>>
>>>>>> This does not change the behaviour of rocker ports that are
>>>>>> not joined to Open vSwitch.
>>>>>>
>>>>>> Signed-off-by: Simon Horman <simon.horman@netronome.com>
>>>>>
>>>>> Acked-by: Scott Feldman <sfeldma@gmail.com>
>>>>>
>>>>> Now, OVS flows on a port.  Strange enough, that was the first RFC
>>>>> implementation for switchdev/rocker where we hooked into ovs-kernel
>>>>> module and programmed flows into hw.  We pulled all of that code
>>>>> because, IIRC, the ovs folks didn't want us hooking into the kernel
>>>>> module directly.  We dropped the ovs hooks and focused on hooking
>>>>> kernel's L2/L3.  The device (rocker) didn't really change: OF-DPA
>>>>> pipeline was used for both.  Might be interesting to try hooking it
>>>>> again.
>>>>
>>>>
>>>> I think that now we have an infrastructure prepared for that. I mean,
>>>> what we need to do is to introduce another generic switchdev object
>>>> called "ntupleflow" and hook-up again into ovs datapath and cls_flower
>>>> and insert/remove the object from those codes. Should be pretty easy to do.
>>>
>>> That sounds right.  Is the ovs datapath hooking still happening in the
>>> ovs-kernel module?  Remind me again, what was the objection the last
>>> time we tried that?
>>
>> Yep, we need to hook there. Otherwise it won't be transparent.
>>
>> Last time the objection was that this would be ovs specific. But that is
>> passed today. We have switchdev infra with objects, we have cls_flower
>> which would use the same object. I say let's do this now.
>>
>
> My objection wasn't that it was OVS specific but based on two
> observations. First the user-kernel interface for OVS would need
> to changed to optimally use hardware and then userspace would need
> to be changed to pack rules optimally for hardware. The reason is
> hardware has wildcards _and_ priority fields typically. This is a
> different structure than we would want to use in software. Maybe
> there is value in having a sub-optimal 'transparent' implementation
> though. Note I can't see how you can possibly reverse engineer this
> from what the kernel gets from userspace today and build out an
> optimal solution.

Yes, this was the main concern. Furthermore, things are likely to get
worse rather than better on this front (i.e. if/when OVS starts using
a more general BPF engine rather than its own flow processor).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 21, 2015, 1:26 a.m. UTC | #7
From: Simon Horman <simon.horman@netronome.com>
Date: Thu, 16 Jul 2015 10:39:14 +0900

> Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
> There is scope to later refine what is passed up as per Open vSwitch flows
> on a port.
> 
> This does not change the behaviour of rocker ports that are
> not joined to Open vSwitch.
> 
> Signed-off-by: Simon Horman <simon.horman@netronome.com>

Applied, thanks Simon.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index c0051673c9fa..0c8e7ceb4205 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -202,6 +202,7 @@  enum {
 	ROCKER_CTRL_IPV4_MCAST,
 	ROCKER_CTRL_IPV6_MCAST,
 	ROCKER_CTRL_DFLT_BRIDGING,
+	ROCKER_CTRL_DFLT_OVS,
 	ROCKER_CTRL_MAX,
 };
 
@@ -321,9 +322,21 @@  static u16 rocker_port_vlan_to_vid(const struct rocker_port *rocker_port,
 	return ntohs(vlan_id);
 }
 
+static bool rocker_port_is_slave(const struct rocker_port *rocker_port,
+				   const char *kind)
+{
+	return rocker_port->bridge_dev &&
+		!strcmp(rocker_port->bridge_dev->rtnl_link_ops->kind, kind);
+}
+
 static bool rocker_port_is_bridged(const struct rocker_port *rocker_port)
 {
-	return !!rocker_port->bridge_dev;
+	return rocker_port_is_slave(rocker_port, "bridge");
+}
+
+static bool rocker_port_is_ovsed(const struct rocker_port *rocker_port)
+{
+	return rocker_port_is_slave(rocker_port, "openvswitch");
 }
 
 #define ROCKER_OP_FLAG_REMOVE		BIT(0)
@@ -3275,6 +3288,12 @@  static struct rocker_ctrl {
 		.bridge = true,
 		.copy_to_cpu = true,
 	},
+	[ROCKER_CTRL_DFLT_OVS] = {
+		/* pass all pkts up to CPU */
+		.eth_dst = zero_mac,
+		.eth_dst_mask = zero_mac,
+		.acl = true,
+	},
 };
 
 static int rocker_port_ctrl_vlan_acl(struct rocker_port *rocker_port,
@@ -3787,11 +3806,14 @@  static int rocker_port_stp_update(struct rocker_port *rocker_port,
 		break;
 	case BR_STATE_LEARNING:
 	case BR_STATE_FORWARDING:
-		want[ROCKER_CTRL_LINK_LOCAL_MCAST] = true;
+		if (!rocker_port_is_ovsed(rocker_port))
+			want[ROCKER_CTRL_LINK_LOCAL_MCAST] = true;
 		want[ROCKER_CTRL_IPV4_MCAST] = true;
 		want[ROCKER_CTRL_IPV6_MCAST] = true;
 		if (rocker_port_is_bridged(rocker_port))
 			want[ROCKER_CTRL_DFLT_BRIDGING] = true;
+		else if (rocker_port_is_ovsed(rocker_port))
+			want[ROCKER_CTRL_DFLT_OVS] = true;
 		else
 			want[ROCKER_CTRL_LOCAL_ARP] = true;
 		break;
@@ -5251,23 +5273,39 @@  static int rocker_port_bridge_leave(struct rocker_port *rocker_port)
 	return err;
 }
 
+
+static int rocker_port_ovs_changed(struct rocker_port *rocker_port,
+				   struct net_device *master)
+{
+	int err;
+
+	rocker_port->bridge_dev = master;
+
+	err = rocker_port_fwd_disable(rocker_port, SWITCHDEV_TRANS_NONE, 0);
+	if (err)
+		return err;
+	err = rocker_port_fwd_enable(rocker_port, SWITCHDEV_TRANS_NONE, 0);
+
+	return err;
+}
+
 static int rocker_port_master_changed(struct net_device *dev)
 {
 	struct rocker_port *rocker_port = netdev_priv(dev);
 	struct net_device *master = netdev_master_upper_dev_get(dev);
 	int err = 0;
 
-	/* There are currently three cases handled here:
-	 * 1. Joining a bridge
-	 * 2. Leaving a previously joined bridge
-	 * 3. Other, e.g. being added to or removed from a bond or openvswitch,
-	 *    in which case nothing is done
-	 */
-	if (master && master->rtnl_link_ops &&
-	    !strcmp(master->rtnl_link_ops->kind, "bridge"))
-		err = rocker_port_bridge_join(rocker_port, master);
-	else if (rocker_port_is_bridged(rocker_port))
+	/* N.B: Do nothing if the type of master is not supported */
+	if (master && master->rtnl_link_ops) {
+		if (!strcmp(master->rtnl_link_ops->kind, "bridge"))
+			err = rocker_port_bridge_join(rocker_port, master);
+		else if (!strcmp(master->rtnl_link_ops->kind, "openvswitch"))
+			err = rocker_port_ovs_changed(rocker_port, master);
+	} else if (rocker_port_is_bridged(rocker_port)) {
 		err = rocker_port_bridge_leave(rocker_port);
+	} else if (rocker_port_is_ovsed(rocker_port)) {
+		err = rocker_port_ovs_changed(rocker_port, NULL);
+	}
 
 	return err;
 }