Patchwork [1/2] net: Add layer 2 hardware acceleration operations for macvlan devices

login
register
mail settings
Submitter Neil Horman
Date Oct. 4, 2013, 8:10 p.m.
Message ID <1380917405-23801-2-git-send-email-nhorman@tuxdriver.com>
Download mbox | patch
Permalink /patch/280729/
State RFC
Delegated to: David Miller
Headers show

Comments

Neil Horman - Oct. 4, 2013, 8:10 p.m.
Add a operations structure that allows a network interface to export the fact
that it supports package forwarding in hardware between physical interfaces and
other mac layer devices assigned to it (such as macvlans).  this operaions
structure can be used by virtual mac devices to bypass software switching so
that forwarding can be done in hardware more efficiently.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: John Fastabend <john.r.fastabend@intel.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/macvlan.c      | 32 ++++++++++++++++++++++++++++++++
 include/linux/if_macvlan.h |  1 +
 include/linux/netdevice.h  | 22 ++++++++++++++++++++++
 include/linux/skbuff.h     |  9 ++++++---
 net/core/dev.c             |  3 +++
 5 files changed, 64 insertions(+), 3 deletions(-)
David Miller - Oct. 7, 2013, 7:52 p.m.
From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri,  4 Oct 2013 16:10:04 -0400

> @@ -426,9 +426,12 @@ struct sk_buff {
>  	char			cb[48] __aligned(8);
>  
>  	unsigned long		_skb_refdst;
> -#ifdef CONFIG_XFRM
> -	struct	sec_path	*sp;
> -#endif
> +
> +	union {
> +		struct	sec_path	*sp;
> +		void 			*accel_priv;
> +	};
> +

I'm not %100 sure these two things are really mutually exclusive.

What if bridging ebtables does an input route lookup?  That can
populate the security path.

Also, why have you not added this to the usual netdev_ops and
hw_features?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman - Oct. 7, 2013, 9:20 p.m.
Forgive the poor reply format, Dave, I deleted your email (to fast on the
trigger apparently), so I have to reconstruct it.

>> @@ -426,9 +426,12 @@ struct sk_buff {
>>  	char			cb[48] __aligned(8);
>>  
>>  	unsigned long		_skb_refdst;
>> -#ifdef CONFIG_XFRM
>> -	struct	sec_path	*sp;
>> -#endif
>> +
>> +	union {
>> +		struct	sec_path	*sp;
>> +		void 			*accel_priv;
>> +	};
>> +
>
>I'm not %100 sure these two things are really mutually exclusive.
>
>What if bridging ebtables does an input route lookup?  That can
>populate the security path.
>
You are mostly likely right, thats why this is an RFC, I haven't really thought
through that bit fully yet, to be perfectly honest.  I wanted a place for a
pointer to the accelerated data path data to live, and that looked like a
reasonably safe place at the time, but as you point out, its not.  I'll need to
find a better place for it.

>Also, why have you not added this to the usual netdev_ops and
>hw_features?

Thats me experimenting.  I was thinking that origionally this functionality
might be grouped separately, so that we could handle it independently of the
standard network device operations (you might have noticed in v1 of my patch I
had a size_t variable in there, so I thought the separation might be
organizationally nice).  It was also something I was tinkering with for
potential future work to support other data plane accelerators (like the FM6000
switch chip from intel) in a manner that didn't pollute the more typical host network
devices.  Like I said though, just experimenting at the moment....

Regards
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Oct. 7, 2013, 9:34 p.m.
From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 7 Oct 2013 17:20:00 -0400

> Thats me experimenting.  I was thinking that origionally this functionality
> might be grouped separately, so that we could handle it independently of the
> standard network device operations (you might have noticed in v1 of my patch I
> had a size_t variable in there, so I thought the separation might be
> organizationally nice).  It was also something I was tinkering with for
> potential future work to support other data plane accelerators (like the FM6000
> switch chip from intel) in a manner that didn't pollute the more typical host network
> devices.  Like I said though, just experimenting at the moment....

Can these dataplane devices still act like a normal networking port and
send and receive packets at the host level?

If yes, that would be an extremely strong argument for netdev_ops.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Fastabend - Oct. 7, 2013, 10:39 p.m.
On 10/07/2013 02:34 PM, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Mon, 7 Oct 2013 17:20:00 -0400
>
>> Thats me experimenting.  I was thinking that origionally this functionality
>> might be grouped separately, so that we could handle it independently of the
>> standard network device operations (you might have noticed in v1 of my patch I
>> had a size_t variable in there, so I thought the separation might be
>> organizationally nice).  It was also something I was tinkering with for
>> potential future work to support other data plane accelerators (like the FM6000
>> switch chip from intel) in a manner that didn't pollute the more typical host network
>> devices.  Like I said though, just experimenting at the moment....
>

We can do something like the dcbnl ops and add another pointer off
the net device structure and then use the skb->dev field to find the
correct set of ops? This seems like the simplest option to me and
isolates the ops structure.

Is there some information loss from hanging it off the netdevice
structure vs the skb? I can't see any.

> Can these dataplane devices still act like a normal networking port and
> send and receive packets at the host level?
>

Yes they act like normal networking ports except for there is a
switching component in the hardware. These patches are not looking at
virtual or multiple physical functions at the moment.

> If yes, that would be an extremely strong argument for netdev_ops.

I agree.
Neil Horman - Oct. 8, 2013, 12:52 a.m.
On Mon, Oct 07, 2013 at 03:39:01PM -0700, John Fastabend wrote:
> On 10/07/2013 02:34 PM, David Miller wrote:
> >From: Neil Horman <nhorman@tuxdriver.com>
> >Date: Mon, 7 Oct 2013 17:20:00 -0400
> >
> >>Thats me experimenting.  I was thinking that origionally this functionality
> >>might be grouped separately, so that we could handle it independently of the
> >>standard network device operations (you might have noticed in v1 of my patch I
> >>had a size_t variable in there, so I thought the separation might be
> >>organizationally nice).  It was also something I was tinkering with for
> >>potential future work to support other data plane accelerators (like the FM6000
> >>switch chip from intel) in a manner that didn't pollute the more typical host network
> >>devices.  Like I said though, just experimenting at the moment....
> >
> 
> We can do something like the dcbnl ops and add another pointer off
> the net device structure and then use the skb->dev field to find the
> correct set of ops? This seems like the simplest option to me and
> isolates the ops structure.
> 
We certainly could do that, or perhaps, for what we're trying to do here, just
using standard netdev_ops is sufficient.  I kind of like the separation (like
the dcbnl_ops), but like I said, experimenting.  I'll try the next version with
the accel methods added to the netdev structure for comparison.

> Is there some information loss from hanging it off the netdevice
> structure vs the skb? I can't see any.
> 
No, not that I'm aware of.  The only reason I added it to the skb in this
version was that, by doing so, I was able to make dual use of the netdev's
standard tx path.

> >Can these dataplane devices still act like a normal networking port and
> >send and receive packets at the host level?
> >
> 
> Yes they act like normal networking ports except for there is a
> switching component in the hardware. These patches are not looking at
> virtual or multiple physical functions at the moment.
> 
To be clear, as John says, these patches aren't addressing any dataplane
acceleration devices beyond the internal switching capabilities of the ixgbe
cards.  That said, other chips will have varying degrees of capabilities, from
simple L2 switching, to full content addressable memories that allow for l2/l3
forwarding, as well as higher level routing functions.  Again however, these
patches are just to integrate macvlans with johns virtual station interface
work.

> >If yes, that would be an extremely strong argument for netdev_ops.
> 
> I agree.
In this specific case, that may well be the case, yes.  I'm not so sure of that
for more advanced switching/routing accelerators, but we probably should do what
makes sense now, and worry about future bridges when we forward over them
(pardon the pun :) ).

Neil

> 
> 
> -- 
> John Fastabend         Intel Corporation
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 9bf46bd..38d0fc5 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -297,7 +297,17 @@  netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 	int ret;
 	const struct macvlan_dev *vlan = netdev_priv(dev);
 
+	if (vlan->fwd_priv) {
+		skb->dev = vlan->lowerdev;
+		skb->accel_priv = vlan->fwd_priv;
+		ret = dev_queue_xmit(skb);
+		if (likely(ret == NETDEV_TX_OK))
+			goto update_stats;
+	}
+
+	skb->accel_priv = NULL;
 	ret = macvlan_queue_xmit(skb, dev);
+update_stats:
 	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
 		struct macvlan_pcpu_stats *pcpu_stats;
 
@@ -347,6 +357,18 @@  static int macvlan_open(struct net_device *dev)
 		goto hash_add;
 	}
 
+	if (fwd_accel_supports(lowerdev, FA_FLG_STA_SUPPORT)) {
+		vlan->fwd_priv = fwd_accel_add_station(lowerdev, dev);
+		/*
+		 * If we get a NULL pointer back, or if we get an error
+		 * then we should just fall through to the non accelerated path
+		 */
+		if (IS_ERR_OR_NULL(vlan->fwd_priv))
+			vlan->fwd_priv = NULL;
+		else
+			return 0;
+	}
+
 	err = -EBUSY;
 	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
 		goto out;
@@ -367,6 +389,10 @@  hash_add:
 del_unicast:
 	dev_uc_del(lowerdev, dev->dev_addr);
 out:
+	if (vlan->fwd_priv) {
+		fwd_accel_del_station(lowerdev, vlan->fwd_priv);
+		vlan->fwd_priv = NULL;
+	}
 	return err;
 }
 
@@ -391,6 +417,11 @@  static int macvlan_stop(struct net_device *dev)
 
 hash_del:
 	macvlan_hash_del(vlan, !dev->dismantle);
+	if (vlan->fwd_priv) {
+		fwd_accel_del_station(lowerdev, vlan->fwd_priv);
+		vlan->fwd_priv = NULL;
+	}
+
 	return 0;
 }
 
@@ -801,6 +832,7 @@  int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 		if (err < 0)
 			return err;
 	}
+
 	port = macvlan_port_get_rtnl(lowerdev);
 
 	/* Only 1 macvlan device can be created in passthru mode */
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index ddd33fd..c270285 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -61,6 +61,7 @@  struct macvlan_dev {
 	struct hlist_node	hlist;
 	struct macvlan_port	*port;
 	struct net_device	*lowerdev;
+	void			*fwd_priv;
 	struct macvlan_pcpu_stats __percpu *pcpu_stats;
 
 	DECLARE_BITMAP(mc_filter, MACVLAN_MC_FILTER_SZ);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3de49ac..ea18f07 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1100,6 +1100,27 @@  struct net_device_ops {
 };
 
 /*
+ * Flags to ennumerate hardware acceleration support
+ */
+#define FA_FLG_STA_SUPPORT (1 << 1)
+
+#define fwd_accel_supports(dev, feature) (dev->fwd_ops->flags & feature)
+#define fwd_accel_add_station(pdev, vdev) dev->fwd_ops->fwd_accel_add_station(pdev, vdev)
+#define fwd_accel_del_station(pdev, priv) dev->fwd_ops->fwd_accel_del_station(pdev, priv)
+
+struct forwarding_accel_ops {
+	unsigned int flags;
+
+	/*
+	 * fwd_accel_[add|del]_station must be set if
+	 * FA_FLG_STA_SUPPORT is set
+	 */
+	void*	(*fwd_accel_add_station)(struct net_device *pdev,
+					struct net_device *vdev);
+	void	(*fwd_accel_del_station)(struct net_device *pdev, void *priv);
+};
+
+/*
  *	The DEVICE structure.
  *	Actually, this whole structure is a big mistake.  It mixes I/O
  *	data with strictly "high-level" data, and it has to know about
@@ -1183,6 +1204,7 @@  struct net_device {
 	/* Management operations */
 	const struct net_device_ops *netdev_ops;
 	const struct ethtool_ops *ethtool_ops;
+	const struct forwarding_accel_ops *fwd_ops;
 
 	/* Hardware header description */
 	const struct header_ops *header_ops;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2ddb48d..0be9152 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -426,9 +426,12 @@  struct sk_buff {
 	char			cb[48] __aligned(8);
 
 	unsigned long		_skb_refdst;
-#ifdef CONFIG_XFRM
-	struct	sec_path	*sp;
-#endif
+
+	union {
+		struct	sec_path	*sp;
+		void 			*accel_priv;
+	};
+
 	unsigned int		len,
 				data_len;
 	__u16			mac_len,
diff --git a/net/core/dev.c b/net/core/dev.c
index 5c713f2..5f99382 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5992,6 +5992,7 @@  struct netdev_queue *dev_ingress_queue_create(struct net_device *dev)
 }
 
 static const struct ethtool_ops default_ethtool_ops;
+static const struct forwarding_accel_ops default_fwd_ops;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops)
@@ -6090,6 +6091,8 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->group = INIT_NETDEV_GROUP;
 	if (!dev->ethtool_ops)
 		dev->ethtool_ops = &default_ethtool_ops;
+	if (!dev->fwd_ops)
+		dev->fwd_ops = &default_fwd_ops;
 	return dev;
 
 free_all: