diff mbox

[net-next] gro: Handle inline VLAN tags

Message ID 1353097030.2743.28.camel@bwh-desktop.uk.solarflarecom.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Ben Hutchings Nov. 16, 2012, 8:17 p.m. UTC
The receive paths for skbs with inline and out-of-line VLAN tags (VLAN
RX accleration) were made largely consistent in 2.6.37, with tags
pulled out by software as necessary.  However GRO doesn't do this, so
it is not effective for VLAN-tagged packets received on devices
without VLAN RX acceleration.

napi_gro_frags() must not free the skb and does not advance the
skb->data pointer, so cannot use vlan_untag().  Extract the core of
vlan_untag() into a new function __vlan_untag() that allows the offset
to the VLAN tag to be specified and returns an error code.  Add
kernel-doc comments for both those functions.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
Tested with sfc using both napi_gro_receive() and napi_gro_frags().  On
a Core i7 920 (Nehalem) system it increased TCP/IPv4 receive throughput
over a VLAN from ~8.0 to ~9.3 Gbit/s.

Ben.

 include/linux/if_vlan.h |    6 ++++
 net/8021q/vlan_core.c   |   60 ++++++++++++++++++++++++++++++++---------------
 net/core/dev.c          |   27 ++++++++++++++++----
 3 files changed, 68 insertions(+), 25 deletions(-)

Comments

Eric Dumazet Nov. 16, 2012, 11:01 p.m. UTC | #1
On Fri, 2012-11-16 at 20:17 +0000, Ben Hutchings wrote:
> The receive paths for skbs with inline and out-of-line VLAN tags (VLAN
> RX accleration) were made largely consistent in 2.6.37, with tags
> pulled out by software as necessary.  However GRO doesn't do this, so
> it is not effective for VLAN-tagged packets received on devices
> without VLAN RX acceleration.
> 
> napi_gro_frags() must not free the skb and does not advance the
> skb->data pointer, so cannot use vlan_untag().  Extract the core of
> vlan_untag() into a new function __vlan_untag() that allows the offset
> to the VLAN tag to be specified and returns an error code.  Add
> kernel-doc comments for both those functions.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> Tested with sfc using both napi_gro_receive() and napi_gro_frags().  On
> a Core i7 920 (Nehalem) system it increased TCP/IPv4 receive throughput
> over a VLAN from ~8.0 to ~9.3 Gbit/s.
> 
> Ben.
> 

> index b4978e2..9d658eb 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3668,6 +3668,13 @@ static void skb_gro_reset_offset(struct sk_buff *skb)
>  
>  gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
>  {
> +	if (unlikely(skb->protocol == htons(ETH_P_8021Q)) &&
> +	    !vlan_tx_tag_present(skb)) {
> +		skb = vlan_untag(skb);
> +		if (unlikely(!skb))
> +			return GRO_DROP;
> +	}
> +
>  	skb_gro_reset_offset(skb);

I am not very convinced.

So for some drivers _not_ doing vlan acceleration, we are slowing down
GRO.

I mean, driver authors should know if they need to call a helper before
calling napi_gro_receive()

To date, only two drivers would need this, and it was discovered very
recently.

Also at GRO point, we totally own the skb and the vlan decap cannot
possibly fail (and free the skb)

A packet sniffer should see all skbs delivered to napi_gro_receive()



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Nov. 17, 2012, midnight UTC | #2
On Fri, 2012-11-16 at 15:01 -0800, Eric Dumazet wrote:
> On Fri, 2012-11-16 at 20:17 +0000, Ben Hutchings wrote:
> > The receive paths for skbs with inline and out-of-line VLAN tags (VLAN
> > RX accleration) were made largely consistent in 2.6.37, with tags
> > pulled out by software as necessary.  However GRO doesn't do this, so
> > it is not effective for VLAN-tagged packets received on devices
> > without VLAN RX acceleration.
> > 
> > napi_gro_frags() must not free the skb and does not advance the
> > skb->data pointer, so cannot use vlan_untag().  Extract the core of
> > vlan_untag() into a new function __vlan_untag() that allows the offset
> > to the VLAN tag to be specified and returns an error code.  Add
> > kernel-doc comments for both those functions.
> > 
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> > ---
> > Tested with sfc using both napi_gro_receive() and napi_gro_frags().  On
> > a Core i7 920 (Nehalem) system it increased TCP/IPv4 receive throughput
> > over a VLAN from ~8.0 to ~9.3 Gbit/s.
> > 
> > Ben.
> > 
> 
> > index b4978e2..9d658eb 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3668,6 +3668,13 @@ static void skb_gro_reset_offset(struct sk_buff *skb)
> >  
> >  gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
> >  {
> > +	if (unlikely(skb->protocol == htons(ETH_P_8021Q)) &&
> > +	    !vlan_tx_tag_present(skb)) {
> > +		skb = vlan_untag(skb);
> > +		if (unlikely(!skb))
> > +			return GRO_DROP;
> > +	}
> > +
> >  	skb_gro_reset_offset(skb);
> 
> I am not very convinced.
> 
> So for some drivers _not_ doing vlan acceleration, we are slowing down
> GRO.

It's a single comparison, hinted as 'unlikely'.

> I mean, driver authors should know if they need to call a helper before
> calling napi_gro_receive()
> 
> To date, only two drivers would need this, and it was discovered very
> recently.

It's not only two drivers.  qlcnic fakes RX VLAN acceleration precisely
to work around this limitation.  netxen, niu and Calxeda's xgmac might
also benefit (it's not clear whether the hardware does checksum
validation for VLAN-tagged packets).

> Also at GRO point, we totally own the skb and the vlan decap cannot
> possibly fail (and free the skb)
> 
> A packet sniffer should see all skbs delivered to napi_gro_receive()

I'm not sure what you mean by this.  Is your point that the
copy-on-write is never needed?  It is still possible for pskb_may_pull()
to fail.

Ben.
Eric Dumazet Nov. 17, 2012, 12:16 a.m. UTC | #3
On Sat, 2012-11-17 at 00:00 +0000, Ben Hutchings wrote:

> I'm not sure what you mean by this.  Is your point that the
> copy-on-write is never needed?  It is still possible for pskb_may_pull()
> to fail.
> 

A packet sniffer should have a copy of bad frames, even if dropped later
in our stacks.

GRO layer is not allowed to drop a frame, even if not 'correct'.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Nov. 17, 2012, 12:32 a.m. UTC | #4
On Fri, 2012-11-16 at 16:16 -0800, Eric Dumazet wrote:
> On Sat, 2012-11-17 at 00:00 +0000, Ben Hutchings wrote:
> 
> > I'm not sure what you mean by this.  Is your point that the
> > copy-on-write is never needed?  It is still possible for pskb_may_pull()
> > to fail.
> > 
> 
> A packet sniffer should have a copy of bad frames, even if dropped later
> in our stacks.
> 
> GRO layer is not allowed to drop a frame, even if not 'correct'.

What do you think the accelerated hardware does with frames that have a
truncated VLAN tag?

Ben.
Eric Dumazet Nov. 17, 2012, 1:09 a.m. UTC | #5
On Sat, 2012-11-17 at 00:32 +0000, Ben Hutchings wrote:
> On Fri, 2012-11-16 at 16:16 -0800, Eric Dumazet wrote:
> > On Sat, 2012-11-17 at 00:00 +0000, Ben Hutchings wrote:
> > 
> > > I'm not sure what you mean by this.  Is your point that the
> > > copy-on-write is never needed?  It is still possible for pskb_may_pull()
> > > to fail.
> > > 
> > 
> > A packet sniffer should have a copy of bad frames, even if dropped later
> > in our stacks.
> > 
> > GRO layer is not allowed to drop a frame, even if not 'correct'.
> 
> What do you think the accelerated hardware does with frames that have a
> truncated VLAN tag?

The hardware should send us the frame, exactly like when RX checksum is
wrong.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Gallatin Nov. 17, 2012, 1:17 a.m. UTC | #6
On 11/16/12 15:17, Ben Hutchings wrote:
> The receive paths for skbs with inline and out-of-line VLAN tags (VLAN
> RX accleration) were made largely consistent in 2.6.37, with tags
> pulled out by software as necessary.  However GRO doesn't do this, so
> it is not effective for VLAN-tagged packets received on devices
> without VLAN RX acceleration.
>
> napi_gro_frags() must not free the skb and does not advance the
> skb->data pointer, so cannot use vlan_untag().  Extract the core of
> vlan_untag() into a new function __vlan_untag() that allows the offset
> to the VLAN tag to be specified and returns an error code.  Add
> kernel-doc comments for both those functions.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> Tested with sfc using both napi_gro_receive() and napi_gro_frags().  On
> a Core i7 920 (Nehalem) system it increased TCP/IPv4 receive throughput
> over a VLAN from ~8.0 to ~9.3 Gbit/s.


I verified similar results on myri10ge, using my recent GRO
patchset (minus the in-driver vtag removal) with just
napi_gro_frags().

I've no strong feeling as to whether or not this belongs in GRO or the
driver.  I'm just glad that it is being discussed.

Thank you,

Drew

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 20, 2012, 12:10 a.m. UTC | #7
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 16 Nov 2012 17:09:19 -0800

> On Sat, 2012-11-17 at 00:32 +0000, Ben Hutchings wrote:
>> On Fri, 2012-11-16 at 16:16 -0800, Eric Dumazet wrote:
>> > On Sat, 2012-11-17 at 00:00 +0000, Ben Hutchings wrote:
>> > 
>> > > I'm not sure what you mean by this.  Is your point that the
>> > > copy-on-write is never needed?  It is still possible for pskb_may_pull()
>> > > to fail.
>> > > 
>> > 
>> > A packet sniffer should have a copy of bad frames, even if dropped later
>> > in our stacks.
>> > 
>> > GRO layer is not allowed to drop a frame, even if not 'correct'.
>> 
>> What do you think the accelerated hardware does with frames that have a
>> truncated VLAN tag?
> 
> The hardware should send us the frame, exactly like when RX checksum is
> wrong.

I agree with Eric, and therefore will not apply this patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Gallatin Nov. 26, 2012, 3:04 p.m. UTC | #8
On 11/19/12 19:10, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 16 Nov 2012 17:09:19 -0800
>
>> On Sat, 2012-11-17 at 00:32 +0000, Ben Hutchings wrote:
>>> On Fri, 2012-11-16 at 16:16 -0800, Eric Dumazet wrote:
>>>> On Sat, 2012-11-17 at 00:00 +0000, Ben Hutchings wrote:
>>>>
>>>>> I'm not sure what you mean by this.  Is your point that the
>>>>> copy-on-write is never needed?  It is still possible for pskb_may_pull()
>>>>> to fail.
>>>>>
>>>>
>>>> A packet sniffer should have a copy of bad frames, even if dropped later
>>>> in our stacks.
>>>>
>>>> GRO layer is not allowed to drop a frame, even if not 'correct'.
>>>
>>> What do you think the accelerated hardware does with frames that have a
>>> truncated VLAN tag?
>>
>> The hardware should send us the frame, exactly like when RX checksum is
>> wrong.
>
> I agree with Eric, and therefore will not apply this patch.
>

David,

How do you feel about the patchset I posted on 11/14/2012
([PATCH net-next 0/3] myri10ge: LRO to GRO conversion,
http://marc.info/?l=linux-netdev&m=135289838223920&w=2)
which moves myri10ge from LRO to GRO?

Specifically, if doing vlan decap in GRO is not OK, then how
about doing it in the driver?

BTW, if I have bungled something it the myri10ge patchset submission,
I do apologize. I don't frequently submit patches, and it is likely
I screwed up some convention..

Thanks,

Drew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 28, 2012, 4:46 p.m. UTC | #9
From: Andrew Gallatin <gallatin@myri.com>
Date: Mon, 26 Nov 2012 10:04:44 -0500

> How do you feel about the patchset I posted on 11/14/2012
> ([PATCH net-next 0/3] myri10ge: LRO to GRO conversion,
> http://marc.info/?l=linux-netdev&m=135289838223920&w=2)
> which moves myri10ge from LRO to GRO?
> 
> Specifically, if doing vlan decap in GRO is not OK, then how
> about doing it in the driver?

I'm going to trust Eric's judgment on this one, as he has been
thinking more about the long term GRO issues than I have.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Gallatin Nov. 28, 2012, 5:30 p.m. UTC | #10
On 11/28/12 11:46, David Miller wrote:
> From: Andrew Gallatin <gallatin@myri.com>
> Date: Mon, 26 Nov 2012 10:04:44 -0500
>
>> How do you feel about the patchset I posted on 11/14/2012
>> ([PATCH net-next 0/3] myri10ge: LRO to GRO conversion,
>> http://marc.info/?l=linux-netdev&m=135289838223920&w=2)
>> which moves myri10ge from LRO to GRO?
>>
>> Specifically, if doing vlan decap in GRO is not OK, then how
>> about doing it in the driver?
>
> I'm going to trust Eric's judgment on this one, as he has been
> thinking more about the long term GRO issues than I have.
>
> Thanks.
>

Perfect, thanks.  Eric seemed OK with doing the vlan decap
in the driver.  But.. I didn't get an "nack" or an "applied" from
you in response to that myri10ge patchset.  Do I need to re-submit it?

Sorry, I'm not terribly familiar with the etiquette for this sort
of thing.

Thanks,

Drew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 28, 2012, 6:39 p.m. UTC | #11
From: Andrew Gallatin <gallatin@myri.com>
Date: Wed, 28 Nov 2012 12:30:12 -0500

> Perfect, thanks.  Eric seemed OK with doing the vlan decap
> in the driver.  But.. I didn't get an "nack" or an "applied" from
> you in response to that myri10ge patchset.  Do I need to re-submit it?
> 
> Sorry, I'm not terribly familiar with the etiquette for this sort
> of thing.

Please just repost the patch set.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index d06cc5c..a2167c3 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -91,6 +91,7 @@  extern struct net_device *vlan_dev_real_dev(const struct net_device *dev);
 extern u16 vlan_dev_vlan_id(const struct net_device *dev);
 
 extern bool vlan_do_receive(struct sk_buff **skb);
+extern int __vlan_untag(struct sk_buff *skb, int offset);
 extern struct sk_buff *vlan_untag(struct sk_buff *skb);
 
 extern int vlan_vid_add(struct net_device *dev, unsigned short vid);
@@ -126,6 +127,11 @@  static inline bool vlan_do_receive(struct sk_buff **skb)
 	return false;
 }
 
+static inline int __vlan_untag(struct sk_buff *skb, int offset)
+{
+	return 0;
+}
+
 static inline struct sk_buff *vlan_untag(struct sk_buff *skb)
 {
 	return skb;
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 65e06ab..8486430 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -93,20 +93,53 @@  u16 vlan_dev_vlan_id(const struct net_device *dev)
 }
 EXPORT_SYMBOL(vlan_dev_vlan_id);
 
-static struct sk_buff *vlan_reorder_header(struct sk_buff *skb)
+/**
+ *	__vlan_untag - pull VLAN tag out of 802.1q packet header
+ *	@skb: sk_buff to edit; may be cloned but not shared.
+ *	@offset: Offset from @skb->data to VLAN tag.  Must be either
+ *		0 or %ETH_HLEN.
+ *
+ *	This updates the @mac_header but no other header offset.  The
+ *	caller is expected to check the @protocol and that there is no
+ *	out-of-line tag before calling this.
+ */
+int __vlan_untag(struct sk_buff *skb, int offset)
 {
+	struct vlan_hdr *vhdr;
+	u16 vlan_tci;
+
+	if (unlikely(!pskb_may_pull(skb, offset + VLAN_HLEN)))
+		return -EINVAL;
+
+	vhdr = (struct vlan_hdr *) (skb->data + offset);
+	vlan_tci = ntohs(vhdr->h_vlan_TCI);
+	__vlan_hwaccel_put_tag(skb, vlan_tci);
+
+	skb->len -= VLAN_HLEN;
+	skb_postpull_rcsum(skb, skb->data + offset, VLAN_HLEN);
+	skb->data += VLAN_HLEN;
+	vlan_set_encap_proto(skb, vhdr);
+
 	if (skb_cow(skb, skb_headroom(skb)) < 0)
-		return NULL;
-	memmove(skb->data - ETH_HLEN, skb->data - VLAN_ETH_HLEN, 2 * ETH_ALEN);
+		return -ENOMEM;
+
+	memmove(skb->data + offset - ETH_HLEN,
+		skb->data + offset - VLAN_ETH_HLEN, 2 * ETH_ALEN);
 	skb->mac_header += VLAN_HLEN;
-	return skb;
+	return 0;
 }
 
+/**
+ *	vlan_untag - pull VLAN tag out of packet header, if appropriate
+ *	@skb: sk_buff to edit; may be cloned or shared.
+ *
+ *	If @skb has an inline VLAN tag and no out-of-line VLAN tag,
+ *	pull the tag out-of-line and reset all header offsets.  Return
+ *	the edited sk_buff.  If allocation fails or the VLAN tag is
+ *	invalid, free @skb and return NULL.
+ */
 struct sk_buff *vlan_untag(struct sk_buff *skb)
 {
-	struct vlan_hdr *vhdr;
-	u16 vlan_tci;
-
 	if (unlikely(vlan_tx_tag_present(skb))) {
 		/* vlan_tci is already set-up so leave this for another time */
 		return skb;
@@ -116,18 +149,7 @@  struct sk_buff *vlan_untag(struct sk_buff *skb)
 	if (unlikely(!skb))
 		goto err_free;
 
-	if (unlikely(!pskb_may_pull(skb, VLAN_HLEN)))
-		goto err_free;
-
-	vhdr = (struct vlan_hdr *) skb->data;
-	vlan_tci = ntohs(vhdr->h_vlan_TCI);
-	__vlan_hwaccel_put_tag(skb, vlan_tci);
-
-	skb_pull_rcsum(skb, VLAN_HLEN);
-	vlan_set_encap_proto(skb, vhdr);
-
-	skb = vlan_reorder_header(skb);
-	if (unlikely(!skb))
+	if (unlikely(__vlan_untag(skb, 0)))
 		goto err_free;
 
 	skb_reset_network_header(skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index b4978e2..9d658eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3668,6 +3668,13 @@  static void skb_gro_reset_offset(struct sk_buff *skb)
 
 gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
+	if (unlikely(skb->protocol == htons(ETH_P_8021Q)) &&
+	    !vlan_tx_tag_present(skb)) {
+		skb = vlan_untag(skb);
+		if (unlikely(!skb))
+			return GRO_DROP;
+	}
+
 	skb_gro_reset_offset(skb);
 
 	return napi_skb_finish(__napi_gro_receive(napi, skb), skb);
@@ -3743,11 +3750,8 @@  static struct sk_buff *napi_frags_skb(struct napi_struct *napi)
 	eth = skb_gro_header_fast(skb, off);
 	if (skb_gro_header_hard(skb, hlen)) {
 		eth = skb_gro_header_slow(skb, hlen, off);
-		if (unlikely(!eth)) {
-			napi_reuse_skb(napi, skb);
-			skb = NULL;
-			goto out;
-		}
+		if (unlikely(!eth))
+			goto fail;
 	}
 
 	skb_gro_pull(skb, sizeof(*eth));
@@ -3758,8 +3762,19 @@  static struct sk_buff *napi_frags_skb(struct napi_struct *napi)
 	 */
 	skb->protocol = eth->h_proto;
 
-out:
+	if (unlikely(skb->protocol == htons(ETH_P_8021Q)) &&
+	    !vlan_tx_tag_present(skb)) {
+		if (unlikely(__vlan_untag(skb, sizeof(*eth))))
+			goto fail;
+		skb_gro_reset_offset(skb);
+		skb_gro_pull(skb, sizeof(*eth));
+	}
+
 	return skb;
+
+fail:
+	napi_reuse_skb(napi, skb);
+	return NULL;
 }
 
 gro_result_t napi_gro_frags(struct napi_struct *napi)