diff mbox

[net-next] skbuff: clear tx zero-copy flag

Message ID 1310195566.25391.6.camel@localhost.localdomain
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Shirley Ma July 9, 2011, 7:12 a.m. UTC
Hello Dave,

This patch clears tx zero-copy flag as needed.

Sign-off-by: Shirley Ma <xma@us.ibm.com>
---

 net/core/skbuff.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller July 9, 2011, 9:55 a.m. UTC | #1
From: Shirley Ma <mashirle@us.ibm.com>
Date: Sat, 09 Jul 2011 00:12:46 -0700

> This patch clears tx zero-copy flag as needed.
> 
> Signed-off-by: Shirley Ma <xma@us.ibm.com>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu July 25, 2011, 12:42 a.m. UTC | #2
Shirley Ma <mashirle@us.ibm.com> wrote:
> 
> This patch clears tx zero-copy flag as needed.
> 
> Sign-off-by: Shirley Ma <xma@us.ibm.com>

I think we also need to copy and clear this flag on the splice
read path as that takes a direct page reference.

I hope there isn't any other path that does this.

Cheers,
Michael S. Tsirkin July 25, 2011, 8:07 a.m. UTC | #3
On Mon, Jul 25, 2011 at 08:42:00AM +0800, Herbert Xu wrote:
> Shirley Ma <mashirle@us.ibm.com> wrote:
> > 
> > This patch clears tx zero-copy flag as needed.
> > 
> > Sign-off-by: Shirley Ma <xma@us.ibm.com>
> 
> I think we also need to copy and clear this flag on the splice
> read path as that takes a direct page reference.
> 
> I hope there isn't any other path that does this.
> 
> Cheers,

When there's a way for an skb to get into the
host networking stack, (e.g. when tap gains zero copy
support) we'll need to handle that.

However macvtap passes an skb directly to the
lower device, so as long as macvtap is the only user
of that interface, we are fine I think - there's
no way for an skb to get from macvtap to splice
read path I think.

Right?

> -- 
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu July 25, 2011, 8:40 a.m. UTC | #4
On Mon, Jul 25, 2011 at 11:07:43AM +0300, Michael S. Tsirkin wrote:
>
> However macvtap passes an skb directly to the
> lower device, so as long as macvtap is the only user
> of that interface, we are fine I think - there's
> no way for an skb to get from macvtap to splice
> read path I think.
> 
> Right?

Yes, as long as you can guarantee that the skb never loops back
then you should be fine.

However, does macvtap really bypass everything, including the
qdisc layer? The qdisc layer is certainly capable of looping
the skb back with the redirect action.

Cheers,
Michael S. Tsirkin July 25, 2011, 9:44 a.m. UTC | #5
On Mon, Jul 25, 2011 at 04:40:57PM +0800, Herbert Xu wrote:
> On Mon, Jul 25, 2011 at 11:07:43AM +0300, Michael S. Tsirkin wrote:
> >
> > However macvtap passes an skb directly to the
> > lower device, so as long as macvtap is the only user
> > of that interface, we are fine I think - there's
> > no way for an skb to get from macvtap to splice
> > read path I think.
> > 
> > Right?
> 
> Yes, as long as you can guarantee that the skb never loops back
> then you should be fine.
> 
> However, does macvtap really bypass everything, including the
> qdisc layer? The qdisc layer is certainly capable of looping
> the skb back with the redirect action.
> 
> Cheers,

No, I don't think macvtap bypasses the qdisc.
Is the action in question here?
static int tcf_mirred(struct sk_buff *skb,
			const struct tc_action *a,
                      struct tcf_result *res)

if yes that seems to always clone an skb, which in turn
does the copy so we are fine?

> -- 
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu July 25, 2011, 9:57 a.m. UTC | #6
On Mon, Jul 25, 2011 at 12:44:14PM +0300, Michael S. Tsirkin wrote:
>
> if yes that seems to always clone an skb, which in turn
> does the copy so we are fine?

Yes you're right, it should be safe.

However, I think we should add a WARN_ON to the splice skb path
so that should a packet find its way through a path that we haven't
thought of then at least we'll know about it.

Thanks,
David Miller July 25, 2011, 10:02 a.m. UTC | #7
From: Herbert Xu <herbert@gondor.hengli.com.au>
Date: Mon, 25 Jul 2011 17:57:11 +0800

> However, I think we should add a WARN_ON to the splice skb path
> so that should a packet find its way through a path that we haven't
> thought of then at least we'll know about it.

Good idea.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin July 25, 2011, 10:53 a.m. UTC | #8
On Mon, Jul 25, 2011 at 03:02:29AM -0700, David Miller wrote:
> From: Herbert Xu <herbert@gondor.hengli.com.au>
> Date: Mon, 25 Jul 2011 17:57:11 +0800
> 
> > However, I think we should add a WARN_ON to the splice skb path
> > so that should a packet find its way through a path that we haven't
> > thought of then at least we'll know about it.
> 
> Good idea.

Another place like this is skb_split, I think.
diff mbox

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a9577a2..d220119 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -677,6 +677,7 @@  struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 	if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
 		if (skb_copy_ubufs(skb, gfp_mask))
 			return NULL;
+		skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
 	}
 
 	n = skb + 1;
@@ -801,6 +802,7 @@  struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
 				kfree(n);
 				goto out;
 			}
+			skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
 		}
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 			skb_shinfo(n)->frags[i] = skb_shinfo(skb)->frags[i];
@@ -893,6 +895,7 @@  int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
 			if (skb_copy_ubufs(skb, gfp_mask))
 				goto nofrags;
+			skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
 		}
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
 			get_page(skb_shinfo(skb)->frags[i].page);