diff mbox

[RFC,v2,02/12] sock: skb_copy_ubufs support for compound pages

Message ID 20170222163901.90834-3-willemdebruijn.kernel@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Willem de Bruijn Feb. 22, 2017, 4:38 p.m. UTC
From: Willem de Bruijn <willemb@google.com>

Refine skb_copy_ubufs to support compount pages. With upcoming TCP
and UDP zerocopy sendmsg, such fragments may appear.

These skbuffs can have both kernel and zerocopy fragments, e.g., when
corking. Avoid unnecessary copying of fragments that have no userspace
reference.

It is not safe to modify skb frags when the skbuff is shared. This
should not happen. Fail loudly if we find an unexpected edge case.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/core/skbuff.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

Comments

Eric Dumazet Feb. 22, 2017, 8:33 p.m. UTC | #1
On Wed, 2017-02-22 at 11:38 -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Refine skb_copy_ubufs to support compount pages. With upcoming TCP
> and UDP zerocopy sendmsg, such fragments may appear.
> 
> These skbuffs can have both kernel and zerocopy fragments, e.g., when
> corking. Avoid unnecessary copying of fragments that have no userspace
> reference.
> 
> It is not safe to modify skb frags when the skbuff is shared. This
> should not happen. Fail loudly if we find an unexpected edge case.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  net/core/skbuff.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f3557958e9bf..67e4216fca01 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -944,6 +944,9 @@ EXPORT_SYMBOL_GPL(skb_morph);
>   *	If this function is called from an interrupt gfp_mask() must be
>   *	%GFP_ATOMIC.
>   *
> + *	skb_shinfo(skb) can only be safely modified when not accessed
> + *	concurrently. Fail if the skb is shared or cloned.
> + *
>   *	Returns 0 on success or a negative error code on failure
>   *	to allocate kernel memory to copy to.
>   */
> @@ -954,11 +957,29 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
>  	struct page *page, *head = NULL;
>  	struct ubuf_info *uarg = skb_shinfo(skb)->destructor_arg;
>  
> +	if (skb_shared(skb) || skb_cloned(skb)) {
> +		WARN_ON_ONCE(1);
> +		return -EINVAL;
> +	}
> +
>  	for (i = 0; i < num_frags; i++) {
>  		u8 *vaddr;
> +		unsigned int order = 0;
> +		gfp_t mask = gfp_mask;
>  		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
>  
> -		page = alloc_page(gfp_mask);
> +		page = skb_frag_page(f);
> +		if (page_count(page) == 1) {
> +			skb_frag_ref(skb, i);

This could be : get_page(page);

> +			goto copy_done;
> +		}
> +
> +		if (f->size > PAGE_SIZE) {
> +			order = get_order(f->size);
> +			mask |= __GFP_COMP;

Note that this would probably fail under memory pressure.

We could instead try to explode the few segments into order-0 only
pages.

Hopefully this case should not be frequent.
Willem de Bruijn Feb. 23, 2017, 1:51 a.m. UTC | #2
>>
>> -             page = alloc_page(gfp_mask);
>> +             page = skb_frag_page(f);
>> +             if (page_count(page) == 1) {
>> +                     skb_frag_ref(skb, i);
>
> This could be : get_page(page);

Ah, indeed. Thanks.

>
>> +                     goto copy_done;
>> +             }
>> +
>> +             if (f->size > PAGE_SIZE) {
>> +                     order = get_order(f->size);
>> +                     mask |= __GFP_COMP;
>
> Note that this would probably fail under memory pressure.
>
> We could instead try to explode the few segments into order-0 only
> pages.

Good point. I'll revise to use only order-0 here.
diff mbox

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f3557958e9bf..67e4216fca01 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -944,6 +944,9 @@  EXPORT_SYMBOL_GPL(skb_morph);
  *	If this function is called from an interrupt gfp_mask() must be
  *	%GFP_ATOMIC.
  *
+ *	skb_shinfo(skb) can only be safely modified when not accessed
+ *	concurrently. Fail if the skb is shared or cloned.
+ *
  *	Returns 0 on success or a negative error code on failure
  *	to allocate kernel memory to copy to.
  */
@@ -954,11 +957,29 @@  int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
 	struct page *page, *head = NULL;
 	struct ubuf_info *uarg = skb_shinfo(skb)->destructor_arg;
 
+	if (skb_shared(skb) || skb_cloned(skb)) {
+		WARN_ON_ONCE(1);
+		return -EINVAL;
+	}
+
 	for (i = 0; i < num_frags; i++) {
 		u8 *vaddr;
+		unsigned int order = 0;
+		gfp_t mask = gfp_mask;
 		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
 
-		page = alloc_page(gfp_mask);
+		page = skb_frag_page(f);
+		if (page_count(page) == 1) {
+			skb_frag_ref(skb, i);
+			goto copy_done;
+		}
+
+		if (f->size > PAGE_SIZE) {
+			order = get_order(f->size);
+			mask |= __GFP_COMP;
+		}
+
+		page = alloc_pages(mask, order);
 		if (!page) {
 			while (head) {
 				struct page *next = (struct page *)page_private(head);
@@ -971,6 +992,7 @@  int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
 		memcpy(page_address(page),
 		       vaddr + f->page_offset, skb_frag_size(f));
 		kunmap_atomic(vaddr);
+copy_done:
 		set_page_private(page, (unsigned long)head);
 		head = page;
 	}