Message ID | 20170222163901.90834-3-willemdebruijn.kernel@gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2017-02-22 at 11:38 -0500, Willem de Bruijn wrote: > From: Willem de Bruijn <willemb@google.com> > > Refine skb_copy_ubufs to support compount pages. With upcoming TCP > and UDP zerocopy sendmsg, such fragments may appear. > > These skbuffs can have both kernel and zerocopy fragments, e.g., when > corking. Avoid unnecessary copying of fragments that have no userspace > reference. > > It is not safe to modify skb frags when the skbuff is shared. This > should not happen. Fail loudly if we find an unexpected edge case. > > Signed-off-by: Willem de Bruijn <willemb@google.com> > --- > net/core/skbuff.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index f3557958e9bf..67e4216fca01 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -944,6 +944,9 @@ EXPORT_SYMBOL_GPL(skb_morph); > * If this function is called from an interrupt gfp_mask() must be > * %GFP_ATOMIC. > * > + * skb_shinfo(skb) can only be safely modified when not accessed > + * concurrently. Fail if the skb is shared or cloned. > + * > * Returns 0 on success or a negative error code on failure > * to allocate kernel memory to copy to. > */ > @@ -954,11 +957,29 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) > struct page *page, *head = NULL; > struct ubuf_info *uarg = skb_shinfo(skb)->destructor_arg; > > + if (skb_shared(skb) || skb_cloned(skb)) { > + WARN_ON_ONCE(1); > + return -EINVAL; > + } > + > for (i = 0; i < num_frags; i++) { > u8 *vaddr; > + unsigned int order = 0; > + gfp_t mask = gfp_mask; > skb_frag_t *f = &skb_shinfo(skb)->frags[i]; > > - page = alloc_page(gfp_mask); > + page = skb_frag_page(f); > + if (page_count(page) == 1) { > + skb_frag_ref(skb, i); This could be : get_page(page); > + goto copy_done; > + } > + > + if (f->size > PAGE_SIZE) { > + order = get_order(f->size); > + mask |= __GFP_COMP; Note that this would probably fail under memory pressure. We could instead try to explode the few segments into order-0 only pages. Hopefully this case should not be frequent.
>> >> - page = alloc_page(gfp_mask); >> + page = skb_frag_page(f); >> + if (page_count(page) == 1) { >> + skb_frag_ref(skb, i); > > This could be : get_page(page); Ah, indeed. Thanks. > >> + goto copy_done; >> + } >> + >> + if (f->size > PAGE_SIZE) { >> + order = get_order(f->size); >> + mask |= __GFP_COMP; > > Note that this would probably fail under memory pressure. > > We could instead try to explode the few segments into order-0 only > pages. Good point. I'll revise to use only order-0 here.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f3557958e9bf..67e4216fca01 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -944,6 +944,9 @@ EXPORT_SYMBOL_GPL(skb_morph); * If this function is called from an interrupt gfp_mask() must be * %GFP_ATOMIC. * + * skb_shinfo(skb) can only be safely modified when not accessed + * concurrently. Fail if the skb is shared or cloned. + * * Returns 0 on success or a negative error code on failure * to allocate kernel memory to copy to. */ @@ -954,11 +957,29 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) struct page *page, *head = NULL; struct ubuf_info *uarg = skb_shinfo(skb)->destructor_arg; + if (skb_shared(skb) || skb_cloned(skb)) { + WARN_ON_ONCE(1); + return -EINVAL; + } + for (i = 0; i < num_frags; i++) { u8 *vaddr; + unsigned int order = 0; + gfp_t mask = gfp_mask; skb_frag_t *f = &skb_shinfo(skb)->frags[i]; - page = alloc_page(gfp_mask); + page = skb_frag_page(f); + if (page_count(page) == 1) { + skb_frag_ref(skb, i); + goto copy_done; + } + + if (f->size > PAGE_SIZE) { + order = get_order(f->size); + mask |= __GFP_COMP; + } + + page = alloc_pages(mask, order); if (!page) { while (head) { struct page *next = (struct page *)page_private(head); @@ -971,6 +992,7 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) memcpy(page_address(page), vaddr + f->page_offset, skb_frag_size(f)); kunmap_atomic(vaddr); +copy_done: set_page_private(page, (unsigned long)head); head = page; }