From patchwork Mon Mar 25 12:15:50 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 230656 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 60EC82C00AE for ; Mon, 25 Mar 2013 23:16:29 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757329Ab3CYMQY (ORCPT ); Mon, 25 Mar 2013 08:16:24 -0400 Received: from mail.us.es ([193.147.175.20]:46580 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757293Ab3CYMQW (ORCPT ); Mon, 25 Mar 2013 08:16:22 -0400 Received: (qmail 20843 invoked from network); 25 Mar 2013 13:16:21 +0100 Received: from unknown (HELO us.es) (192.168.2.13) by us.es with SMTP; 25 Mar 2013 13:16:21 +0100 Received: (qmail 14396 invoked by uid 507); 25 Mar 2013 12:16:21 -0000 X-Qmail-Scanner-Diagnostics: from 127.0.0.1 by antivirus3 (envelope-from , uid 501) with qmail-scanner-2.10 (clamdscan: 0.97.6/16899. spamassassin: 3.3.2. Clear:RC:1(127.0.0.1):SA:0(-97.2/7.5):. Processed in 1.99555 secs); 25 Mar 2013 12:16:21 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on antivirus3 X-Spam-Level: X-Spam-Status: No, score=-97.2 required=7.5 tests=BAYES_50,RCVD_IN_PBL, RCVD_IN_RP_RNBL,RCVD_IN_SORBS_DUL,RDNS_DYNAMIC,USER_IN_WHITELIST autolearn=disabled version=3.3.2 X-Envelope-From: pablo@netfilter.org Received: from unknown (HELO antivirus3) (127.0.0.1) by us.es with SMTP; 25 Mar 2013 12:16:19 -0000 Received: from 192.168.1.13 (192.168.1.13) by antivirus3 (F-Secure/fsigk_smtp/407/antivirus3); Mon, 25 Mar 2013 13:16:18 +0100 (CET) X-Virus-Status: clean(F-Secure/fsigk_smtp/407/antivirus3) Received: (qmail 7694 invoked from network); 25 Mar 2013 13:16:18 +0100 Received: from 172.231.78.188.dynamic.jazztel.es (HELO localhost.localdomain) (pneira@us.es@188.78.231.172) by us.es with SMTP; 25 Mar 2013 13:16:18 +0100 From: pablo@netfilter.org To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH 10/12] netfilter: nfnetlink_queue: zero copy support Date: Mon, 25 Mar 2013 13:15:50 +0100 Message-Id: <1364213752-3934-11-git-send-email-pablo@netfilter.org> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1364213752-3934-1-git-send-email-pablo@netfilter.org> References: <1364213752-3934-1-git-send-email-pablo@netfilter.org> Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org From: Eric Dumazet nfqnl_build_packet_message() actually copy the packet inside the netlink message, while it can instead use zero copy. Make sure the skb 'copy' is the last component of the cooked netlink message, as we cant add anything after it. Patch cooked in Copenhagen at Netfilter Workshop ;) Still to be addressed in separate patches : -GRO/GSO packets are segmented in nf_queue() and checksummed in nfqnl_build_packet_message(). Proper support for GSO/GRO packets (no segmentation, and no checksumming) needs application cooperation, if we want no regressions. Signed-off-by: Eric Dumazet Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nfnetlink_queue_core.c | 94 ++++++++++++++++++++++++++-------- 1 file changed, 72 insertions(+), 22 deletions(-) diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c index 350c50f..da91b86 100644 --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -217,14 +217,59 @@ nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn, unsigned long data) spin_unlock_bh(&queue->lock); } +static void +nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen) +{ + int i, j = 0; + int plen = 0; /* length of skb->head fragment */ + struct page *page; + unsigned int offset; + + /* dont bother with small payloads */ + if (len <= skb_tailroom(to)) { + skb_copy_bits(from, 0, skb_put(to, len), len); + return; + } + + if (hlen) { + skb_copy_bits(from, 0, skb_put(to, hlen), hlen); + len -= hlen; + } else { + plen = min_t(int, skb_headlen(from), len); + if (plen) { + page = virt_to_head_page(from->head); + offset = from->data - (unsigned char *)page_address(page); + __skb_fill_page_desc(to, 0, page, offset, plen); + get_page(page); + j = 1; + len -= plen; + } + } + + to->truesize += len + plen; + to->len += len + plen; + to->data_len += len + plen; + + for (i = 0; i < skb_shinfo(from)->nr_frags; i++) { + if (!len) + break; + skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i]; + skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len); + len -= skb_shinfo(to)->frags[j].size; + skb_frag_ref(to, j); + j++; + } + skb_shinfo(to)->nr_frags = j; +} + static struct sk_buff * nfqnl_build_packet_message(struct nfqnl_instance *queue, struct nf_queue_entry *entry, __be32 **packet_id_ptr) { - sk_buff_data_t old_tail; size_t size; size_t data_len = 0, cap_len = 0; + int hlen = 0; struct sk_buff *skb; struct nlattr *nla; struct nfqnl_msg_packet_hdr *pmsg; @@ -246,8 +291,10 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue, #endif + nla_total_size(sizeof(u_int32_t)) /* mark */ + nla_total_size(sizeof(struct nfqnl_msg_packet_hw)) - + nla_total_size(sizeof(struct nfqnl_msg_packet_timestamp) - + nla_total_size(sizeof(u_int32_t))); /* cap_len */ + + nla_total_size(sizeof(u_int32_t)); /* cap_len */ + + if (entskb->tstamp.tv64) + size += nla_total_size(sizeof(struct nfqnl_msg_packet_timestamp)); outdev = entry->outdev; @@ -265,7 +312,16 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue, if (data_len == 0 || data_len > entskb->len) data_len = entskb->len; - size += nla_total_size(data_len); + + if (!entskb->head_frag || + skb_headlen(entskb) < L1_CACHE_BYTES || + skb_shinfo(entskb)->nr_frags >= MAX_SKB_FRAGS) + hlen = skb_headlen(entskb); + + if (skb_has_frag_list(entskb)) + hlen = entskb->len; + hlen = min_t(int, data_len, hlen); + size += sizeof(struct nlattr) + hlen; cap_len = entskb->len; break; } @@ -277,7 +333,6 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue, if (!skb) return NULL; - old_tail = skb->tail; nlh = nlmsg_put(skb, 0, 0, NFNL_SUBSYS_QUEUE << 8 | NFQNL_MSG_PACKET, sizeof(struct nfgenmsg), 0); @@ -382,31 +437,26 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue, goto nla_put_failure; } + if (ct && nfqnl_ct_put(skb, ct, ctinfo) < 0) + goto nla_put_failure; + + if (cap_len > 0 && nla_put_be32(skb, NFQA_CAP_LEN, htonl(cap_len))) + goto nla_put_failure; + if (data_len) { struct nlattr *nla; - int sz = nla_attr_size(data_len); - if (skb_tailroom(skb) < nla_total_size(data_len)) { - printk(KERN_WARNING "nf_queue: no tailroom!\n"); - kfree_skb(skb); - return NULL; - } + if (skb_tailroom(skb) < sizeof(*nla) + hlen) + goto nla_put_failure; - nla = (struct nlattr *)skb_put(skb, nla_total_size(data_len)); + nla = (struct nlattr *)skb_put(skb, sizeof(*nla)); nla->nla_type = NFQA_PAYLOAD; - nla->nla_len = sz; + nla->nla_len = nla_attr_size(data_len); - if (skb_copy_bits(entskb, 0, nla_data(nla), data_len)) - BUG(); + nfqnl_zcopy(skb, entskb, data_len, hlen); } - if (ct && nfqnl_ct_put(skb, ct, ctinfo) < 0) - goto nla_put_failure; - - if (cap_len > 0 && nla_put_be32(skb, NFQA_CAP_LEN, htonl(cap_len))) - goto nla_put_failure; - - nlh->nlmsg_len = skb->tail - old_tail; + nlh->nlmsg_len = skb->len; return skb; nla_put_failure: