From patchwork Wed Apr 3 15:39:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 1076122 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=netfilter.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44Z9Jf68v2z9sRt for ; Thu, 4 Apr 2019 02:40:34 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726864AbfDCPkL (ORCPT ); Wed, 3 Apr 2019 11:40:11 -0400 Received: from mail.us.es ([193.147.175.20]:56674 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726396AbfDCPkK (ORCPT ); Wed, 3 Apr 2019 11:40:10 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 35B28172C8A for ; Wed, 3 Apr 2019 17:40:09 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 23A70DA712 for ; Wed, 3 Apr 2019 17:40:09 +0200 (CEST) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id 1892ADA70E; Wed, 3 Apr 2019 17:40:09 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 051CFDA706; Wed, 3 Apr 2019 17:40:07 +0200 (CEST) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Wed, 03 Apr 2019 17:40:06 +0200 (CEST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (sys.soleta.eu [212.170.55.40]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id AEEB34265A32; Wed, 3 Apr 2019 17:40:06 +0200 (CEST) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: fw@strlen.de, eric.dumazet@gmail.com, davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH net-next, RFC 1/8] net: use kfree_skb_list() from ip_do_fragment() Date: Wed, 3 Apr 2019 17:39:54 +0200 Message-Id: <20190403154001.8177-2-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190403154001.8177-1-pablo@netfilter.org> References: <20190403154001.8177-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Just like 46cfd725c377 ("net: use kfree_skb_list() helper in more places"). Signed-off-by: Pablo Neira Ayuso Acked-by: Florian Westphal --- net/ipv4/ip_output.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index c80188875f39..10b35328cfbc 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -693,11 +693,8 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, return 0; } - while (frag) { - skb = frag->next; - kfree_skb(frag); - frag = skb; - } + kfree_skb_list(frag); + IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS); return err; From patchwork Wed Apr 3 15:39:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 1076118 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=netfilter.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44Z9JX42XQz9sSM for ; Thu, 4 Apr 2019 02:40:28 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726882AbfDCPkO (ORCPT ); Wed, 3 Apr 2019 11:40:14 -0400 Received: from mail.us.es ([193.147.175.20]:56692 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726264AbfDCPkM (ORCPT ); Wed, 3 Apr 2019 11:40:12 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 085FA172C81 for ; Wed, 3 Apr 2019 17:40:10 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id E6145DA715 for ; Wed, 3 Apr 2019 17:40:09 +0200 (CEST) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id E4050DA714; Wed, 3 Apr 2019 17:40:09 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id B4B91DA716; Wed, 3 Apr 2019 17:40:07 +0200 (CEST) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Wed, 03 Apr 2019 17:40:07 +0200 (CEST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (sys.soleta.eu [212.170.55.40]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id 705EA4265A31; Wed, 3 Apr 2019 17:40:07 +0200 (CEST) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: fw@strlen.de, eric.dumazet@gmail.com, davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH net-next, RFC 2/8] net: ipv4: add skbuff fraglist split iterator Date: Wed, 3 Apr 2019 17:39:55 +0200 Message-Id: <20190403154001.8177-3-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190403154001.8177-1-pablo@netfilter.org> References: <20190403154001.8177-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch adds skbuff fraglist split iterator. The API consists of: * ip_fraglist_init(), that initializes the internal state of the fraglist iterator. * ip_fraglist_prepare(), that restores the IPv4 and transport headers on the fragment. * ip_fraglist_next(), that retrieves the fragment from the fraglist and updates the internal state of the iterator to point to the next fragment in the fraglist. The ip_fraglist_iter object stores the internal state of the iterator. This code has been extracted from ip_do_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath for refragmentation. Signed-off-by: Pablo Neira Ayuso --- include/net/ip.h | 23 ++++++++++++++ net/ipv4/ip_output.c | 88 ++++++++++++++++++++++++++++++++-------------------- 2 files changed, 78 insertions(+), 33 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index aa09ae5f01a5..cc3d83680f4f 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -165,6 +165,29 @@ int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb); int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb); int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, int (*output)(struct net *, struct sock *, struct sk_buff *)); + +struct ip_fraglist_iter { + struct sk_buff *frag_list; + struct sk_buff *frag; + struct iphdr *iph; + int offset; + unsigned int hlen; +}; + +void ip_fraglist_init(struct sk_buff *skb, struct iphdr *iph, + unsigned int hlen, struct ip_fraglist_iter *iter); +void ip_fraglist_prepare(struct sk_buff *skb, struct ip_fraglist_iter *iter); + +static inline struct sk_buff *ip_fraglist_next(struct ip_fraglist_iter *iter) +{ + struct sk_buff *skb = iter->frag; + + iter->frag = skb->next; + skb_mark_not_on_list(skb); + + return skb; +} + void ip_send_check(struct iphdr *ip); int __ip_local_out(struct net *net, struct sock *sk, struct sk_buff *skb); int ip_local_out(struct net *net, struct sock *sk, struct sk_buff *skb); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 10b35328cfbc..d53fa795feb4 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -562,6 +562,54 @@ static int ip_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, return ip_do_fragment(net, sk, skb, output); } +void ip_fraglist_init(struct sk_buff *skb, struct iphdr *iph, + unsigned int hlen, struct ip_fraglist_iter *iter) +{ + unsigned int first_len = skb_pagelen(skb); + + iter->frag_list = skb_shinfo(skb)->frag_list; + iter->frag = iter->frag_list; + skb_frag_list_init(skb); + + iter->offset = 0; + iter->iph = iph; + iter->hlen = hlen; + + skb->data_len = first_len - skb_headlen(skb); + skb->len = first_len; + iph->tot_len = htons(first_len); + iph->frag_off = htons(IP_MF); + ip_send_check(iph); +} +EXPORT_SYMBOL(ip_fraglist_init); + +void ip_fraglist_prepare(struct sk_buff *skb, struct ip_fraglist_iter *iter) +{ + unsigned int hlen = iter->hlen; + struct iphdr *iph = iter->iph; + struct sk_buff *frag; + + frag = iter->frag; + frag->ip_summed = CHECKSUM_NONE; + skb_reset_transport_header(frag); + __skb_push(frag, hlen); + skb_reset_network_header(frag); + memcpy(skb_network_header(frag), iph, hlen); + iter->iph = ip_hdr(frag); + iph = iter->iph; + iph->tot_len = htons(frag->len); + ip_copy_metadata(frag, skb); + if (iter->offset == 0) + ip_options_fragment(frag); + iter->offset += skb->len - hlen; + iph->frag_off = htons(iter->offset >> 3); + if (frag->next) + iph->frag_off |= htons(IP_MF); + /* Ready, complete checksum */ + ip_send_check(iph); +} +EXPORT_SYMBOL(ip_fraglist_prepare); + /* * This IP datagram is too large to be sent in one piece. Break it up into * smaller pieces (each of size equal to IP header plus @@ -579,6 +627,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, int offset; __be16 not_last_frag; struct rtable *rt = skb_rtable(skb); + struct ip_fraglist_iter iter; int err = 0; /* for offloaded checksums cleanup checksum before fragmentation */ @@ -643,49 +692,22 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, } /* Everything is OK. Generate! */ - - err = 0; - offset = 0; - frag = skb_shinfo(skb)->frag_list; - skb_frag_list_init(skb); - skb->data_len = first_len - skb_headlen(skb); - skb->len = first_len; - iph->tot_len = htons(first_len); - iph->frag_off = htons(IP_MF); - ip_send_check(iph); + ip_fraglist_init(skb, iph, hlen, &iter); for (;;) { /* Prepare header of the next frame, * before previous one went down. */ - if (frag) { - frag->ip_summed = CHECKSUM_NONE; - skb_reset_transport_header(frag); - __skb_push(frag, hlen); - skb_reset_network_header(frag); - memcpy(skb_network_header(frag), iph, hlen); - iph = ip_hdr(frag); - iph->tot_len = htons(frag->len); - ip_copy_metadata(frag, skb); - if (offset == 0) - ip_options_fragment(frag); - offset += skb->len - hlen; - iph->frag_off = htons(offset>>3); - if (frag->next) - iph->frag_off |= htons(IP_MF); - /* Ready, complete checksum */ - ip_send_check(iph); - } + if (iter.frag) + ip_fraglist_prepare(skb, &iter); err = output(net, sk, skb); if (!err) IP_INC_STATS(net, IPSTATS_MIB_FRAGCREATES); - if (err || !frag) + if (err || !iter.frag) break; - skb = frag; - frag = skb->next; - skb_mark_not_on_list(skb); + skb = ip_fraglist_next(&iter); } if (err == 0) { @@ -693,7 +715,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, return 0; } - kfree_skb_list(frag); + kfree_skb_list(iter.frag_list); IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS); return err; From patchwork Wed Apr 3 15:39:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 1076121 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=netfilter.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44Z9Jb2WKNz9sNd for ; Thu, 4 Apr 2019 02:40:31 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727053AbfDCPka (ORCPT ); Wed, 3 Apr 2019 11:40:30 -0400 Received: from mail.us.es ([193.147.175.20]:56730 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726879AbfDCPkO (ORCPT ); Wed, 3 Apr 2019 11:40:14 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 5C1411798BE for ; Wed, 3 Apr 2019 17:40:11 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 45E36DA791 for ; Wed, 3 Apr 2019 17:40:11 +0200 (CEST) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id 388C4DA71F; Wed, 3 Apr 2019 17:40:11 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 704D9DA70A; Wed, 3 Apr 2019 17:40:08 +0200 (CEST) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Wed, 03 Apr 2019 17:40:08 +0200 (CEST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (sys.soleta.eu [212.170.55.40]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id 2C8EA4265A31; Wed, 3 Apr 2019 17:40:08 +0200 (CEST) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: fw@strlen.de, eric.dumazet@gmail.com, davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH net-next, RFC 3/8] net: ipv6: add skbuff fraglist split iterator Date: Wed, 3 Apr 2019 17:39:56 +0200 Message-Id: <20190403154001.8177-4-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190403154001.8177-1-pablo@netfilter.org> References: <20190403154001.8177-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch adds skbuff fraglist split iterator. The API consists of: * ip6_fraglist_init(), that initializes the internal state of the fraglist iterator. * ip6_fraglist_prepare(), that restores the IPv4 and transport headers on the fragment. * ip6_fraglist_next(), that retrieves the fragment from the fraglist and updates the internal state of the iterator to point to the next fragment in the fraglist. The ip6_fraglist_iter object stores the internal state of the iterator. This code has been extracted from ip6_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath for refragmentation. Signed-off-by: Pablo Neira Ayuso --- include/net/ipv6.h | 25 ++++++++++ net/ipv6/ip6_output.c | 132 +++++++++++++++++++++++++++++--------------------- 2 files changed, 102 insertions(+), 55 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index daf80863d3a5..acefbc718abe 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -154,6 +154,31 @@ struct frag_hdr { #define IP6_MF 0x0001 #define IP6_OFFSET 0xFFF8 +struct ip6_fraglist_iter { + struct ipv6hdr *tmp_hdr; + struct sk_buff *frag_list; + struct sk_buff *frag; + int offset; + unsigned int hlen; + __be32 frag_id; + u8 nexthdr; +}; + +int ip6_fraglist_init(struct sk_buff *skb, unsigned int hlen, u8 *prevhdr, + u8 nexthdr, __be32 frag_id, + struct ip6_fraglist_iter *iter); +void ip6_fraglist_prepare(struct sk_buff *skb, struct ip6_fraglist_iter *iter); + +static inline struct sk_buff *ip6_fraglist_next(struct ip6_fraglist_iter *iter) +{ + struct sk_buff *skb = iter->frag; + + iter->frag = skb->next; + skb_mark_not_on_list(skb); + + return skb; +} + #define IP6_REPLY_MARK(net, mark) \ ((net)->ipv6.sysctl.fwmark_reflect ? (mark) : 0) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index edbd12067170..006b954bee89 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -592,6 +592,73 @@ static void ip6_copy_metadata(struct sk_buff *to, struct sk_buff *from) skb_copy_secmark(to, from); } +int ip6_fraglist_init(struct sk_buff *skb, unsigned int hlen, u8 *prevhdr, + u8 nexthdr, __be32 frag_id, + struct ip6_fraglist_iter *iter) +{ + unsigned int first_len; + struct frag_hdr *fh; + + /* BUILD HEADER */ + *prevhdr = NEXTHDR_FRAGMENT; + iter->tmp_hdr = kmemdup(skb_network_header(skb), hlen, GFP_ATOMIC); + if (!iter->tmp_hdr) + return -ENOMEM; + + iter->frag_list = skb_shinfo(skb)->frag_list; + iter->frag = iter->frag_list; + skb_frag_list_init(skb); + + iter->offset = 0; + iter->hlen = hlen; + iter->frag_id = frag_id; + iter->nexthdr = nexthdr; + + __skb_pull(skb, hlen); + fh = __skb_push(skb, sizeof(struct frag_hdr)); + __skb_push(skb, hlen); + skb_reset_network_header(skb); + memcpy(skb_network_header(skb), iter->tmp_hdr, hlen); + + fh->nexthdr = nexthdr; + fh->reserved = 0; + fh->frag_off = htons(IP6_MF); + fh->identification = frag_id; + + first_len = skb_pagelen(skb); + skb->data_len = first_len - skb_headlen(skb); + skb->len = first_len; + ipv6_hdr(skb)->payload_len = htons(first_len - sizeof(struct ipv6hdr)); + + return 0; +} +EXPORT_SYMBOL(ip6_fraglist_init); + +void ip6_fraglist_prepare(struct sk_buff *skb, + struct ip6_fraglist_iter *iter) +{ + struct sk_buff *frag = iter->frag; + unsigned int hlen = iter->hlen; + struct frag_hdr *fh; + + frag->ip_summed = CHECKSUM_NONE; + skb_reset_transport_header(frag); + fh = __skb_push(frag, sizeof(struct frag_hdr)); + __skb_push(frag, hlen); + skb_reset_network_header(frag); + memcpy(skb_network_header(frag), iter->tmp_hdr, hlen); + iter->offset += skb->len - hlen - sizeof(struct frag_hdr); + fh->nexthdr = iter->nexthdr; + fh->reserved = 0; + fh->frag_off = htons(iter->offset); + if (frag->next) + fh->frag_off |= htons(IP6_MF); + fh->identification = iter->frag_id; + ipv6_hdr(frag)->payload_len = htons(frag->len - sizeof(struct ipv6hdr)); + ip6_copy_metadata(frag, skb); +} +EXPORT_SYMBOL(ip6_fraglist_prepare); + int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, int (*output)(struct net *, struct sock *, struct sk_buff *)) { @@ -599,7 +666,6 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ? inet6_sk(skb->sk) : NULL; - struct ipv6hdr *tmp_hdr; struct frag_hdr *fh; unsigned int mtu, hlen, left, len; int hroom, troom; @@ -649,6 +715,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, hroom = LL_RESERVED_SPACE(rt->dst.dev); if (skb_has_frag_list(skb)) { unsigned int first_len = skb_pagelen(skb); + struct ip6_fraglist_iter iter; struct sk_buff *frag2; if (first_len - hlen > mtu || @@ -676,74 +743,29 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, skb->truesize -= frag->truesize; } - err = 0; - offset = 0; - /* BUILD HEADER */ - - *prevhdr = NEXTHDR_FRAGMENT; - tmp_hdr = kmemdup(skb_network_header(skb), hlen, GFP_ATOMIC); - if (!tmp_hdr) { - err = -ENOMEM; + err = ip6_fraglist_init(skb, hlen, prevhdr, nexthdr, frag_id, + &iter); + if (err < 0) goto fail; - } - frag = skb_shinfo(skb)->frag_list; - skb_frag_list_init(skb); - - __skb_pull(skb, hlen); - fh = __skb_push(skb, sizeof(struct frag_hdr)); - __skb_push(skb, hlen); - skb_reset_network_header(skb); - memcpy(skb_network_header(skb), tmp_hdr, hlen); - - fh->nexthdr = nexthdr; - fh->reserved = 0; - fh->frag_off = htons(IP6_MF); - fh->identification = frag_id; - - first_len = skb_pagelen(skb); - skb->data_len = first_len - skb_headlen(skb); - skb->len = first_len; - ipv6_hdr(skb)->payload_len = htons(first_len - - sizeof(struct ipv6hdr)); for (;;) { /* Prepare header of the next frame, * before previous one went down. */ - if (frag) { - frag->ip_summed = CHECKSUM_NONE; - skb_reset_transport_header(frag); - fh = __skb_push(frag, sizeof(struct frag_hdr)); - __skb_push(frag, hlen); - skb_reset_network_header(frag); - memcpy(skb_network_header(frag), tmp_hdr, - hlen); - offset += skb->len - hlen - sizeof(struct frag_hdr); - fh->nexthdr = nexthdr; - fh->reserved = 0; - fh->frag_off = htons(offset); - if (frag->next) - fh->frag_off |= htons(IP6_MF); - fh->identification = frag_id; - ipv6_hdr(frag)->payload_len = - htons(frag->len - - sizeof(struct ipv6hdr)); - ip6_copy_metadata(frag, skb); - } + if (iter.frag) + ip6_fraglist_prepare(skb, &iter); err = output(net, sk, skb); if (!err) IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), IPSTATS_MIB_FRAGCREATES); - if (err || !frag) + if (err || !iter.frag) break; - skb = frag; - frag = skb->next; - skb_mark_not_on_list(skb); + skb = ip6_fraglist_next(&iter); } - kfree(tmp_hdr); + kfree(iter.tmp_hdr); if (err == 0) { IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), @@ -751,7 +773,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, return 0; } - kfree_skb_list(frag); + kfree_skb_list(iter.frag_list); IP6_INC_STATS(net, ip6_dst_idev(&rt->dst), IPSTATS_MIB_FRAGFAILS); From patchwork Wed Apr 3 15:39:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 1076120 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=netfilter.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44Z9JY6dgmz9sRt for ; Thu, 4 Apr 2019 02:40:29 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727039AbfDCPk2 (ORCPT ); Wed, 3 Apr 2019 11:40:28 -0400 Received: from mail.us.es ([193.147.175.20]:56758 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726183AbfDCPkO (ORCPT ); Wed, 3 Apr 2019 11:40:14 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id AA5D017AE52 for ; Wed, 3 Apr 2019 17:40:11 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 943FEDA738 for ; Wed, 3 Apr 2019 17:40:11 +0200 (CEST) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id 9350CDA70E; Wed, 3 Apr 2019 17:40:11 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 2751CDA717; Wed, 3 Apr 2019 17:40:09 +0200 (CEST) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Wed, 03 Apr 2019 17:40:09 +0200 (CEST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (sys.soleta.eu [212.170.55.40]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id D25704265A31; Wed, 3 Apr 2019 17:40:08 +0200 (CEST) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: fw@strlen.de, eric.dumazet@gmail.com, davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH net-next, RFC 4/8] net: ipv4: split skbuff into fragments transformer Date: Wed, 3 Apr 2019 17:39:57 +0200 Message-Id: <20190403154001.8177-5-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190403154001.8177-1-pablo@netfilter.org> References: <20190403154001.8177-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch exposes a new API to refragment a skbuff. The allows you to split either linear skbuff or force the refragmentation of an existing fraglist. The API consists of: * ip_frag_init(), that initializes the internal state of the transformer. * ip_frag_next(), that allows you to fetch the next fragment. This function internally allocates the skbuff that represents the fragment, it pushes the IPv4 and transport headers, and it also copies the payload for this fragment. The ip_frag_state object stores the internal state of the splitter. This code has been extracted from ip_do_fragment(). Symbols are also exported to allow to reuse this transformer from the bridge codepath for refragmentation. Signed-off-by: Pablo Neira Ayuso --- include/net/ip.h | 16 +++++ net/ipv4/ip_output.c | 200 ++++++++++++++++++++++++++++----------------------- 2 files changed, 128 insertions(+), 88 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index cc3d83680f4f..94f654e6bd41 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -188,6 +188,22 @@ static inline struct sk_buff *ip_fraglist_next(struct ip_fraglist_iter *iter) return skb; } +struct ip_frag_state { + struct iphdr *iph; + unsigned int hlen; + unsigned int ll_rs; + unsigned int mtu; + unsigned int left; + int offset; + int ptr; + __be16 not_last_frag; +}; + +void ip_frag_init(struct sk_buff *skb, unsigned int hlen, unsigned int ll_rs, + unsigned int mtu, struct ip_frag_state *state); +struct sk_buff *ip_frag_next(struct sk_buff *skb, + struct ip_frag_state *state); + void ip_send_check(struct iphdr *ip); int __ip_local_out(struct net *net, struct sock *sk, struct sk_buff *skb); int ip_local_out(struct net *net, struct sock *sk, struct sk_buff *skb); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index d53fa795feb4..ab39bbfe8197 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -610,6 +610,111 @@ void ip_fraglist_prepare(struct sk_buff *skb, struct ip_fraglist_iter *iter) } EXPORT_SYMBOL(ip_fraglist_prepare); +void ip_frag_init(struct sk_buff *skb, unsigned int hlen, + unsigned int ll_rs, unsigned int mtu, + struct ip_frag_state *state) +{ + struct iphdr *iph = ip_hdr(skb); + + state->hlen = hlen; + state->ll_rs = ll_rs; + state->mtu = mtu; + + state->left = skb->len - hlen; /* Space per frame */ + state->ptr = hlen; /* Where to start from */ + + state->offset = (ntohs(iph->frag_off) & IP_OFFSET) << 3; + state->not_last_frag = iph->frag_off & htons(IP_MF); +} +EXPORT_SYMBOL(ip_frag_init); + +struct sk_buff *ip_frag_next(struct sk_buff *skb, struct ip_frag_state *state) +{ + unsigned int len = state->left; + struct sk_buff *skb2; + struct iphdr *iph; + + len = state->left; + /* IF: it doesn't fit, use 'mtu' - the data space left */ + if (len > state->mtu) + len = state->mtu; + /* IF: we are not sending up to and including the packet end + then align the next start on an eight byte boundary */ + if (len < state->left) { + len &= ~7; + } + + /* Allocate buffer */ + skb2 = alloc_skb(len + state->hlen + state->ll_rs, GFP_ATOMIC); + if (!skb2) + return ERR_PTR(-ENOMEM); + + /* + * Set up data on packet + */ + + ip_copy_metadata(skb2, skb); + skb_reserve(skb2, state->ll_rs); + skb_put(skb2, len + state->hlen); + skb_reset_network_header(skb2); + skb2->transport_header = skb2->network_header + state->hlen; + + /* + * Charge the memory for the fragment to any owner + * it might possess + */ + + if (skb->sk) + skb_set_owner_w(skb2, skb->sk); + + /* + * Copy the packet header into the new buffer. + */ + + skb_copy_from_linear_data(skb, skb_network_header(skb2), state->hlen); + + /* + * Copy a block of the IP datagram. + */ + if (skb_copy_bits(skb, state->ptr, skb_transport_header(skb2), len)) + BUG(); + state->left -= len; + + /* + * Fill in the new header fields. + */ + iph = ip_hdr(skb2); + iph->frag_off = htons((state->offset >> 3)); + + if (IPCB(skb)->flags & IPSKB_FRAG_PMTU) + iph->frag_off |= htons(IP_DF); + + /* ANK: dirty, but effective trick. Upgrade options only if + * the segment to be fragmented was THE FIRST (otherwise, + * options are already fixed) and make it ONCE + * on the initial skb, so that all the following fragments + * will inherit fixed options. + */ + if (state->offset == 0) + ip_options_fragment(skb); + + /* + * Added AC : If we are fragmenting a fragment that's not the + * last fragment then keep MF on each bit + */ + if (state->left > 0 || state->not_last_frag) + iph->frag_off |= htons(IP_MF); + state->ptr += len; + state->offset += len; + + iph->tot_len = htons(len + state->hlen); + + ip_send_check(iph); + + return skb2; +} +EXPORT_SYMBOL(ip_frag_next); + /* * This IP datagram is too large to be sent in one piece. Break it up into * smaller pieces (each of size equal to IP header plus @@ -621,13 +726,11 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, int (*output)(struct net *, struct sock *, struct sk_buff *)) { struct iphdr *iph; - int ptr; struct sk_buff *skb2; - unsigned int mtu, hlen, left, len, ll_rs; - int offset; - __be16 not_last_frag; struct rtable *rt = skb_rtable(skb); + unsigned int mtu, hlen, ll_rs; struct ip_fraglist_iter iter; + struct ip_frag_state state; int err = 0; /* for offloaded checksums cleanup checksum before fragmentation */ @@ -731,105 +834,26 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, } slow_path: - iph = ip_hdr(skb); - - left = skb->len - hlen; /* Space per frame */ - ptr = hlen; /* Where to start from */ - /* * Fragment the datagram. */ - offset = (ntohs(iph->frag_off) & IP_OFFSET) << 3; - not_last_frag = iph->frag_off & htons(IP_MF); + ip_frag_init(skb, hlen, ll_rs, mtu, &state); /* * Keep copying data until we run out. */ - while (left > 0) { - len = left; - /* IF: it doesn't fit, use 'mtu' - the data space left */ - if (len > mtu) - len = mtu; - /* IF: we are not sending up to and including the packet end - then align the next start on an eight byte boundary */ - if (len < left) { - len &= ~7; - } - - /* Allocate buffer */ - skb2 = alloc_skb(len + hlen + ll_rs, GFP_ATOMIC); - if (!skb2) { - err = -ENOMEM; + while (state.left > 0) { + skb2 = ip_frag_next(skb, &state); + if (IS_ERR(skb2)) { + err = PTR_ERR(skb2); goto fail; } /* - * Set up data on packet - */ - - ip_copy_metadata(skb2, skb); - skb_reserve(skb2, ll_rs); - skb_put(skb2, len + hlen); - skb_reset_network_header(skb2); - skb2->transport_header = skb2->network_header + hlen; - - /* - * Charge the memory for the fragment to any owner - * it might possess - */ - - if (skb->sk) - skb_set_owner_w(skb2, skb->sk); - - /* - * Copy the packet header into the new buffer. - */ - - skb_copy_from_linear_data(skb, skb_network_header(skb2), hlen); - - /* - * Copy a block of the IP datagram. - */ - if (skb_copy_bits(skb, ptr, skb_transport_header(skb2), len)) - BUG(); - left -= len; - - /* - * Fill in the new header fields. - */ - iph = ip_hdr(skb2); - iph->frag_off = htons((offset >> 3)); - - if (IPCB(skb)->flags & IPSKB_FRAG_PMTU) - iph->frag_off |= htons(IP_DF); - - /* ANK: dirty, but effective trick. Upgrade options only if - * the segment to be fragmented was THE FIRST (otherwise, - * options are already fixed) and make it ONCE - * on the initial skb, so that all the following fragments - * will inherit fixed options. - */ - if (offset == 0) - ip_options_fragment(skb); - - /* - * Added AC : If we are fragmenting a fragment that's not the - * last fragment then keep MF on each bit - */ - if (left > 0 || not_last_frag) - iph->frag_off |= htons(IP_MF); - ptr += len; - offset += len; - - /* * Put this fragment into the sending queue. */ - iph->tot_len = htons(len + hlen); - - ip_send_check(iph); - err = output(net, sk, skb2); if (err) goto fail; From patchwork Wed Apr 3 15:39:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 1076113 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=netfilter.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44Z9JS11xrz9sNd for ; Thu, 4 Apr 2019 02:40:24 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726905AbfDCPkR (ORCPT ); Wed, 3 Apr 2019 11:40:17 -0400 Received: from mail.us.es ([193.147.175.20]:56802 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726892AbfDCPkO (ORCPT ); Wed, 3 Apr 2019 11:40:14 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 2FA8B1878AF for ; Wed, 3 Apr 2019 17:40:12 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 19777DA712 for ; Wed, 3 Apr 2019 17:40:12 +0200 (CEST) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id 0D945DA70A; Wed, 3 Apr 2019 17:40:12 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id BF7D0DA702; Wed, 3 Apr 2019 17:40:09 +0200 (CEST) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Wed, 03 Apr 2019 17:40:09 +0200 (CEST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (sys.soleta.eu [212.170.55.40]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id 784434265A31; Wed, 3 Apr 2019 17:40:09 +0200 (CEST) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: fw@strlen.de, eric.dumazet@gmail.com, davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH net-next, RFC 5/8] net: ipv6: split skbuff into fragments transformer Date: Wed, 3 Apr 2019 17:39:58 +0200 Message-Id: <20190403154001.8177-6-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190403154001.8177-1-pablo@netfilter.org> References: <20190403154001.8177-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch exposes a new API to refragment a skbuff. The allows you to split either linear skbuff or force the refragmentation of an existing fraglist. The API consists of: * ip6_frag_init(), that initializes the internal state of the transformer. * ip6_frag_next(), that allows you to fetch the next fragment. This function internally allocates the skbuff that represents the fragment, it pushes the IPv6 and transport headers, and it also copies the payload for this fragment. The ip6_frag_state object stores the internal state of the splitter. This code has been extracted from ip6_fragment(). Symbols are also exported to allow to reuse this transformer from the bridge codepath for refragmentation. Signed-off-by: Pablo Neira Ayuso --- include/net/ipv6.h | 19 ++++++ net/ipv6/ip6_output.c | 183 +++++++++++++++++++++++++++++--------------------- 2 files changed, 126 insertions(+), 76 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index acefbc718abe..21bb830e9679 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -179,6 +179,25 @@ static inline struct sk_buff *ip6_fraglist_next(struct ip6_fraglist_iter *iter) return skb; } +struct ip6_frag_state { + u8 *prevhdr; + unsigned int hlen; + unsigned int mtu; + unsigned int left; + int offset; + int ptr; + int hroom; + int troom; + __be32 frag_id; + u8 nexthdr; +}; + +void ip6_frag_init(struct sk_buff *skb, unsigned int hlen, unsigned int mtu, + unsigned short needed_tailroom, int hdr_room, u8 *prevhdr, + u8 nexthdr, __be32 frag_id, struct ip6_frag_state *state); +struct sk_buff *ip6_frag_next(struct sk_buff *skb, + struct ip6_frag_state *state); + #define IP6_REPLY_MARK(net, mark) \ ((net)->ipv6.sysctl.fwmark_reflect ? (mark) : 0) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 006b954bee89..8fae19e492ed 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -659,6 +659,103 @@ void ip6_fraglist_prepare(struct sk_buff *skb, } EXPORT_SYMBOL(ip6_fraglist_prepare); +void ip6_frag_init(struct sk_buff *skb, unsigned int hlen, unsigned int mtu, + unsigned short needed_tailroom, int hdr_room, u8 *prevhdr, + u8 nexthdr, __be32 frag_id, struct ip6_frag_state *state) +{ + state->prevhdr = prevhdr; + state->nexthdr = nexthdr; + state->frag_id = frag_id; + + state->hlen = hlen; + state->mtu = mtu; + + state->left = skb->len - hlen; /* Space per frame */ + state->ptr = hlen; /* Where to start from */ + + state->hroom = hdr_room; + state->troom = needed_tailroom; + + state->offset = 0; +} +EXPORT_SYMBOL(ip6_frag_init); + +struct sk_buff *ip6_frag_next(struct sk_buff *skb, struct ip6_frag_state *state) +{ + u8 *prevhdr = state->prevhdr, *fragnexthdr_offset; + struct sk_buff *frag; + struct frag_hdr *fh; + unsigned int len; + + len = state->left; + /* IF: it doesn't fit, use 'mtu' - the data space left */ + if (len > state->mtu) + len = state->mtu; + /* IF: we are not sending up to and including the packet end + then align the next start on an eight byte boundary */ + if (len < state->left) + len &= ~7; + + /* Allocate buffer */ + frag = alloc_skb(len + state->hlen + sizeof(struct frag_hdr) + + state->hroom + state->troom, GFP_ATOMIC); + if (!frag) + return ERR_PTR(-ENOMEM); + + /* + * Set up data on packet + */ + + ip6_copy_metadata(frag, skb); + skb_reserve(frag, state->hroom); + skb_put(frag, len + state->hlen + sizeof(struct frag_hdr)); + skb_reset_network_header(frag); + fh = (struct frag_hdr *)(skb_network_header(frag) + state->hlen); + frag->transport_header = (frag->network_header + state->hlen + + sizeof(struct frag_hdr)); + + /* + * Charge the memory for the fragment to any owner + * it might possess + */ + if (skb->sk) + skb_set_owner_w(frag, skb->sk); + + /* + * Copy the packet header into the new buffer. + */ + skb_copy_from_linear_data(skb, skb_network_header(frag), state->hlen); + + fragnexthdr_offset = skb_network_header(frag); + fragnexthdr_offset += prevhdr - skb_network_header(skb); + *fragnexthdr_offset = NEXTHDR_FRAGMENT; + + /* + * Build fragment header. + */ + fh->nexthdr = state->nexthdr; + fh->reserved = 0; + fh->identification = state->frag_id; + + /* + * Copy a block of the IP datagram. + */ + BUG_ON(skb_copy_bits(skb, state->ptr, skb_transport_header(frag), + len)); + state->left -= len; + + fh->frag_off = htons(state->offset); + if (state->left > 0) + fh->frag_off |= htons(IP6_MF); + ipv6_hdr(frag)->payload_len = htons(frag->len - sizeof(struct ipv6hdr)); + + state->ptr += len; + state->offset += len; + + return frag; +} +EXPORT_SYMBOL(ip6_frag_next); + int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, int (*output)(struct net *, struct sock *, struct sk_buff *)) { @@ -666,11 +763,10 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ? inet6_sk(skb->sk) : NULL; - struct frag_hdr *fh; - unsigned int mtu, hlen, left, len; - int hroom, troom; + struct ip6_frag_state state; + unsigned int mtu, hlen; + int hroom, err = 0; __be32 frag_id; - int ptr, offset = 0, err = 0; u8 *prevhdr, nexthdr = 0; err = ip6_find_1stfragopt(skb, &prevhdr); @@ -790,91 +886,26 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, } slow_path: - left = skb->len - hlen; /* Space per frame */ - ptr = hlen; /* Where to start from */ - /* * Fragment the datagram. */ - troom = rt->dst.dev->needed_tailroom; + ip6_frag_init(skb, hlen, mtu, rt->dst.dev->needed_tailroom, + LL_RESERVED_SPACE(rt->dst.dev), prevhdr, nexthdr, frag_id, + &state); /* * Keep copying data until we run out. */ - while (left > 0) { - u8 *fragnexthdr_offset; - - len = left; - /* IF: it doesn't fit, use 'mtu' - the data space left */ - if (len > mtu) - len = mtu; - /* IF: we are not sending up to and including the packet end - then align the next start on an eight byte boundary */ - if (len < left) { - len &= ~7; - } - /* Allocate buffer */ - frag = alloc_skb(len + hlen + sizeof(struct frag_hdr) + - hroom + troom, GFP_ATOMIC); - if (!frag) { - err = -ENOMEM; + while (state.left > 0) { + frag = ip6_frag_next(skb, &state); + if (IS_ERR(frag)) { + err = PTR_ERR(frag); goto fail; } /* - * Set up data on packet - */ - - ip6_copy_metadata(frag, skb); - skb_reserve(frag, hroom); - skb_put(frag, len + hlen + sizeof(struct frag_hdr)); - skb_reset_network_header(frag); - fh = (struct frag_hdr *)(skb_network_header(frag) + hlen); - frag->transport_header = (frag->network_header + hlen + - sizeof(struct frag_hdr)); - - /* - * Charge the memory for the fragment to any owner - * it might possess - */ - if (skb->sk) - skb_set_owner_w(frag, skb->sk); - - /* - * Copy the packet header into the new buffer. - */ - skb_copy_from_linear_data(skb, skb_network_header(frag), hlen); - - fragnexthdr_offset = skb_network_header(frag); - fragnexthdr_offset += prevhdr - skb_network_header(skb); - *fragnexthdr_offset = NEXTHDR_FRAGMENT; - - /* - * Build fragment header. - */ - fh->nexthdr = nexthdr; - fh->reserved = 0; - fh->identification = frag_id; - - /* - * Copy a block of the IP datagram. - */ - BUG_ON(skb_copy_bits(skb, ptr, skb_transport_header(frag), - len)); - left -= len; - - fh->frag_off = htons(offset); - if (left > 0) - fh->frag_off |= htons(IP6_MF); - ipv6_hdr(frag)->payload_len = htons(frag->len - - sizeof(struct ipv6hdr)); - - ptr += len; - offset += len; - - /* * Put this fragment into the sending queue. */ err = output(net, sk, frag); From patchwork Wed Apr 3 15:39:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 1076109 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=netfilter.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44Z9JM53s2z9sNd for ; Thu, 4 Apr 2019 02:40:19 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727020AbfDCPkS (ORCPT ); Wed, 3 Apr 2019 11:40:18 -0400 Received: from mail.us.es ([193.147.175.20]:56834 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726918AbfDCPkO (ORCPT ); Wed, 3 Apr 2019 11:40:14 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id AC8EC1878C5 for ; Wed, 3 Apr 2019 17:40:12 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 9E7D7DA706 for ; Wed, 3 Apr 2019 17:40:12 +0200 (CEST) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id 7D27EDA70E; Wed, 3 Apr 2019 17:40:12 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 6ED2DDA709; Wed, 3 Apr 2019 17:40:10 +0200 (CEST) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Wed, 03 Apr 2019 17:40:10 +0200 (CEST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (sys.soleta.eu [212.170.55.40]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id 2BA8E4265A31; Wed, 3 Apr 2019 17:40:10 +0200 (CEST) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: fw@strlen.de, eric.dumazet@gmail.com, davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH net-next, RFC 6/8] net: ipv4: place cb handling away from fraglist iterator Date: Wed, 3 Apr 2019 17:39:59 +0200 Message-Id: <20190403154001.8177-7-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190403154001.8177-1-pablo@netfilter.org> References: <20190403154001.8177-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Deal with the IPCB() area away from the fraglist iterator. The bridge codebase has its own control buffer layout, move specific IP control buffer into function. Signed-off-by: Pablo Neira Ayuso --- net/ipv4/ip_output.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index ab39bbfe8197..1dc731d7be0e 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -526,9 +526,6 @@ static void ip_copy_metadata(struct sk_buff *to, struct sk_buff *from) skb_copy_hash(to, from); - /* Copy the flags to each fragment. */ - IPCB(to)->flags = IPCB(from)->flags; - #ifdef CONFIG_NET_SCHED to->tc_index = from->tc_index; #endif @@ -583,6 +580,18 @@ void ip_fraglist_init(struct sk_buff *skb, struct iphdr *iph, } EXPORT_SYMBOL(ip_fraglist_init); +static void ip_fraglist_ipcb_prepare(struct sk_buff *skb, + struct ip_fraglist_iter *iter) +{ + struct sk_buff *to = iter->frag; + + /* Copy the flags to each fragment. */ + IPCB(to)->flags = IPCB(skb)->flags; + + if (iter->offset == 0) + ip_options_fragment(to); +} + void ip_fraglist_prepare(struct sk_buff *skb, struct ip_fraglist_iter *iter) { unsigned int hlen = iter->hlen; @@ -599,8 +608,6 @@ void ip_fraglist_prepare(struct sk_buff *skb, struct ip_fraglist_iter *iter) iph = iter->iph; iph->tot_len = htons(frag->len); ip_copy_metadata(frag, skb); - if (iter->offset == 0) - ip_options_fragment(frag); iter->offset += skb->len - hlen; iph->frag_off = htons(iter->offset >> 3); if (frag->next) @@ -800,8 +807,10 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, for (;;) { /* Prepare header of the next frame, * before previous one went down. */ - if (iter.frag) + if (iter.frag) { + ip_fraglist_ipcb_prepare(skb, &iter); ip_fraglist_prepare(skb, &iter); + } err = output(net, sk, skb); From patchwork Wed Apr 3 15:40:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 1076115 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=netfilter.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44Z9JV2KNZz9sRt for ; Thu, 4 Apr 2019 02:40:26 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726886AbfDCPkR (ORCPT ); Wed, 3 Apr 2019 11:40:17 -0400 Received: from mail.us.es ([193.147.175.20]:56858 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726930AbfDCPkP (ORCPT ); Wed, 3 Apr 2019 11:40:15 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 5CFE31878B4 for ; Wed, 3 Apr 2019 17:40:13 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 4DAB5DA715 for ; Wed, 3 Apr 2019 17:40:13 +0200 (CEST) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id 4CC5FDA713; Wed, 3 Apr 2019 17:40:13 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 20E84DA703; Wed, 3 Apr 2019 17:40:11 +0200 (CEST) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Wed, 03 Apr 2019 17:40:11 +0200 (CEST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (sys.soleta.eu [212.170.55.40]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id D46F64265A31; Wed, 3 Apr 2019 17:40:10 +0200 (CEST) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: fw@strlen.de, eric.dumazet@gmail.com, davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH net-next, RFC 7/8] net: ipv4: place cb handling away from fragment transformer Date: Wed, 3 Apr 2019 17:40:00 +0200 Message-Id: <20190403154001.8177-8-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190403154001.8177-1-pablo@netfilter.org> References: <20190403154001.8177-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Deal with the IPCB() area away from the fragment transformer. The bridge codebase has its own control buffer layout, move specific IP control buffer into function. Signed-off-by: Pablo Neira Ayuso --- net/ipv4/ip_output.c | 34 ++++++++++++++++++++++------------ 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 1dc731d7be0e..bb0a13f78b43 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -635,6 +635,25 @@ void ip_frag_init(struct sk_buff *skb, unsigned int hlen, } EXPORT_SYMBOL(ip_frag_init); +static void ip_frag_ipcb(struct sk_buff *from, struct sk_buff *to, + bool first_frag, struct ip_frag_state *state) +{ + /* Copy the flags to each fragment. */ + IPCB(to)->flags = IPCB(from)->flags; + + if (IPCB(from)->flags & IPSKB_FRAG_PMTU) + state->iph->frag_off |= htons(IP_DF); + + /* ANK: dirty, but effective trick. Upgrade options only if + * the segment to be fragmented was THE FIRST (otherwise, + * options are already fixed) and make it ONCE + * on the initial skb, so that all the following fragments + * will inherit fixed options. + */ + if (first_frag) + ip_options_fragment(from); +} + struct sk_buff *ip_frag_next(struct sk_buff *skb, struct ip_frag_state *state) { unsigned int len = state->left; @@ -693,18 +712,6 @@ struct sk_buff *ip_frag_next(struct sk_buff *skb, struct ip_frag_state *state) iph = ip_hdr(skb2); iph->frag_off = htons((state->offset >> 3)); - if (IPCB(skb)->flags & IPSKB_FRAG_PMTU) - iph->frag_off |= htons(IP_DF); - - /* ANK: dirty, but effective trick. Upgrade options only if - * the segment to be fragmented was THE FIRST (otherwise, - * options are already fixed) and make it ONCE - * on the initial skb, so that all the following fragments - * will inherit fixed options. - */ - if (state->offset == 0) - ip_options_fragment(skb); - /* * Added AC : If we are fragmenting a fragment that's not the * last fragment then keep MF on each bit @@ -854,11 +861,14 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, */ while (state.left > 0) { + bool first_frag = (state.offset == 0); + skb2 = ip_frag_next(skb, &state); if (IS_ERR(skb2)) { err = PTR_ERR(skb2); goto fail; } + ip_frag_ipcb(skb, skb2, first_frag, &state); /* * Put this fragment into the sending queue. From patchwork Wed Apr 3 15:40:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 1076111 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=netfilter.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44Z9JQ4v8Mz9sSM for ; Thu, 4 Apr 2019 02:40:22 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727025AbfDCPkV (ORCPT ); Wed, 3 Apr 2019 11:40:21 -0400 Received: from mail.us.es ([193.147.175.20]:56834 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726264AbfDCPkT (ORCPT ); Wed, 3 Apr 2019 11:40:19 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 64C401878B6 for ; Wed, 3 Apr 2019 17:40:14 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 4E15DDA713 for ; Wed, 3 Apr 2019 17:40:14 +0200 (CEST) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id 43347DA70F; Wed, 3 Apr 2019 17:40:14 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id D076EDA701; Wed, 3 Apr 2019 17:40:11 +0200 (CEST) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Wed, 03 Apr 2019 17:40:11 +0200 (CEST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (sys.soleta.eu [212.170.55.40]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id 8E4444265A31; Wed, 3 Apr 2019 17:40:11 +0200 (CEST) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: fw@strlen.de, eric.dumazet@gmail.com, davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH net-next, RFC 8/8] netfilter: bridge: add basic conntrack support Date: Wed, 3 Apr 2019 17:40:01 +0200 Message-Id: <20190403154001.8177-9-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190403154001.8177-1-pablo@netfilter.org> References: <20190403154001.8177-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch adds native connection tracking support for the bridge. This patch register two hooks, one at bridge prerouting to call nf_conntrack_in() and another at the bridge postrouting hook to confirm the entry. The conntrack bridge prerouting hook defragments packets and it passes them to nf_conntrack_in() to look up for an existing entry, otherwise a new entry is created in the conntrack table. The conntrack bridge postrouting hook confirms new entries, ie. this is the first packet seen of this flow, then - if needed - it refragments the skbuff into the original fragments, leaving the geometry in untouched, unless this skbuff has been already linearized - eg. passed up to nfqueue and conntrack helpers - or skbuff has been cloned either for the local delivery for tcpdump or in case bridge port flooding is needed. The maximum fragment length is stored and it is used refragment the skbuff. In case the maximum fragment length is larger than the output port, the packet is silently discarded, ie. no refragmentation occurs. The packet defragmentation is still done through the ip_defrag() call. This forces us to save the bridge control buffer, reset the IP control buffer area and then restore it after call. This function bumps the IP fragmentation statistics. The new fraglist iterator and fragment transformer APIs is used to implement the refragmentation code. The br_ip_fragment() function drops the packet in case the maximum fragment size seen is larger than the output port MTU. Health check validations are only performed in case of defragmentation, to follow the principle that conntrack should not drop packets, so users can do it through policy, ie. drop invalid packets. Like br_netfilter, there is no need to refragment packets that are passed up for local delivery, ie. prerouting -> input path. There are calls to nf_reset() already in several spots that were placed there a bit of time ago already, eg. af_packet, that show that skbuff fraglist from the netif_rx path is supported already. This is work-in-progress, missing parts are: 1. A track extension to allow userspace to pick up selected packets for tracking. We cannot inconditionally enable conntrack as in IPv4 and IPv6, we need an extension to specify what zone this conntrack belongs to to support VLAN scenarios. 2. Complete IPv6 support. 3. Netnamespace support. This patch is based on original work from Florian Westphal. Signed-off-by: Pablo Neira Ayuso --- net/bridge/br_private.h | 2 +- net/bridge/netfilter/Kconfig | 14 + net/bridge/netfilter/Makefile | 3 + net/bridge/netfilter/nf_conntrack_bridge.c | 411 +++++++++++++++++++++++++++++ 4 files changed, 429 insertions(+), 1 deletion(-) create mode 100644 net/bridge/netfilter/nf_conntrack_bridge.c diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 00deef7fc1f3..d8d150677fa1 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -430,7 +430,7 @@ struct br_input_skb_cb { int igmp; int mrouters_only; #endif - + u16 nf_frag_max_size; bool proxyarp_replied; bool src_port_isolated; diff --git a/net/bridge/netfilter/Kconfig b/net/bridge/netfilter/Kconfig index 9a0159aebe1a..eb61197d8af8 100644 --- a/net/bridge/netfilter/Kconfig +++ b/net/bridge/netfilter/Kconfig @@ -18,6 +18,20 @@ config NF_LOG_BRIDGE tristate "Bridge packet logging" select NF_LOG_COMMON +config NF_CONNTRACK_BRIDGE + tristate "IPv4/IPV6 bridge connection tracking support" + depends on NF_CONNTRACK + default n + help + Connection tracking keeps a record of what packets have passed + through your machine, in order to figure out how they are related + into connections. This is used to enhance packet filtering via + stateful policies. Enable this if you want native tracking from + the bridge. This is provides a replacement for the `br_netfilter' + infrastructure. + + To compile it as a module, choose M here. If unsure, say N. + endif # NF_TABLES_BRIDGE menuconfig BRIDGE_NF_EBTABLES diff --git a/net/bridge/netfilter/Makefile b/net/bridge/netfilter/Makefile index 9b868861f21a..9d7767322a64 100644 --- a/net/bridge/netfilter/Makefile +++ b/net/bridge/netfilter/Makefile @@ -5,6 +5,9 @@ obj-$(CONFIG_NFT_BRIDGE_REJECT) += nft_reject_bridge.o +# connection tracking +obj-$(CONFIG_NF_CONNTRACK_BRIDGE) += nf_conntrack_bridge.o + # packet logging obj-$(CONFIG_NF_LOG_BRIDGE) += nf_log_bridge.o diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c new file mode 100644 index 000000000000..46fa52778d2c --- /dev/null +++ b/net/bridge/netfilter/nf_conntrack_bridge.c @@ -0,0 +1,411 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include + +#include "../br_private.h" + +struct nf_ct_bridge_frag_data { + char mac[ETH_HLEN]; + u16 vlan_tci; + __be16 vlan_proto; +}; + +/* Best effort variant of ip_do_fragment which preserves geometry, unless skbuff + * has been linearized or cloned. + */ +static int br_ip_fragment(struct net *net, struct sock *sk, struct sk_buff *skb, + struct nf_ct_bridge_frag_data *data, + int (*output)(struct net *, struct sock *sk, + const struct nf_ct_bridge_frag_data *data, + struct sk_buff *)) +{ + int frag_max_size = BR_INPUT_SKB_CB(skb)->nf_frag_max_size; + unsigned int hlen, ll_rs, mtu; + struct ip_frag_state state; + struct iphdr *iph; + int err; + + /* for offloaded checksums cleanup checksum before fragmentation */ + if (skb->ip_summed == CHECKSUM_PARTIAL && + (err = skb_checksum_help(skb))) + goto blackhole; + + iph = ip_hdr(skb); + + /* + * Setup starting values + */ + + hlen = iph->ihl * 4; + frag_max_size -= hlen; + ll_rs = LL_RESERVED_SPACE(skb->dev); + mtu = skb->dev->mtu; + + if (skb_has_frag_list(skb)) { + unsigned int first_len = skb_pagelen(skb); + struct ip_fraglist_iter iter; + struct sk_buff *frag; + + if (first_len - hlen > mtu || + skb_headroom(skb) < ll_rs) + goto blackhole; + + if (skb_cloned(skb)) + goto slow_path; + + skb_walk_frags(skb, frag) { + if (frag->len > mtu || + skb_headroom(frag) < hlen + ll_rs) + goto blackhole; + + if (skb_shared(frag)) + goto slow_path; + } + + ip_fraglist_init(skb, iph, hlen, &iter); + + for (;;) { + if (iter.frag) + ip_fraglist_prepare(skb, &iter); + + err = output(net, sk, data, skb); + if (err || !iter.frag) + break; + + skb = ip_fraglist_next(&iter); + } + return err; + } +slow_path: + /* This is a linearized skbuff, the original geometry is lost for us. + * This may also be a clone skbuff, we could preserve the geometry for + * the copies but probably not worth the effort. + */ + ip_frag_init(skb, hlen, ll_rs, frag_max_size, &state); + + while (state.left > 0) { + struct sk_buff *skb2; + + skb2 = ip_frag_next(skb, &state); + if (IS_ERR(skb2)) { + err = PTR_ERR(skb2); + goto blackhole; + } + + err = output(net, sk, data, skb2); + if (err) + goto blackhole; + } + consume_skb(skb); + return err; + +blackhole: + kfree_skb(skb); + return 0; +} + +/* ip_defrag() expects IPCB() in place. */ +static void br_skb_cb_save(struct sk_buff *skb, struct br_input_skb_cb *cb, + size_t inet_skb_parm_size) +{ + memcpy(cb, skb->cb, sizeof(*cb)); + memset(skb->cb, 0, inet_skb_parm_size); +} + +static void br_skb_cb_restore(struct sk_buff *skb, + const struct br_input_skb_cb *cb, + u16 fragsz) +{ + memcpy(skb->cb, cb, sizeof(*cb)); + BR_INPUT_SKB_CB(skb)->nf_frag_max_size = fragsz; +} + +static int nf_ct_br_defrag4_check(const struct sk_buff *skb) +{ + const struct iphdr *iph; + int nhoff, len; + + nhoff = skb_network_offset(skb); + iph = ip_hdr(skb); + if (iph->ihl < 5 || + iph->version != 4) + return -1; + + len = ntohs(iph->tot_len); + if (skb->len < nhoff + len || + len < (iph->ihl * 4)) + return -1; + + return 0; +} + +static unsigned int nf_ct_br_defrag4(struct sk_buff *skb, + const struct nf_hook_state *state) +{ + u16 zone_id = NF_CT_DEFAULT_ZONE_ID; + enum ip_conntrack_info ctinfo; + struct br_input_skb_cb cb; + const struct nf_conn *ct; + int err; + + if (!ip_is_fragment(ip_hdr(skb))) + return NF_ACCEPT; + + if (nf_ct_br_defrag4_check(skb)) + return NF_ACCEPT; + + ct = nf_ct_get(skb, &ctinfo); + if (ct) + zone_id = nf_ct_zone_id(nf_ct_zone(ct), CTINFO2DIR(ctinfo)); + + br_skb_cb_save(skb, &cb, sizeof(struct inet_skb_parm)); + local_bh_disable(); + err = ip_defrag(state->net, skb, + IP_DEFRAG_CONNTRACK_BRIDGE_IN + zone_id); + local_bh_enable(); + if (!err) { + br_skb_cb_restore(skb, &cb, IPCB(skb)->frag_max_size); + skb->ignore_df = 1; + return NF_ACCEPT; + } + + return NF_STOLEN; +} + +static int nf_ct_br_defrag6_check(const struct sk_buff *skb) +{ + const struct ipv6hdr *hdr; + int nhoff, len; + + nhoff = skb_network_offset(skb); + hdr = ipv6_hdr(skb); + if (hdr->version != 6) + return -1; + + len = ntohs(hdr->payload_len) + sizeof(struct ipv6hdr) + nhoff; + if (skb->len < len) + return -1; + + return 0; +} + +static unsigned int nf_ct_br_defrag6(struct sk_buff *skb, + const struct nf_hook_state *state) +{ + u16 zone_id = NF_CT_DEFAULT_ZONE_ID; + enum ip_conntrack_info ctinfo; + struct br_input_skb_cb cb; + const struct nf_conn *ct; + int err; + + ct = nf_ct_get(skb, &ctinfo); + if (ct) + zone_id = nf_ct_zone_id(nf_ct_zone(ct), CTINFO2DIR(ctinfo)); + + if (nf_ct_br_defrag6_check(skb)) + return NF_ACCEPT; + + br_skb_cb_save(skb, &cb, sizeof(struct inet6_skb_parm)); + + err = nf_ct_frag6_gather(state->net, skb, + IP_DEFRAG_CONNTRACK_BRIDGE_IN + zone_id); + /* queued */ + if (err == -EINPROGRESS) + return NF_STOLEN; + + br_skb_cb_restore(skb, &cb, IP6CB(skb)->frag_max_size); + return err == 0 ? NF_ACCEPT : NF_DROP; +} + +static unsigned int nf_ct_bridge_pre_ipv4(struct sk_buff *skb, + const struct nf_hook_state *state) +{ + u32 len; + + if (!pskb_may_pull(skb, sizeof(struct iphdr))) + return NF_ACCEPT; + + len = ntohs(ip_hdr(skb)->tot_len); + if (pskb_trim_rcsum(skb, len)) + return NF_DROP; + + return nf_ct_br_defrag4(skb, state); +} + +static unsigned int nf_ct_bridge_pre_ipv6(struct sk_buff *skb, + const struct nf_hook_state *state) +{ + u32 len; + + if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) + return NF_ACCEPT; + + len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len); + if (pskb_trim_rcsum(skb, len)) + return NF_DROP; + + return nf_ct_br_defrag6(skb, state); +} + +static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb, + const struct nf_hook_state *state) +{ + struct nf_hook_state bridge_state = *state; + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + int ret; + + ct = nf_ct_get(skb, &ctinfo); + if (ct && !nf_ct_is_template(ct)) + return NF_ACCEPT; + + BR_INPUT_SKB_CB(skb)->nf_frag_max_size = 0; + + switch (skb->protocol) { + case htons(ETH_P_IP): + bridge_state.pf = NFPROTO_IPV4; + ret = nf_ct_bridge_pre_ipv4(skb, &bridge_state); + break; + case htons(ETH_P_IPV6): + bridge_state.pf = NFPROTO_IPV6; + ret = nf_ct_bridge_pre_ipv6(skb, &bridge_state); + default: + return NF_ACCEPT; + } + + if (ret != NF_ACCEPT) + return ret; + + ret = nf_conntrack_in(skb, &bridge_state); + + return ret; +} + +static void nf_ct_bridge_refrag_prepare(struct sk_buff *skb, + struct nf_ct_bridge_frag_data *data) +{ + data->vlan_tci = skb->vlan_tci; + data->vlan_proto = skb->vlan_proto; + skb_copy_from_linear_data_offset(skb, -ETH_HLEN, data->mac, ETH_HLEN); +} + +static unsigned int +nf_ct_bridge_refrag(struct sk_buff *skb, const struct nf_hook_state *state, + int (*output)(struct net *, struct sock *sk, + const struct nf_ct_bridge_frag_data *data, + struct sk_buff *)) +{ + struct nf_ct_bridge_frag_data data; + + if (!BR_INPUT_SKB_CB(skb)->nf_frag_max_size) + return NF_ACCEPT; + + nf_ct_bridge_refrag_prepare(skb, &data); + switch (skb->protocol) { + case htons(ETH_P_IP): + br_ip_fragment(state->net, state->sk, skb, &data, output); + break; + case htons(ETH_P_IPV6): + return NF_DROP; + default: + WARN_ON_ONCE(1); + return NF_DROP; + } + + return NF_STOLEN; +} + +/* Actually only slow path refragmentation needs this. */ +static int nf_ct_bridge_frag_prepare(struct sk_buff *skb, + const struct nf_ct_bridge_frag_data *data) +{ + int err; + + err = skb_cow_head(skb, ETH_HLEN); + if (err) { + kfree_skb(skb); + return -ENOMEM; + } + if (data->vlan_tci) { + skb->vlan_tci = skb->vlan_tci; + skb->vlan_proto = data->vlan_proto; + } + skb_copy_to_linear_data_offset(skb, -ETH_HLEN, data->mac, ETH_HLEN); + skb_reset_mac_header(skb); + + return 0; +} + +static int nf_ct_bridge_refrag_post(struct net *net, struct sock *sk, + const struct nf_ct_bridge_frag_data *data, + struct sk_buff *skb) +{ + int err; + + err = nf_ct_bridge_frag_prepare(skb, data); + if (err < 0) + return err; + + return br_dev_queue_push_xmit(net, sk, skb); +} + +static unsigned int nf_ct_bridge_post(void *priv, struct sk_buff *skb, + const struct nf_hook_state *state) +{ + int ret; + + ret = nf_conntrack_confirm(skb); + if (ret != NF_ACCEPT) + return ret; + + return nf_ct_bridge_refrag(skb, state, nf_ct_bridge_refrag_post); +} + +static struct nf_hook_ops nf_ct_bridge_hook_ops[] __read_mostly = { + { + .hook = nf_ct_bridge_pre, + .pf = NFPROTO_BRIDGE, + .hooknum = NF_INET_PRE_ROUTING, + .priority = NF_IP_PRI_CONNTRACK, + }, + { + .hook = nf_ct_bridge_post, + .pf = NFPROTO_BRIDGE, + .hooknum = NF_INET_POST_ROUTING, + .priority = NF_IP_PRI_CONNTRACK_CONFIRM, + }, +}; + +static int __init nf_conntrack_l3proto_bridge_init(void) +{ + return nf_register_net_hooks(&init_net, nf_ct_bridge_hook_ops, + ARRAY_SIZE(nf_ct_bridge_hook_ops)); +} + +static void __exit nf_conntrack_l3proto_bridge_fini(void) +{ + nf_unregister_net_hooks(&init_net, nf_ct_bridge_hook_ops, + ARRAY_SIZE(nf_ct_bridge_hook_ops)); +} + +module_init(nf_conntrack_l3proto_bridge_init); +module_exit(nf_conntrack_l3proto_bridge_fini); + +MODULE_ALIAS("nf_conntrack-" __stringify(AF_BRIDGE)); +MODULE_LICENSE("GPL");