diff mbox

[net-next,fragmentation,icmp,v4,3/4] bridge_netfilter: No ICMP packet on IPv4 defragmentation timeout

Message ID 1431570482-9236-4-git-send-email-azhou@nicira.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Andy Zhou May 14, 2015, 2:28 a.m. UTC
Currently, on defragmentation timeout error, ICMP error message
will be generated. This is fine when they are used in a routing context,
but does not make sense in the context of bridging netfilter.

This patch adds a bit (IPSKB_NO_FRAG_ICMP) in IPCB to control
whether ICMP error message should be generated. br_netfiler sets
this bit.

Signed-off-by: Andy Zhou <azhou@nicira.com>
---
 include/net/inet_frag.h   |  4 +++-
 include/net/ip.h          |  1 +
 net/bridge/br_netfilter.c |  5 +++++
 net/ipv4/ip_fragment.c    | 12 ++++++++++--
 4 files changed, 19 insertions(+), 3 deletions(-)

Comments

Florian Westphal May 14, 2015, 8:59 a.m. UTC | #1
Andy Zhou <azhou@nicira.com> wrote:
> Currently, on defragmentation timeout error, ICMP error message
> will be generated. This is fine when they are used in a routing context,
> but does not make sense in the context of bridging netfilter.
> 
> This patch adds a bit (IPSKB_NO_FRAG_ICMP) in IPCB to control
> whether ICMP error message should be generated. br_netfiler sets
> this bit.

Could you please explain why we need this patch?
After the previous change (patch 2 in your series), we will already
bail out before hitting

/* Send an ICMP "Fragment Reassembly Timeout" message. */
icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);

in ip_expire() based on qp->user value test?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Zhou May 14, 2015, 7:54 p.m. UTC | #2
On Thu, May 14, 2015 at 1:59 AM, Florian Westphal <fw@strlen.de> wrote:
> Andy Zhou <azhou@nicira.com> wrote:
>> Currently, on defragmentation timeout error, ICMP error message
>> will be generated. This is fine when they are used in a routing context,
>> but does not make sense in the context of bridging netfilter.
>>
>> This patch adds a bit (IPSKB_NO_FRAG_ICMP) in IPCB to control
>> whether ICMP error message should be generated. br_netfiler sets
>> this bit.
>
> Could you please explain why we need this patch?
> After the previous change (patch 2 in your series), we will already
> bail out before hitting
>
> /* Send an ICMP "Fragment Reassembly Timeout" message. */
> icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
>
> in ip_expire() based on qp->user value test?

Letting caller making the decision seems to be a better design choice
rather than implicit
logic embedded within the function.  It is also more flexible and
robust when we need to extend 'user'
range down the road.

>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 14, 2015, 9:42 p.m. UTC | #3
From: Andy Zhou <azhou@nicira.com>
Date: Thu, 14 May 2015 12:54:23 -0700

> On Thu, May 14, 2015 at 1:59 AM, Florian Westphal <fw@strlen.de> wrote:
>> Andy Zhou <azhou@nicira.com> wrote:
>>> Currently, on defragmentation timeout error, ICMP error message
>>> will be generated. This is fine when they are used in a routing context,
>>> but does not make sense in the context of bridging netfilter.
>>>
>>> This patch adds a bit (IPSKB_NO_FRAG_ICMP) in IPCB to control
>>> whether ICMP error message should be generated. br_netfiler sets
>>> this bit.
>>
>> Could you please explain why we need this patch?
>> After the previous change (patch 2 in your series), we will already
>> bail out before hitting
>>
>> /* Send an ICMP "Fragment Reassembly Timeout" message. */
>> icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
>>
>> in ip_expire() based on qp->user value test?
> 
> Letting caller making the decision seems to be a better design choice
> rather than implicit
> logic embedded within the function.  It is also more flexible and
> robust when we need to extend 'user'
> range down the road.

You're doing useless testing for a hypothetical situation which may
never come to fruitation at all.

I don't think that ever qualifies as a legitimate objection.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Zhou May 15, 2015, 4:21 a.m. UTC | #4
O.K. I will drop this patch in the next Rev.

On Thu, May 14, 2015 at 2:42 PM, David Miller <davem@davemloft.net> wrote:
> From: Andy Zhou <azhou@nicira.com>
> Date: Thu, 14 May 2015 12:54:23 -0700
>
>> On Thu, May 14, 2015 at 1:59 AM, Florian Westphal <fw@strlen.de> wrote:
>>> Andy Zhou <azhou@nicira.com> wrote:
>>>> Currently, on defragmentation timeout error, ICMP error message
>>>> will be generated. This is fine when they are used in a routing context,
>>>> but does not make sense in the context of bridging netfilter.
>>>>
>>>> This patch adds a bit (IPSKB_NO_FRAG_ICMP) in IPCB to control
>>>> whether ICMP error message should be generated. br_netfiler sets
>>>> this bit.
>>>
>>> Could you please explain why we need this patch?
>>> After the previous change (patch 2 in your series), we will already
>>> bail out before hitting
>>>
>>> /* Send an ICMP "Fragment Reassembly Timeout" message. */
>>> icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
>>>
>>> in ip_expire() based on qp->user value test?
>>
>> Letting caller making the decision seems to be a better design choice
>> rather than implicit
>> logic embedded within the function.  It is also more flexible and
>> robust when we need to extend 'user'
>> range down the road.
>
> You're doing useless testing for a hypothetical situation which may
> never come to fruitation at all.
>
> I don't think that ever qualifies as a legitimate objection.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 8d17655..e3c8840 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -22,12 +22,14 @@  struct netns_frags {
  * @INET_FRAG_LAST_IN: final fragment has arrived
  * @INET_FRAG_COMPLETE: frag queue has been processed and is due for destruction
  * @INET_FRAG_EVICTED: frag queue is being evicted
+ * @INET_FRAG_NO_ICMP: Do not send icmp message on incomplete defrag
  */
 enum {
 	INET_FRAG_FIRST_IN	= BIT(0),
 	INET_FRAG_LAST_IN	= BIT(1),
 	INET_FRAG_COMPLETE	= BIT(2),
-	INET_FRAG_EVICTED	= BIT(3)
+	INET_FRAG_EVICTED	= BIT(3),
+	INET_FRAG_NO_ICMP	= BIT(4)
 };
 
 /**
diff --git a/include/net/ip.h b/include/net/ip.h
index 43f6f39..af44c9f 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -45,6 +45,7 @@  struct inet_skb_parm {
 #define IPSKB_FRAG_COMPLETE	BIT(3)
 #define IPSKB_REROUTED		BIT(4)
 #define IPSKB_DOREDIRECT	BIT(5)
+#define IPSKB_NO_FRAG_ICMP	BIT(6)
 
 	u16			frag_max_size;
 };
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index ab55e24..6a2adba 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -663,6 +663,11 @@  static unsigned int br_nf_pre_routing(const struct nf_hook_ops *ops,
 	if (br_parse_ip_options(skb))
 		return NF_DROP;
 
+	/* In case this is a fragmented packet, do not send icmp packet on
+	 * defragmentation error
+	 */
+	IPCB(skb)->flags |= IPSKB_NO_FRAG_ICMP;
+
 	nf_bridge_put(skb->nf_bridge);
 	if (!nf_bridge_alloc(skb))
 		return NF_DROP;
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 47fa64e..60afaa7 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -226,7 +226,8 @@  static void ip_expire(unsigned long arg)
 		/* Only an end host needs to send an ICMP
 		 * "Fragment Reassembly Timeout" message, per RFC792.
 		 */
-		if (frag_expire_skip_icmp(qp->user) &&
+		if ((qp->q.flags & INET_FRAG_NO_ICMP) ||
+		    frag_expire_skip_icmp(qp->user) ||
 		    (skb_rtable(head)->rt_type != RTN_LOCAL))
 			goto out_rcu_unlock;
 
@@ -330,6 +331,7 @@  static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 	int ihl, end;
 	int err = -ENOENT;
 	u8 ecn;
+	bool no_icmp;
 
 	if (qp->q.flags & INET_FRAG_COMPLETE)
 		goto err;
@@ -347,6 +349,7 @@  static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 	offset &= IP_OFFSET;
 	offset <<= 3;		/* offset is in 8-byte chunks */
 	ihl = ip_hdrlen(skb);
+	no_icmp = IPCB(skb)->flags & IPSKB_NO_FRAG_ICMP;
 
 	/* Determine the position of this fragment. */
 	end = offset + skb->len - ihl;
@@ -485,7 +488,12 @@  found:
 	    skb->len + ihl > qp->q.max_size)
 		qp->q.max_size = skb->len + ihl;
 
-	if (qp->q.flags == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
+	if (no_icmp) {
+		qp->q.flags |= INET_FRAG_NO_ICMP;
+	}
+
+	if (((qp->q.flags & ~INET_FRAG_NO_ICMP) ==
+		(INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN)) &&
 	    qp->q.meat == qp->q.len) {
 		unsigned long orefdst = skb->_skb_refdst;