diff mbox

xt_TCPMSS: SYN packets are allowed to contain data

Message ID 4B5773C2.2010000@simon.arlott.org.uk
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Simon Arlott Jan. 20, 2010, 9:21 p.m. UTC
The TCPMSS target is dropping SYN packets where:
  1) There is data, or
  2) The data offset makes the TCP header larger than
  the packet.

Both of these result in an error level printk.

This change fixes the drop of SYN packets with data
(because the MSS option can safely be modified) and
passes packets with no MSS option instead of adding
one (which is not valid).

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
---
Tested mangle OUTPUT rule with IPv4 and IPv6.
SYN with data not tested.

 net/netfilter/xt_TCPMSS.c |   82 +++++++-------------------------------------
 1 files changed, 13 insertions(+), 69 deletions(-)

Comments

Jan Engelhardt Jan. 20, 2010, 9:39 p.m. UTC | #1
On Wednesday 2010-01-20 22:21, Simon Arlott wrote:

>The TCPMSS target is dropping SYN packets where:
>  1) There is data, or
>  2) The data offset makes the TCP header larger than
>  the packet.
>
>Both of these result in an error level printk.
>
>This change fixes the drop of SYN packets with data
>(because the MSS option can safely be modified) and
>passes packets with no MSS option instead of adding
>one (which is not valid).

Can you explain why the automatic addition of a MSS option is removed?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt Jan. 20, 2010, 9:41 p.m. UTC | #2
On Wednesday 2010-01-20 22:39, Jan Engelhardt wrote:

>On Wednesday 2010-01-20 22:21, Simon Arlott wrote:
>
>>The TCPMSS target is dropping SYN packets where:
>>  1) There is data, or
>>  2) The data offset makes the TCP header larger than
>>  the packet.
>>
>>Both of these result in an error level printk.
>>
>>This change fixes the drop of SYN packets with data
>>(because the MSS option can safely be modified) and
>>passes packets with no MSS option instead of adding
>>one (which is not valid).
>
>Can you explain why the automatic addition of a MSS option is removed?

That is, of course, for the git log. If I followed the thread right, it 
was that adding the option could exceed the MTU. Well, can't we check 
for the outgoing MTU?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simon Arlott Jan. 20, 2010, 9:51 p.m. UTC | #3
On 20/01/10 21:41, Jan Engelhardt wrote:
> On Wednesday 2010-01-20 22:39, Jan Engelhardt wrote:
> 
>>On Wednesday 2010-01-20 22:21, Simon Arlott wrote:
>>
>>>The TCPMSS target is dropping SYN packets where:
>>>  1) There is data, or
>>>  2) The data offset makes the TCP header larger than
>>>  the packet.
>>>
>>>Both of these result in an error level printk.
>>>
>>>This change fixes the drop of SYN packets with data
>>>(because the MSS option can safely be modified) and
>>>passes packets with no MSS option instead of adding
>>>one (which is not valid).
>>
>>Can you explain why the automatic addition of a MSS option is removed?
> 
> That is, of course, for the git log. If I followed the thread right, it 
> was that adding the option could exceed the MTU. Well, can't we check 
> for the outgoing MTU?

The MSS option is for the MRU of whoever sent the SYN packet. There's no
way of knowing this information so it's not possible to avoid using an
MSS that is too large. With no option, "any" segment size could be used,
which implies 536 to match the MRU of 576.

The other reason for not being able to add it is that it may increase the
packet size beyond an MRU/MTU limit if there is data. There's no guarantee
we'll see an ICMP error message if this occurs, because the limit doesn't
have to be local and the return path does not need to be the same. The
original host won't know that the packet is going to be increased in size.
Amos Jeffries Jan. 20, 2010, 10:22 p.m. UTC | #4
On Wed, 20 Jan 2010 21:51:33 +0000, Simon Arlott <simon@fire.lp0.eu>
wrote:
> On 20/01/10 21:41, Jan Engelhardt wrote:
>> On Wednesday 2010-01-20 22:39, Jan Engelhardt wrote:
>> 
>>>On Wednesday 2010-01-20 22:21, Simon Arlott wrote:
>>>
>>>>The TCPMSS target is dropping SYN packets where:
>>>>  1) There is data, or
>>>>  2) The data offset makes the TCP header larger than
>>>>  the packet.
>>>>
>>>>Both of these result in an error level printk.
>>>>
>>>>This change fixes the drop of SYN packets with data
>>>>(because the MSS option can safely be modified) and
>>>>passes packets with no MSS option instead of adding
>>>>one (which is not valid).
>>>
>>>Can you explain why the automatic addition of a MSS option is removed?
>> 
>> That is, of course, for the git log. If I followed the thread right, it

>> was that adding the option could exceed the MTU. Well, can't we check 
>> for the outgoing MTU?
> 
> The MSS option is for the MRU of whoever sent the SYN packet. There's no
> way of knowing this information so it's not possible to avoid using an
> MSS that is too large. With no option, "any" segment size could be used,
> which implies 536 to match the MRU of 576.
> 
> The other reason for not being able to add it is that it may increase
the
> packet size beyond an MRU/MTU limit if there is data. There's no
guarantee
> we'll see an ICMP error message if this occurs, because the limit
doesn't
> have to be local and the return path does not need to be the same. The
> original host won't know that the packet is going to be increased in
size.

(I know little, so just my 2c)

So... packets are 'tunneled' down a link where MSS is required/added.
However packets which will not fit into the MTU of that 'tunnel' are send
down it without MSS and without fragmentation? I wonder what would happen
if all TCP MTUs worked that way...

Maybe I've misunderstood how path MTU discovery works. But is it and TCP
not built on the premise that the origin source host always receives the
ACKs regardless of reverse route? With PMTU discovery built on that
guarantee, to return the ICMP error to the same source the ACK would go?

If ICMP is administrively crippled to break TCP its not iptables fault,
nor the admin who is using TCP/ICMP correctly to signal available MTU.

AYJ
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Jan. 20, 2010, 11:14 p.m. UTC | #5
Jan Engelhardt wrote:
> On Wednesday 2010-01-20 22:39, Jan Engelhardt wrote:
> 
>> On Wednesday 2010-01-20 22:21, Simon Arlott wrote:
>>
>>> The TCPMSS target is dropping SYN packets where:
>>>  1) There is data, or
>>>  2) The data offset makes the TCP header larger than
>>>  the packet.
>>>
>>> Both of these result in an error level printk.
>>>
>>> This change fixes the drop of SYN packets with data
>>> (because the MSS option can safely be modified) and
>>> passes packets with no MSS option instead of adding
>>> one (which is not valid).
>> Can you explain why the automatic addition of a MSS option is removed?
> 
> That is, of course, for the git log. If I followed the thread right, it 
> was that adding the option could exceed the MTU. Well, can't we check 
> for the outgoing MTU?

We certainly can, and in fact the packet would get fragmented
by the IP layer in case we would exceed the PMTU. Additionally
we currently check that the packet contains no data, even with
the first version of this patch, so there's no way the packet
could exceed the MTU.

This feature has been there from day one since the TCPMSS target
has been merged and people are using this with knowledge of their
MTUs to work around broken ISPs. I'm not apply this.

The first version seemed fine to me though :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simon Arlott Jan. 21, 2010, 12:47 p.m. UTC | #6
On Wed, January 20, 2010 23:14, Patrick McHardy wrote:
> Jan Engelhardt wrote:
>> On Wednesday 2010-01-20 22:39, Jan Engelhardt wrote:
>>> On Wednesday 2010-01-20 22:21, Simon Arlott wrote:
>>>> The TCPMSS target is dropping SYN packets where:
>>>>  1) There is data, or
>>>>  2) The data offset makes the TCP header larger than
>>>>  the packet.
>>>>
>>>> Both of these result in an error level printk.
>>>>
>>>> This change fixes the drop of SYN packets with data
>>>> (because the MSS option can safely be modified) and
>>>> passes packets with no MSS option instead of adding
>>>> one (which is not valid).
>>> Can you explain why the automatic addition of a MSS option is removed?
>>
>> That is, of course, for the git log. If I followed the thread right, it
>> was that adding the option could exceed the MTU. Well, can't we check
>> for the outgoing MTU?
>
> We certainly can, and in fact the packet would get fragmented
> by the IP layer in case we would exceed the PMTU. Additionally
> we currently check that the packet contains no data, even with
> the first version of this patch, so there's no way the packet
> could exceed the MTU.

If DF is set and the MTU is exceeded (for the SYN packet) at a
hop further away, the original host will not understand that it
needs to allow for the MSS option being added.

(Header + Data + New MSS Option) can't exceed 576 bytes and
there's no way to know that more than 576 bytes is allowed
because the ICMP error message may not go via the same host that
is mangling the packet.

Of course, it could just allow fragmentation for this one SYN
packet but that doesn't work for IPv6.

> This feature has been there from day one since the TCPMSS target
> has been merged and people are using this with knowledge of their
> MTUs to work around broken ISPs. I'm not apply this.

The TCPMSS target can be applied to more than just one direction
of traffic. I'm modifying incoming traffic too, so adding the MSS
option and setting it to over 536 is wrong (although the first ICMP
error will fix it).

Existing users use this target precisely because their hosts are
sending an unwanted MSS value, so it will never need to be added.

> The first version seemed fine to me though :)

The first version is ok with me. Only SYN packets with data and
no MSS option will be dropped. William objects to ever adding the
MSS option.

Although ideally SYN packets with data and no MSS option should
be accepted without adding an option. Dropping arbitrary traffic
(especially when new kernels allow data to be sent with SYN
packets) is not a good idea. If that is ok with you then I'll
make another patch to do it and update the comments.
Jan Engelhardt Jan. 21, 2010, 12:58 p.m. UTC | #7
On Thursday 2010-01-21 13:47, Simon Arlott wrote:
>
>The TCPMSS target can be applied to more than just one direction
>of traffic. I'm modifying incoming traffic too, so adding the MSS
>option and setting it to over 536 is wrong (although the first ICMP
>error will fix it).
>
>Existing users use this target precisely because their hosts are
>sending an unwanted MSS value, so it will never need to be added.

Ah, so they should be using TCPOPTSTRIP ;-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Jan. 21, 2010, 1:02 p.m. UTC | #8
Simon Arlott wrote:
> On Wed, January 20, 2010 23:14, Patrick McHardy wrote:
>> Jan Engelhardt wrote:
>>>> Can you explain why the automatic addition of a MSS option is removed?
>>> That is, of course, for the git log. If I followed the thread right, it
>>> was that adding the option could exceed the MTU. Well, can't we check
>>> for the outgoing MTU?
>> We certainly can, and in fact the packet would get fragmented
>> by the IP layer in case we would exceed the PMTU. Additionally
>> we currently check that the packet contains no data, even with
>> the first version of this patch, so there's no way the packet
>> could exceed the MTU.
> 
> If DF is set and the MTU is exceeded (for the SYN packet) at a
> hop further away, the original host will not understand that it
> needs to allow for the MSS option being added.

Yes, but we don't add it for SYNs containing data.

> (Header + Data + New MSS Option) can't exceed 576 bytes and
> there's no way to know that more than 576 bytes is allowed
> because the ICMP error message may not go via the same host that
> is mangling the packet.
> 
> Of course, it could just allow fragmentation for this one SYN
> packet but that doesn't work for IPv6.
> 
>> This feature has been there from day one since the TCPMSS target
>> has been merged and people are using this with knowledge of their
>> MTUs to work around broken ISPs. I'm not apply this.
> 
> The TCPMSS target can be applied to more than just one direction
> of traffic. I'm modifying incoming traffic too, so adding the MSS
> option and setting it to over 536 is wrong (although the first ICMP
> error will fix it).

It might be wrong, but so is dropping ICMP fragmentation required
packets. This is a workaround for broken behaviour and you should
of course only use MSS values that you know are valid.

> Existing users use this target precisely because their hosts are
> sending an unwanted MSS value, so it will never need to be added.

Its mainly used for ISPs surpressing ICMP fragmentation required
messages. That affects hosts not adding an MSS option as well.

>> The first version seemed fine to me though :)
> 
> The first version is ok with me. Only SYN packets with data and
> no MSS option will be dropped. William objects to ever adding the
> MSS option.

Well, he's about 10 years late.

> Although ideally SYN packets with data and no MSS option should
> be accepted without adding an option. Dropping arbitrary traffic
> (especially when new kernels allow data to be sent with SYN
> packets) is not a good idea. If that is ok with you then I'll
> make another patch to do it and update the comments.

I agree, it shouldn't drop packets unless it really has to.
Please go ahead with a new patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
index eda64c1..3648761 100644
--- a/net/netfilter/xt_TCPMSS.c
+++ b/net/netfilter/xt_TCPMSS.c
@@ -41,7 +41,7 @@  optlen(const u_int8_t *opt, unsigned int offset)
 		return opt[offset+1];
 }
 
-static int
+static unsigned int
 tcpmss_mangle_packet(struct sk_buff *skb,
 		     const struct xt_tcpmss_info *info,
 		     unsigned int in_mtu,
@@ -50,27 +50,18 @@  tcpmss_mangle_packet(struct sk_buff *skb,
 {
 	struct tcphdr *tcph;
 	unsigned int tcplen, i;
-	__be16 oldval;
 	u16 newmss;
 	u8 *opt;
 
 	if (!skb_make_writable(skb, skb->len))
-		return -1;
+		return NF_DROP;
 
 	tcplen = skb->len - tcphoff;
 	tcph = (struct tcphdr *)(skb_network_header(skb) + tcphoff);
 
-	/* Since it passed flags test in tcp match, we know it is is
-	   not a fragment, and has data >= tcp header length.  SYN
-	   packets should not contain data: if they did, then we risk
-	   running over MTU, sending Frag Needed and breaking things
-	   badly. --RR */
-	if (tcplen != tcph->doff*4) {
-		if (net_ratelimit())
-			printk(KERN_ERR "xt_TCPMSS: bad length (%u bytes)\n",
-			       skb->len);
-		return -1;
-	}
+	/* Header cannot be larger than the packet */
+	if (tcplen < tcph->doff*4)
+		return NF_DROP;
 
 	if (info->mss == XT_TCPMSS_CLAMP_PMTU) {
 		if (dst_mtu(skb_dst(skb)) <= minlen) {
@@ -78,13 +69,13 @@  tcpmss_mangle_packet(struct sk_buff *skb,
 				printk(KERN_ERR "xt_TCPMSS: "
 				       "unknown or invalid path-MTU (%u)\n",
 				       dst_mtu(skb_dst(skb)));
-			return -1;
+			return NF_DROP;
 		}
 		if (in_mtu <= minlen) {
 			if (net_ratelimit())
 				printk(KERN_ERR "xt_TCPMSS: unknown or "
 				       "invalid path-MTU (%u)\n", in_mtu);
-			return -1;
+			return NF_DROP;
 		}
 		newmss = min(dst_mtu(skb_dst(skb)), in_mtu) - minlen;
 	} else
@@ -103,7 +94,7 @@  tcpmss_mangle_packet(struct sk_buff *skb,
 			 * on MSS being set correctly.
 			 */
 			if (oldmss <= newmss)
-				return 0;
+				return XT_CONTINUE;
 
 			opt[i+2] = (newmss & 0xff00) >> 8;
 			opt[i+3] = newmss & 0x00ff;
@@ -111,40 +102,12 @@  tcpmss_mangle_packet(struct sk_buff *skb,
 			inet_proto_csum_replace2(&tcph->check, skb,
 						 htons(oldmss), htons(newmss),
 						 0);
-			return 0;
+			return XT_CONTINUE;
 		}
 	}
 
-	/*
-	 * MSS Option not found ?! add it..
-	 */
-	if (skb_tailroom(skb) < TCPOLEN_MSS) {
-		if (pskb_expand_head(skb, 0,
-				     TCPOLEN_MSS - skb_tailroom(skb),
-				     GFP_ATOMIC))
-			return -1;
-		tcph = (struct tcphdr *)(skb_network_header(skb) + tcphoff);
-	}
-
-	skb_put(skb, TCPOLEN_MSS);
-
-	opt = (u_int8_t *)tcph + sizeof(struct tcphdr);
-	memmove(opt + TCPOLEN_MSS, opt, tcplen - sizeof(struct tcphdr));
-
-	inet_proto_csum_replace2(&tcph->check, skb,
-				 htons(tcplen), htons(tcplen + TCPOLEN_MSS), 1);
-	opt[0] = TCPOPT_MSS;
-	opt[1] = TCPOLEN_MSS;
-	opt[2] = (newmss & 0xff00) >> 8;
-	opt[3] = newmss & 0x00ff;
-
-	inet_proto_csum_replace4(&tcph->check, skb, 0, *((__be32 *)opt), 0);
-
-	oldval = ((__be16 *)tcph)[6];
-	tcph->doff += TCPOLEN_MSS/4;
-	inet_proto_csum_replace2(&tcph->check, skb,
-				 oldval, ((__be16 *)tcph)[6], 0);
-	return TCPOLEN_MSS;
+	/* MSS Option not found */
+	return XT_CONTINUE;
 }
 
 static u_int32_t tcpmss_reverse_mtu(const struct sk_buff *skb,
@@ -177,22 +140,11 @@  static unsigned int
 tcpmss_tg4(struct sk_buff *skb, const struct xt_target_param *par)
 {
 	struct iphdr *iph = ip_hdr(skb);
-	__be16 newlen;
-	int ret;
 
-	ret = tcpmss_mangle_packet(skb, par->targinfo,
+	return tcpmss_mangle_packet(skb, par->targinfo,
 				   tcpmss_reverse_mtu(skb, PF_INET),
 				   iph->ihl * 4,
 				   sizeof(*iph) + sizeof(struct tcphdr));
-	if (ret < 0)
-		return NF_DROP;
-	if (ret > 0) {
-		iph = ip_hdr(skb);
-		newlen = htons(ntohs(iph->tot_len) + ret);
-		csum_replace2(&iph->check, iph->tot_len, newlen);
-		iph->tot_len = newlen;
-	}
-	return XT_CONTINUE;
 }
 
 #if defined(CONFIG_IP6_NF_IPTABLES) || defined(CONFIG_IP6_NF_IPTABLES_MODULE)
@@ -202,23 +154,15 @@  tcpmss_tg6(struct sk_buff *skb, const struct xt_target_param *par)
 	struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 	u8 nexthdr;
 	int tcphoff;
-	int ret;
 
 	nexthdr = ipv6h->nexthdr;
 	tcphoff = ipv6_skip_exthdr(skb, sizeof(*ipv6h), &nexthdr);
 	if (tcphoff < 0)
 		return NF_DROP;
-	ret = tcpmss_mangle_packet(skb, par->targinfo,
+	return tcpmss_mangle_packet(skb, par->targinfo,
 				   tcpmss_reverse_mtu(skb, PF_INET6),
 				   tcphoff,
 				   sizeof(*ipv6h) + sizeof(struct tcphdr));
-	if (ret < 0)
-		return NF_DROP;
-	if (ret > 0) {
-		ipv6h = ipv6_hdr(skb);
-		ipv6h->payload_len = htons(ntohs(ipv6h->payload_len) + ret);
-	}
-	return XT_CONTINUE;
 }
 #endif