Patchwork BUG: unable to handle kernel paging request at 0000000000609920 in networking code on 3.2.23.

login
register
mail settings
Submitter Florian Westphal
Date Jan. 24, 2013, 11:08 p.m.
Message ID <20130124230850.GI8541@breakpoint.cc>
Download mbox | patch
Permalink /patch/215507/
State Superseded
Headers show

Comments

Florian Westphal - Jan. 24, 2013, 11:08 p.m.
Rafal Kupka <rkupka@telemetry.com> wrote:

[ cc nf-devel ]

> > After upgrade to 3.2.23 (debian backports 2.6.32-45 package) from 2.6.32 I
> experience server crash.
> New round of tests on 3.2.35-2~bpo60+1. Still similar crashes.
> 
> > Iptables:
> > 
> > Chain INPUT (policy ACCEPT)
> > target     prot opt source               destination
> > dumbtcp    tcp  --  0.0.0.0/0            91.217.135.0/24
> > 
> > Chain OUTPUT (policy ACCEPT)
> > target     prot opt source               destination
> > dumbtcp    tcp  --  91.217.135.0/24      0.0.0.0/0
> > 
> > Chain dumbtcp (2 references)
> > target     prot opt source               destination
> > TCPOPTSTRIP  tcp  --  0.0.0.0/0            0.0.0.0/0            tcpflags:
> 0x02/0x02 TCPOPTSTRIP options 3,4,5,8,19
> > ECN        tcp  --  0.0.0.0/0            0.0.0.0/0            ECN TCP remove
> 
> This Netfilter rules are causing it. Either ECN or TCPOPTSTRIP module.

I don't see any relevant changes in either TCPOPTSTRIP or ECN.

> 3.2.35 calltrace:
> [15368.854247] Call Trace:
> [15368.856749]  <IRQ> 
> [15368.858898]  [<ffffffff812a02a8>] ? skb_release_data+0x6c/0xe4
> [15368.864791]  [<ffffffff812a08b0>] ? __kfree_skb+0x11/0x73
> [15368.870254]  [<ffffffff812e5c5f>] ? tcp_rcv_state_process+0x74/0x8d9
> [15368.876632]  [<ffffffff812ed0b7>] ? tcp_v4_do_rcv+0x388/0x3eb
> [15368.882448]  [<ffffffff812ee54e>] ? tcp_v4_rcv+0x447/0x6ed
> [15368.888007]  [<ffffffff812cb746>] ? nf_hook_slow+0x68/0xfd
> [15368.893572]  [<ffffffff812d197e>] ? T.1004+0x4f/0x4f
> [15368.898614]  [<ffffffff812d1abb>] ? ip_local_deliver_finish+0x13d/0x1aa
> [15368.905301]  [<ffffffff812aab66>] ? __netif_receive_skb+0x47d/0x4b0
> [15368.911642]  [<ffffffff81013a01>] ? read_tsc+0x5/0x16
> [15368.916768]  [<ffffffff812aadc7>] ? netif_receive_skb+0x67/0x6d
> [15368.922757]  [<ffffffff812ab335>] ? napi_gro_receive+0x1f/0x2c
> [15368.928661]  [<ffffffff812aaea1>] ? napi_skb_finish+0x1c/0x31
> [15368.934495]  [<ffffffffa0049a61>] ? e1000_clean_rx_irq+0x1ea/0x29a [e1000e]
> [15368.941533]  [<ffffffffa0049fa2>] ? e1000_clean+0x71/0x229 [e1000e]
> [15368.947875]  [<ffffffff8103b982>] ? __wake_up+0x35/0x46
> [15368.953171]  [<ffffffff812ab460>] ? net_rx_action+0xa8/0x207
> [15368.958908]  [<ffffffff81046351>] ? finish_task_switch+0x50/0xc7
> [15368.964995]  [<ffffffff8104f2ca>] ? __do_softirq+0xc4/0x1a0
> [15368.970636]  [<ffffffff81097ec6>] ? handle_irq_event_percpu+0x163/0x181
> [15368.977324]  [<ffffffff8136f8ac>] ? call_softirq+0x1c/0x30
> [15368.982884]  [<ffffffff8100fa3f>] ? do_softirq+0x3f/0x79
> [15368.988266]  [<ffffffff8104f09a>] ? irq_exit+0x44/0xb5
> [15368.993473]  [<ffffffff8100f38a>] ? do_IRQ+0x94/0xaa
> [15368.998489]  [<ffffffff8136836e>] ? common_interrupt+0x6e/0x6e
> [15369.004397]  <EOI> 
> [15369.006537]  [<ffffffff81107e38>] ? fput+0x17a/0x1a2
> [15369.011576]  [<ffffffff81046351>] ? finish_task_switch+0x50/0xc7
> [15369.017653]  [<ffffffff81366b46>] ? __schedule+0x57a/0x5cd
> [15369.023209]  [<ffffffff81368416>] ? retint_careful+0x14/0x32

However, it does seem to me as if both are missing a few sanity checks.
Especially TCPOPSTRIP can read/write beyond end-of-packet when
tcph->doff is bogus?

Something like this (not even compile tested):


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt - Jan. 25, 2013, 12:36 a.m.
On Friday 2013-01-25 00:08, Florian Westphal wrote:
>@@ -35,10 +35,18 @@ tcpoptstrip_mangle_packet(struct sk_buff *skb,
> {
> 	unsigned int optl, i, j;
> 	struct tcphdr *tcph;
>+	struct tcphdr _tcph;
> 	u_int16_t n, o;
> 	u_int8_t *opt;
> 
>-	if (!skb_make_writable(skb, skb->len))
>+	if (skb->len < minlen)
>+		return XT_CONTINUE;
>+
>+	tcph = skb_header_pointer(skb, tcphoff, sizeof(_tcph), &_tcph);
>+	if (!tcph)
>+		return XT_CONTINUE; /* no options -> nothing to do */

To the best of my analysis, the "no options" comment is incorrect here,
because you are not even looking at the options so far, but only tcph.
The prose should probably be something like:

	if (iph->frag_off & htons(IP_OFFSET)) != 0)
		/* not the first fragment - lost case */
		return XT_CONTINUE;
	if (iph->len < iph->ihl * 4 + sizeof(_tcph))
		/* fragment boundary within tcphdr */
		return XT_CONTINUE;
	tcph = skb_header_pointer(skb, tcphoff, sizeof(_tcph), &_tcph);
	if (tcph == NULL)
		/* packet way too short for some reason
		   (should not occur since we tested for fragment
		   boundary) */
		return NF_DROP;
	if (tcph->doff * 4 - sizeof(_tcph) == 0)
		/* no options */
		return XT_CONTINUE;
	if (iph->len < iph->ihl * 4 + sizeof(_tcph) + tcphoff))
		/* fragment boundary within tcpoptions */
		return XT_CONTINUE;

	skb_make_writable...
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal - Jan. 25, 2013, 8:28 a.m.
Jan Engelhardt <jengelh@inai.de> wrote:
> On Friday 2013-01-25 00:08, Florian Westphal wrote:
> >@@ -35,10 +35,18 @@ tcpoptstrip_mangle_packet(struct sk_buff *skb,
> > {
> > 	unsigned int optl, i, j;
> > 	struct tcphdr *tcph;
> >+	struct tcphdr _tcph;
> > 	u_int16_t n, o;
> > 	u_int8_t *opt;
> > 
> >-	if (!skb_make_writable(skb, skb->len))
> >+	if (skb->len < minlen)
> >+		return XT_CONTINUE;
> >+
> >+	tcph = skb_header_pointer(skb, tcphoff, sizeof(_tcph), &_tcph);
> >+	if (!tcph)
> >+		return XT_CONTINUE; /* no options -> nothing to do */
> 
> To the best of my analysis, the "no options" comment is incorrect here,
> because you are not even looking at the options so far, but only tcph.

Yup.

> The prose should probably be something like:
 
> 	if (iph->frag_off & htons(IP_OFFSET)) != 0)
> 		/* not the first fragment - lost case */
> 		return XT_CONTINUE;
[..]

Can to submit a patch?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/ipv4/netfilter/ipt_ECN.c b/net/ipv4/netfilter/ipt_ECN.c
index 4bf3dc4..a1f8a59 100644
--- a/net/ipv4/netfilter/ipt_ECN.c
+++ b/net/ipv4/netfilter/ipt_ECN.c
@@ -86,7 +86,7 @@  ecn_tg(struct sk_buff *skb, const struct xt_action_param *par)
 			return NF_DROP;
 
 	if (einfo->operation & (IPT_ECN_OP_SET_ECE | IPT_ECN_OP_SET_CWR) &&
-	    ip_hdr(skb)->protocol == IPPROTO_TCP)
+	   (ip_hdr(skb)->frag_off & htons(IP_OFFSET)) == 0)
 		if (!set_ect_tcp(skb, einfo))
 			return NF_DROP;
 
diff --git a/net/netfilter/xt_TCPOPTSTRIP.c b/net/netfilter/xt_TCPOPTSTRIP.c
index 25fd1c4..ebb9451 100644
--- a/net/netfilter/xt_TCPOPTSTRIP.c
+++ b/net/netfilter/xt_TCPOPTSTRIP.c
@@ -35,10 +35,18 @@  tcpoptstrip_mangle_packet(struct sk_buff *skb,
 {
 	unsigned int optl, i, j;
 	struct tcphdr *tcph;
+	struct tcphdr _tcph;
 	u_int16_t n, o;
 	u_int8_t *opt;
 
-	if (!skb_make_writable(skb, skb->len))
+	if (skb->len < minlen)
+		return XT_CONTINUE;
+
+	tcph = skb_header_pointer(skb, tcphoff, sizeof(_tcph), &_tcph);
+	if (!tcph)
+		return XT_CONTINUE; /* no options -> nothing to do */
+
+	if (!skb_make_writable(skb, tcphoff + (tcph->doff * 4)))
 		return NF_DROP;
 
 	tcph = (struct tcphdr *)(skb_network_header(skb) + tcphoff);
@@ -76,6 +84,9 @@  tcpoptstrip_mangle_packet(struct sk_buff *skb,
 static unsigned int
 tcpoptstrip_tg4(struct sk_buff *skb, const struct xt_action_param *par)
 {
+	if (ip_hdr(skb)->frag_off & htons(IP_OFFSET))
+		return XT_CONTINUE;
+
 	return tcpoptstrip_mangle_packet(skb, par->targinfo, ip_hdrlen(skb),
 	       sizeof(struct iphdr) + sizeof(struct tcphdr));
 }