diff mbox

3.9.5+: Crash in tcp_input.c:4810.

Message ID 1372813467.4979.46.camel@edumazet-glaptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet July 3, 2013, 1:04 a.m. UTC
On Mon, 2013-07-01 at 11:10 -0700, Ben Greear wrote:

> offset: -1459  start: -1146162927 seq: -1146161468 size: 16047 copy: 3576
> ...
> 
> There were 80 total splats of this nature grouped together, and then
> the system recovered and continue to function normally as far as I
> can tell.  The later splats are a bit farther apart...maybe the
> TCP connection is dying.
> 
> It appears my 'work-around' is poor at best, but I'd rather kill
> a TCP connection and spam the logs than crash the OS.
> 
> I'd be more than happy to add more/different debugging code.

It would be nice to pinpoint the origin of the bug. Really.

This BUG_ON() is at least 7 years old. I do not think invariant has
changed ?

Sure we can avoid crashes but it looks like we could randomly corrupt
tcp payload or whatever kernel memory, if it turns out its caused by a
buggy driver.

Is it happening while collapsing the receive queue, or the ofo queue ?

In receive queue, all skbs skb2 following skb1 must have

TCP_SKB_CB(skb1)->end_seq >= TCP_SKB_CB(skb2)->seq

Only on ofo, we could have this not respected, and it should be handled
properly in tcp_collapse_ofo_queue()



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ben Greear July 3, 2013, 3:21 a.m. UTC | #1
On 07/02/2013 06:04 PM, Eric Dumazet wrote:
> On Mon, 2013-07-01 at 11:10 -0700, Ben Greear wrote:
>
>> offset: -1459  start: -1146162927 seq: -1146161468 size: 16047 copy: 3576
>> ...
>>
>> There were 80 total splats of this nature grouped together, and then
>> the system recovered and continue to function normally as far as I
>> can tell.  The later splats are a bit farther apart...maybe the
>> TCP connection is dying.
>>
>> It appears my 'work-around' is poor at best, but I'd rather kill
>> a TCP connection and spam the logs than crash the OS.
>>
>> I'd be more than happy to add more/different debugging code.
>
> It would be nice to pinpoint the origin of the bug. Really.
>
> This BUG_ON() is at least 7 years old. I do not think invariant has
> changed ?
>
> Sure we can avoid crashes but it looks like we could randomly corrupt
> tcp payload or whatever kernel memory, if it turns out its caused by a
> buggy driver.
>
> Is it happening while collapsing the receive queue, or the ofo queue ?

What kinds of things could a driver do to cause this.  Maybe modify an
skb after it has sent it up the stack, or something like that?

We haven't been able to reproduce on a clean 3.10 yet...but it often takes days,
so we'll leave the test up through end of this week if we don't hit it
sooner...

I'll add your patch to my 3.9 tree.

Thanks,
Ben


> In receive queue, all skbs skb2 following skb1 must have
>
> TCP_SKB_CB(skb1)->end_seq >= TCP_SKB_CB(skb2)->seq
>
> Only on ofo, we could have this not respected, and it should be handled
> properly in tcp_collapse_ofo_queue()
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 28af45a..d77f1f0 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4457,7 +4457,12 @@ restart:
>   			int offset = start - TCP_SKB_CB(skb)->seq;
>   			int size = TCP_SKB_CB(skb)->end_seq - start;
>
> -			BUG_ON(offset < 0);
> +			if (unlikely(offset < 0)) {
> +				pr_err("tcp_collapse() bug on %s offset:%d size:%d copy:%d skb->len %u truesize %u, nskb->len %u\n",
> +					list == &sk->sk_receive_queue ? "receive_queue" : "ofo_queue",
> +					offset, size, copy, skb->len, skb->truesize, nskb->len);
> +				return;
> +			}
>   			if (size > 0) {
>   				size = min(copy, size);
>   				if (skb_copy_bits(skb, offset, skb_put(nskb, size), size))
>
Eric Dumazet July 3, 2013, 4:41 a.m. UTC | #2
On Tue, 2013-07-02 at 20:21 -0700, Ben Greear wrote:

> What kinds of things could a driver do to cause this.  Maybe modify an
> skb after it has sent it up the stack, or something like that?
> 

It might be a genuine TCP bug, who knows...


> We haven't been able to reproduce on a clean 3.10 yet...but it often takes days,
> so we'll leave the test up through end of this week if we don't hit it
> sooner...

TCP collapse is/should be really rare, but losses can of course trigger
it.

> 
> I'll add your patch to my 3.9 tree.

Thanks




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear July 3, 2013, 4:49 a.m. UTC | #3
On 07/02/2013 09:41 PM, Eric Dumazet wrote:
> On Tue, 2013-07-02 at 20:21 -0700, Ben Greear wrote:
>
>> What kinds of things could a driver do to cause this.  Maybe modify an
>> skb after it has sent it up the stack, or something like that?
>>
>
> It might be a genuine TCP bug, who knows...
>
>
>> We haven't been able to reproduce on a clean 3.10 yet...but it often takes days,
>> so we'll leave the test up through end of this week if we don't hit it
>> sooner...
>
> TCP collapse is/should be really rare, but losses can of course trigger
> it.

Well, network emulators are easy to come by in the office....  Maybe running
a bunch of TCP connections through a lossy network would exercise this code
path a bit?  Aside from random pkt loss, any other types of network conditions
that might help trigger this faster?

I'll set up some tests using some wired ethernet...if we can trigger it there
then we at least know it doesn't depend on ath9k...

Thanks,
Ben
Eric Dumazet July 3, 2013, 5:02 a.m. UTC | #4
On Tue, 2013-07-02 at 21:49 -0700, Ben Greear wrote:

> Well, network emulators are easy to come by in the office....  Maybe running
> a bunch of TCP connections through a lossy network would exercise this code
> path a bit?  Aside from random pkt loss, any other types of network conditions
> that might help trigger this faster?
> 
> I'll set up some tests using some wired ethernet...if we can trigger it there
> then we at least know it doesn't depend on ath9k...

I tried a lot of things, including netem with many reorders and/or
packetdrill tests, but so far not a single warning from tcp_collapse()



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear July 8, 2013, 5:23 p.m. UTC | #5
On 07/02/2013 10:02 PM, Eric Dumazet wrote:
> On Tue, 2013-07-02 at 21:49 -0700, Ben Greear wrote:
>
>> Well, network emulators are easy to come by in the office....  Maybe running
>> a bunch of TCP connections through a lossy network would exercise this code
>> path a bit?  Aside from random pkt loss, any other types of network conditions
>> that might help trigger this faster?
>>
>> I'll set up some tests using some wired ethernet...if we can trigger it there
>> then we at least know it doesn't depend on ath9k...
>
> I tried a lot of things, including netem with many reorders and/or
> packetdrill tests, but so far not a single warning from tcp_collapse()

We ran a 5+ day test using un-modified 3.10 kernel and did not trigger
the bug.

So, I'm guessing the problem is either fixed upstream or is exacerbated
or caused by our local patches.  Sometime soon we'll start porting local
patches to newer kernels...we'll see what happens then.

Thanks,
Ben
Eric Dumazet July 8, 2013, 6:21 p.m. UTC | #6
On Mon, 2013-07-08 at 10:23 -0700, Ben Greear wrote:

> We ran a 5+ day test using un-modified 3.10 kernel and did not trigger
> the bug.

Using wired ethernet only, or any kind of adapters, including ath9k ?

> So, I'm guessing the problem is either fixed upstream or is exacerbated
> or caused by our local patches.  Sometime soon we'll start porting local
> patches to newer kernels...we'll see what happens then.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear July 8, 2013, 6:30 p.m. UTC | #7
On 07/08/2013 11:21 AM, Eric Dumazet wrote:
> On Mon, 2013-07-08 at 10:23 -0700, Ben Greear wrote:
>
>> We ran a 5+ day test using un-modified 3.10 kernel and did not trigger
>> the bug.
>
> Using wired ethernet only, or any kind of adapters, including ath9k ?

Exact same hardware and configuration:

ath9k, around 240 wifi
stations trying to connect to APs that can handle a bit less
than 240 total, starting TCP traffic when stations are connected.
It appears that the constant churn of stations going up and down
is key, but of course that is par for the course, especially in
the wifi stack.

Some of our local wifi patches make the system work considerably faster when
we have hundreds of wifi stations, so timing will be different on upstream
kernels, and of course we could have bugs :)

Thanks,
Ben
Eric Dumazet July 8, 2013, 7:01 p.m. UTC | #8
On Mon, 2013-07-08 at 11:30 -0700, Ben Greear wrote:
> On 07/08/2013 11:21 AM, Eric Dumazet wrote:
> > On Mon, 2013-07-08 at 10:23 -0700, Ben Greear wrote:
> >
> >> We ran a 5+ day test using un-modified 3.10 kernel and did not trigger
> >> the bug.
> >
> > Using wired ethernet only, or any kind of adapters, including ath9k ?
> 
> Exact same hardware and configuration:
> 
> ath9k, around 240 wifi
> stations trying to connect to APs that can handle a bit less
> than 240 total, starting TCP traffic when stations are connected.
> It appears that the constant churn of stations going up and down
> is key, but of course that is par for the course, especially in
> the wifi stack.
> 
> Some of our local wifi patches make the system work considerably faster when
> we have hundreds of wifi stations, so timing will be different on upstream
> kernels, and of course we could have bugs :)

There is this thing in ath9k about aggregating two frags 

drivers/net/wireless/ath/ath9k/recv.c line 1298 contains :

RX_STAT_INC(rx_frags); 

Could you check these stats (I do not know if they are reported by
ethtool -S or another debugging facility) and check if rx_frags is ever
increasing ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear July 8, 2013, 7:59 p.m. UTC | #9
On 07/08/2013 12:01 PM, Eric Dumazet wrote:
> On Mon, 2013-07-08 at 11:30 -0700, Ben Greear wrote:
>> On 07/08/2013 11:21 AM, Eric Dumazet wrote:
>>> On Mon, 2013-07-08 at 10:23 -0700, Ben Greear wrote:
>>>
>>>> We ran a 5+ day test using un-modified 3.10 kernel and did not trigger
>>>> the bug.
>>>
>>> Using wired ethernet only, or any kind of adapters, including ath9k ?
>>
>> Exact same hardware and configuration:
>>
>> ath9k, around 240 wifi
>> stations trying to connect to APs that can handle a bit less
>> than 240 total, starting TCP traffic when stations are connected.
>> It appears that the constant churn of stations going up and down
>> is key, but of course that is par for the course, especially in
>> the wifi stack.
>>
>> Some of our local wifi patches make the system work considerably faster when
>> we have hundreds of wifi stations, so timing will be different on upstream
>> kernels, and of course we could have bugs :)
>
> There is this thing in ath9k about aggregating two frags
>
> drivers/net/wireless/ath/ath9k/recv.c line 1298 contains :
>
> RX_STAT_INC(rx_frags);
>
> Could you check these stats (I do not know if they are reported by
> ethtool -S or another debugging facility) and check if rx_frags is ever
> increasing ?

They are in debugfs, and they appear to increase fairly often, for
instance:

[root@lec2010-ath9k-1 lanforge]# cat /debug/ieee80211/wiphy0/ath9k/recv|tail -5
            RX-Pkts-All :  288009442
           RX-Bytes-All : 4067932166
             RX-Beacons :   14826735
               RX-Frags :       3944
            RX-Spectral :          0

I don't have the stats from the system that reproduced the bug
(it has been rebooted), but if I do see the bug again, I'll
grab the rx-frags and other stats just in case it shows
some anomaly.

Thanks,
Ben
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 28af45a..d77f1f0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4457,7 +4457,12 @@  restart:
 			int offset = start - TCP_SKB_CB(skb)->seq;
 			int size = TCP_SKB_CB(skb)->end_seq - start;
 
-			BUG_ON(offset < 0);
+			if (unlikely(offset < 0)) {
+				pr_err("tcp_collapse() bug on %s offset:%d size:%d copy:%d skb->len %u truesize %u, nskb->len %u\n",
+					list == &sk->sk_receive_queue ? "receive_queue" : "ofo_queue",
+					offset, size, copy, skb->len, skb->truesize, nskb->len);
+				return;
+			}
 			if (size > 0) {
 				size = min(copy, size);
 				if (skb_copy_bits(skb, offset, skb_put(nskb, size), size))