diff mbox

[Bugme-new,Bug,38102] New: BUG kmalloc-2048: Poison overwritten

Message ID 1309792323.2247.33.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet July 4, 2011, 3:12 p.m. UTC
Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
> On Mon, 4 Jul 2011 16:27:26 +0200
> Michael Büsch <m@bues.ch> wrote:
> > We do this in b43, which has exactly the same DMA engine.
> 
> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>  But that's a bug. We should do a wmb() on the RX side before advancing the
>  descriptor ring pointer.)

I am wondering what happens if RX ring is set to 64, and we receive
exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?

Alexey, could you try this patch please ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alexey Zaytsev July 4, 2011, 8:25 p.m. UTC | #1
On Mon, Jul 4, 2011 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
>> On Mon, 4 Jul 2011 16:27:26 +0200
>> Michael Büsch <m@bues.ch> wrote:
>> > We do this in b43, which has exactly the same DMA engine.
>>
>> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>>  But that's a bug. We should do a wmb() on the RX side before advancing the
>>  descriptor ring pointer.)
>
> I am wondering what happens if RX ring is set to 64, and we receive
> exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?
>
> Alexey, could you try this patch please ?

Sorry, did not help.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Zaytsev July 4, 2011, 10:29 p.m. UTC | #2
On Tue, Jul 5, 2011 at 00:25, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> On Mon, Jul 4, 2011 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
>>> On Mon, 4 Jul 2011 16:27:26 +0200
>>> Michael Büsch <m@bues.ch> wrote:
>>> > We do this in b43, which has exactly the same DMA engine.
>>>
>>> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>>>  But that's a bug. We should do a wmb() on the RX side before advancing the
>>>  descriptor ring pointer.)
>>
>> I am wondering what happens if RX ring is set to 64, and we receive
>> exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?
>>
>> Alexey, could you try this patch please ?
>
> Sorry, did not help.
>

Ran a few rounds of tcpdump. Seeing a significant number or duplicate
ACKs from the problematic machine. Not seeing them when testing
between this machine and an other linux box. Or the illumos machine
and the other linux box.

Dumps are available here:

http://zaytsev.su/tmp/caps/

dump1-3 - between the problematic machine an the illumos box,
collected on illumos side. All show dups.
dump5 - between an other linux box and the illumos machine, no dups.
Collcted on the illumos side.
dump-linux - between 2 linux machines, collected on the
non-problematic side. No dups, no corruptions.

192.168.0.33 - the problematic machine.
192.168.0.72 - the illumos machine.
192.168.0.122 - an other linux machine.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 5, 2011, 3:44 a.m. UTC | #3
Le mardi 05 juillet 2011 à 02:29 +0400, Alexey Zaytsev a écrit :
> On Tue, Jul 5, 2011 at 00:25, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> > On Mon, Jul 4, 2011 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
> >>> On Mon, 4 Jul 2011 16:27:26 +0200
> >>> Michael Büsch <m@bues.ch> wrote:
> >>> > We do this in b43, which has exactly the same DMA engine.
> >>>
> >>> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
> >>>  But that's a bug. We should do a wmb() on the RX side before advancing the
> >>>  descriptor ring pointer.)
> >>
> >> I am wondering what happens if RX ring is set to 64, and we receive
> >> exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?
> >>
> >> Alexey, could you try this patch please ?
> >
> > Sorry, did not help.
> >
> 
> Ran a few rounds of tcpdump. Seeing a significant number or duplicate
> ACKs from the problematic machine. Not seeing them when testing
> between this machine and an other linux box. Or the illumos machine
> and the other linux box.
> 
> Dumps are available here:
> 
> http://zaytsev.su/tmp/caps/
> 
> dump1-3 - between the problematic machine an the illumos box,
> collected on illumos side. All show dups.
> dump5 - between an other linux box and the illumos machine, no dups.
> Collcted on the illumos side.
> dump-linux - between 2 linux machines, collected on the
> non-problematic side. No dups, no corruptions.
> 
> 192.168.0.33 - the problematic machine.
> 192.168.0.72 - the illumos machine.
> 192.168.0.122 - an other linux machine.

??

I dont care about duplicate acks at this point.

Thats a separate issue (TCP layer)

Do you still have memory scribbles ?

I wonder if the problem is not coming from the "fast recovery" added in
commit 32737e934a952c (PATCH: b44 Handle RX FIFO overflow better
(simplified))

Maybe we should do instead a fast dequeue of packets (recycling them
instead of pushing them to upper stack) in case too many packets are
ready to be delivered, and always make sure NIC has a reserve of
available buffers for DMA accesses, before it can assert ISTAT_RFO



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Zaytsev July 5, 2011, 3:56 a.m. UTC | #4
On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 05 juillet 2011 à 02:29 +0400, Alexey Zaytsev a écrit :
>> On Tue, Jul 5, 2011 at 00:25, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
>> > On Mon, Jul 4, 2011 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >> Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
>> >>> On Mon, 4 Jul 2011 16:27:26 +0200
>> >>> Michael Büsch <m@bues.ch> wrote:
>> >>> > We do this in b43, which has exactly the same DMA engine.
>> >>>
>> >>> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>> >>>  But that's a bug. We should do a wmb() on the RX side before advancing the
>> >>>  descriptor ring pointer.)
>> >>
>> >> I am wondering what happens if RX ring is set to 64, and we receive
>> >> exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?
>> >>
>> >> Alexey, could you try this patch please ?
>> >
>> > Sorry, did not help.
>> >
>>
>> Ran a few rounds of tcpdump. Seeing a significant number or duplicate
>> ACKs from the problematic machine. Not seeing them when testing
>> between this machine and an other linux box. Or the illumos machine
>> and the other linux box.
>>
>> Dumps are available here:
>>
>> http://zaytsev.su/tmp/caps/
>>
>> dump1-3 - between the problematic machine an the illumos box,
>> collected on illumos side. All show dups.
>> dump5 - between an other linux box and the illumos machine, no dups.
>> Collcted on the illumos side.
>> dump-linux - between 2 linux machines, collected on the
>> non-problematic side. No dups, no corruptions.
>>
>> 192.168.0.33 - the problematic machine.
>> 192.168.0.72 - the illumos machine.
>> 192.168.0.122 - an other linux machine.
>
> ??
>
> I dont care about duplicate acks at this point.
>
> Thats a separate issue (TCP layer)
>

Maybe some tx packets are just sent out more then once? Or a single
packet is sent out instead of some other packets?
The delays between two dups is short, and they come in bursts, up to a
few hundreds of duplicate packets at a time.

> Do you still have memory scribbles ?
Yes.

>
> I wonder if the problem is not coming from the "fast recovery" added in
> commit 32737e934a952c (PATCH: b44 Handle RX FIFO overflow better
> (simplified))
>

I've tested back to 2.6.27. I did not test all releases of course, so
maybe this was fixed, and then broken again.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 5, 2011, 4:11 a.m. UTC | #5
Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
> On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > I dont care about duplicate acks at this point.
> >
> > Thats a separate issue (TCP layer)
> >
> 
> Maybe some tx packets are just sent out more then once? Or a single
> packet is sent out instead of some other packets?
> The delays between two dups is short, and they come in bursts, up to a
> few hundreds of duplicate packets at a time.
> 

Thats a completely different problem. SSH is very expensive for your
receiver (your dump1 file has small packets (560 bytes)), and it cannot
cope with the stress.

You're filling the b44 rx ring, and b44 driver has no choice to zap 200
packets at once. This sure is a problem for tcp, as it stalls the thing.

You could avoid this by doing this at b44 machine (the receiver)

echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem

So that sender wont be able to push so many packets

> > Do you still have memory scribbles ?
> Yes.

OK

> 
> >
> > I wonder if the problem is not coming from the "fast recovery" added in
> > commit 32737e934a952c (PATCH: b44 Handle RX FIFO overflow better
> > (simplified))
> >
> 
> I've tested back to 2.6.27. I did not test all releases of course, so
> maybe this was fixed, and then broken again.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 5, 2011, 4:14 a.m. UTC | #6
Le mardi 05 juillet 2011 à 06:11 +0200, Eric Dumazet a écrit :
> Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
> > On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >
> > > I dont care about duplicate acks at this point.
> > >
> > > Thats a separate issue (TCP layer)
> > >
> > 
> > Maybe some tx packets are just sent out more then once? Or a single
> > packet is sent out instead of some other packets?
> > The delays between two dups is short, and they come in bursts, up to a
> > few hundreds of duplicate packets at a time.
> > 
> 
> Thats a completely different problem. SSH is very expensive for your
> receiver (your dump1 file has small packets (560 bytes)), and it cannot
> cope with the stress.
> 
> You're filling the b44 rx ring, and b44 driver has no choice to zap 200
> packets at once. This sure is a problem for tcp, as it stalls the thing.
> 
> You could avoid this by doing this at b44 machine (the receiver)
> 
> echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem
> 
> So that sender wont be able to push so many packets

You can also try using more packets in rx ring : (default is 200
packets, limit ~511)

ethtool -G eth0 rx 400



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Zaytsev July 5, 2011, 4:17 a.m. UTC | #7
On Tue, Jul 5, 2011 at 08:14, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 05 juillet 2011 à 06:11 +0200, Eric Dumazet a écrit :
>> Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
>> > On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > >
>> > > I dont care about duplicate acks at this point.
>> > >
>> > > Thats a separate issue (TCP layer)
>> > >
>> >
>> > Maybe some tx packets are just sent out more then once? Or a single
>> > packet is sent out instead of some other packets?
>> > The delays between two dups is short, and they come in bursts, up to a
>> > few hundreds of duplicate packets at a time.
>> >
>>
>> Thats a completely different problem. SSH is very expensive for your
>> receiver (your dump1 file has small packets (560 bytes)), and it cannot
>> cope with the stress.
>>
>> You're filling the b44 rx ring, and b44 driver has no choice to zap 200
>> packets at once. This sure is a problem for tcp, as it stalls the thing.
>>
>> You could avoid this by doing this at b44 machine (the receiver)
>>
>> echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem
>>
>> So that sender wont be able to push so many packets
>
> You can also try using more packets in rx ring : (default is 200
> packets, limit ~511)
>
> ethtool -G eth0 rx 400
>
>
Check out starting at packet 302893. 383 _identical_ ACKs were sent
out by the b44 machine within 30 milliseconds.

>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Zaytsev July 5, 2011, 4:18 a.m. UTC | #8
On Tue, Jul 5, 2011 at 08:17, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> On Tue, Jul 5, 2011 at 08:14, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le mardi 05 juillet 2011 à 06:11 +0200, Eric Dumazet a écrit :
>>> Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
>>> > On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> > >
>>> > > I dont care about duplicate acks at this point.
>>> > >
>>> > > Thats a separate issue (TCP layer)
>>> > >
>>> >
>>> > Maybe some tx packets are just sent out more then once? Or a single
>>> > packet is sent out instead of some other packets?
>>> > The delays between two dups is short, and they come in bursts, up to a
>>> > few hundreds of duplicate packets at a time.
>>> >
>>>
>>> Thats a completely different problem. SSH is very expensive for your
>>> receiver (your dump1 file has small packets (560 bytes)), and it cannot
>>> cope with the stress.
>>>
>>> You're filling the b44 rx ring, and b44 driver has no choice to zap 200
>>> packets at once. This sure is a problem for tcp, as it stalls the thing.
>>>
>>> You could avoid this by doing this at b44 machine (the receiver)
>>>
>>> echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem
>>>
>>> So that sender wont be able to push so many packets
>>
>> You can also try using more packets in rx ring : (default is 200
>> packets, limit ~511)
>>
>> ethtool -G eth0 rx 400
>>
>>
> Check out starting at packet 302893. 383 _identical_ ACKs were sent
> out by the b44 machine within 30 milliseconds.

In dump1.pcap, that is.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 5, 2011, 4:21 a.m. UTC | #9
Le mardi 05 juillet 2011 à 05:44 +0200, Eric Dumazet a écrit :

> Maybe we should do instead a fast dequeue of packets (recycling them
> instead of pushing them to upper stack) in case too many packets are
> ready to be delivered, and always make sure NIC has a reserve of
> available buffers for DMA accesses, before it can assert ISTAT_RFO
> 
> 

Another way would be to add Explicit Congestion Notification when too
many packets are received in a burst, but unfortunately not enough TCP
flows are ECN ready :)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 5, 2011, 4:25 a.m. UTC | #10
Le mardi 05 juillet 2011 à 08:17 +0400, Alexey Zaytsev a écrit :
> >
> Check out starting at packet 302893. 383 _identical_ ACKs were sent
> out by the b44 machine within 30 milliseconds.


As I said, b44 driver lost at least 200 consecutive frames (source says
recovery takes about 20 ms)

TCP then do its normal job.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Zaytsev July 5, 2011, 4:29 a.m. UTC | #11
On Tue, Jul 5, 2011 at 08:25, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 05 juillet 2011 à 08:17 +0400, Alexey Zaytsev a écrit :
>> >
>> Check out starting at packet 302893. 383 _identical_ ACKs were sent
>> out by the b44 machine within 30 milliseconds.
>
>
> As I said, b44 driver lost at least 200 consecutive frames (source says
> recovery takes about 20 ms)
>
> TCP then do its normal job.
>

From my understanding, after a frame is lost, TCP would be waiting for
a retransmit. Or at least, it would not be sending 400 duplicate ACKs
for the single last frame received, right? Let me run tcpdump on the
b44 side now. I'm quite sure I won't see any ACK dups leaving the
stack.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 5, 2011, 4:38 a.m. UTC | #12
Le mardi 05 juillet 2011 à 08:29 +0400, Alexey Zaytsev a écrit :
> On Tue, Jul 5, 2011 at 08:25, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le mardi 05 juillet 2011 à 08:17 +0400, Alexey Zaytsev a écrit :
> >> >
> >> Check out starting at packet 302893. 383 _identical_ ACKs were sent
> >> out by the b44 machine within 30 milliseconds.
> >
> >
> > As I said, b44 driver lost at least 200 consecutive frames (source says
> > recovery takes about 20 ms)
> >
> > TCP then do its normal job.
> >
> 
> From my understanding, after a frame is lost, TCP would be waiting for
> a retransmit. Or at least, it would not be sending 400 duplicate ACKs
> for the single last frame received, right? Let me run tcpdump on the
> b44 side now. I'm quite sure I won't see any ACK dups leaving the
> stack.

Wow, I believe you are on a wrong track. Honestly.

Try to unpplug the wire for 100ms, and watch your "duplicate acks
disease".

Thats exactly what is happening with b44 driver doing a "fast recovery"
right now.

Thats a moot point. Running tcpdump on your b44 machine will kill your
performance even more, it wont solve the b44 bug.

If you prefer to 'fix tcp', please open another thread.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Zaytsev July 5, 2011, 4:57 a.m. UTC | #13
On Tue, Jul 5, 2011 at 08:38, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 05 juillet 2011 à 08:29 +0400, Alexey Zaytsev a écrit :
>> On Tue, Jul 5, 2011 at 08:25, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > Le mardi 05 juillet 2011 à 08:17 +0400, Alexey Zaytsev a écrit :
>> >> >
>> >> Check out starting at packet 302893. 383 _identical_ ACKs were sent
>> >> out by the b44 machine within 30 milliseconds.
>> >
>> >
>> > As I said, b44 driver lost at least 200 consecutive frames (source says
>> > recovery takes about 20 ms)
>> >
>> > TCP then do its normal job.
>> >
>>
>> From my understanding, after a frame is lost, TCP would be waiting for
>> a retransmit. Or at least, it would not be sending 400 duplicate ACKs
>> for the single last frame received, right? Let me run tcpdump on the
>> b44 side now. I'm quite sure I won't see any ACK dups leaving the
>> stack.
>
> Wow, I believe you are on a wrong track. Honestly.
>
> Try to unpplug the wire for 100ms, and watch your "duplicate acks
> disease".
>
> Thats exactly what is happening with b44 driver doing a "fast recovery"
> right now.
>
> Thats a moot point. Running tcpdump on your b44 machine will kill your
> performance even more, it wont solve the b44 bug.
>
> If you prefer to 'fix tcp', please open another thread.

Ran tcpdump. You are right, I was wrong. Sorry for the noise.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index a69331e..51072a3 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -829,6 +829,7 @@  static int b44_rx(struct b44 *bp, int budget)
 	}
 
 	bp->rx_cons = cons;
+	wmb();
 	bw32(bp, B44_DMARX_PTR, cons * sizeof(struct dma_desc));
 
 	return received;