diff mbox

[Bugme-new,Bug,38102] New: BUG kmalloc-2048: Poison overwritten

Message ID 1309707971.2523.20.camel@edumazet-laptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet July 3, 2011, 3:46 p.m. UTC
Le dimanche 03 juillet 2011 à 01:25 +0400, Alexey Zaytsev a écrit :
> On Fri, Jul 1, 2011 at 10:01, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> > On Thu, Jun 30, 2011 at 01:51, Andrew Morton <akpm@linux-foundation.org> wrote:
> >>
> >> (switched to email.  Please respond via emailed reply-to-all, not via the
> >> bugzilla web interface).
> >>
> >> On Thu, 23 Jun 2011 17:33:54 GMT
> >> bugzilla-daemon@bugzilla.kernel.org wrote:
> >>
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=38102
> >>>
> >>>            Summary: BUG kmalloc-2048: Poison overwritten
> >>>            Product: Drivers
> >>>            Version: 2.5
> >>>     Kernel Version: 3.0.0-rc4
> >>
> >> Looks like a 2.6.38->2.6.39 regression, perhaps a memory scribble in b44.
> >
> > Actually, not sure about the version. 39 was the first one I've been
> > using in the scenario. Checking older versions now.
> > And git-log does not show a lot of changes to the b44 driver, so it
> > might be something unrelated.
> >
> 
> I've checked back as far as 2.6.27, and the problem is still there.
> I've also looked through the allocation-related code, and it seemed
> sane. I'm not sure I understand the 1GB dma workaround, but this path
> is never hit in my case. So adding the driver authors to CC. This
> could be something different, but I've been unable to reproduce using
> an other machine with an rtl8139 nic.

Hmm, looking at b44 code, I believe there is a race there.

Could you try following patch ?

Thanks



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alexey Zaytsev July 4, 2011, 11:48 a.m. UTC | #1
On Sun, Jul 3, 2011 at 19:46, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le dimanche 03 juillet 2011 à 01:25 +0400, Alexey Zaytsev a écrit :
>> On Fri, Jul 1, 2011 at 10:01, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
>> > On Thu, Jun 30, 2011 at 01:51, Andrew Morton <akpm@linux-foundation.org> wrote:
>> >>
>> >> (switched to email.  Please respond via emailed reply-to-all, not via the
>> >> bugzilla web interface).
>> >>
>> >> On Thu, 23 Jun 2011 17:33:54 GMT
>> >> bugzilla-daemon@bugzilla.kernel.org wrote:
>> >>
>> >>> https://bugzilla.kernel.org/show_bug.cgi?id=38102
>> >>>
>> >>>            Summary: BUG kmalloc-2048: Poison overwritten
>> >>>            Product: Drivers
>> >>>            Version: 2.5
>> >>>     Kernel Version: 3.0.0-rc4
>> >>
>> >> Looks like a 2.6.38->2.6.39 regression, perhaps a memory scribble in b44.
>> >
>> > Actually, not sure about the version. 39 was the first one I've been
>> > using in the scenario. Checking older versions now.
>> > And git-log does not show a lot of changes to the b44 driver, so it
>> > might be something unrelated.
>> >
>>
>> I've checked back as far as 2.6.27, and the problem is still there.
>> I've also looked through the allocation-related code, and it seemed
>> sane. I'm not sure I understand the 1GB dma workaround, but this path
>> is never hit in my case. So adding the driver authors to CC. This
>> could be something different, but I've been unable to reproduce using
>> an other machine with an rtl8139 nic.
>
> Hmm, looking at b44 code, I believe there is a race there.
>
> Could you try following patch ?
>

This might fix a potential problem, but unfortunately did not help here.

There is an other place that looks suspicious to me:

 812                         struct sk_buff *copy_skb;
 813
 814                         b44_recycle_rx(bp, cons, bp->rx_prod);
 815                         copy_skb = netdev_alloc_skb(bp->dev, len + 2);
 816                         if (copy_skb == NULL)
 817                                 goto drop_it_no_recycle;
 818
 819                         skb_reserve(copy_skb, 2);
 820                         skb_put(copy_skb, len);
 821                         /* DMA sync done above, copy just the
actual packet */
 822                         skb_copy_from_linear_data_offset(skb,
RX_PKT_OFFSET,
 823
copy_skb->data, len);
 824                         skb = copy_skb;


The skb is reinserted into the ring before its data is copied, it
seems. But this can't be the cause of my problem, as it would lead to
data corruption at most, not a write-after-free.

And an other question. Why so we have the logic to work-around the 1Gb
DMA limit instead of just setting the dma mask?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Büsch July 4, 2011, 1:05 p.m. UTC | #2
On Mon, 4 Jul 2011 15:48:31 +0400
Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> The skb is reinserted into the ring before its data is copied, it
> seems. But this can't be the cause of my problem, as it would lead to
> data corruption at most, not a write-after-free.

Recycling the skb does not imply that the device can reuse it immediately. The device is told at the very end of the RX function (after the loop) that it's now safe to put stuff into the recyceled/new buffers.

> And an other question. Why so we have the logic to work-around the 1Gb
> DMA limit instead of just setting the dma mask?

Because the DMA mask does not work correctly on all arches for masks smaller than 4G.

And btw, I dont understand what that wmb() patch is supposed to fix. There may be a wmb() missing, but rather after the ctrl _and_ the address assignment to the descriptor.
But I don't think this can cause this use-after-free anyway.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 4, 2011, 1:57 p.m. UTC | #3
Le lundi 04 juillet 2011 à 13:05 +0000, Michael Büsch a écrit :
> On Mon, 4 Jul 2011 15:48:31 +0400
> Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> > The skb is reinserted into the ring before its data is copied, it
> > seems. But this can't be the cause of my problem, as it would lead to
> > data corruption at most, not a write-after-free.
> 
> Recycling the skb does not imply that the device can reuse it immediately. The device is told at the very end of the RX function (after the loop) that it's now safe to put stuff into the recyceled/new buffers.
> 
> > And an other question. Why so we have the logic to work-around the 1Gb
> > DMA limit instead of just setting the dma mask?
> 
> Because the DMA mask does not work correctly on all arches for masks smaller than 4G.
> 
> And btw, I dont understand what that wmb() patch is supposed to fix. There may be a wmb() missing, but rather after the ctrl _and_ the address assignment to the descriptor.
> But I don't think this can cause this use-after-free anyway.
> 

I dont have the b44 specs, but :

For sure, addr should be set before ctl, just in case ctl allows chip to
start a dma transfert (to previous packet), because a OWN bit is unset
for example...

A second wmb() is not necessary.
It will be done eventually at next packet (we have a ring of 200
packets)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 4, 2011, 2 p.m. UTC | #4
Le lundi 04 juillet 2011 à 15:48 +0400, Alexey Zaytsev a écrit :

> 
> This might fix a potential problem, but unfortunately did not help here.
> 
> There is an other place that looks suspicious to me:
> 
>  812                         struct sk_buff *copy_skb;
>  813
>  814                         b44_recycle_rx(bp, cons, bp->rx_prod);
>  815                         copy_skb = netdev_alloc_skb(bp->dev, len + 2);
>  816                         if (copy_skb == NULL)
>  817                                 goto drop_it_no_recycle;
>  818
>  819                         skb_reserve(copy_skb, 2);
>  820                         skb_put(copy_skb, len);
>  821                         /* DMA sync done above, copy just the
> actual packet */
>  822                         skb_copy_from_linear_data_offset(skb,
> RX_PKT_OFFSET,
>  823
> copy_skb->data, len);
>  824                         skb = copy_skb;
> 
> 
> The skb is reinserted into the ring before its data is copied, it
> seems. But this can't be the cause of my problem, as it would lead to
> data corruption at most, not a write-after-free.
> 
> And an other question. Why so we have the logic to work-around the 1Gb
> DMA limit instead of just setting the dma mask?

Your problem is in RX side : NIC actually writes to a buffer that is
supposedly not its property.

If DMA workaround is triggered, all frames are copied, so bug has no
chance to trigger, because we feed a totally new frame to upper stack,
and keep reuse 'DMA' frames for the device itself.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Büsch July 4, 2011, 2:27 p.m. UTC | #5
On Mon, 04 Jul 2011 15:57:02 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> I dont have the b44 specs, but :

It uses the 30-bit version of the Broadcom HND engine, for which complete
specs are available here:
http://bcm-v4.sipsolutions.net/802.11/DMA

> For sure, addr should be set before ctl, just in case ctl allows chip to
> start a dma transfert (to previous packet), because a OWN bit is unset
> for example...

Certainly not.
The device does not know about the buffer until this line:
 bw32(bp, B44_DMARX_PTR, cons * sizeof(struct dma_desc));
which advances the DMA descriptor pointer.
However, I do think we probably need a wmb() right before that bw32() line, to make sure
memory is committed before we tell the device it's OK to access it.
We do this in b43, which has exactly the same DMA engine.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Büsch July 4, 2011, 2:31 p.m. UTC | #6
On Mon, 04 Jul 2011 16:00:49 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > And an other question. Why so we have the logic to work-around the 1Gb
> > DMA limit instead of just setting the dma mask?
> 
> Your problem is in RX side : NIC actually writes to a buffer that is
> supposedly not its property.

The problem is on both sides, because some Linux architectures simply
do not support any DMA mask less than 32. This applied to i386 (IA32) last
time I looked.
The b44 DMA engine can only address 30-bits.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Büsch July 4, 2011, 2:43 p.m. UTC | #7
On Mon, 4 Jul 2011 16:27:26 +0200
Michael Büsch <m@bues.ch> wrote:
> We do this in b43, which has exactly the same DMA engine.

(Ok, it turns out we don't do this in b43 (We only do it on the TX side).
 But that's a bug. We should do a wmb() on the RX side before advancing the
 descriptor ring pointer.)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 4, 2011, 2:45 p.m. UTC | #8
Le lundi 04 juillet 2011 à 16:31 +0200, Michael Büsch a écrit :
> On Mon, 04 Jul 2011 16:00:49 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > And an other question. Why so we have the logic to work-around the 1Gb
> > > DMA limit instead of just setting the dma mask?
> > 
> > Your problem is in RX side : NIC actually writes to a buffer that is
> > supposedly not its property.
> 
> The problem is on both sides, because some Linux architectures simply
> do not support any DMA mask less than 32. This applied to i386 (IA32) last
> time I looked.
> The b44 DMA engine can only address 30-bits.


Michael, traces provided by Alexey are in the RX path.

NIC does a DMA  (Receives an UDP frame) into a 2048 bytes buffers that
was freed.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Büsch July 4, 2011, 2:51 p.m. UTC | #9
On Mon, 04 Jul 2011 16:45:05 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le lundi 04 juillet 2011 à 16:31 +0200, Michael Büsch a écrit :
> > On Mon, 04 Jul 2011 16:00:49 +0200
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > > And an other question. Why so we have the logic to work-around the 1Gb
> > > > DMA limit instead of just setting the dma mask?
> > > 
> > > Your problem is in RX side : NIC actually writes to a buffer that is
> > > supposedly not its property.
> > 
> > The problem is on both sides, because some Linux architectures simply
> > do not support any DMA mask less than 32. This applied to i386 (IA32) last
> > time I looked.
> > The b44 DMA engine can only address 30-bits.
> 
> 
> Michael, traces provided by Alexey are in the RX path.
> 
> NIC does a DMA  (Receives an UDP frame) into a 2048 bytes buffers that
> was freed.

Yeah sure. That's obvious from the logs.
By "the problem" I meant "the 30bit limitation".
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 4, 2011, 2:53 p.m. UTC | #10
Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
> On Mon, 4 Jul 2011 16:27:26 +0200
> Michael Büsch <m@bues.ch> wrote:
> > We do this in b43, which has exactly the same DMA engine.
> 
> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>  But that's a bug. We should do a wmb() on the RX side before advancing the
>  descriptor ring pointer.)


Also it appears rx_ring (or tx_ring ) are not necessarily 4K aligned if
kzalloc() path is taken.

(SL?B DEBUG -> kzalloc(4096) might not be 4K aligned)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index a69331e..80f2fdc 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -689,9 +689,9 @@  static int b44_alloc_rx_skb(struct b44 *bp, int src_idx, u32 dest_idx_unmasked)
 		ctrl |= DESC_CTRL_EOT;
 
 	dp = &bp->rx_ring[dest_idx];
-	dp->ctrl = cpu_to_le32(ctrl);
 	dp->addr = cpu_to_le32((u32) mapping + bp->dma_offset);
-
+	wmb();
+	dp->ctrl = cpu_to_le32(ctrl);
 	if (bp->flags & B44_FLAG_RX_RING_HACK)
 		b44_sync_dma_desc_for_device(bp->sdev, bp->rx_ring_dma,
 			                    dest_idx * sizeof(*dp),