Message ID | 1309707971.2523.20.camel@edumazet-laptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Sun, Jul 3, 2011 at 19:46, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le dimanche 03 juillet 2011 à 01:25 +0400, Alexey Zaytsev a écrit : >> On Fri, Jul 1, 2011 at 10:01, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: >> > On Thu, Jun 30, 2011 at 01:51, Andrew Morton <akpm@linux-foundation.org> wrote: >> >> >> >> (switched to email. Please respond via emailed reply-to-all, not via the >> >> bugzilla web interface). >> >> >> >> On Thu, 23 Jun 2011 17:33:54 GMT >> >> bugzilla-daemon@bugzilla.kernel.org wrote: >> >> >> >>> https://bugzilla.kernel.org/show_bug.cgi?id=38102 >> >>> >> >>> Summary: BUG kmalloc-2048: Poison overwritten >> >>> Product: Drivers >> >>> Version: 2.5 >> >>> Kernel Version: 3.0.0-rc4 >> >> >> >> Looks like a 2.6.38->2.6.39 regression, perhaps a memory scribble in b44. >> > >> > Actually, not sure about the version. 39 was the first one I've been >> > using in the scenario. Checking older versions now. >> > And git-log does not show a lot of changes to the b44 driver, so it >> > might be something unrelated. >> > >> >> I've checked back as far as 2.6.27, and the problem is still there. >> I've also looked through the allocation-related code, and it seemed >> sane. I'm not sure I understand the 1GB dma workaround, but this path >> is never hit in my case. So adding the driver authors to CC. This >> could be something different, but I've been unable to reproduce using >> an other machine with an rtl8139 nic. > > Hmm, looking at b44 code, I believe there is a race there. > > Could you try following patch ? > This might fix a potential problem, but unfortunately did not help here. There is an other place that looks suspicious to me: 812 struct sk_buff *copy_skb; 813 814 b44_recycle_rx(bp, cons, bp->rx_prod); 815 copy_skb = netdev_alloc_skb(bp->dev, len + 2); 816 if (copy_skb == NULL) 817 goto drop_it_no_recycle; 818 819 skb_reserve(copy_skb, 2); 820 skb_put(copy_skb, len); 821 /* DMA sync done above, copy just the actual packet */ 822 skb_copy_from_linear_data_offset(skb, RX_PKT_OFFSET, 823 copy_skb->data, len); 824 skb = copy_skb; The skb is reinserted into the ring before its data is copied, it seems. But this can't be the cause of my problem, as it would lead to data corruption at most, not a write-after-free. And an other question. Why so we have the logic to work-around the 1Gb DMA limit instead of just setting the dma mask? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 4 Jul 2011 15:48:31 +0400 Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: > The skb is reinserted into the ring before its data is copied, it > seems. But this can't be the cause of my problem, as it would lead to > data corruption at most, not a write-after-free. Recycling the skb does not imply that the device can reuse it immediately. The device is told at the very end of the RX function (after the loop) that it's now safe to put stuff into the recyceled/new buffers. > And an other question. Why so we have the logic to work-around the 1Gb > DMA limit instead of just setting the dma mask? Because the DMA mask does not work correctly on all arches for masks smaller than 4G. And btw, I dont understand what that wmb() patch is supposed to fix. There may be a wmb() missing, but rather after the ctrl _and_ the address assignment to the descriptor. But I don't think this can cause this use-after-free anyway. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 04 juillet 2011 à 13:05 +0000, Michael Büsch a écrit : > On Mon, 4 Jul 2011 15:48:31 +0400 > Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote: > > The skb is reinserted into the ring before its data is copied, it > > seems. But this can't be the cause of my problem, as it would lead to > > data corruption at most, not a write-after-free. > > Recycling the skb does not imply that the device can reuse it immediately. The device is told at the very end of the RX function (after the loop) that it's now safe to put stuff into the recyceled/new buffers. > > > And an other question. Why so we have the logic to work-around the 1Gb > > DMA limit instead of just setting the dma mask? > > Because the DMA mask does not work correctly on all arches for masks smaller than 4G. > > And btw, I dont understand what that wmb() patch is supposed to fix. There may be a wmb() missing, but rather after the ctrl _and_ the address assignment to the descriptor. > But I don't think this can cause this use-after-free anyway. > I dont have the b44 specs, but : For sure, addr should be set before ctl, just in case ctl allows chip to start a dma transfert (to previous packet), because a OWN bit is unset for example... A second wmb() is not necessary. It will be done eventually at next packet (we have a ring of 200 packets) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 04 juillet 2011 à 15:48 +0400, Alexey Zaytsev a écrit : > > This might fix a potential problem, but unfortunately did not help here. > > There is an other place that looks suspicious to me: > > 812 struct sk_buff *copy_skb; > 813 > 814 b44_recycle_rx(bp, cons, bp->rx_prod); > 815 copy_skb = netdev_alloc_skb(bp->dev, len + 2); > 816 if (copy_skb == NULL) > 817 goto drop_it_no_recycle; > 818 > 819 skb_reserve(copy_skb, 2); > 820 skb_put(copy_skb, len); > 821 /* DMA sync done above, copy just the > actual packet */ > 822 skb_copy_from_linear_data_offset(skb, > RX_PKT_OFFSET, > 823 > copy_skb->data, len); > 824 skb = copy_skb; > > > The skb is reinserted into the ring before its data is copied, it > seems. But this can't be the cause of my problem, as it would lead to > data corruption at most, not a write-after-free. > > And an other question. Why so we have the logic to work-around the 1Gb > DMA limit instead of just setting the dma mask? Your problem is in RX side : NIC actually writes to a buffer that is supposedly not its property. If DMA workaround is triggered, all frames are copied, so bug has no chance to trigger, because we feed a totally new frame to upper stack, and keep reuse 'DMA' frames for the device itself. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 04 Jul 2011 15:57:02 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > I dont have the b44 specs, but : It uses the 30-bit version of the Broadcom HND engine, for which complete specs are available here: http://bcm-v4.sipsolutions.net/802.11/DMA > For sure, addr should be set before ctl, just in case ctl allows chip to > start a dma transfert (to previous packet), because a OWN bit is unset > for example... Certainly not. The device does not know about the buffer until this line: bw32(bp, B44_DMARX_PTR, cons * sizeof(struct dma_desc)); which advances the DMA descriptor pointer. However, I do think we probably need a wmb() right before that bw32() line, to make sure memory is committed before we tell the device it's OK to access it. We do this in b43, which has exactly the same DMA engine. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 04 Jul 2011 16:00:49 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > > And an other question. Why so we have the logic to work-around the 1Gb > > DMA limit instead of just setting the dma mask? > > Your problem is in RX side : NIC actually writes to a buffer that is > supposedly not its property. The problem is on both sides, because some Linux architectures simply do not support any DMA mask less than 32. This applied to i386 (IA32) last time I looked. The b44 DMA engine can only address 30-bits. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 4 Jul 2011 16:27:26 +0200
Michael Büsch <m@bues.ch> wrote:
> We do this in b43, which has exactly the same DMA engine.
(Ok, it turns out we don't do this in b43 (We only do it on the TX side).
But that's a bug. We should do a wmb() on the RX side before advancing the
descriptor ring pointer.)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 04 juillet 2011 à 16:31 +0200, Michael Büsch a écrit : > On Mon, 04 Jul 2011 16:00:49 +0200 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > And an other question. Why so we have the logic to work-around the 1Gb > > > DMA limit instead of just setting the dma mask? > > > > Your problem is in RX side : NIC actually writes to a buffer that is > > supposedly not its property. > > The problem is on both sides, because some Linux architectures simply > do not support any DMA mask less than 32. This applied to i386 (IA32) last > time I looked. > The b44 DMA engine can only address 30-bits. Michael, traces provided by Alexey are in the RX path. NIC does a DMA (Receives an UDP frame) into a 2048 bytes buffers that was freed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 04 Jul 2011 16:45:05 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le lundi 04 juillet 2011 à 16:31 +0200, Michael Büsch a écrit : > > On Mon, 04 Jul 2011 16:00:49 +0200 > > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > And an other question. Why so we have the logic to work-around the 1Gb > > > > DMA limit instead of just setting the dma mask? > > > > > > Your problem is in RX side : NIC actually writes to a buffer that is > > > supposedly not its property. > > > > The problem is on both sides, because some Linux architectures simply > > do not support any DMA mask less than 32. This applied to i386 (IA32) last > > time I looked. > > The b44 DMA engine can only address 30-bits. > > > Michael, traces provided by Alexey are in the RX path. > > NIC does a DMA (Receives an UDP frame) into a 2048 bytes buffers that > was freed. Yeah sure. That's obvious from the logs. By "the problem" I meant "the 30bit limitation". -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit : > On Mon, 4 Jul 2011 16:27:26 +0200 > Michael Büsch <m@bues.ch> wrote: > > We do this in b43, which has exactly the same DMA engine. > > (Ok, it turns out we don't do this in b43 (We only do it on the TX side). > But that's a bug. We should do a wmb() on the RX side before advancing the > descriptor ring pointer.) Also it appears rx_ring (or tx_ring ) are not necessarily 4K aligned if kzalloc() path is taken. (SL?B DEBUG -> kzalloc(4096) might not be 4K aligned) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/b44.c b/drivers/net/b44.c index a69331e..80f2fdc 100644 --- a/drivers/net/b44.c +++ b/drivers/net/b44.c @@ -689,9 +689,9 @@ static int b44_alloc_rx_skb(struct b44 *bp, int src_idx, u32 dest_idx_unmasked) ctrl |= DESC_CTRL_EOT; dp = &bp->rx_ring[dest_idx]; - dp->ctrl = cpu_to_le32(ctrl); dp->addr = cpu_to_le32((u32) mapping + bp->dma_offset); - + wmb(); + dp->ctrl = cpu_to_le32(ctrl); if (bp->flags & B44_FLAG_RX_RING_HACK) b44_sync_dma_desc_for_device(bp->sdev, bp->rx_ring_dma, dest_idx * sizeof(*dp),