diff mbox

[Bugme-new,Bug,38102] New: BUG kmalloc-2048: Poison overwritten

Message ID 20110705220644.GB12118@hmsreliant.think-freely.org
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Neil Horman July 5, 2011, 10:06 p.m. UTC
On Tue, Jul 05, 2011 at 10:15:40PM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 22:02 +0200, Eric Dumazet a écrit :
> > Le mardi 05 juillet 2011 à 15:53 -0400, Neil Horman a écrit :
> > > I think this is a goo idea, at least for testing.  It seems odd to me that we
> > > have the B44_DMARX_PTR value which indicates (ostensibly) the pointer to the
> > > descriptor to be processed next (the documentation isnt' very verbose on the
> > > subject), along with the EOT bit on a descriptor.  It seems like both the
> > > register and the bit are capable of conveying the same (or at least overlapping)
> > > information.
> > > 
> > > I think what I'm having the most trouble with is understanding when the hw looks
> > > at the EOT bit in the descriptor.  If it completes a DMA and sees the EOT bit
> > > set, does the next DMA occur to the descriptor pointed to by the DMARX_ADDR
> > > register?  Of does it stall until such time as the DMARX_PTR register is rotated
> > > around?  What if it doesn't see the EOT bit set?  Does it just keep going with
> > > the next descriptor?  
> 
> Since there is no OWN bit (at least not on the online doc I got : it
> says the rx_ring is read only by the NIC), I would say we really need to
> advance DMARX_PTR to signal NIC a new entry is available for following
> incoming frames.
> 
> This is the reason rx_pending max value is B44_RX_RING_SIZE - 1, or else
> chip could loop on a circular rx_ring.
> 
Agree, although that still leaves open the question of what exactly should get
written into the DMARX_PTR.  Is it an index of the descriptor number, or a byte
offset.

Regardless, I think we ned to fix up the looping so as to prevent an EOT reset
jumping outside of our valid ring window.  Alexey, theres better ways to do
this, but if in the interim, you could please try this patch, it makes the valid
receive window for b44 cover the entire ring, so as to avoid this problem.  It
will at least help support or refute this theory.  Note its not exactly the same
as my previous patch.  If you set the default ring pending to 512, the math in
the b44_alloc_rx_skb path is wrong, we skip the EOT bit as well as the first
entry in the ring.  At 511 it should work out properly.

Thanks
Neil


> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Michael Büsch July 6, 2011, 3:32 p.m. UTC | #1
On Tue, 5 Jul 2011 18:06:44 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:

> On Tue, Jul 05, 2011 at 10:15:40PM +0200, Eric Dumazet wrote:
> > Le mardi 05 juillet 2011 à 22:02 +0200, Eric Dumazet a écrit :
> > > Le mardi 05 juillet 2011 à 15:53 -0400, Neil Horman a écrit :
> > > > I think this is a goo idea, at least for testing.  It seems odd to me that we
> > > > have the B44_DMARX_PTR value which indicates (ostensibly) the pointer to the
> > > > descriptor to be processed next (the documentation isnt' very verbose on the
> > > > subject), along with the EOT bit on a descriptor.  It seems like both the
> > > > register and the bit are capable of conveying the same (or at least overlapping)
> > > > information.
> > > > 
> > > > I think what I'm having the most trouble with is understanding when the hw looks
> > > > at the EOT bit in the descriptor.  If it completes a DMA and sees the EOT bit
> > > > set, does the next DMA occur to the descriptor pointed to by the DMARX_ADDR
> > > > register?  Of does it stall until such time as the DMARX_PTR register is rotated
> > > > around?  What if it doesn't see the EOT bit set?  Does it just keep going with
> > > > the next descriptor?  
> > 
> > Since there is no OWN bit (at least not on the online doc I got : it
> > says the rx_ring is read only by the NIC), I would say we really need to
> > advance DMARX_PTR to signal NIC a new entry is available for following
> > incoming frames.
> > 
> > This is the reason rx_pending max value is B44_RX_RING_SIZE - 1, or else
> > chip could loop on a circular rx_ring.
> > 
> Agree, although that still leaves open the question of what exactly should get
> written into the DMARX_PTR.  Is it an index of the descriptor number, or a byte
> offset.
> 
> Regardless, I think we ned to fix up the looping so as to prevent an EOT reset
> jumping outside of our valid ring window.  Alexey, theres better ways to do
> this, but if in the interim, you could please try this patch, it makes the valid
> receive window for b44 cover the entire ring, so as to avoid this problem.  It
> will at least help support or refute this theory.  Note its not exactly the same
> as my previous patch.  If you set the default ring pending to 512, the math in
> the b44_alloc_rx_skb path is wrong, we skip the EOT bit as well as the first
> entry in the ring.  At 511 it should work out properly.
> 
> Thanks
> Neil
> 
> 
> diff --git a/drivers/net/b44.c b/drivers/net/b44.c
> index 3d247f3..b7f5ed1 100644
> --- a/drivers/net/b44.c
> +++ b/drivers/net/b44.c
> @@ -57,7 +57,7 @@
>  #define B44_MAX_MTU			1500
>  
>  #define B44_RX_RING_SIZE		512
> -#define B44_DEF_RX_RING_PENDING		200
> +#define B44_DEF_RX_RING_PENDING		511
>  #define B44_RX_RING_BYTES	(sizeof(struct dma_desc) * \
>  				 B44_RX_RING_SIZE)
>  #define B44_TX_RING_SIZE		512

You guys are mixing up quite a bit of stuff here...

The EOT bit has _nothing_ to do with the descriptor pointers.
It simply marks the last descriptor in the (linear) descriptor
page, so that it becomes an actual ring:

   DDDDDDDDDDDDDDDDDDDDDDDDDDDE
   |                          O
   |                          T
   ^--------------------------|

It doesn't say anything about the read and write pointers
to the ring.

The B44_DMARX_PTR is the write-end pointer. It points one entry
beyond the end of the write area. Then there's the software pointer
where we keep track of the read position.

   -rx_cons     DMARX_PTR-
   v                     v
   DDDDDDDDDDDDDDDDDDDDDDDDDDE
   ^                    ^    O
   |                    |    T
   Device might write from
   here to here.

On an RX interrupt (or poll), we read the _actual_ device write
pointer. (B44_DMARX_STAT & DMARX_STAT_CDMASK). If that is equal
to our stored rx_cons, the device didn't write anything.
So we read buffers until we hit the _actual_ device write pointer.
So rx_cons is equal to (B44_DMARX_STAT & DMARX_STAT_CDMASK), except
that it lags behind by one IRQ/poll.
If we read are done, we set the DMARX_PTR write pointer to one desc
beyond the buffer that we just ate. So the device is free to continue
writing the ring _up to_ the position we left.

I don't know why b44 sets the DMARX_PTR to 200 initially (which is 40
descriptors, as this is a byte pointer). This seems kind of arbitrary.
In b43 we set it to (NR_OF_DESCRIPTORS - 1) * sizeof(descriptor).
But I don't think it really matters. It only limits the usable DMA
area before the first interrupt (or poll) occurs. After the final
B44_DMARX_PTR write in b44_rx(), the full descriptor range (well, minus one)
will be usable.

Summary: I don't see where the DMA engine code is broken (except for
the minor missing wmb(), which doesn't trigger this memory corruption, though)

I hope that helps to clear up stuff...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 6, 2011, 4 p.m. UTC | #2
Le mercredi 06 juillet 2011 à 17:32 +0200, Michael Büsch a écrit :

> You guys are mixing up quite a bit of stuff here...
> 
> The EOT bit has _nothing_ to do with the descriptor pointers.
> It simply marks the last descriptor in the (linear) descriptor
> page, so that it becomes an actual ring:
> 
>    DDDDDDDDDDDDDDDDDDDDDDDDDDDE
>    |                          O
>    |                          T
>    ^--------------------------|
> 
> It doesn't say anything about the read and write pointers
> to the ring.
> 
> The B44_DMARX_PTR is the write-end pointer. It points one entry
> beyond the end of the write area. Then there's the software pointer
> where we keep track of the read position.
> 
>    -rx_cons     DMARX_PTR-
>    v                     v
>    DDDDDDDDDDDDDDDDDDDDDDDDDDE
>    ^                    ^    O
>    |                    |    T
>    Device might write from
>    here to here.
> 
> On an RX interrupt (or poll), we read the _actual_ device write
> pointer. (B44_DMARX_STAT & DMARX_STAT_CDMASK). If that is equal
> to our stored rx_cons, the device didn't write anything.
> So we read buffers until we hit the _actual_ device write pointer.
> So rx_cons is equal to (B44_DMARX_STAT & DMARX_STAT_CDMASK), except
> that it lags behind by one IRQ/poll.
> If we read are done, we set the DMARX_PTR write pointer to one desc
> beyond the buffer that we just ate. So the device is free to continue
> writing the ring _up to_ the position we left.

Not exactly :

If we read one skb at descriptor 0, we prepare a new buffer on slot 200,
and advance DMARX_PTR to 201*sizeof(descriptor).

> 
> I don't know why b44 sets the DMARX_PTR to 200 initially (which is 40
> descriptors, as this is a byte pointer). This seems kind of arbitrary.
> In b43 we set it to (NR_OF_DESCRIPTORS - 1) * sizeof(descriptor).
> But I don't think it really matters. It only limits the usable DMA
> area before the first interrupt (or poll) occurs. After the final
> B44_DMARX_PTR write in b44_rx(), the full descriptor range (well, minus one)
> will be usable.

Yes, this is probably a small bug, we should fix it for correctness.

> 
> Summary: I don't see where the DMA engine code is broken (except for
> the minor missing wmb(), which doesn't trigger this memory corruption, though)
> 
> I hope that helps to clear up stuff...

Well, you describe (nicely, thanks !) your understanding of how work the
driver and chip.

Problem is we suspect a wrong statement or wrong hardware ;)

Another problem is Alexey doesnt answer anymore, and I dont have this
(old) hardware...

Other point : Do you know why b44_get_ringparam() doesnt  set
ering->tx_max_pending and ering->tx_pending

The comment seems wrong :
/* XXX ethtool lacks a tx_max_pending, oops... */



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Büsch July 6, 2011, 4:12 p.m. UTC | #3
On Wed, 06 Jul 2011 18:00:19 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Not exactly :
> 
> If we read one skb at descriptor 0, we prepare a new buffer on slot 200,
> and advance DMARX_PTR to 201*sizeof(descriptor).

I don't think so. Why do you think this is the case?
We allocate a new descriptor buffer for the consumed buffer at exactly
the same place (which is "cons").
(Alternatively, we leave the buffer in place, and just copy the data to a new buffer).
And DMARX_PTR is updated to the last "cons", which is one beyond
the last buffer that we consumed (and pushed up the net stack).

(Note that "beyond" always means "beyond" in the sense of a DMA _ring_,
which honors the EOT bit. This is done by masking cons with RX_RING_SIZE-1,
which is thus assumed to be a power of two).

> > I don't know why b44 sets the DMARX_PTR to 200 initially (which is 40
> > descriptors, as this is a byte pointer). This seems kind of arbitrary.
> > In b43 we set it to (NR_OF_DESCRIPTORS - 1) * sizeof(descriptor).
> > But I don't think it really matters. It only limits the usable DMA
> > area before the first interrupt (or poll) occurs. After the final
> > B44_DMARX_PTR write in b44_rx(), the full descriptor range (well, minus one)
> > will be usable.
> 
> Yes, this is probably a small bug, we should fix it for correctness.

I wouldn't call this a bug.

> > Summary: I don't see where the DMA engine code is broken (except for
> > the minor missing wmb(), which doesn't trigger this memory corruption, though)
> > 
> > I hope that helps to clear up stuff...
> 
> Well, you describe (nicely, thanks !) your understanding of how work the
> driver and chip.
> 
> Problem is we suspect a wrong statement or wrong hardware ;)
>
> Another problem is Alexey doesnt answer anymore, and I dont have this
> (old) hardware...

I do really think that either
1) His device is broken. Chips break. I already saw several devices
with HND DMA engine that broke down after some time of use.
or
2) The bug is at another place, but not in the lowlevel DMA handling.

Could this possibly be some kind of context issue? threaded IRQs?
The net subsys was rather picky about context, last time I looked
into it. see the .._ni() style functions.

> Other point : Do you know why b44_get_ringparam() doesnt  set
> ering->tx_max_pending and ering->tx_pending
> 
> The comment seems wrong :
> /* XXX ethtool lacks a tx_max_pending, oops... */

Well, I _guess_ that ering->tx_max_pending was added later? (I didn't
even check if it's there _now_, though.)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 6, 2011, 4:35 p.m. UTC | #4
Le mercredi 06 juillet 2011 à 18:12 +0200, Michael Büsch a écrit :
> On Wed, 06 Jul 2011 18:00:19 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Not exactly :
> > 
> > If we read one skb at descriptor 0, we prepare a new buffer on slot 200,
> > and advance DMARX_PTR to 201*sizeof(descriptor).
> 
> I don't think so. Why do you think this is the case?
> We allocate a new descriptor buffer for the consumed buffer at exactly
> the same place (which is "cons").
> (Alternatively, we leave the buffer in place, and just copy the data to a new buffer).
> And DMARX_PTR is updated to the last "cons", which is one beyond
> the last buffer that we consumed (and pushed up the net stack).

Not at all.

If it was true, b44_recycle_rx() would not even exist as is.

When we allocate a new buffer, we put it at rx_prod index, which is the
slot _after_ window, not the first slot.

First time we dequeue a packet from NIC, rx_prod is something like 200


Oh, it seems we do the following in b44_init_hw()

bp->rx_prod = bp->rx_pending;

But this seems completely wrong, if b44_init_rings() was not able to
allocate rx_pending buffers (b44_alloc_rx_skb() can return NULL)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 3d247f3..b7f5ed1 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -57,7 +57,7 @@ 
 #define B44_MAX_MTU			1500
 
 #define B44_RX_RING_SIZE		512
-#define B44_DEF_RX_RING_PENDING		200
+#define B44_DEF_RX_RING_PENDING		511
 #define B44_RX_RING_BYTES	(sizeof(struct dma_desc) * \
 				 B44_RX_RING_SIZE)
 #define B44_TX_RING_SIZE		512