diff mbox

[Bugme-new,Bug,38102] New: BUG kmalloc-2048: Poison overwritten

Message ID 20110705180650.GF2959@hmsreliant.think-freely.org
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Neil Horman July 5, 2011, 6:06 p.m. UTC
On Tue, Jul 05, 2011 at 06:47:21PM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 12:42 -0400, Neil Horman a écrit :
> > On Tue, Jul 05, 2011 at 06:12:32PM +0200, Eric Dumazet wrote:
> 
> > > So all descriptors before prod are guaranteed to be ready for host
> > > consume... Fact that a dma access is running on 'next descriptor' should
> > > be irrelevant.
> > > 
> > But we handle more than one descriptor per b44_rx call - theres a while loop in
> > there where we do advance to the next descriptor.
> 
> Yes, but we advance up to 'prod', which is the very last safe
> descriptor.
> 
> If hardware advertises descriptor X being ready to be handled by host,
> while DMA on this X descriptor is not yet finished, this would be a
> really useless hardware ;)
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Something else just jumped out at me.  During b44_open, we call b44_init_rings.
This function allocates bp->rx_pending skb's and iteratively puts them in the rx
dma ring. bp->rx_pending is initalized to B44_DEF_RX_RING_PENDING, which is
defined as 200 (just about half of the 512 entries that the dma ring actually
supports in the hardware.  This is normally ok, as subsequent calls to
b44_alloc_rx_skb will fill in entries in the ring as those skbs are consumed.
The problem with this however is that b44_alloc_rx_skb only sets the
DESC_CTRL_EOT bit in the descriptor of the 512th entry, indicating that the
hardware should wrap around and reset the index counter.  If a large volume of
traffic is pushed through the adapter early on after initalization, or if the
cpu is busy during init, it would be possible that the ring buffer would fill up
prior to having additional entries added to the ring, the result being that the
dma engine would reach the end of the allocated descriptors, not see an EOT bit
set, and continue on using unallocated descriptors.

Just a theory, but it would be interesting to see if the problem subsided if you
ensured that you allocated  a full descriptor ring on b44_open
Neil
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet July 5, 2011, 6:13 p.m. UTC | #1
Le mardi 05 juillet 2011 à 14:06 -0400, Neil Horman a écrit :
> On Tue, Jul 05, 2011 at 06:47:21PM +0200, Eric Dumazet wrote:
> > Le mardi 05 juillet 2011 à 12:42 -0400, Neil Horman a écrit :
> > > On Tue, Jul 05, 2011 at 06:12:32PM +0200, Eric Dumazet wrote:
> > 
> > > > So all descriptors before prod are guaranteed to be ready for host
> > > > consume... Fact that a dma access is running on 'next descriptor' should
> > > > be irrelevant.
> > > > 
> > > But we handle more than one descriptor per b44_rx call - theres a while loop in
> > > there where we do advance to the next descriptor.
> > 
> > Yes, but we advance up to 'prod', which is the very last safe
> > descriptor.
> > 
> > If hardware advertises descriptor X being ready to be handled by host,
> > while DMA on this X descriptor is not yet finished, this would be a
> > really useless hardware ;)
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
> Something else just jumped out at me.  During b44_open, we call b44_init_rings.
> This function allocates bp->rx_pending skb's and iteratively puts them in the rx
> dma ring. bp->rx_pending is initalized to B44_DEF_RX_RING_PENDING, which is
> defined as 200 (just about half of the 512 entries that the dma ring actually
> supports in the hardware.  This is normally ok, as subsequent calls to
> b44_alloc_rx_skb will fill in entries in the ring as those skbs are consumed.
> The problem with this however is that b44_alloc_rx_skb only sets the
> DESC_CTRL_EOT bit in the descriptor of the 512th entry, indicating that the
> hardware should wrap around and reset the index counter.  If a large volume of
> traffic is pushed through the adapter early on after initalization, or if the
> cpu is busy during init, it would be possible that the ring buffer would fill up
> prior to having additional entries added to the ring, the result being that the
> dma engine would reach the end of the allocated descriptors, not see an EOT bit
> set, and continue on using unallocated descriptors.
> 
> Just a theory, but it would be interesting to see if the problem subsided if you
> ensured that you allocated  a full descriptor ring on b44_open
> Neil
>  
> diff --git a/drivers/net/b44.c b/drivers/net/b44.c
> index 3d247f3..1b58a7c 100644
> --- a/drivers/net/b44.c
> +++ b/drivers/net/b44.c
> @@ -57,7 +57,7 @@
>  #define B44_MAX_MTU			1500
>  
>  #define B44_RX_RING_SIZE		512
> -#define B44_DEF_RX_RING_PENDING		200
> +#define B44_DEF_RX_RING_PENDING		512
>  #define B44_RX_RING_BYTES	(sizeof(struct dma_desc) * \
>  				 B44_RX_RING_SIZE)
>  #define B44_TX_RING_SIZE		512

No

Please take time to read the driver again.

200 desc are setup, and NIC is not allowed to use more than 200 descs.

( B44_DMARX_PTR )

We carefuly advance this pointer after a new desc(s) is(are) setup



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 5, 2011, 6:32 p.m. UTC | #2
Le mardi 05 juillet 2011 à 20:13 +0200, Eric Dumazet a écrit :
> Le mardi 05 juillet 2011 à 14:06 -0400, Neil Horman a écrit :
> > On Tue, Jul 05, 2011 at 06:47:21PM +0200, Eric Dumazet wrote:
> > > Le mardi 05 juillet 2011 à 12:42 -0400, Neil Horman a écrit :
> > > > On Tue, Jul 05, 2011 at 06:12:32PM +0200, Eric Dumazet wrote:
> > > 
> > > > > So all descriptors before prod are guaranteed to be ready for host
> > > > > consume... Fact that a dma access is running on 'next descriptor' should
> > > > > be irrelevant.
> > > > > 
> > > > But we handle more than one descriptor per b44_rx call - theres a while loop in
> > > > there where we do advance to the next descriptor.
> > > 
> > > Yes, but we advance up to 'prod', which is the very last safe
> > > descriptor.
> > > 
> > > If hardware advertises descriptor X being ready to be handled by host,
> > > while DMA on this X descriptor is not yet finished, this would be a
> > > really useless hardware ;)
> > > 
> > > 
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > 
> > Something else just jumped out at me.  During b44_open, we call b44_init_rings.
> > This function allocates bp->rx_pending skb's and iteratively puts them in the rx
> > dma ring. bp->rx_pending is initalized to B44_DEF_RX_RING_PENDING, which is
> > defined as 200 (just about half of the 512 entries that the dma ring actually
> > supports in the hardware.  This is normally ok, as subsequent calls to
> > b44_alloc_rx_skb will fill in entries in the ring as those skbs are consumed.
> > The problem with this however is that b44_alloc_rx_skb only sets the
> > DESC_CTRL_EOT bit in the descriptor of the 512th entry, indicating that the
> > hardware should wrap around and reset the index counter.  If a large volume of
> > traffic is pushed through the adapter early on after initalization, or if the
> > cpu is busy during init, it would be possible that the ring buffer would fill up
> > prior to having additional entries added to the ring, the result being that the
> > dma engine would reach the end of the allocated descriptors, not see an EOT bit
> > set, and continue on using unallocated descriptors.
> > 
> > Just a theory, but it would be interesting to see if the problem subsided if you
> > ensured that you allocated  a full descriptor ring on b44_open
> > Neil
> >  
> > diff --git a/drivers/net/b44.c b/drivers/net/b44.c
> > index 3d247f3..1b58a7c 100644
> > --- a/drivers/net/b44.c
> > +++ b/drivers/net/b44.c
> > @@ -57,7 +57,7 @@
> >  #define B44_MAX_MTU			1500
> >  
> >  #define B44_RX_RING_SIZE		512
> > -#define B44_DEF_RX_RING_PENDING		200
> > +#define B44_DEF_RX_RING_PENDING		512
> >  #define B44_RX_RING_BYTES	(sizeof(struct dma_desc) * \
> >  				 B44_RX_RING_SIZE)
> >  #define B44_TX_RING_SIZE		512
> 
> No
> 
> Please take time to read the driver again.
> 
> 200 desc are setup, and NIC is not allowed to use more than 200 descs.
> 
> ( B44_DMARX_PTR )
> 
> We carefuly advance this pointer after a new desc(s) is(are) setup
> 
> 

Then, maybe the driver model is completely wrong, and should really
setup 512 buffers, or use less descs but set EOT on last one.

Currently it uses a 200 sliding window out of the 512 descs.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 5, 2011, 6:45 p.m. UTC | #3
Le mardi 05 juillet 2011 à 20:32 +0200, Eric Dumazet a écrit :

> Then, maybe the driver model is completely wrong, and should really
> setup 512 buffers, or use less descs but set EOT on last one.
> 
> Currently it uses a 200 sliding window out of the 512 descs.
> 
> 

One thing we could do would be to allocate a special guard buffer and
set all 'out of window' descriptors to point to this guard buffer, and
periodically check if buffer is dirtied by the card.

(first word would be enough)

(instead of setting desc->addr to NULL, set to
dma_map_single(guard_buffer))



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman July 5, 2011, 7:53 p.m. UTC | #4
On Tue, Jul 05, 2011 at 08:45:16PM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 20:32 +0200, Eric Dumazet a écrit :
> 
> > Then, maybe the driver model is completely wrong, and should really
> > setup 512 buffers, or use less descs but set EOT on last one.
> > 
> > Currently it uses a 200 sliding window out of the 512 descs.
> > 
> > 
> 
> One thing we could do would be to allocate a special guard buffer and
> set all 'out of window' descriptors to point to this guard buffer, and
> periodically check if buffer is dirtied by the card.
> 
> (first word would be enough)
> 
> (instead of setting desc->addr to NULL, set to
> dma_map_single(guard_buffer))
> 
I think this is a goo idea, at least for testing.  It seems odd to me that we
have the B44_DMARX_PTR value which indicates (ostensibly) the pointer to the
descriptor to be processed next (the documentation isnt' very verbose on the
subject), along with the EOT bit on a descriptor.  It seems like both the
register and the bit are capable of conveying the same (or at least overlapping)
information.

I think what I'm having the most trouble with is understanding when the hw looks
at the EOT bit in the descriptor.  If it completes a DMA and sees the EOT bit
set, does the next DMA occur to the descriptor pointed to by the DMARX_ADDR
register?  Of does it stall until such time as the DMARX_PTR register is rotated
around?  What if it doesn't see the EOT bit set?  Does it just keep going with
the next descriptor?  

Also, there seems to be some inconsistency in the settnig of the B44_DMARX_PTR
register.  In bnx2_init_hw its set to the value of bp->rx_pending, which is
defined as being 200.  But in b44_rx its advanced by sizeof(struct dma_desc) for
every iteration.  So in b44_init_hw we write the value 200 to it, ostensibly
indicating a limit of 200 descriptors, but in b44_rx we iteratively write the
values 0, 8, 16, 24...4*n to the register to indicate which descriptor we're
indexing?  Something really doesn't sit right with me there.  In the former case
we treat the register as holding  number of entries, and in the latter we treat
it as holding a byte offset into an array.  Or am I missing something?

Regards
Neil

> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 5, 2011, 8:02 p.m. UTC | #5
Le mardi 05 juillet 2011 à 15:53 -0400, Neil Horman a écrit :
> I think this is a goo idea, at least for testing.  It seems odd to me that we
> have the B44_DMARX_PTR value which indicates (ostensibly) the pointer to the
> descriptor to be processed next (the documentation isnt' very verbose on the
> subject), along with the EOT bit on a descriptor.  It seems like both the
> register and the bit are capable of conveying the same (or at least overlapping)
> information.
> 
> I think what I'm having the most trouble with is understanding when the hw looks
> at the EOT bit in the descriptor.  If it completes a DMA and sees the EOT bit
> set, does the next DMA occur to the descriptor pointed to by the DMARX_ADDR
> register?  Of does it stall until such time as the DMARX_PTR register is rotated
> around?  What if it doesn't see the EOT bit set?  Does it just keep going with
> the next descriptor?  
> 
> Also, there seems to be some inconsistency in the settnig of the B44_DMARX_PTR
> register.  In bnx2_init_hw its set to the value of bp->rx_pending, which is
> defined as being 200.  But in b44_rx its advanced by sizeof(struct dma_desc) for
> every iteration.  So in b44_init_hw we write the value 200 to it, ostensibly
> indicating a limit of 200 descriptors, but in b44_rx we iteratively write the
> values 0, 8, 16, 24...4*n to the register to indicate which descriptor we're
> indexing?  Something really doesn't sit right with me there.  In the former case
> we treat the register as holding  number of entries, and in the latter we treat
> it as holding a byte offset into an array.  Or am I missing something?
> 

Yes, definitely this needs some clarification.

More over, when we hit the last entry (currently at slot 511), the EOT
instructs hardware to go back to slot 0, while our real window is
511-200 -> 511 . Slot 0 contains garbage (well, an old value)

Its late here so I dont plan to send a patch before 8/10 hours.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 5, 2011, 8:15 p.m. UTC | #6
Le mardi 05 juillet 2011 à 22:02 +0200, Eric Dumazet a écrit :
> Le mardi 05 juillet 2011 à 15:53 -0400, Neil Horman a écrit :
> > I think this is a goo idea, at least for testing.  It seems odd to me that we
> > have the B44_DMARX_PTR value which indicates (ostensibly) the pointer to the
> > descriptor to be processed next (the documentation isnt' very verbose on the
> > subject), along with the EOT bit on a descriptor.  It seems like both the
> > register and the bit are capable of conveying the same (or at least overlapping)
> > information.
> > 
> > I think what I'm having the most trouble with is understanding when the hw looks
> > at the EOT bit in the descriptor.  If it completes a DMA and sees the EOT bit
> > set, does the next DMA occur to the descriptor pointed to by the DMARX_ADDR
> > register?  Of does it stall until such time as the DMARX_PTR register is rotated
> > around?  What if it doesn't see the EOT bit set?  Does it just keep going with
> > the next descriptor?  

Since there is no OWN bit (at least not on the online doc I got : it
says the rx_ring is read only by the NIC), I would say we really need to
advance DMARX_PTR to signal NIC a new entry is available for following
incoming frames.

This is the reason rx_pending max value is B44_RX_RING_SIZE - 1, or else
chip could loop on a circular rx_ring.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 3d247f3..1b58a7c 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -57,7 +57,7 @@ 
 #define B44_MAX_MTU			1500
 
 #define B44_RX_RING_SIZE		512
-#define B44_DEF_RX_RING_PENDING		200
+#define B44_DEF_RX_RING_PENDING		512
 #define B44_RX_RING_BYTES	(sizeof(struct dma_desc) * \
 				 B44_RX_RING_SIZE)
 #define B44_TX_RING_SIZE		512