diff mbox

[net] atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring

Message ID 1374857234-1442-1-git-send-email-nhorman@tuxdriver.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Neil Horman July 26, 2013, 4:47 p.m. UTC
atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
guarantees about the suitability of the memory for use in DMA.  As a result
we've gotten reports of atl1c drivers occasionally hanging and needing to be
reset:
https://bugzilla.kernel.org/show_bug.cgi?id=54021

Fix this by modifying the call to use the internal version __netdev_alloc_skb,
where you can set the gfp_mask explicitly to include GFP_DMA.

Tested by two reporters in the above bug, who have the hardware to validate it.
Both report immediate cessation of the problem with this patch

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Jay Cliburn <jcliburn@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: stable@vger.kernel.org
Tested-by: Luis Henrix <luis.henrix@gmail.com>
Tested-by: Vincent Alquier <vincent.alquier@gmail.com>
---
 drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Luis Henriques July 26, 2013, 4:56 p.m. UTC | #1
Neil Horman <nhorman@tuxdriver.com> writes:

> atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
> guarantees about the suitability of the memory for use in DMA.  As a result
> we've gotten reports of atl1c drivers occasionally hanging and needing to be
> reset:
> https://bugzilla.kernel.org/show_bug.cgi?id=54021
>
> Fix this by modifying the call to use the internal version __netdev_alloc_skb,
> where you can set the gfp_mask explicitly to include GFP_DMA.
>
> Tested by two reporters in the above bug, who have the hardware to validate it.
> Both report immediate cessation of the problem with this patch
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Jay Cliburn <jcliburn@gmail.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: stable@vger.kernel.org
> Tested-by: Luis Henrix <luis.henrix@gmail.com>

Thanks Neil!

Would it be possible for you to update my name and email?  (I've just
updated my bz account to remove my gmail account address -- I thought
I had done this ages ago.)

Tested-by: Luis Henriques <luis.henriques@canonical.com>

Cheers,
Neil Horman July 26, 2013, 5:02 p.m. UTC | #2
On Fri, Jul 26, 2013 at 05:56:16PM +0100, Luis Henriques wrote:
> Neil Horman <nhorman@tuxdriver.com> writes:
> 
> > atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
> > guarantees about the suitability of the memory for use in DMA.  As a result
> > we've gotten reports of atl1c drivers occasionally hanging and needing to be
> > reset:
> > https://bugzilla.kernel.org/show_bug.cgi?id=54021
> >
> > Fix this by modifying the call to use the internal version __netdev_alloc_skb,
> > where you can set the gfp_mask explicitly to include GFP_DMA.
> >
> > Tested by two reporters in the above bug, who have the hardware to validate it.
> > Both report immediate cessation of the problem with this patch
> >
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: Jay Cliburn <jcliburn@gmail.com>
> > CC: "David S. Miller" <davem@davemloft.net>
> > CC: stable@vger.kernel.org
> > Tested-by: Luis Henrix <luis.henrix@gmail.com>
> 
> Thanks Neil!
> 
> Would it be possible for you to update my name and email?  (I've just
> updated my bz account to remove my gmail account address -- I thought
> I had done this ages ago.)
> 
> Tested-by: Luis Henriques <luis.henriques@canonical.com>
> 
I'm sure if Dave has time, he will square that up.
Neil

> Cheers,
> -- 
> Luis
> 
> 
> > Tested-by: Vincent Alquier <vincent.alquier@gmail.com> ---
> > drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 2 +- 1 file
> > changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > index 786a874..d5e38d1 100644
> > --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> > @@ -1660,7 +1660,7 @@ static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter)
> >  	while (next_info->flags & ATL1C_BUFFER_FREE) {
> >  		rfd_desc = ATL1C_RFD_DESC(rfd_ring, rfd_next_to_use);
> >  
> > -		skb = netdev_alloc_skb(adapter->netdev, adapter->rx_buffer_len);
> > +		skb = __netdev_alloc_skb(adapter->netdev, adapter->rx_buffer_len, GFP_ATOMIC|GFP_DMA);
> >  		if (unlikely(!skb)) {
> >  			if (netif_msg_rx_err(adapter))
> >  				dev_warn(&pdev->dev, "alloc rx buffer failed\n");
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 26, 2013, 10:56 p.m. UTC | #3
From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri, 26 Jul 2013 13:02:51 -0400

> On Fri, Jul 26, 2013 at 05:56:16PM +0100, Luis Henriques wrote:
>> Neil Horman <nhorman@tuxdriver.com> writes:
>> 
>> > atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
>> > guarantees about the suitability of the memory for use in DMA.  As a result
>> > we've gotten reports of atl1c drivers occasionally hanging and needing to be
>> > reset:
>> > https://bugzilla.kernel.org/show_bug.cgi?id=54021
>> >
>> > Fix this by modifying the call to use the internal version __netdev_alloc_skb,
>> > where you can set the gfp_mask explicitly to include GFP_DMA.
>> >
>> > Tested by two reporters in the above bug, who have the hardware to validate it.
>> > Both report immediate cessation of the problem with this patch
>> >
>> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>> > CC: Jay Cliburn <jcliburn@gmail.com>
>> > CC: "David S. Miller" <davem@davemloft.net>
>> > CC: stable@vger.kernel.org
>> > Tested-by: Luis Henrix <luis.henrix@gmail.com>
>> 
>> Thanks Neil!
>> 
>> Would it be possible for you to update my name and email?  (I've just
>> updated my bz account to remove my gmail account address -- I thought
>> I had done this ages ago.)
>> 
>> Tested-by: Luis Henriques <luis.henriques@canonical.com>
>> 
> I'm sure if Dave has time, he will square that up.

I took care of this while applying Neil's patch, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings July 27, 2013, 12:02 a.m. UTC | #4
On Fri, 2013-07-26 at 12:47 -0400, Neil Horman wrote:
> atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
> guarantees about the suitability of the memory for use in DMA.  As a result
> we've gotten reports of atl1c drivers occasionally hanging and needing to be
> reset:
> https://bugzilla.kernel.org/show_bug.cgi?id=54021
> 
> Fix this by modifying the call to use the internal version __netdev_alloc_skb,
> where you can set the gfp_mask explicitly to include GFP_DMA.

This is a really bad idea.  GFP_DMA means allocation from the ISA DMA
region (< 16 MB).  pci_map_single() takes care of allocating a bounce
buffer if necessary.

Ben.

> Tested by two reporters in the above bug, who have the hardware to validate it.
> Both report immediate cessation of the problem with this patch
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: Jay Cliburn <jcliburn@gmail.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: stable@vger.kernel.org
> Tested-by: Luis Henrix <luis.henrix@gmail.com>
> Tested-by: Vincent Alquier <vincent.alquier@gmail.com>
> ---
>  drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 786a874..d5e38d1 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -1660,7 +1660,7 @@ static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter)
>  	while (next_info->flags & ATL1C_BUFFER_FREE) {
>  		rfd_desc = ATL1C_RFD_DESC(rfd_ring, rfd_next_to_use);
>  
> -		skb = netdev_alloc_skb(adapter->netdev, adapter->rx_buffer_len);
> +		skb = __netdev_alloc_skb(adapter->netdev, adapter->rx_buffer_len, GFP_ATOMIC|GFP_DMA);
>  		if (unlikely(!skb)) {
>  			if (netif_msg_rx_err(adapter))
>  				dev_warn(&pdev->dev, "alloc rx buffer failed\n");
Ben Hutchings July 27, 2013, 12:24 a.m. UTC | #5
On Sat, 2013-07-27 at 01:02 +0100, Ben Hutchings wrote:
> On Fri, 2013-07-26 at 12:47 -0400, Neil Horman wrote:
> > atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
> > guarantees about the suitability of the memory for use in DMA.  As a result
> > we've gotten reports of atl1c drivers occasionally hanging and needing to be
> > reset:
> > https://bugzilla.kernel.org/show_bug.cgi?id=54021
> > 
> > Fix this by modifying the call to use the internal version __netdev_alloc_skb,
> > where you can set the gfp_mask explicitly to include GFP_DMA.
> 
> This is a really bad idea.  GFP_DMA means allocation from the ISA DMA
> region (< 16 MB).  pci_map_single() takes care of allocating a bounce
> buffer if necessary.
> 
> Ben.
> 
> > Tested by two reporters in the above bug, who have the hardware to validate it.
> > Both report immediate cessation of the problem with this patch
[...]

So perhaps the chip somehow fails to support a full 32-bit address
(which is the current DMA mask), though given that there are 64 address
bits in RX descriptors this seems unlikely.  And the most likely result
of that would be memory corruption, not a stall.

Alternately, perhaps more likely, there's something wrong with the
driver's error handling.  If atl1_alloc_rx_buffer() fails then the RX
queue could run dry.  Depending on how the hardware is designed, that
could result in a complete RX stall (no RX buffers available => no RX
completions => no attempt to allocate more RX buffers).

Maybe your change makes it less likely for atl1_alloc_rx_buffer() to
fail.  On a modern PC the (ISA) DMA zone is basically unused whereas
bounce buffers might be more contended.  Did you try adding some logging
for failure of pci_map_single()?

Ben.
Luis Henriques July 27, 2013, 4:25 p.m. UTC | #6
David Miller <davem@davemloft.net> writes:

> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Fri, 26 Jul 2013 13:02:51 -0400
>
>> On Fri, Jul 26, 2013 at 05:56:16PM +0100, Luis Henriques wrote:
>>> Neil Horman <nhorman@tuxdriver.com> writes:
>>> 
>>> > atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
>>> > guarantees about the suitability of the memory for use in DMA.  As a result
>>> > we've gotten reports of atl1c drivers occasionally hanging and needing to be
>>> > reset:
>>> > https://bugzilla.kernel.org/show_bug.cgi?id=54021
>>> >
>>> > Fix this by modifying the call to use the internal version __netdev_alloc_skb,
>>> > where you can set the gfp_mask explicitly to include GFP_DMA.
>>> >
>>> > Tested by two reporters in the above bug, who have the hardware to validate it.
>>> > Both report immediate cessation of the problem with this patch
>>> >
>>> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>>> > CC: Jay Cliburn <jcliburn@gmail.com>
>>> > CC: "David S. Miller" <davem@davemloft.net>
>>> > CC: stable@vger.kernel.org
>>> > Tested-by: Luis Henrix <luis.henrix@gmail.com>
>>> 
>>> Thanks Neil!
>>> 
>>> Would it be possible for you to update my name and email?  (I've just
>>> updated my bz account to remove my gmail account address -- I thought
>>> I had done this ages ago.)
>>> 
>>> Tested-by: Luis Henriques <luis.henriques@canonical.com>
>>> 
>> I'm sure if Dave has time, he will square that up.
>
> I took care of this while applying Neil's patch, thanks.

Thanks!

Cheers,
Luis Henriques July 27, 2013, 7:30 p.m. UTC | #7
Ben Hutchings <bhutchings@solarflare.com> writes:

> On Sat, 2013-07-27 at 01:02 +0100, Ben Hutchings wrote:
>> On Fri, 2013-07-26 at 12:47 -0400, Neil Horman wrote:
>> > atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
>> > guarantees about the suitability of the memory for use in DMA.  As a result
>> > we've gotten reports of atl1c drivers occasionally hanging and needing to be
>> > reset:
>> > https://bugzilla.kernel.org/show_bug.cgi?id=54021
>> > 
>> > Fix this by modifying the call to use the internal version __netdev_alloc_skb,
>> > where you can set the gfp_mask explicitly to include GFP_DMA.
>> 
>> This is a really bad idea.  GFP_DMA means allocation from the ISA DMA
>> region (< 16 MB).  pci_map_single() takes care of allocating a bounce
>> buffer if necessary.
>> 
>> Ben.
>> 
>> > Tested by two reporters in the above bug, who have the hardware to validate it.
>> > Both report immediate cessation of the problem with this patch
> [...]
>
> So perhaps the chip somehow fails to support a full 32-bit address
> (which is the current DMA mask), though given that there are 64 address
> bits in RX descriptors this seems unlikely.  And the most likely result
> of that would be memory corruption, not a stall.
>
> Alternately, perhaps more likely, there's something wrong with the
> driver's error handling.  If atl1_alloc_rx_buffer() fails then the RX
> queue could run dry.  Depending on how the hardware is designed, that
> could result in a complete RX stall (no RX buffers available => no RX
> completions => no attempt to allocate more RX buffers).
>
> Maybe your change makes it less likely for atl1_alloc_rx_buffer() to
> fail.  On a modern PC the (ISA) DMA zone is basically unused whereas
> bounce buffers might be more contended.  Did you try adding some logging
> for failure of pci_map_single()?
>
> Ben.

Just to add a little bit more context (and hopefully not noise), I
started seeing this issue on 3.7.  Bisection resulted on the following
first bad commit:

69b08f6 net: use bigger pages in __netdev_alloc_frag

Reverting this commit (and e5e6730 "skbuff: Move definition of
NETDEV_FRAG_PAGE_MAX_SIZE") solved the problem.

Note also that I'm seeing this issue on a 32 bits system (64 bits
isn't supported).  This initially made me think the problem could be
related with this as 69b08f6 log explicitly refers to 32/64 bit
archs.  But I failed to find any obvious issue with the patch.

Cheers,
Eric Dumazet July 27, 2013, 7:49 p.m. UTC | #8
On Sat, 2013-07-27 at 20:30 +0100, Luis Henriques wrote:
> Ben Hutchings <bhutchings@solarflare.com> writes:
> 
> > On Sat, 2013-07-27 at 01:02 +0100, Ben Hutchings wrote:
> >> On Fri, 2013-07-26 at 12:47 -0400, Neil Horman wrote:
> >> > atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
> >> > guarantees about the suitability of the memory for use in DMA.  As a result
> >> > we've gotten reports of atl1c drivers occasionally hanging and needing to be
> >> > reset:
> >> > https://bugzilla.kernel.org/show_bug.cgi?id=54021
> >> > 
> >> > Fix this by modifying the call to use the internal version __netdev_alloc_skb,
> >> > where you can set the gfp_mask explicitly to include GFP_DMA.
> >> 
> >> This is a really bad idea.  GFP_DMA means allocation from the ISA DMA
> >> region (< 16 MB).  pci_map_single() takes care of allocating a bounce
> >> buffer if necessary.
> >> 
> >> Ben.
> >> 
> >> > Tested by two reporters in the above bug, who have the hardware to validate it.
> >> > Both report immediate cessation of the problem with this patch
> > [...]
> >
> > So perhaps the chip somehow fails to support a full 32-bit address
> > (which is the current DMA mask), though given that there are 64 address
> > bits in RX descriptors this seems unlikely.  And the most likely result
> > of that would be memory corruption, not a stall.
> >
> > Alternately, perhaps more likely, there's something wrong with the
> > driver's error handling.  If atl1_alloc_rx_buffer() fails then the RX
> > queue could run dry.  Depending on how the hardware is designed, that
> > could result in a complete RX stall (no RX buffers available => no RX
> > completions => no attempt to allocate more RX buffers).
> >
> > Maybe your change makes it less likely for atl1_alloc_rx_buffer() to
> > fail.  On a modern PC the (ISA) DMA zone is basically unused whereas
> > bounce buffers might be more contended.  Did you try adding some logging
> > for failure of pci_map_single()?
> >
> > Ben.
> 
> Just to add a little bit more context (and hopefully not noise), I
> started seeing this issue on 3.7.  Bisection resulted on the following
> first bad commit:
> 
> 69b08f6 net: use bigger pages in __netdev_alloc_frag
> 
> Reverting this commit (and e5e6730 "skbuff: Move definition of
> NETDEV_FRAG_PAGE_MAX_SIZE") solved the problem.
> 
> Note also that I'm seeing this issue on a 32 bits system (64 bits
> isn't supported).  This initially made me think the problem could be
> related with this as 69b08f6 log explicitly refers to 32/64 bit
> archs.  But I failed to find any obvious issue with the patch.
> 
> Cheers,

Unfortunately, nothing makes sense here. It looks like a possible
hardware defect on some memory ranges, as only atl1c is impacted.

Restricting allocations to DMA zone would probably avoid this bug
completely, but it adds obvious problems as DMA zone is so small.

Say you have some tcp flows with application not reading received queue
fast enough, DMA zone will be depleted and network will just hang.

This driver never used GFP_DMA, so adding them now seems quite strange.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings July 27, 2013, 9:30 p.m. UTC | #9
On Sat, 2013-07-27 at 20:30 +0100, Luis Henriques wrote:
> Ben Hutchings <bhutchings@solarflare.com> writes:
> 
> > On Sat, 2013-07-27 at 01:02 +0100, Ben Hutchings wrote:
> >> On Fri, 2013-07-26 at 12:47 -0400, Neil Horman wrote:
> >> > atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
> >> > guarantees about the suitability of the memory for use in DMA.  As a result
> >> > we've gotten reports of atl1c drivers occasionally hanging and needing to be
> >> > reset:
> >> > https://bugzilla.kernel.org/show_bug.cgi?id=54021
> >> > 
> >> > Fix this by modifying the call to use the internal version __netdev_alloc_skb,
> >> > where you can set the gfp_mask explicitly to include GFP_DMA.
> >> 
> >> This is a really bad idea.  GFP_DMA means allocation from the ISA DMA
> >> region (< 16 MB).  pci_map_single() takes care of allocating a bounce
> >> buffer if necessary.
[...]
> Just to add a little bit more context (and hopefully not noise), I
> started seeing this issue on 3.7.  Bisection resulted on the following
> first bad commit:
> 
> 69b08f6 net: use bigger pages in __netdev_alloc_frag
> 
> Reverting this commit (and e5e6730 "skbuff: Move definition of
> NETDEV_FRAG_PAGE_MAX_SIZE") solved the problem.
> 
> Note also that I'm seeing this issue on a 32 bits system (64 bits
> isn't supported).  This initially made me think the problem could be
> related with this as 69b08f6 log explicitly refers to 32/64 bit
> archs.  But I failed to find any obvious issue with the patch.

Then it seems like this patch works because passing the GFP_DMA flag to
__netdev_alloc_skb() disables the use of __netdev_alloc_frag() and
results in it calling __alloc_skb().  A better workaround would be for
atl1c to call __alloc_skb() directly.

Perhaps the controller doesn't split RX DMA across PCIe page boundaries
(4K), or some other boundaries at smaller intervals than
NETDEV_FRAG_PAGE_MAX_SIZE.

But I think that perhaps the use of __netdev_alloc_frag() should be made
opt-in.  I doubt this is the only driver whose DMA requirements have
been broken.  Since the Linux DMA API lacks any way for devices to
specify boundaries which would then be observed by pci_map_single(), I
don't think this can be considered simply a driver bug.

Ben.
Eric Dumazet July 27, 2013, 11:59 p.m. UTC | #10
On Sat, 2013-07-27 at 22:30 +0100, Ben Hutchings wrote:
> On Sat, 2013-07-27 at 20:30 +0100, Luis Henriques wrote:
> > Ben Hutchings <bhutchings@solarflare.com> writes:
> > 
> > > On Sat, 2013-07-27 at 01:02 +0100, Ben Hutchings wrote:
> > >> On Fri, 2013-07-26 at 12:47 -0400, Neil Horman wrote:
> > >> > atl1c uses netdev_alloc_skb to refill its rx dma ring, but that call makes no
> > >> > guarantees about the suitability of the memory for use in DMA.  As a result
> > >> > we've gotten reports of atl1c drivers occasionally hanging and needing to be
> > >> > reset:
> > >> > https://bugzilla.kernel.org/show_bug.cgi?id=54021
> > >> > 
> > >> > Fix this by modifying the call to use the internal version __netdev_alloc_skb,
> > >> > where you can set the gfp_mask explicitly to include GFP_DMA.
> > >> 
> > >> This is a really bad idea.  GFP_DMA means allocation from the ISA DMA
> > >> region (< 16 MB).  pci_map_single() takes care of allocating a bounce
> > >> buffer if necessary.
> [...]
> > Just to add a little bit more context (and hopefully not noise), I
> > started seeing this issue on 3.7.  Bisection resulted on the following
> > first bad commit:
> > 
> > 69b08f6 net: use bigger pages in __netdev_alloc_frag
> > 
> > Reverting this commit (and e5e6730 "skbuff: Move definition of
> > NETDEV_FRAG_PAGE_MAX_SIZE") solved the problem.
> > 
> > Note also that I'm seeing this issue on a 32 bits system (64 bits
> > isn't supported).  This initially made me think the problem could be
> > related with this as 69b08f6 log explicitly refers to 32/64 bit
> > archs.  But I failed to find any obvious issue with the patch.
> 
> Then it seems like this patch works because passing the GFP_DMA flag to
> __netdev_alloc_skb() disables the use of __netdev_alloc_frag() and
> results in it calling __alloc_skb().  A better workaround would be for
> atl1c to call __alloc_skb() directly.
> 
> Perhaps the controller doesn't split RX DMA across PCIe page boundaries
> (4K), or some other boundaries at smaller intervals than
> NETDEV_FRAG_PAGE_MAX_SIZE.
> 
> But I think that perhaps the use of __netdev_alloc_frag() should be made
> opt-in.  I doubt this is the only driver whose DMA requirements have
> been broken.  Since the Linux DMA API lacks any way for devices to
> specify boundaries which would then be observed by pci_map_single(), I
> don't think this can be considered simply a driver bug.

It is a driver bug to assume anything about alloc_skb(), as there is no
specification about skb->head being aligned to whatever boundary.

The only guarantee is the one provided by kmalloc(), that is 8 bytes.

I specifically asked this exact question in 

https://bugzilla.kernel.org/show_bug.cgi?id=54021#c19

And the Qualcom guys checked and said it was ok. Who should we trust ?

I hope you understand kmalloc(~2000 bytes) never made the assumption the
area fit a single page. SLOB or even SLUB/SLAB with some debugging
features...

If a hardware needs frame being in a single 4K page, its driver must do
its own allocation, or add appropriate aligning logic.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 28, 2013, 3:02 a.m. UTC | #11
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 27 Jul 2013 16:59:43 -0700

> If a hardware needs frame being in a single 4K page, its driver must do
> its own allocation, or add appropriate aligning logic.

Whatever the cause the original patch in this thread is not the
correct fix and I've thus reverted it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman July 28, 2013, 10:44 a.m. UTC | #12
On Sat, Jul 27, 2013 at 08:02:05PM -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sat, 27 Jul 2013 16:59:43 -0700
> 
> > If a hardware needs frame being in a single 4K page, its driver must do
> > its own allocation, or add appropriate aligning logic.
> 
> Whatever the cause the original patch in this thread is not the
> correct fix and I've thus reverted it.
> 

So what exactly is the consensus here?  We can certainly modify the patch to
ensure allocation on a 4k boundary, and within a unique 4k page, but thats just
a guess at the problem, and the Qualcomm people have been silent on the issue
for weeks now.

Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 786a874..d5e38d1 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -1660,7 +1660,7 @@  static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter)
 	while (next_info->flags & ATL1C_BUFFER_FREE) {
 		rfd_desc = ATL1C_RFD_DESC(rfd_ring, rfd_next_to_use);
 
-		skb = netdev_alloc_skb(adapter->netdev, adapter->rx_buffer_len);
+		skb = __netdev_alloc_skb(adapter->netdev, adapter->rx_buffer_len, GFP_ATOMIC|GFP_DMA);
 		if (unlikely(!skb)) {
 			if (netif_msg_rx_err(adapter))
 				dev_warn(&pdev->dev, "alloc rx buffer failed\n");