Patchwork Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync)

login
register
mail settings
Submitter Stephen Hemminger
Date Jan. 27, 2010, 5:56 p.m.
Message ID <20100127095614.14313677@nehalam>
Download mbox | patch
Permalink /patch/43828/
State RFC
Delegated to: David Miller
Headers show

Comments

Stephen Hemminger - Jan. 27, 2010, 5:56 p.m.
On Wed, 27 Jan 2010 11:57:35 -0500
Michael Breuer <mbreuer@majjas.com> wrote:

> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
> > On Wed, 27 Jan 2010 10:34:51 -0500
> > Michael Breuer<mbreuer@majjas.com>  wrote:
> >
> >    
> >> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
> >>      
> >>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
> >>>
> >>>        
> >>>> When the packets were dropped, there was a different sequence in the
> >>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
> >>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
> >>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
> >>>>
> >>>>          
> >>> Anyway, I'd be intersted if the switch matters here.
> >>>
> >>> Plus one more test: could you try to load sky2 with the parameter:
> >>> "copybreak=1" (the rest as in any recent test, which gave you dmar
> >>> errors; any switch).
> >>>
> >>> Thanks,
> >>> Jarek P.
> >>>
> >>>        
> >> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
> >> to confirm that I haven't inadvertently fixed something. However, given
> >> that it might be copybreak-related, I looked at sky2.c again and I'm
> >> wondering about the copybreak max size in sky2_rx_start:
> >>
> >>     size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
> >>
> >>           /* Stopping point for hardware truncation */
> >>           thresh = (size - 8) / sizeof(u32);
> >>
> >>           sky2->rx_nfrags = size>>  PAGE_SHIFT;
> >>           BUG_ON(sky2->rx_nfrags>  ARRAY_SIZE(re->frag_addr));
> >>
> >>           /* Compute residue after pages */
> >>           size -= sky2->rx_nfrags<<  PAGE_SHIFT;
> >>
> >>           /* Optimize to handle small packets and headers */
> >>           if (size<  copybreak)
> >>                   size = copybreak;
> >>           if (size<  ETH_HLEN)
> >>                   size = ETH_HLEN;
> >>
> >>
> >> Why would increasing size to copybreak be valid here?
> >>
> >> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
> >> correctly, if size is ever less than copybreak it's because there isn't
> >> enough space left for anything larger. If so, wouldn't increasing size
> >> potentially corrupt something? I'd further guess that the resulting
> >> condition manifests sooner (or at least with a more visible effect) when
> >> using DMAR.
> >>
> >> In any event, why "copybreak" as the minimum buffer size? I'd suggest
> >> that if it isn't possible to allocate at least MTU + overhead that
> >> sky2_rx_start ought to be delayed until there is room.
> >>      
> > This code is where driver decides how much data will be received in skb
> > data area and the remaining data spills over into skb frags.
> > Copybreak is the threshold so that packets less than size are copied
> > to a new skb.  The code doing the copying there assumes the data is
> > totally contained in the skb (not in frags). The size increase there
> > is to make sure that assumption is always true.  I suppose you
> > could do something perverse like setting copybreak really huge
> > and confuse driver, but that is a user error.
> >
> >    
> Ok - but I'm wondering under what circumstances size would be < 
> copybreak in the first place after computing the residue. If size ends 
> up being unreasonably small, is simply increasing the number to whatever 
> copybreak is correct? Assuming my testing is correct, then the crash 
> I've been experiencing when using dmar (only) seems related to the value 
> of copybreak. I don't think the other use (skb reuse) is the issue (but 
> hey, I could have missed something). The crash occurs when copybreak is 
> the default of 128, didn't happen when I set copybreak to 1.

Does this change it? If so the dma code is (not sky2) is buggy and not
rounding up properly.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Breuer - Jan. 27, 2010, 5:58 p.m.
On 1/27/2010 12:56 PM, Stephen Hemminger wrote:
> On Wed, 27 Jan 2010 11:57:35 -0500
> Michael Breuer<mbreuer@majjas.com>  wrote:
>
>    
>> On 1/27/2010 11:50 AM, Stephen Hemminger wrote:
>>      
>>> On Wed, 27 Jan 2010 10:34:51 -0500
>>> Michael Breuer<mbreuer@majjas.com>   wrote:
>>>
>>>
>>>        
>>>> On 01/23/2010 06:21 PM, Jarek Poplawski wrote:
>>>>
>>>>          
>>>>> On Fri, Jan 22, 2010 at 06:50:21PM -0500, Michael Breuer wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> When the packets were dropped, there was a different sequence in the
>>>>>> log - DISCOVER/OFFER repeated. The "normal" is that the sequence
>>>>>> appeared correct and complete - DISCOVER/OFFER/REQUEST/ACK - or
>>>>>> INFORM/ACK (vs. INFORM repeatedly sans ACK) as the case may be.
>>>>>>
>>>>>>
>>>>>>              
>>>>> Anyway, I'd be intersted if the switch matters here.
>>>>>
>>>>> Plus one more test: could you try to load sky2 with the parameter:
>>>>> "copybreak=1" (the rest as in any recent test, which gave you dmar
>>>>> errors; any switch).
>>>>>
>>>>> Thanks,
>>>>> Jarek P.
>>>>>
>>>>>
>>>>>            
>>>> Ok - now up 80+ hours with copybreak=1. I'm going to redo w/o copybreak
>>>> to confirm that I haven't inadvertently fixed something. However, given
>>>> that it might be copybreak-related, I looked at sky2.c again and I'm
>>>> wondering about the copybreak max size in sky2_rx_start:
>>>>
>>>>      size = roundup(sky2->netdev->mtu + ETH_HLEN + VLAN_HLEN, 8);
>>>>
>>>>            /* Stopping point for hardware truncation */
>>>>            thresh = (size - 8) / sizeof(u32);
>>>>
>>>>            sky2->rx_nfrags = size>>   PAGE_SHIFT;
>>>>            BUG_ON(sky2->rx_nfrags>   ARRAY_SIZE(re->frag_addr));
>>>>
>>>>            /* Compute residue after pages */
>>>>            size -= sky2->rx_nfrags<<   PAGE_SHIFT;
>>>>
>>>>            /* Optimize to handle small packets and headers */
>>>>            if (size<   copybreak)
>>>>                    size = copybreak;
>>>>            if (size<   ETH_HLEN)
>>>>                    size = ETH_HLEN;
>>>>
>>>>
>>>> Why would increasing size to copybreak be valid here?
>>>>
>>>> Guessing a bit as I'm not sure about rx_nfrags, but if I read this
>>>> correctly, if size is ever less than copybreak it's because there isn't
>>>> enough space left for anything larger. If so, wouldn't increasing size
>>>> potentially corrupt something? I'd further guess that the resulting
>>>> condition manifests sooner (or at least with a more visible effect) when
>>>> using DMAR.
>>>>
>>>> In any event, why "copybreak" as the minimum buffer size? I'd suggest
>>>> that if it isn't possible to allocate at least MTU + overhead that
>>>> sky2_rx_start ought to be delayed until there is room.
>>>>
>>>>          
>>> This code is where driver decides how much data will be received in skb
>>> data area and the remaining data spills over into skb frags.
>>> Copybreak is the threshold so that packets less than size are copied
>>> to a new skb.  The code doing the copying there assumes the data is
>>> totally contained in the skb (not in frags). The size increase there
>>> is to make sure that assumption is always true.  I suppose you
>>> could do something perverse like setting copybreak really huge
>>> and confuse driver, but that is a user error.
>>>
>>>
>>>        
>> Ok - but I'm wondering under what circumstances size would be<
>> copybreak in the first place after computing the residue. If size ends
>> up being unreasonably small, is simply increasing the number to whatever
>> copybreak is correct? Assuming my testing is correct, then the crash
>> I've been experiencing when using dmar (only) seems related to the value
>> of copybreak. I don't think the other use (skb reuse) is the issue (but
>> hey, I could have missed something). The crash occurs when copybreak is
>> the default of 128, didn't happen when I set copybreak to 1.
>>      
> Does this change it? If so the dma code is (not sky2) is buggy and not
> rounding up properly.
>
> --- a/drivers/net/sky2.c	2010-01-27 09:46:10.940005248 -0800
> +++ b/drivers/net/sky2.c	2010-01-27 09:53:47.141267850 -0800
> @@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
>
>   	skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
>   	if (likely(skb)) {
> +		unsigned dma_align = dma_get_cache_alignment();
> +		unsigned dma_size = ALIGN(length+1, dma_align);
> +
>   		pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> -					    length, PCI_DMA_FROMDEVICE);
> +					    dma_size, PCI_DMA_FROMDEVICE);
>   		skb_copy_from_linear_data(re->skb, skb->data, length);
>   		skb->ip_summed = re->skb->ip_summed;
>   		skb->csum = re->skb->csum;
>   		pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> -					       length, PCI_DMA_FROMDEVICE);
> +					       dma_size, PCI_DMA_FROMDEVICE);
>   		re->skb->ip_summed = CHECKSUM_NONE;
>   		skb_put(skb, length);
>   	}
>    
Ok - will queue this - want to reconfirm that the system still crashes 
w/o this (or copybreak). That should take a few days.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Breuer - Jan. 27, 2010, 6:08 p.m.
On 01/27/2010 12:56 PM, Stephen Hemminger wrote:
> --- a/drivers/net/sky2.c	2010-01-27 09:46:10.940005248 -0800
> +++ b/drivers/net/sky2.c	2010-01-27 09:53:47.141267850 -0800
> @@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
>
>   	skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
>   	if (likely(skb)) {
> +		unsigned dma_align = dma_get_cache_alignment();
> +		unsigned dma_size = ALIGN(length+1, dma_align);
> +
>   		pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> -					    length, PCI_DMA_FROMDEVICE);
> +					    dma_size, PCI_DMA_FROMDEVICE);
>   		skb_copy_from_linear_data(re->skb, skb->data, length);
>   		skb->ip_summed = re->skb->ip_summed;
>   		skb->csum = re->skb->csum;
>   		pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> -					       length, PCI_DMA_FROMDEVICE);
> +					       dma_size, PCI_DMA_FROMDEVICE);
>   		re->skb->ip_summed = CHECKSUM_NONE;
>   		skb_put(skb, length);
>   	}
>    
This doesn't apply - I'm missing some intermediate patch.

I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re, data_size) 
vs., "length." I assume that I can just replace the pci_unmap_len with 
dma_size... but perhaps the intermediate change may have affected this 
as well?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Breuer - Jan. 27, 2010, 6:45 p.m.
On 1/27/2010 1:08 PM, Michael Breuer wrote:
> On 01/27/2010 12:56 PM, Stephen Hemminger wrote:
>> --- a/drivers/net/sky2.c    2010-01-27 09:46:10.940005248 -0800
>> +++ b/drivers/net/sky2.c    2010-01-27 09:53:47.141267850 -0800
>> @@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
>>
>>       skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
>>       if (likely(skb)) {
>> +        unsigned dma_align = dma_get_cache_alignment();
>> +        unsigned dma_size = ALIGN(length+1, dma_align);
>> +
>>           pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
>> -                        length, PCI_DMA_FROMDEVICE);
>> +                        dma_size, PCI_DMA_FROMDEVICE);
>>           skb_copy_from_linear_data(re->skb, skb->data, length);
>>           skb->ip_summed = re->skb->ip_summed;
>>           skb->csum = re->skb->csum;
>>           pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
>> -                           length, PCI_DMA_FROMDEVICE);
>> +                           dma_size, PCI_DMA_FROMDEVICE);
>>           re->skb->ip_summed = CHECKSUM_NONE;
>>           skb_put(skb, length);
>>       }
> This doesn't apply - I'm missing some intermediate patch.
>
> I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re, 
> data_size) vs., "length." I assume that I can just replace the 
> pci_unmap_len with dma_size... but perhaps the intermediate change may 
> have affected this as well?
>
Never mind - that was from one of the earlier patches I had been trying 
out. will try the above patch after reestablishing that the system still 
crashes without copybreak=1.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Jan. 27, 2010, 7:23 p.m.
On Wed, Jan 27, 2010 at 01:45:27PM -0500, Michael Breuer wrote:
> On 1/27/2010 1:08 PM, Michael Breuer wrote:
> >On 01/27/2010 12:56 PM, Stephen Hemminger wrote:
> >>--- a/drivers/net/sky2.c    2010-01-27 09:46:10.940005248 -0800
> >>+++ b/drivers/net/sky2.c    2010-01-27 09:53:47.141267850 -0800
> >>@@ -2257,13 +2257,16 @@ static struct sk_buff *receive_copy(stru
> >>
> >>      skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
> >>      if (likely(skb)) {
> >>+        unsigned dma_align = dma_get_cache_alignment();
> >>+        unsigned dma_size = ALIGN(length+1, dma_align);
> >>+
> >>          pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
> >>-                        length, PCI_DMA_FROMDEVICE);
> >>+                        dma_size, PCI_DMA_FROMDEVICE);
> >>          skb_copy_from_linear_data(re->skb, skb->data, length);
> >>          skb->ip_summed = re->skb->ip_summed;
> >>          skb->csum = re->skb->csum;
> >>          pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
> >>-                           length, PCI_DMA_FROMDEVICE);
> >>+                           dma_size, PCI_DMA_FROMDEVICE);
> >>          re->skb->ip_summed = CHECKSUM_NONE;
> >>          skb_put(skb, length);
> >>      }
> >This doesn't apply - I'm missing some intermediate patch.
> >
> >I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re,
> >data_size) vs., "length." I assume that I can just replace the
> >pci_unmap_len with dma_size... but perhaps the intermediate change
> >may have affected this as well?
> >
> Never mind - that was from one of the earlier patches I had been
> trying out. will try the above patch after reestablishing that the
> system still crashes without copybreak=1.
> 

Stephen, I'm not sure this patch can show much after the patch with
"legal" dma_size == re->data_addr didn't help. It looks like David
was right: dma_sync can't affect dmar, because it doesn't use it at
all.

Then I'd rather suggest to test if using copybreak more often, e.g.
with copybreak=1000 or even more can trigger these errors faster.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Jan. 27, 2010, 7:32 p.m.
On Wed, Jan 27, 2010 at 08:23:12PM +0100, Jarek Poplawski wrote:
> Stephen, I'm not sure this patch can show much after the patch with
> "legal" dma_size == re->data_addr didn't help. It looks like David

Of course: "dma_size == re->data_size".

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Breuer - Jan. 28, 2010, 3:32 p.m.
On 01/27/2010 01:45 PM, Michael Breuer wrote:
> On 1/27/2010 1:08 PM, Michael Breuer wrote:
>>
>>
>> I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re, 
>> data_size) vs., "length." I assume that I can just replace the 
>> pci_unmap_len with dma_size... but perhaps the intermediate change 
>> may have affected this as well?
>>
> Never mind - that was from one of the earlier patches I had been 
> trying out. will try the above patch after reestablishing that the 
> system still crashes without copybreak=1.
Just FYI - still crashes with default copybreak.  Didn't get the netdev 
watchdog this time - just DMAR and then HW watchdog reboot (see below).

So what's known to be required to cause this crash:

1) sky2 @ 1Gb
2) High sustained RX load (> 40MBps)
3) Uptime (I can't cause this to happen just after boot).
4) DMAR enabled (doesn't crash w/o DMAR).
5) copybreak != 1

What might be required but is unproven:
1) cifs traffic (I've only seen this when the high traffic was due to a 
Win7 box doing backup). I've tried but have been unable to recreate by 
just copying large files. Backups done from a Mac OS laptop don't 
trigger the issue even though that machine is also connecting with CIFS 
(TimeMachine works better that way).
2) DHCP traffic. There has always been some sort of DHCP exchange in the 
log before the first indication of a problem (DMAR).
3) Total throughput since boot. DK about this - however the uptime 
component before the latest crash was the shortest yet. In preparation I 
moved a bunch of large files around on the Windows box to ensure a 
larger than normal backup run. I also ran manually before going to bed 
(then moved the files around again). Didn't crash when I was watching - 
but did overnight. Total uptime before this crash was only about 6 
hours. Previously (with less backup data) the system didn't crash until 
24-36 hours.

Observations:

Copybreak: I did play for an hour or so yesterday with copybreak=1000. 
Ran traffic, etc. No crash, but throughput was lower and the system was 
clearly working way harder than normal. Given the whine of the fans I'm 
not keen on leaving the system in that state for any extended period of 
time.

MTU: Increasing the MTU to 9000 yesterday after the system had been up 
for some time (copybreak=1) crashed the system immediately. Subsequently 
I have been able to change the mtu without crashes (although the driver 
does end up in some sort of state that requires a restart after lowering 
the mtu). I suspect that over time something is being corrupted 
resulting in the crash when changing mtu. Whatever it becoming corrupted 
is probably related to the other crash as well. That suggests to me that 
copybreak=1 is preventing or delaying the manifestation of the 
underlying issue but is unrelated to the source of corruption.

[no messages in the prior three minutes - there was a dhcp exchange 
(request/ack) at 06:02:27]
Jan 28 06:05:58 mail kernel: DRHD: handling fault status reg 2
Jan 28 06:05:58 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
fault addr ffdd06bfe000
Jan 28 06:05:58 mail kernel: DMAR:[fault reason 06] PTE Read access is 
not set
Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: error interrupt 
status=0x80000000
Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
[No further messages until restart at 06:09:46.]

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Breuer - Jan. 28, 2010, 4:43 p.m.
On 01/28/2010 10:32 AM, Michael Breuer wrote:
> On 01/27/2010 01:45 PM, Michael Breuer wrote:
>> On 1/27/2010 1:08 PM, Michael Breuer wrote:
>>>
>>>
>>> I've got (both in 2.6.32.4 and 2.6.33-rc5: pci_unmap_len(re, 
>>> data_size) vs., "length." I assume that I can just replace the 
>>> pci_unmap_len with dma_size... but perhaps the intermediate change 
>>> may have affected this as well?
>>>
>> Never mind - that was from one of the earlier patches I had been 
>> trying out. will try the above patch after reestablishing that the 
>> system still crashes without copybreak=1.
> Just FYI - still crashes with default copybreak.  Didn't get the 
> netdev watchdog this time - just DMAR and then HW watchdog reboot (see 
> below).
>
> So what's known to be required to cause this crash:
>
> 1) sky2 @ 1Gb
> 2) High sustained RX load (> 40MBps)
> 3) Uptime (I can't cause this to happen just after boot).
> 4) DMAR enabled (doesn't crash w/o DMAR).
> 5) copybreak != 1
>
> What might be required but is unproven:
> 1) cifs traffic (I've only seen this when the high traffic was due to 
> a Win7 box doing backup). I've tried but have been unable to recreate 
> by just copying large files. Backups done from a Mac OS laptop don't 
> trigger the issue even though that machine is also connecting with 
> CIFS (TimeMachine works better that way).
> 2) DHCP traffic. There has always been some sort of DHCP exchange in 
> the log before the first indication of a problem (DMAR).
> 3) Total throughput since boot. DK about this - however the uptime 
> component before the latest crash was the shortest yet. In preparation 
> I moved a bunch of large files around on the Windows box to ensure a 
> larger than normal backup run. I also ran manually before going to bed 
> (then moved the files around again). Didn't crash when I was watching 
> - but did overnight. Total uptime before this crash was only about 6 
> hours. Previously (with less backup data) the system didn't crash 
> until 24-36 hours.
>
> Observations:
>
> Copybreak: I did play for an hour or so yesterday with copybreak=1000. 
> Ran traffic, etc. No crash, but throughput was lower and the system 
> was clearly working way harder than normal. Given the whine of the 
> fans I'm not keen on leaving the system in that state for any extended 
> period of time.
>
> MTU: Increasing the MTU to 9000 yesterday after the system had been up 
> for some time (copybreak=1) crashed the system immediately. 
> Subsequently I have been able to change the mtu without crashes 
> (although the driver does end up in some sort of state that requires a 
> restart after lowering the mtu). I suspect that over time something is 
> being corrupted resulting in the crash when changing mtu. Whatever it 
> becoming corrupted is probably related to the other crash as well. 
> That suggests to me that copybreak=1 is preventing or delaying the 
> manifestation of the underlying issue but is unrelated to the source 
> of corruption.
>
> [no messages in the prior three minutes - there was a dhcp exchange 
> (request/ack) at 06:02:27]
> Jan 28 06:05:58 mail kernel: DRHD: handling fault status reg 2
> Jan 28 06:05:58 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
> fault addr ffdd06bfe000
> Jan 28 06:05:58 mail kernel: DMAR:[fault reason 06] PTE Read access is 
> not set
> Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: error interrupt 
> status=0x80000000
> Jan 28 06:05:58 mail kernel: sky2 0000:06:00.0: PCI hardware error 
> (0x2010)
> [No further messages until restart at 06:09:46.]
>
Update: I played with dma-debug. Was being disabled due to lack of 
memory. I forced it back on while pumping traffic through and got this:
Jan 28 11:39:30 mail kernel: ------------[ cut here ]------------
Jan 28 11:39:30 mail kernel: WARNING: at lib/dma-debug.c:902 
check_sync+0xc1/0x43f()
Jan 28 11:39:30 mail kernel: Hardware name: System Product Name
Jan 28 11:39:30 mail kernel: sky2 0000:06:00.0: DMA-API: device driver 
tries to sync DMA memory it has not allocated [device 
address=0x0000ffff4fe37022] [size=1520 bytes]
Jan 28 11:39:30 mail kernel: Modules linked in: microcode(+) 
ip6table_filter ip6table_mangle ip6_tables iptable_raw iptable_mangle 
ipt_MASQUERADE iptable_nat nf_nat bridge stp appletalk psnap llc nfsd 
lockd nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc 
acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns 
nf_conntrack_ftp xt_DSCP xt_dscp xt_MARK nf_conntrack_ipv6 xt_multiport 
ipv6 dm_multipath kvm_intel kvm snd_hda_codec_analog snd_ens1371 
gameport snd_rawmidi gspca_spca505 snd_hda_intel snd_ac97_codec 
gspca_main snd_hda_codec videodev snd_hwdep snd_seq v4l1_compat i2c_i801 
pcspkr ac97_bus v4l2_compat_ioctl32 snd_seq_device asus_atk0110 hwmon 
snd_pcm firewire_ohci firewire_core crc_itu_t sky2 snd_timer snd 
iTCO_wdt iTCO_vendor_support wmi soundcore snd_page_alloc fbcon tileblit 
font bitblit softcursor raid456 async_raid6_recov async_pq raid6_pq 
async_xor xor async_memcpy async_tx raid1 ata_generic pata_acpi 
pata_marvell nouveau ttm drm_kms_helper drm agpgart fb i2c_algo_bit 
cfbcopyarea i2c_core cfb
Jan 28 11:39:30 mail kernel: imgblt cfbfillrect [last unloaded: ip6_tables]
Jan 28 11:39:30 mail kernel: Pid: 5327, comm: bash Tainted: G        W  
2.6.32.4MMAPDMARAF3SKY2PSKBMAYPULL-00912-g914160d-dirty #6
Jan 28 11:39:30 mail kernel: Call Trace:
Jan 28 11:39:30 mail kernel: <IRQ>  [<ffffffff810536ee>] 
warn_slowpath_common+0x7c/0x94
Jan 28 11:39:30 mail kernel: [<ffffffff8105375d>] 
warn_slowpath_fmt+0x41/0x43
Jan 28 11:39:30 mail kernel: [<ffffffff8127b891>] check_sync+0xc1/0x43f
Jan 28 11:39:30 mail kernel: [<ffffffff8146c51a>] ? 
_spin_unlock_irqrestore+0x29/0x41
Jan 28 11:39:30 mail kernel: [<ffffffff813cac10>] ? 
__netdev_alloc_skb+0x34/0x50
Jan 28 11:39:30 mail kernel: [<ffffffff8127bf62>] 
debug_dma_sync_single_for_cpu+0x42/0x44
Jan 28 11:39:30 mail kernel: [<ffffffff813cac10>] ? 
__netdev_alloc_skb+0x34/0x50
Jan 28 11:39:30 mail kernel: [<ffffffffa019aee8>] sky2_poll+0x4d5/0xb06 
[sky2]
Jan 28 11:39:30 mail kernel: [<ffffffff81044840>] ? 
enqueue_entity+0x26c/0x279
Jan 28 11:39:30 mail kernel: [<ffffffff8107decf>] ? 
clockevents_program_event+0x7a/0x83
Jan 28 11:39:30 mail kernel: [<ffffffff813d18ae>] net_rx_action+0xb5/0x1f3
Jan 28 11:39:30 mail kernel: [<ffffffff8105af0f>] __do_softirq+0xf8/0x1cd
Jan 28 11:39:30 mail kernel: [<ffffffff810a3006>] ? 
handle_IRQ_event+0x119/0x12b
Jan 28 11:39:30 mail kernel: [<ffffffff81012e1c>] call_softirq+0x1c/0x30
Jan 28 11:39:30 mail kernel: [<ffffffff810143a3>] do_softirq+0x4b/0xa6
Jan 28 11:39:30 mail kernel: [<ffffffff8105aaef>] irq_exit+0x4a/0x8c
Jan 28 11:39:30 mail kernel: [<ffffffff81470575>] do_IRQ+0xa5/0xbc
Jan 28 11:39:30 mail kernel: [<ffffffff81012613>] ret_from_intr+0x0/0x16
Jan 28 11:39:30 mail kernel: <EOI>
Jan 28 11:39:30 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 28 11:39:30 mail kernel: DMA-API: debugging out of memory - disabling



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- a/drivers/net/sky2.c	2010-01-27 09:46:10.940005248 -0800
+++ b/drivers/net/sky2.c	2010-01-27 09:53:47.141267850 -0800
@@ -2257,13 +2257,16 @@  static struct sk_buff *receive_copy(stru
 
 	skb = netdev_alloc_skb_ip_align(sky2->netdev, length);
 	if (likely(skb)) {
+		unsigned dma_align = dma_get_cache_alignment();
+		unsigned dma_size = ALIGN(length+1, dma_align);
+
 		pci_dma_sync_single_for_cpu(sky2->hw->pdev, re->data_addr,
-					    length, PCI_DMA_FROMDEVICE);
+					    dma_size, PCI_DMA_FROMDEVICE);
 		skb_copy_from_linear_data(re->skb, skb->data, length);
 		skb->ip_summed = re->skb->ip_summed;
 		skb->csum = re->skb->csum;
 		pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
-					       length, PCI_DMA_FROMDEVICE);
+					       dma_size, PCI_DMA_FROMDEVICE);
 		re->skb->ip_summed = CHECKSUM_NONE;
 		skb_put(skb, length);
 	}