Message ID | E1Kt0Cr-0008PB-Ux@approx.mit.edu |
---|---|
State | Not Applicable, archived |
Delegated to: | Jeff Garzik |
Headers | show |
Sanjoy Mahajan wrote: >> There is also lots of opportunity for BIOS bugs to be effecting >> things so please make sure that you have the latest bios. > > I was about to burn the CD to update the bios to 2.23 when the failure > recurred. So, with the caveat that the bios is still 2.20, I've > attached logs from ethregs and ethtool before and after > ethtool -r eth0 > (which fixed the dhcp). > > Here is the e1000e driver version: > > $ grep e1000e /var/log/dmesg > [ 23.988317] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k2 > [ 23.988390] e1000e: Copyright (c) 1999-2008 Intel Corporation. > [ 23.988505] e1000e 0000:02:00.0: Disabling L1 ASPM hm, does your kernel have CONFIG_PM defined? if it happens again please include lspci -vvv before and after ethtool -r (see below) > Here are diffs of the attached before and after logs: > > --- ethtool-before.log 2008-10-23 09:14:41.000000000 -0400 > +++ ethtool-after.log 2008-10-23 09:17:54.000000000 -0400 > @@ -33,8 +33,8 @@ > Pass MAC control frames: don't pass > Receive buffer size: 2048 > 0x02808: RDLEN (Receive desc length) 0x00001000 > -0x02810: RDH (Receive desc head) 0x000000BB > -0x02818: RDT (Receive desc tail) 0x000000B9 > +0x02810: RDH (Receive desc head) 0x00000051 > +0x02818: RDT (Receive desc tail) 0x0000004F this indicates the device was actually receiving packets okay (RDH) and the driver was returning buffers to hardware (RDT) > 0x02820: RDTR (Receive delay timer) 0x00000000 > 0x00400: TCTL (Transmit ctrl register) 0x3103F0FA > Transmitter: enabled > @@ -42,7 +42,7 @@ > Software XOFF Transmission: disabled > Re-transmit on late collision: enabled > 0x03808: TDLEN (Transmit desc length) 0x00001000 > -0x03810: TDH (Transmit desc head) 0x00000018 > -0x03818: TDT (Transmit desc tail) 0x00000018 > +0x03810: TDH (Transmit desc head) 0x00000075 > +0x03818: TDT (Transmit desc tail) 0x00000075 device was also claiming successfully transmitting, so I don't know why the DHCP packets don't work, can you tcpdump on the network or the dhcp server by chance? I'm looking to see if the server receives the transmits and then replies. > RAL[0] 52411600 > RAH[0] 8000de50 > - RAL[1] 00003333 > + RAL[1] 005e0001 > RAH[1] 8000fb00 > - RAL[2] 52ff3333 > - RAH[2] 8000de50 > - RAL[3] 00003333 > - RAH[3] 80000100 > - RAL[4] 005e0001 > + RAL[2] 00003333 > + RAH[2] 8000fb00 > + RAL[3] 52ff3333 > + RAH[3] 8000de50 > + RAL[4] 00003333 > RAH[4] 80000100 > - RAL[5] 00000000 > - RAH[5] 00000000 > + RAL[5] 005e0001 > + RAH[5] 80000100 after resume, one multicast address is added and one is missing from the list of addresses the adapter will listen on. I reordered but here are the diffs before: RAL[5] 00000000 RAH[5] 00000000 after RAL[5] 005e0001 RAH[5] 8000fb00 I don't know which protocol added 01005e00fb as a multicast address only after suspend. can you ifconfig eth0 promisc before doing suspend? I'd be curious if that fixed it. > RAL[6] 00000000 > RAH[6] 00000000 > RAL[7] 00000000 > @@ -390,7 +390,7 @@ > GSCL_2 00000000 > GSCL_3 00000000 > GSCL_4 00000000 > - FACTPS a1041046 > + FACTPS 21041046 FACTPS bits are reserved in our manuals (but have to do with PCIe power state changes), but I can't help but wonder if there isn't something with ASPM L0s or L1 on your system (where we had trouble with that feature on your laptop) when coming out of resume, therefore the lspci would show us the difference if there was one. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sanjoy Mahajan wrote: >> hm, does your kernel have CONFIG_PM defined? > > It does have that defined: ok >> if it happens again please include lspci -vvv before and after >> ethtool -r (see below) > > I will. Now I'm running BIOS 2.23, so I'm curious whether that > 'upgrade' fixes the problem. > > I say 'upgrade' because now S3 sleep and wakeup often take 60 seconds. > I've also noticed ACPI errors in the 'dmesg'. Once I have something > reproducible I'll file a bugzilla report. ick, it would be nice if the system vendors actually tested their acpi implementations on multiple OSes. >> device was also claiming successfully transmitting, so I don't know >> why the DHCP packets don't work, can you tcpdump on the network or >> the dhcp server by chance? > > I'll do that too on the next failure. Is 'tcpdump host 18.38.0.1' > sufficient or do I need a few -v switches? I'm mostly looking for the conversation back and forth, so that should be fine. Keep in mind that the first dhcp packet is usually a broadcast (not to a particular IP) >> can you ifconfig eth0 promisc before doing suspend? I'd be curious >> if that fixed it. > > If/when it reproduces, I'll add that line to the pre-suspend code. (I > use 's2ram', which I think sleeps with 'echo mem > /sys/power/state' > and does a vt switch on wakeup). okay, thanks > Generally: For making debugging go smoothly, is it worth running a > vanilla kernel rather than the Debian one? I could try 2.6.26.7 or > 2.6.27.3. Is running 2.6.27.y not as useful as running 2.6.26.y, in > case the bug is merely hidden but not solved in the new kernel? On > the other hand, I'm tempted to try 2.6.27.y in case it fixes the slow > suspend/resume. I think you should definitely try 2.6.27.y, the e1000e versions in the kernel are different than what is in ubuntu at least, so not sure if that applies to debian.-- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- ethtool-before.log 2008-10-23 09:14:41.000000000 -0400 +++ ethtool-after.log 2008-10-23 09:17:54.000000000 -0400 @@ -33,8 +33,8 @@ Pass MAC control frames: don't pass Receive buffer size: 2048 0x02808: RDLEN (Receive desc length) 0x00001000 -0x02810: RDH (Receive desc head) 0x000000BB -0x02818: RDT (Receive desc tail) 0x000000B9 +0x02810: RDH (Receive desc head) 0x00000051 +0x02818: RDT (Receive desc tail) 0x0000004F 0x02820: RDTR (Receive delay timer) 0x00000000 0x00400: TCTL (Transmit ctrl register) 0x3103F0FA Transmitter: enabled @@ -42,7 +42,7 @@ Software XOFF Transmission: disabled Re-transmit on late collision: enabled 0x03808: TDLEN (Transmit desc length) 0x00001000 -0x03810: TDH (Transmit desc head) 0x00000018 -0x03818: TDT (Transmit desc tail) 0x00000018 +0x03810: TDH (Transmit desc head) 0x00000075 +0x03818: TDT (Transmit desc tail) 0x00000075 0x03820: TIDV (Transmit delay timer) 0x00000008 PHY type: IGP2 --- ethregs-before.log 2008-10-23 09:13:50.000000000 -0400 +++ ethregs-after.log 2008-10-23 09:17:38.000000000 -0400 @@ -40,11 +40,11 @@ FCRTL 800047f8 FCRTH 00004800 PSRCTL 00040402 - RDBAL 37c84000 + RDBAL 37dde000 RDBAH 00000000 RDLEN 00001000 - RDH 000000bf - RDT 000000bd + RDH 000000e3 + RDT 000000e1 RDTR 00000000 RXDCTL 00010000 RADV 00000008 @@ -56,16 +56,16 @@ RSRPD 00000000 RAID 00000000 CPUVEC 00000000 - TDFH 00000d14 - TDFT 00000d14 - TDFHS 00000d14 - TDFTS 00000d14 + TDFH 00000f96 + TDFT 00000f96 + TDFHS 00000f96 + TDFTS 00000f96 TDFPC 00000000 - TDBAL 2f30a000 + TDBAL 377ed000 TDBAH 00000000 TDLEN 00001000 - TDH 00000018 - TDT 00000018 + TDH 00000075 + TDT 00000075 TIDV 00000008 TXDCTL 01410000 TADV 00000020 @@ -217,16 +217,16 @@ MTA[127] 00000000 RAL[0] 52411600 RAH[0] 8000de50 - RAL[1] 00003333 + RAL[1] 005e0001 RAH[1] 8000fb00 - RAL[2] 52ff3333 - RAH[2] 8000de50 - RAL[3] 00003333 - RAH[3] 80000100 - RAL[4] 005e0001 + RAL[2] 00003333 + RAH[2] 8000fb00 + RAL[3] 52ff3333 + RAH[3] 8000de50 + RAL[4] 00003333 RAH[4] 80000100 - RAL[5] 00000000 - RAH[5] 00000000 + RAL[5] 005e0001 + RAH[5] 80000100 RAL[6] 00000000 RAH[6] 00000000 RAL[7] 00000000 @@ -390,7 +390,7 @@ GSCL_2 00000000 GSCL_3 00000000 GSCL_4 00000000 - FACTPS a1041046 + FACTPS 21041046 FWSM 00000000 RETA[0] 00000017 RETA[1] 000000e9