Message ID | 1481105116.578767.1450304180215.JavaMail.zimbra@xes-inc.com |
---|---|
State | Superseded |
Delegated to: | Jeff Kirsher |
Headers | show |
> From: Aaron Sierra [asierra@xes-inc.com] > Sent: Wednesday, December 16, 2015 2:16 PM > To: Kirsher, Jeffrey T; intel-wired-lan@lists.osuosl.org > Cc: Wyborny, Carolyn; Brandeburg, Jesse; Williams, Mitch A; Brown, Aaron F; Joe Schultz > Subject: [PATCH v3] igb: Add I210 cable fault detection to self test > > From: Joe Schultz <jschultz@xes-inc.com> > > Add an offline diagnostic test for the I210 internal PHY which checks > for cable faults and reports the distance along the cable where the > fault was detected. Fault types detected include open, short, and > cross-pair short. > > Signed-off-by: Joe Schultz <jschultz@xes-inc.com> > Signed-off-by: Aaron Sierra <asierra@xes-inc.com> > --- > v2 - account for changes made by this patch in dev-queue: > drivers/net: get rid of unnecessary initializations in .get_drvinfo() > v3 - fix uninitialized variable compile warning > - remove unneeded igb_cable_fault_test_prep() function > - don't add unused define to e1000_defines.h > - only run cable diagnostic if link test fails > > drivers/net/ethernet/intel/igb/e1000_defines.h | 12 +- > drivers/net/ethernet/intel/igb/igb_ethtool.c | 186 ++++++++++++++++++++++++- > 2 files changed, 192 insertions(+), 6 deletions(-) A question and a bug. Question first since it's short: Q. Was this version supposed to suppress running the cable diagnostics if the system has link? I am still seeing the cable fault output (complete with -1 for the fault distances) when I run ethtool -t against an i210 or i211. When I run diagnostics against a number of parts I get a "Bug: This version of the patch is causing (or exposing) a "BUG: unable to handle kernel paging request at ffffffffffffffff" followed by a Call Trace. It does not seem to occur with i210 or i211, but I consistently get the trace on 82575EB, 82576, i350 and i354. It does not occur every time I run the diagnostics, but pretty frequently. Generally if it shows up it does so within 5 or 6 iterations (though sometimes it takes many more.) The system hangs and becomes unusable, but the trace is captured to /var/log/messages. Here is a sample of the trace, in this case from an the 82575EB, the other traces look pretty similar: ----------------------------------------------------------------------------------------------------------------------------------- Jan 4 13:07:19 u1485 kernel: BUG: unable to handle kernel paging request at ffffffffffffffff Jan 4 13:07:19 u1485 kernel: IP: [<ffffffff81181998>] unlink_anon_vmas+0x78/0x190 Jan 4 13:07:19 u1485 kernel: PGD 1a0b067 PUD 1a0d067 PMD 0 Jan 4 13:07:19 u1485 kernel: Oops: 0000 [#1] SMP Jan 4 13:07:19 u1485 kernel: Modules linked in: igb dca ptp pps_core ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc nfsd lockd grace nfs_acl exportfs auth_rpcgss sunrpc autofs4 ipv6 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun uinput joydev sg e100 mii serio_raw iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass i2c_i801 lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp dm_mod(E) ext4(E) jbd2(E) mbcache(E) sd_mod(E) sr_mod(E) cdrom(E) pata_acpi(E) ata_generic(E) ata_piix(E) radeon(E) ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) Jan 4 13:07:19 u1485 kernel: CPU: 3 PID: 9351 Comm: ethtool Tainted: G E 4.4.0-rc5_next_dev_2d9a87e-01243-g2d9a87e #13 Jan 4 13:07:19 u1485 kernel: Hardware name: Supermicro X7DW3/X7DWN+, BIOS 1.1 04/30/2008 Jan 4 13:07:19 u1485 kernel: task: ffff88005df5e280 ti: ffff88005df60000 task.ti: ffff88005df60000 Jan 4 13:07:19 u1485 kernel: RIP: 0010:[<ffffffff81181998>] [<ffffffff81181998>] unlink_anon_vmas+0x78/0x190 Jan 4 13:07:19 u1485 kernel: RSP: 0018:ffff88005df63c98 EFLAGS: 00010246 Jan 4 13:07:19 u1485 kernel: RAX: 0000000000000000 RBX: ffffffffffffffff RCX: 000000000000018e Jan 4 13:07:19 u1485 kernel: RDX: 000000359f18f000 RSI: ffff88007c3b7240 RDI: ffff88007b631970 Jan 4 13:07:19 u1485 kernel: RBP: ffff88005df63ce8 R08: 000000359f18f000 R09: ffff88007a80d408 Jan 4 13:07:19 u1485 kernel: R10: 0000000000000000 R11: ffff88007aecce18 R12: 0000000000000000 Jan 4 13:07:19 u1485 kernel: R13: ffffffffffffffef R14: ffff88007acfbd00 R15: ffff88007b6319e8 Jan 4 13:07:19 u1485 kernel: FS: 0000000000000000(0000) GS:ffff88007fcc0000(0000) knlGS:0000000000000000 Jan 4 13:07:19 u1485 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 4 13:07:19 u1485 kernel: CR2: ffffffffffffffff CR3: 000000007a978000 CR4: 00000000000006e0 Jan 4 13:07:19 u1485 kernel: Stack: Jan 4 13:07:19 u1485 kernel: ffff88005df63cb8 ffffffff815ba8c6 ffff88007b631970 ffff88007acfbd10 Jan 4 13:07:19 u1485 kernel: ffff88005df63ce8 ffff88007b631468 0000000000000000 ffff88005df63d58 Jan 4 13:07:19 u1485 kernel: 0000000000000000 000000359e600000 ffff88005df63d38 ffffffff811735e4 Jan 4 13:07:19 u1485 kernel: Call Trace: Jan 4 13:07:19 u1485 kernel: [<ffffffff815ba8c6>] ? down_write+0x16/0x50 Jan 4 13:07:19 u1485 kernel: [<ffffffff811735e4>] free_pgtables+0x84/0x100 Jan 4 13:07:19 u1485 kernel: [<ffffffff8117aa81>] exit_mmap+0xd1/0x140 Jan 4 13:07:19 u1485 kernel: [<ffffffff813766d5>] ? do_tty_write+0x125/0x1d0 Jan 4 13:07:19 u1485 kernel: [<ffffffff8105b86a>] mmput+0x6a/0x100 Jan 4 13:07:19 u1485 kernel: [<ffffffff8105ec54>] exit_mm+0x134/0x1c0 Jan 4 13:07:19 u1485 kernel: [<ffffffff815ba916>] ? down_read+0x16/0x30 Jan 4 13:07:19 u1485 kernel: [<ffffffff81060517>] do_exit+0x147/0x460 Jan 4 13:07:19 u1485 kernel: [<ffffffff81050271>] ? __do_page_fault+0x1a1/0x440 Jan 4 13:07:19 u1485 kernel: [<ffffffff81060881>] do_group_exit+0x51/0xb0 Jan 4 13:07:19 u1485 kernel: [<ffffffff810608f7>] SyS_exit_group+0x17/0x20 Jan 4 13:07:19 u1485 kernel: [<ffffffff815bc1d7>] entry_SYSCALL_64_fastpath+0x12/0x6a Jan 4 13:07:19 u1485 kernel: Code: 7f 9a d1 00 4c 89 f6 e8 57 b6 01 00 49 8d 45 10 49 8b 55 10 4c 39 f8 48 89 45 c8 74 46 4d 89 ee 4c 8d 6a f0 4c 89 e0 49 8b 5e 08 <4c> 8b 23 4c 39 e0 74 13 48 85 c0 0f 85 ca 00 00 00 49 8d 7c 24 Jan 4 13:07:19 u1485 kernel: RIP [<ffffffff81181998>] unlink_anon_vmas+0x78/0x190 Jan 4 13:07:19 u1485 kernel: RSP <ffff88005df63c98> Jan 4 13:07:19 u1485 kernel: CR2: ffffffffffffffff Jan 4 13:07:19 u1485 kernel: ---[ end trace 720ddd6818910144 ]--- Jan 4 13:07:19 u1485 kernel: Fixing recursive fault but reboot is needed!
----- Original Message ----- > From: "Aaron F Brown" <aaron.f.brown@intel.com> > Sent: Monday, January 4, 2016 5:56:31 PM > > > From: Aaron Sierra [asierra@xes-inc.com] > > Sent: Wednesday, December 16, 2015 2:16 PM > > To: Kirsher, Jeffrey T; intel-wired-lan@lists.osuosl.org > > Cc: Wyborny, Carolyn; Brandeburg, Jesse; Williams, Mitch A; Brown, Aaron F; > > Joe Schultz > > Subject: [PATCH v3] igb: Add I210 cable fault detection to self test > > > > From: Joe Schultz <jschultz@xes-inc.com> > > > > Add an offline diagnostic test for the I210 internal PHY which checks > > for cable faults and reports the distance along the cable where the > > fault was detected. Fault types detected include open, short, and > > cross-pair short. > > > > Signed-off-by: Joe Schultz <jschultz@xes-inc.com> > > Signed-off-by: Aaron Sierra <asierra@xes-inc.com> > > --- > > v2 - account for changes made by this patch in dev-queue: > > drivers/net: get rid of unnecessary initializations in .get_drvinfo() > > v3 - fix uninitialized variable compile warning > > - remove unneeded igb_cable_fault_test_prep() function > > - don't add unused define to e1000_defines.h > > - only run cable diagnostic if link test fails > > > > drivers/net/ethernet/intel/igb/e1000_defines.h | 12 +- > > drivers/net/ethernet/intel/igb/igb_ethtool.c | 186 > > ++++++++++++++++++++++++- > > 2 files changed, 192 insertions(+), 6 deletions(-) > > A question and a bug. Question first since it's short: > > Q. Was this version supposed to suppress running the cable diagnostics if the > system has link? I am still seeing the cable fault output (complete with -1 > for the fault distances) when I run ethtool -t against an i210 or i211. Aaron, This latest version should not *run* the cable diagnostics when link is present, but it will *report* diagnostic results (i.e. fault status of "false" and a length that suggests there's no fault). In the case of a fault, a length of 0 means there's a fault within the first meter, so it didn't seem appropriate to reuse that in the "no fault" case. > When I run diagnostics against a number of parts I get a "Bug: This version > of the patch is causing (or exposing) a "BUG: unable to handle kernel paging > request at ffffffffffffffff" followed by a Call Trace. It does not seem to > occur with i210 or i211, but I consistently get the trace on 82575EB, 82576, > i350 and i354. Interesting. I thought the only devices we had were I210, but I was able to find an 82576. I'll do some testing with it. -Aaron S.
diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h index 240902e..707e1ba 100644 --- a/drivers/net/ethernet/intel/igb/e1000_defines.h +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h @@ -932,6 +932,7 @@ #define I347AT4_PCDL1 0x11 /* Pair 1 PHY Cable Diagnostics Length */ #define I347AT4_PCDL2 0x12 /* Pair 2 PHY Cable Diagnostics Length */ #define I347AT4_PCDL3 0x13 /* Pair 3 PHY Cable Diagnostics Length */ +#define I347AT4_PCDR 0x14 /* PHY Cable Diagnostics Results */ #define I347AT4_PCDC 0x15 /* PHY Cable Diagnostics Control */ #define I347AT4_PAGE_SELECT 0x16 @@ -952,7 +953,16 @@ #define I347AT4_PSCR_DOWNSHIFT_8X 0x7000 /* i347-AT4 PHY Cable Diagnostics Control */ -#define I347AT4_PCDC_CABLE_LENGTH_UNIT 0x0400 /* 0=cm 1=meters */ +#define I347AT4_PCDC_CABLE_LENGTH_UNIT 0x0400 /* 0=cm 1=meters */ +#define I347AT4_PCDC_CABLE_DIAG_STATUS 0x0800 +#define I347AT4_PCDC_DISABLE_CROSS_PAIR 0x2000 +#define I347AT4_PCDC_RUN_TEST 0x8000 + +/* i347-AT4 PHY Cable Diagnostics Results */ +#define I347AT4_PCDR_CABLE_OK 0x0001 /* No faults detected on pair */ +#define I347AT4_PCDR_CABLE_OPEN 0x0002 /* Open pair detected */ +#define I347AT4_PCDR_CABLE_SHORT 0x0003 /* Shorted pair detected */ +#define I347AT4_PCDR_CABLE_CROSS_SHORT 0x0004 /* Cross-pair short detected */ /* Marvell 1112 only registers */ #define M88E1112_VCT_DSP_DISTANCE 0x001A diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 1d329f1..11c8071 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -132,7 +132,28 @@ enum igb_diagnostics_results { TEST_EEP, TEST_IRQ, TEST_LOOP, - TEST_LINK + TEST_LINK, + /* I210 superset */ + TEST_FAULT_A, + TEST_FAULT_B, + TEST_FAULT_C, + TEST_FAULT_D, + TEST_LENGTH_A, + TEST_LENGTH_B, + TEST_LENGTH_C, + TEST_LENGTH_D, + TEST_OPEN_A, + TEST_OPEN_B, + TEST_OPEN_C, + TEST_OPEN_D, + TEST_SHORT_A, + TEST_SHORT_B, + TEST_SHORT_C, + TEST_SHORT_D, + TEST_CROSS_A, + TEST_CROSS_B, + TEST_CROSS_C, + TEST_CROSS_D }; static const char igb_gstrings_test[][ETH_GSTRING_LEN] = { @@ -142,7 +163,50 @@ static const char igb_gstrings_test[][ETH_GSTRING_LEN] = { [TEST_LOOP] = "Loopback test (offline)", [TEST_LINK] = "Link test (on/offline)" }; + +static const char igb_i210_gstrings_test[][ETH_GSTRING_LEN] = { + [TEST_REG] = "Register test (offline)", + [TEST_EEP] = "Eeprom test (offline)", + [TEST_IRQ] = "Interrupt test (offline)", + [TEST_LOOP] = "Loopback test (offline)", + [TEST_LINK] = "Link test (on/offline)", + [TEST_FAULT_A] = "Pair A cable fault (offline)", + [TEST_FAULT_B] = "Pair B cable fault (offline)", + [TEST_FAULT_C] = "Pair C cable fault (offline)", + [TEST_FAULT_D] = "Pair D cable fault (offline)", + [TEST_LENGTH_A] = "Pair A fault distance ", + [TEST_LENGTH_B] = "Pair B fault distance ", + [TEST_LENGTH_C] = "Pair C fault distance ", + [TEST_LENGTH_D] = "Pair D fault distance ", + [TEST_OPEN_A] = "Pair A fault open ", + [TEST_OPEN_B] = "Pair B fault open ", + [TEST_OPEN_C] = "Pair C fault open ", + [TEST_OPEN_D] = "Pair D fault open ", + [TEST_SHORT_A] = "Pair A fault intra-pair short ", + [TEST_SHORT_B] = "Pair B fault intra-pair short ", + [TEST_SHORT_C] = "Pair C fault intra-pair short ", + [TEST_SHORT_D] = "Pair D fault intra-pair short ", + [TEST_CROSS_A] = "Pair A fault inter-pair short ", + [TEST_CROSS_B] = "Pair B fault inter-pair short ", + [TEST_CROSS_C] = "Pair C fault inter-pair short ", + [TEST_CROSS_D] = "Pair D fault inter-pair short " +}; + #define IGB_TEST_LEN (sizeof(igb_gstrings_test) / ETH_GSTRING_LEN) +#define IGB_I210_TEST_LEN (sizeof(igb_i210_gstrings_test) / ETH_GSTRING_LEN) + +static inline bool igb_has_i210_cable_fault_test(struct igb_adapter *adapter) +{ + struct e1000_hw *hw = &adapter->hw; + + if (hw->phy.media_type != e1000_media_type_copper) + return false; + + if (hw->phy.id == I210_I_PHY_ID) + return true; + + return false; +} static int igb_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd) { @@ -1983,6 +2047,97 @@ static int igb_link_test(struct igb_adapter *adapter, u64 *data) return *data; } +static int igb_cable_fault_test(struct igb_adapter *adapter, + struct ethtool_test *eth_test, u64 *data) { + struct e1000_hw *hw = &adapter->hw; + u16 old_pcdc, pcdc; + u16 pcdr; + u16 error_code = 0; + u32 timeout = 0; + s32 ret_val; + int i; + + ret_val = igb_write_phy_reg(hw, I347AT4_PAGE_SELECT, 0x7); + if (ret_val) + goto no_set_page; + + /* Save PCDC register and initiate immediate diagnostic */ + ret_val = igb_read_phy_reg(hw, I347AT4_PCDC, &pcdc); + if (ret_val) + goto no_pcdc; + + old_pcdc = pcdc; + pcdc &= ~I347AT4_PCDC_DISABLE_CROSS_PAIR; + pcdc |= I347AT4_PCDC_CABLE_LENGTH_UNIT | I347AT4_PCDC_RUN_TEST; + + ret_val = igb_write_phy_reg(hw, I347AT4_PCDC, pcdc); + if (ret_val) + goto done; + + /* Wait up to 1.5s for the results to be ready */ + do { + ret_val = igb_read_phy_reg(hw, I347AT4_PCDC, &pcdc); + if (ret_val || timeout == 1500) + break; + udelay(1000); + timeout++; + } while (pcdc & I347AT4_PCDC_CABLE_DIAG_STATUS); + + if (timeout >= 1500) + dev_warn(&adapter->pdev->dev, + "Cable fault test timed out. Results may be invalid"); + + ret_val = igb_read_phy_reg(hw, I347AT4_PCDR, &pcdr); + if (ret_val) + goto done; + + hw->phy.ops.get_cable_length(hw); + + /* Iterate over each cable pair */ + for (i = 0; i < 4; i++) { + data[TEST_LENGTH_A + i] = hw->phy.pair_length[i]; + + error_code = (pcdr >> (i * 4)) & 0xf; + switch (error_code) { + case I347AT4_PCDR_CABLE_OK: + data[TEST_FAULT_A + i] = 0; + data[TEST_LENGTH_A + i] = -1; + /* don't assign ret_val */ + break; + case I347AT4_PCDR_CABLE_OPEN: + data[TEST_FAULT_A + i] = 1; + data[TEST_OPEN_A + i] = 1; + ret_val = -1; + break; + case I347AT4_PCDR_CABLE_SHORT: + data[TEST_FAULT_A + i] = 1; + data[TEST_SHORT_A + i] = 1; + ret_val = -1; + break; + case I347AT4_PCDR_CABLE_CROSS_SHORT: + data[TEST_FAULT_A + i] = 1; + data[TEST_CROSS_A + i] = 1; + ret_val = -1; + break; + default: + data[TEST_FAULT_A + i] = -1; + data[TEST_LENGTH_A + i] = -1; + data[TEST_OPEN_A + i] = -1; + data[TEST_SHORT_A + i] = -1; + data[TEST_CROSS_A + i] = -1; + ret_val = -1; + } + } + +done: + /* Restore PCDC */ + igb_write_phy_reg(hw, I347AT4_PCDC, old_pcdc); +no_pcdc: + igb_write_phy_reg(hw, I347AT4_PAGE_SELECT, 0); +no_set_page: + return ret_val; +} + static void igb_diag_test(struct net_device *netdev, struct ethtool_test *eth_test, u64 *data) { @@ -1993,6 +2148,11 @@ static void igb_diag_test(struct net_device *netdev, set_bit(__IGB_TESTING, &adapter->state); + if (igb_has_i210_cable_fault_test(adapter)) { + memset(&data[TEST_FAULT_A], 0x0, + sizeof(u64) * (IGB_I210_TEST_LEN - IGB_TEST_LEN)); + } + /* can't do offline tests on media switching devices */ if (adapter->hw.dev_spec._82575.mas_capable) eth_test->flags &= ~ETH_TEST_FL_OFFLINE; @@ -2012,9 +2172,20 @@ static void igb_diag_test(struct net_device *netdev, /* Link test performed before hardware reset so autoneg doesn't * interfere with test result */ - if (igb_link_test(adapter, &data[TEST_LINK])) + if (igb_link_test(adapter, &data[TEST_LINK])) { eth_test->flags |= ETH_TEST_FL_FAILED; + /* Test for cable faults before the PHY gets shut off */ + if (igb_has_i210_cable_fault_test(adapter) && + igb_cable_fault_test(adapter, eth_test, data)) + eth_test->flags |= ETH_TEST_FL_FAILED; + } else { + data[TEST_LENGTH_A] = -1; + data[TEST_LENGTH_B] = -1; + data[TEST_LENGTH_C] = -1; + data[TEST_LENGTH_D] = -1; + } + if (if_running) /* indicate we're in test mode */ dev_close(netdev); @@ -2270,7 +2441,8 @@ static int igb_get_sset_count(struct net_device *netdev, int sset) case ETH_SS_STATS: return IGB_STATS_LEN; case ETH_SS_TEST: - return IGB_TEST_LEN; + return igb_has_i210_cable_fault_test(netdev_priv(netdev)) ? + IGB_I210_TEST_LEN : IGB_TEST_LEN; default: return -ENOTSUPP; } @@ -2340,8 +2512,12 @@ static void igb_get_strings(struct net_device *netdev, u32 stringset, u8 *data) switch (stringset) { case ETH_SS_TEST: - memcpy(data, *igb_gstrings_test, - IGB_TEST_LEN*ETH_GSTRING_LEN); + if (igb_has_i210_cable_fault_test(adapter)) + memcpy(data, *igb_i210_gstrings_test, + IGB_I210_TEST_LEN*ETH_GSTRING_LEN); + else + memcpy(data, *igb_gstrings_test, + IGB_TEST_LEN*ETH_GSTRING_LEN); break; case ETH_SS_STATS: for (i = 0; i < IGB_GLOBAL_STATS_LEN; i++) {