igb: reset the PHY before reading the PHY ID

Message ID 1589605144.308560.1480435436440.JavaMail.zimbra@xes-inc.com
State Accepted
Delegated to: Jeff Kirsher
Headers show

Commit Message

Aaron Sierra Nov. 29, 2016, 4:03 p.m.
Several people have reported firmware leaving the I210/I211 PHY's page
select register set to something other than the default of zero. This
causes the first accesses, PHY_IDx register reads, to access something
else, resulting in device probe failure:

    igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
    igb: Copyright (c) 2007-2014 Intel Corporation.
    igb: probe of 0000:01:00.0 failed with error -2

This problem began for them after a previous patch I submitted was
applied:

    commit 2a3cdead8b408351fa1e3079b220fa331480ffbc
    Author: Aaron Sierra <asierra@xes-inc.com>
    Date:   Tue Nov 3 12:37:09 2015 -0600

        igb: Remove GS40G specific defines/functions

I personally experienced this problem after attempting to PXE boot from
I210 devices using this firmware:

    Intel(R) Boot Agent GE v1.5.78
    Copyright (C) 1997-2014, Intel Corporation

Resetting the PHY before reading from it, ensures the page select
register is in its default state and doesn't make assumptions about
the PHY's register set before the PHY has been probed.

Cc: Matwey V. Kornilov <matwey@sai.msu.ru>
Cc: Chris Arges <carges@vectranetworks.com>
Cc: Jochen Henneberg <jh@henneberg-systemdesign.com>
Signed-off-by: Aaron Sierra <asierra@xes-inc.com>
---
 drivers/net/ethernet/intel/igb/e1000_82575.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Matwey V. Kornilov Nov. 29, 2016, 5:56 p.m. | #1
2016-11-29 19:03 GMT+03:00 Aaron Sierra <asierra@xes-inc.com>:
> Several people have reported firmware leaving the I210/I211 PHY's page
> select register set to something other than the default of zero. This
> causes the first accesses, PHY_IDx register reads, to access something
> else, resulting in device probe failure:
>
>     igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
>     igb: Copyright (c) 2007-2014 Intel Corporation.
>     igb: probe of 0000:01:00.0 failed with error -2
>
> This problem began for them after a previous patch I submitted was
> applied:
>
>     commit 2a3cdead8b408351fa1e3079b220fa331480ffbc
>     Author: Aaron Sierra <asierra@xes-inc.com>
>     Date:   Tue Nov 3 12:37:09 2015 -0600
>
>         igb: Remove GS40G specific defines/functions
>
> I personally experienced this problem after attempting to PXE boot from
> I210 devices using this firmware:
>
>     Intel(R) Boot Agent GE v1.5.78
>     Copyright (C) 1997-2014, Intel Corporation

Indeed! I use Boot Agent GE v1.5.53. When I disable PCIe Boot ROM the
kernel works without this patch.
The patch fixes kernel even with PXE boot enabled.

>
> Resetting the PHY before reading from it, ensures the page select
> register is in its default state and doesn't make assumptions about
> the PHY's register set before the PHY has been probed.
>
> Cc: Matwey V. Kornilov <matwey@sai.msu.ru>

Tested-By: Matwey V. Kornilov <matwey@sai.msu.ru>

> Cc: Chris Arges <carges@vectranetworks.com>
> Cc: Jochen Henneberg <jh@henneberg-systemdesign.com>
> Signed-off-by: Aaron Sierra <asierra@xes-inc.com>
> ---
>  drivers/net/ethernet/intel/igb/e1000_82575.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c
> index a61447f..ee44398 100644
> --- a/drivers/net/ethernet/intel/igb/e1000_82575.c
> +++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
> @@ -245,6 +245,17 @@ static s32 igb_init_phy_params_82575(struct e1000_hw *hw)
>         hw->bus.func = (rd32(E1000_STATUS) & E1000_STATUS_FUNC_MASK) >>
>                         E1000_STATUS_FUNC_SHIFT;
>
> +       /* Make sure the PHY is in a good state. Several people have reported
> +        * firmware leaving the PHY's page select register set to something
> +        * other than the default of zero, which causes the PHY ID read to
> +        * access something other than the intended register.
> +        */
> +       ret_val = hw->phy.ops.reset(hw);
> +       if (ret_val) {
> +               hw_dbg("Error resetting the PHY.\n");
> +               goto out;
> +       }
> +
>         /* Set phy->phy_addr and phy->id. */
>         ret_val = igb_get_phy_id_82575(hw);
>         if (ret_val)
> --
> 1.9.1
>
Chris Arges Nov. 29, 2016, 9:54 p.m. | #2

Chris Arges Dec. 1, 2016, 3:13 a.m. | #3
On Tue, Nov 29, 2016 at 10:03:56AM -0600, Aaron Sierra wrote:
> Several people have reported firmware leaving the I210/I211 PHY's page
> select register set to something other than the default of zero. This
> causes the first accesses, PHY_IDx register reads, to access something
> else, resulting in device probe failure:
> 
>     igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
>     igb: Copyright (c) 2007-2014 Intel Corporation.
>     igb: probe of 0000:01:00.0 failed with error -2
> 
> This problem began for them after a previous patch I submitted was
> applied:
> 
>     commit 2a3cdead8b408351fa1e3079b220fa331480ffbc
>     Author: Aaron Sierra <asierra@xes-inc.com>
>     Date:   Tue Nov 3 12:37:09 2015 -0600
> 
>         igb: Remove GS40G specific defines/functions
> 
> I personally experienced this problem after attempting to PXE boot from
> I210 devices using this firmware:
> 
>     Intel(R) Boot Agent GE v1.5.78
>     Copyright (C) 1997-2014, Intel Corporation
> 
> Resetting the PHY before reading from it, ensures the page select
> register is in its default state and doesn't make assumptions about
> the PHY's register set before the PHY has been probed.
>

Hi,
I could not get this patch to fix the original issue I had.
The patch I submitted a while back fixed the issue for me:
https://patchwork.ozlabs.org/patch/690464/

Let me know if there is something else I can get check here. To reproduce
I have to fully power off the server which contains this NIC.
--chris

 
> Cc: Matwey V. Kornilov <matwey@sai.msu.ru>
> Cc: Chris Arges <carges@vectranetworks.com>
> Cc: Jochen Henneberg <jh@henneberg-systemdesign.com>
> Signed-off-by: Aaron Sierra <asierra@xes-inc.com>
> ---
>  drivers/net/ethernet/intel/igb/e1000_82575.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c
> index a61447f..ee44398 100644
> --- a/drivers/net/ethernet/intel/igb/e1000_82575.c
> +++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
> @@ -245,6 +245,17 @@ static s32 igb_init_phy_params_82575(struct e1000_hw *hw)
>  	hw->bus.func = (rd32(E1000_STATUS) & E1000_STATUS_FUNC_MASK) >>
>  			E1000_STATUS_FUNC_SHIFT;
>  
> +	/* Make sure the PHY is in a good state. Several people have reported
> +	 * firmware leaving the PHY's page select register set to something
> +	 * other than the default of zero, which causes the PHY ID read to
> +	 * access something other than the intended register.
> +	 */
> +	ret_val = hw->phy.ops.reset(hw);
> +	if (ret_val) {
> +		hw_dbg("Error resetting the PHY.\n");
> +		goto out;
> +	}
> +
>  	/* Set phy->phy_addr and phy->id. */
>  	ret_val = igb_get_phy_id_82575(hw);
>  	if (ret_val)
> -- 
> 1.9.1
Aaron Sierra Dec. 1, 2016, 4:19 p.m. | #4
----- Original Message -----
> From: "Chris Arges" <carges@vectranetworks.com>
> Sent: Wednesday, November 30, 2016 9:13:18 PM
>
> Hi,
> I could not get this patch to fix the original issue I had.
> The patch I submitted a while back fixed the issue for me:
> https://patchwork.ozlabs.org/patch/690464/
> 
> Let me know if there is something else I can get check here. To reproduce
> I have to fully power off the server which contains this NIC.
> --chris
> 

Chris,
Can you provide any more information about your environment than you
already have? I've only found that you're using an I210. Is it embedded
in the baseboard or an add-in device? Please share very verbose lspci
output for this device:

    lspci -vvv -n -s <[bus]:[device].[function]>

What kernel version did you use in your testing? Did my patch apply
cleanly or with "fuzz" reported by the patch command?

-Aaron S.
Chris Arges Dec. 1, 2016, 11:17 p.m. | #5

Matwey V. Kornilov Dec. 2, 2016, 8:35 p.m. | #6
2016-12-02 2:17 GMT+03:00 Chris Arges <carges@vectranetworks.com>:
> ________________________________________
> From: Aaron Sierra <asierra@xes-inc.com>
> Sent: Thursday, December 1, 2016 10:19 AM
> To: Chris Arges
> Cc: Jeffrey T Kirsher; Matwey V. Kornilov; Jochen Henneberg; intel-wired-lan
> Subject: Re: [PATCH] igb: reset the PHY before reading the PHY ID
>
> ----- Original Message -----
>> From: "Chris Arges" <carges@vectranetworks.com>
>> Sent: Wednesday, November 30, 2016 9:13:18 PM
>>
>> Hi,
>> I could not get this patch to fix the original issue I had.
>> The patch I submitted a while back fixed the issue for me:
>> https://patchwork.ozlabs.org/patch/690464/
>>
>> Let me know if there is something else I can get check here. To reproduce
>> I have to fully power off the server which contains this NIC.
>> --chris
>>
>
> Chris,
> Can you provide any more information about your environment than you
> already have? I've only found that you're using an I210. Is it embedded
> in the baseboard or an add-in device? Please share very verbose lspci
> output for this device:
>
>     lspci -vvv -n -s <[bus]:[device].[function]>
>
> What kernel version did you use in your testing? Did my patch apply
> cleanly or with "fuzz" reported by the patch command?
>
> -Aaron S.
>
> Aaron,
> I applied your patch cleanly against 4.4 (testing this on Ubuntu 16.04 fwiw) which I think didn't backport well. Using mainline 4.9rcX I was able to get your patch working. Would there be a recommended modification for a stable patch?
>

2a3cdead8b4083 ("igb: Remove GS40G specific defines/functions") has
been applied in 4.5 AFAIU. Why do you test fixing patch at 4.4?


> For completeness,
> $ lspci -vvv -n -s 07:00.0
> 07:00.0 0200: 8086:1533 (rev 03)
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 128 bytes
>         Interrupt: pin A routed to IRQ 16
>         Region 0: Memory at 91200000 (32-bit, non-prefetchable) [size=512K]
>         Region 2: I/O ports at 3000 [size=32]
>         Region 3: Memory at 91280000 (32-bit, non-prefetchable) [size=16K]
>         Capabilities: <access denied>
>         Kernel driver in use: igb
>         Kernel modules: igb
>
> Thanks,
> --chris
Aaron Sierra Dec. 2, 2016, 8:46 p.m. | #7
----- Original Message -----
> From: "Chris Arges" <carges@vectranetworks.com>
> Sent: Thursday, December 1, 2016 5:17:52 PM

>> ________________________________________
>> From: Aaron Sierra <asierra@xes-inc.com>
>> Sent: Thursday, December 1, 2016 10:19 AM
>> To: Chris Arges
>> Cc: Jeffrey T Kirsher; Matwey V. Kornilov; Jochen Henneberg; intel-wired-lan
>> Subject: Re: [PATCH] igb: reset the PHY before reading the PHY ID
>> 
>> Chris,
>> Can you provide any more information about your environment than you
>> already have? I've only found that you're using an I210. Is it embedded
>> in the baseboard or an add-in device? Please share very verbose lspci
>> output for this device:
>> 
>>    lspci -vvv -n -s <[bus]:[device].[function]>
>> 
>> What kernel version did you use in your testing? Did my patch apply
>> cleanly or with "fuzz" reported by the patch command?
>> 
>> -Aaron S.
> 
> Aaron,
> I applied your patch cleanly against 4.4 (testing this on Ubuntu 16.04 fwiw)
> which I think didn't backport well. Using mainline 4.9rcX I was able to get
> your patch working. Would there be a recommended modification for a stable
> patch?
> 
> For completeness,
> $ lspci -vvv -n -s 07:00.0
> 07:00.0 0200: 8086:1533 (rev 03)
>	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
>	SERR- FastB2B- DisINTx+
>	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
>	>SERR- <PERR- INTx-
>	Latency: 0, Cache Line Size: 128 bytes
>	Interrupt: pin A routed to IRQ 16
>	Region 0: Memory at 91200000 (32-bit, non-prefetchable) [size=512K]
>	Region 2: I/O ports at 3000 [size=32]
>	Region 3: Memory at 91280000 (32-bit, non-prefetchable) [size=16K]
>	Capabilities: <access denied>
>	Kernel driver in use: igb
>	Kernel modules: igb
> 

Chris,
Thanks for double-checking. You've got a copper NIC and I'm testing with a
"flashless copper" NIC (8086:157b).

The patch that caused things to break for you doesn't seem to be in linux-stable's
linux-4.4.y branch, so this patch shouldn't be needed there. Though it does apply
cleanly like you said.

I did another sanity check against linux-4.5.y and found that this patch applies
cleanly there and also still solves the problem for me.

-Aaron S.
Chris Arges Dec. 2, 2016, 9:26 p.m. | #8
On Fri, Dec 02, 2016 at 02:46:30PM -0600, Aaron Sierra wrote:
> ----- Original Message -----
> > From: "Chris Arges" <carges@vectranetworks.com>
> > Sent: Thursday, December 1, 2016 5:17:52 PM
> 
> >> ________________________________________
> >> From: Aaron Sierra <asierra@xes-inc.com>
> >> Sent: Thursday, December 1, 2016 10:19 AM
> >> To: Chris Arges
> >> Cc: Jeffrey T Kirsher; Matwey V. Kornilov; Jochen Henneberg; intel-wired-lan
> >> Subject: Re: [PATCH] igb: reset the PHY before reading the PHY ID
> >> 
> >> Chris,
> >> Can you provide any more information about your environment than you
> >> already have? I've only found that you're using an I210. Is it embedded
> >> in the baseboard or an add-in device? Please share very verbose lspci
> >> output for this device:
> >> 
> >>    lspci -vvv -n -s <[bus]:[device].[function]>
> >> 
> >> What kernel version did you use in your testing? Did my patch apply
> >> cleanly or with "fuzz" reported by the patch command?
> >> 
> >> -Aaron S.
> > 
> > Aaron,
> > I applied your patch cleanly against 4.4 (testing this on Ubuntu 16.04 fwiw)
> > which I think didn't backport well. Using mainline 4.9rcX I was able to get
> > your patch working. Would there be a recommended modification for a stable
> > patch?
> > 
> > For completeness,
> > $ lspci -vvv -n -s 07:00.0
> > 07:00.0 0200: 8086:1533 (rev 03)
> >	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping-
> >	SERR- FastB2B- DisINTx+
> >	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort-
> >	>SERR- <PERR- INTx-
> >	Latency: 0, Cache Line Size: 128 bytes
> >	Interrupt: pin A routed to IRQ 16
> >	Region 0: Memory at 91200000 (32-bit, non-prefetchable) [size=512K]
> >	Region 2: I/O ports at 3000 [size=32]
> >	Region 3: Memory at 91280000 (32-bit, non-prefetchable) [size=16K]
> >	Capabilities: <access denied>
> >	Kernel driver in use: igb
> >	Kernel modules: igb
> > 
> 
> Chris,
> Thanks for double-checking. You've got a copper NIC and I'm testing with a
> "flashless copper" NIC (8086:157b).
> 
> The patch that caused things to break for you doesn't seem to be in linux-stable's
> linux-4.4.y branch, so this patch shouldn't be needed there. Though it does apply
> cleanly like you said.
> 
> I did another sanity check against linux-4.5.y and found that this patch applies
> cleanly there and also still solves the problem for me.
> 
> -Aaron S.

Yes, sorry for the confusion. This patch works for me 4.5+ and I can verify it.
Tested-by: Chris J Arges <christopherarges@gmail.com>

--chris
Brown, Aaron F Dec. 7, 2016, 11:24 p.m. | #9
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On
> Behalf Of Aaron Sierra
> Sent: Tuesday, November 29, 2016 8:04 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Chris Arges <carges@vectranetworks.com>; intel-wired-lan <intel-wired-
> lan@lists.osuosl.org>; Matwey V. Kornilov <matwey@sai.msu.ru>
> Subject: [Intel-wired-lan] [PATCH] igb: reset the PHY before reading the PHY
> ID
> 
> Several people have reported firmware leaving the I210/I211 PHY's page
> select register set to something other than the default of zero. This
> causes the first accesses, PHY_IDx register reads, to access something
> else, resulting in device probe failure:
> 
>     igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
>     igb: Copyright (c) 2007-2014 Intel Corporation.
>     igb: probe of 0000:01:00.0 failed with error -2
> 
> This problem began for them after a previous patch I submitted was
> applied:
> 
>     commit 2a3cdead8b408351fa1e3079b220fa331480ffbc
>     Author: Aaron Sierra <asierra@xes-inc.com>
>     Date:   Tue Nov 3 12:37:09 2015 -0600
> 
>         igb: Remove GS40G specific defines/functions
> 
> I personally experienced this problem after attempting to PXE boot from
> I210 devices using this firmware:
> 
>     Intel(R) Boot Agent GE v1.5.78
>     Copyright (C) 1997-2014, Intel Corporation
> 
> Resetting the PHY before reading from it, ensures the page select
> register is in its default state and doesn't make assumptions about
> the PHY's register set before the PHY has been probed.
> 
> Cc: Matwey V. Kornilov <matwey@sai.msu.ru>
> Cc: Chris Arges <carges@vectranetworks.com>
> Cc: Jochen Henneberg <jh@henneberg-systemdesign.com>
> Signed-off-by: Aaron Sierra <asierra@xes-inc.com>
> ---
>  drivers/net/ethernet/intel/igb/e1000_82575.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

Patch

diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.c b/drivers/net/ethernet/intel/igb/e1000_82575.c
index a61447f..ee44398 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.c
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.c
@@ -245,6 +245,17 @@  static s32 igb_init_phy_params_82575(struct e1000_hw *hw)
 	hw->bus.func = (rd32(E1000_STATUS) & E1000_STATUS_FUNC_MASK) >>
 			E1000_STATUS_FUNC_SHIFT;
 
+	/* Make sure the PHY is in a good state. Several people have reported
+	 * firmware leaving the PHY's page select register set to something
+	 * other than the default of zero, which causes the PHY ID read to
+	 * access something other than the intended register.
+	 */
+	ret_val = hw->phy.ops.reset(hw);
+	if (ret_val) {
+		hw_dbg("Error resetting the PHY.\n");
+		goto out;
+	}
+
 	/* Set phy->phy_addr and phy->id. */
 	ret_val = igb_get_phy_id_82575(hw);
 	if (ret_val)