[RFC] e1000e: Remove Other from EIAC.

Message ID	20180118065054.29844-1-bpoirier@suse.com
State	RFC
Headers	show Return-Path: <intel-wired-lan-bounces@osuosl.org> From: Benjamin Poirier <bpoirier@suse.com> To: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Date: Thu, 18 Jan 2018 15:50:54 +0900 Message-Id: <20180118065054.29844-1-bpoirier@suse.com> Subject: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC. Precedence: list Cc: netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-wired-lan-bounces@osuosl.org Sender: "Intel-wired-lan" <intel-wired-lan-bounces@osuosl.org>
Series	[RFC] e1000e: Remove Other from EIAC. \| expand [RFC] e1000e: Remove Other from EIAC.

Benjamin Poirier Jan. 18, 2018, 6:50 a.m. UTC

It was reported that emulated e1000e devices in vmware esxi 6.5 Build
7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.

Some experimentation showed that this flaw in vmware e1000e emulation can
be worked around by not setting Other in EIAC. This is how it was before
16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).

Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---

Jeff, I'm sending as RFC since it looks like a problem that should be fixed
in vmware. If you'd like to have the workaround in e1000e, I'll submit.

---
 drivers/net/ethernet/intel/e1000e/netdev.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Benjamin Poirier Jan. 18, 2018, 7:27 a.m. UTC | #1

On 2018/01/18 15:50, Benjamin Poirier wrote:
> It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
> 
> Some experimentation showed that this flaw in vmware e1000e emulation can
> be worked around by not setting Other in EIAC. This is how it was before
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).

vmware folks, please comment.

Benjamin Poirier Jan. 18, 2018, 7:41 a.m. UTC | #2

On 2018/01/18 15:50, Benjamin Poirier wrote:
> It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
                 (_INT_ASSERTED | _LSC)

> 
> Some experimentation showed that this flaw in vmware e1000e emulation can
> be worked around by not setting Other in EIAC. This is how it was before
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> 
> Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>

Sasha Neftin Jan. 18, 2018, 11:59 a.m. UTC | #3

On 1/18/2018 08:50, Benjamin Poirier wrote:
> It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
>
> Some experimentation showed that this flaw in vmware e1000e emulation can
> be worked around by not setting Other in EIAC. This is how it was before
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
>
> Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>
> Jeff, I'm sending as RFC since it looks like a problem that should be fixed
> in vmware. If you'd like to have the workaround in e1000e, I'll submit.
>
> ---
>   drivers/net/ethernet/intel/e1000e/netdev.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 9f18d39bdc8f..625a4c9a86a4 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -1918,6 +1918,8 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
>   	bool enable = true;
>   
>   	icr = er32(ICR);
> +	ew32(ICR, E1000_ICR_OTHER);
> +
>   	if (icr & E1000_ICR_RXO) {
>   		ew32(ICR, E1000_ICR_RXO);
>   		enable = false;
> @@ -2040,7 +2042,6 @@ static void e1000_configure_msix(struct e1000_adapter *adapter)
>   		       hw->hw_addr + E1000_EITR_82574(vector));
>   	else
>   		writel(1, hw->hw_addr + E1000_EITR_82574(vector));
> -	adapter->eiac_mask |= E1000_IMS_OTHER;
>   
>   	/* Cause Tx interrupts on every write back */
>   	ivar |= BIT(31);
> @@ -2265,7 +2266,7 @@ static void e1000_irq_enable(struct e1000_adapter *adapter)
>   
>   	if (adapter->msix_entries) {
>   		ew32(EIAC_82574, adapter->eiac_mask & E1000_EIAC_MASK_82574);
> -		ew32(IMS, adapter->eiac_mask | E1000_IMS_LSC);
> +		ew32(IMS, adapter->eiac_mask | E1000_IMS_OTHER | E1000_IMS_LSC);
>   	} else if (hw->mac.type >= e1000_pch_lpt) {
>   		ew32(IMS, IMS_ENABLE_MASK | E1000_IMS_ECCER);
>   	} else {

Hello Benjamin,

Alexander Duyck has submitted follow patch 
https://patchwork.ozlabs.org/patch/862613/ addressed to 
"[ESXi][RHEL7.5]e1000 and e1000e NIC link can not up on VMWare 
ESXi"problem. Looks Red Hat happy with this solution. Let's them finish 
their tests and then decide about next step.
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 1/18/2018 08:50, Benjamin Poirier
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:20180118065054.29844-1-bpoirier@suse.com">
      <pre wrap="">It was reported that emulated e1000e devices in vmware esxi 6.5 Build
7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.

Some experimentation showed that this flaw in vmware e1000e emulation can
be worked around by not setting Other in EIAC. This is how it was before
16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).

Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
Signed-off-by: Benjamin Poirier <a class="moz-txt-link-rfc2396E" href="mailto:bpoirier@suse.com">&lt;bpoirier@suse.com&gt;</a>
---

Jeff, I'm sending as RFC since it looks like a problem that should be fixed
in vmware. If you'd like to have the workaround in e1000e, I'll submit.

---
 drivers/net/ethernet/intel/e1000e/netdev.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9f18d39bdc8f..625a4c9a86a4 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1918,6 +1918,8 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
 	bool enable = true;
 
 	icr = er32(ICR);
+	ew32(ICR, E1000_ICR_OTHER);
+
 	if (icr &amp; E1000_ICR_RXO) {
 		ew32(ICR, E1000_ICR_RXO);
 		enable = false;
@@ -2040,7 +2042,6 @@ static void e1000_configure_msix(struct e1000_adapter *adapter)
 		       hw-&gt;hw_addr + E1000_EITR_82574(vector));
 	else
 		writel(1, hw-&gt;hw_addr + E1000_EITR_82574(vector));
-	adapter-&gt;eiac_mask |= E1000_IMS_OTHER;
 
 	/* Cause Tx interrupts on every write back */
 	ivar |= BIT(31);
@@ -2265,7 +2266,7 @@ static void e1000_irq_enable(struct e1000_adapter *adapter)
 
 	if (adapter-&gt;msix_entries) {
 		ew32(EIAC_82574, adapter-&gt;eiac_mask &amp; E1000_EIAC_MASK_82574);
-		ew32(IMS, adapter-&gt;eiac_mask | E1000_IMS_LSC);
+		ew32(IMS, adapter-&gt;eiac_mask | E1000_IMS_OTHER | E1000_IMS_LSC);
 	} else if (hw-&gt;mac.type &gt;= e1000_pch_lpt) {
 		ew32(IMS, IMS_ENABLE_MASK | E1000_IMS_ECCER);
 	} else {
</pre>
    </blockquote>
    <p>Hello Benjamin,</p>
    <p>Alexander Duyck has submitted follow patch
      <a class="moz-txt-link-freetext" href="https://patchwork.ozlabs.org/patch/862613/">https://patchwork.ozlabs.org/patch/862613/</a> addressed to
      "[ESXi][RHEL7.5]e1000 and e1000e NIC link can not up on VMWare
      ESXi"<span style="color: rgb(0, 0, 0); font-family: &quot;DejaVu
        Sans&quot;, &quot;Liberation Sans&quot;, sans-serif; font-size:
        16.25px; font-style: normal; font-variant-ligatures: normal;
        font-variant-caps: normal; font-weight: 700; letter-spacing:
        normal; orphans: 2; text-align: start; text-indent: 0px;
        text-transform: none; white-space: normal; widows: 2;
        word-spacing: 0px; -webkit-text-stroke-width: 0px;
        background-color: rgb(208, 208, 208); text-decoration-style:
        initial; text-decoration-color: initial; display: inline
        !important; float: none;"></span> problem. Looks Red Hat happy
      with this solution. Let's them finish their tests and then decide
      about next step.<br>
    </p>
  </body>
</html>

Alexander H Duyck Jan. 18, 2018, 3:51 p.m. UTC | #4

On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote:
> It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.

Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my
patch on was that the VMware code was sending _OTHER instead of _LSC
to trigger LSC events. As such in my version of the workaround I just
went through and did the testing if the _RXO bit was set, otherwise I
assumed that whatever event was received must have been meant to
trigger an _LSC type event since that worked in the past.

> Some experimentation showed that this flaw in vmware e1000e emulation can
> be worked around by not setting Other in EIAC. This is how it was before
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).

Did this actually set the _LSC bit or was it just giving you the
_OTHER bit? The ICR for that combination would come out 0x81000000.

> Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>
> Jeff, I'm sending as RFC since it looks like a problem that should be fixed
> in vmware. If you'd like to have the workaround in e1000e, I'll submit.

I would appreciate it if you could review/test the patch I submitted
for the same issue. Specifically I would want to make certain that it
still addresses the original RXO interrupt burst issue your reported.

Thanks.

- Alex

Shrikrishna Khare Jan. 19, 2018, 2:42 a.m. UTC | #5

On Thu, 18 Jan 2018, Benjamin Poirier wrote:

> On 2018/01/18 15:50, Benjamin Poirier wrote:
> > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
> > 
> > Some experimentation showed that this flaw in vmware e1000e emulation can
> > be worked around by not setting Other in EIAC. This is how it was before
> > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> 
> vmware folks, please comment.

Thank you for bringing this to our attention.

Using the reported build (ESX 6.5, 7526125) and 4.15.0-rc8+ kernel (which 
has the said patch), I could bring up e1000e interface (version: 3.2.6-k),
get dhcp address and even do large file downloads without difficulty.

Could you give us more pointers on how we may be able to reproduce this 
locally? Was there anything different with the configuration when the 
issue was observed? Is the issue consistently reproducible?

Thanks,
Shri

Benjamin Poirier Jan. 19, 2018, 5:36 a.m. UTC | #6

On 2018/01/18 18:42, Shrikrishna Khare wrote:
> 
> 
> On Thu, 18 Jan 2018, Benjamin Poirier wrote:
> 
> > On 2018/01/18 15:50, Benjamin Poirier wrote:
> > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
> > > 
> > > Some experimentation showed that this flaw in vmware e1000e emulation can
> > > be worked around by not setting Other in EIAC. This is how it was before
> > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> > 
> > vmware folks, please comment.
> 
> Thank you for bringing this to our attention.
> 
> Using the reported build (ESX 6.5, 7526125) and 4.15.0-rc8+ kernel (which 
> has the said patch), I could bring up e1000e interface (version: 3.2.6-k),
> get dhcp address and even do large file downloads without difficulty.
> 
> Could you give us more pointers on how we may be able to reproduce this 
> locally? Was there anything different with the configuration when the 
> issue was observed? Is the issue consistently reproducible?

It's consistently reproducible, however I noticed that once in a while
there is a genuine "Other" interrupt that comes in and triggers the link
status change. The problem is with interrupts that are triggered via a
write to ICS (such as in e1000e_trigger_lsc()). Can you reproduce a
problem if you do:
	ip link set ethX down
	ip link set ethX up

If you're building your own kernel, you can add the following patch and
cat /sys/kernel/debug/tracing/trace_pipe

For me it shows on v4.15-rc8:
           <...>-2578  [000] .... 83527.938321: e1000e_trigger_lsc: trigger_lsc
           <...>-2578  [000] d.h. 83527.938398: e1000_msix_other: icr 0x0

With the patch that I submitted, it shows:
         wickedd-1329  [002] .N..    20.123545: e1000e_trigger_lsc: trigger_lsc
          <idle>-0     [000] d.h.    20.123630: e1000_msix_other: icr 0x81000004
          <idle>-0     [000] d.h.    20.123654: e1000_msix_other: lsc
          <idle>-0     [000] d.h.    20.123676: e1000_msix_other: mod_timer


diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9f18d39bdc8f..16620ce840fc 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1918,22 +1918,29 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
 	bool enable = true;
 
 	icr = er32(ICR);
+	trace_printk("icr 0x%x\n", icr);
+
 	if (icr & E1000_ICR_RXO) {
+		trace_printk("rxo\n");
 		ew32(ICR, E1000_ICR_RXO);
 		enable = false;
 		/* napi poll will re-enable Other, make sure it runs */
 		if (napi_schedule_prep(&adapter->napi)) {
+			trace_printk("napi schedule\n");
 			adapter->total_rx_bytes = 0;
 			adapter->total_rx_packets = 0;
 			__napi_schedule(&adapter->napi);
 		}
 	}
 	if (icr & E1000_ICR_LSC) {
+		trace_printk("lsc\n");
 		ew32(ICR, E1000_ICR_LSC);
 		hw->mac.get_link_status = true;
 		/* guard against interrupt when we're going down */
-		if (!test_bit(__E1000_DOWN, &adapter->state))
+		if (!test_bit(__E1000_DOWN, &adapter->state)) {
+			trace_printk("mod_timer\n");
 			mod_timer(&adapter->watchdog_timer, jiffies + 1);
+		}
 	}
 
 	if (enable && !test_bit(__E1000_DOWN, &adapter->state))
@@ -4221,6 +4228,8 @@ static void e1000e_trigger_lsc(struct e1000_adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
 
+	trace_printk("trigger_lsc\n");
+
 	if (adapter->msix_entries)
 		ew32(ICS, E1000_ICS_LSC | E1000_ICS_OTHER);
 	else

Benjamin Poirier Jan. 19, 2018, 8:59 a.m. UTC | #7

On 2018/01/18 07:51, Alexander Duyck wrote:
> On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote:
> > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
> 
> Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my

Yes. The numeric value is correct. I made a mistake when writing down
the flag names.

> patch on was that the VMware code was sending _OTHER instead of _LSC
> to trigger LSC events. As such in my version of the workaround I just

It's not so deterministic, sadly. In my tests, upon entry into
e1000_msix_other() after e1000e_trigger_lsc(), with no workaround patch
I've seen:
icr=0x0
icr=0x3d
	Reserved RXDMT0 Reserved LSC TXDW
icr=0x46
	RXO LSC TXQE
and someone else reported:
icr=0x3c
	Reserved RXDMT0 Reserved LSC

> went through and did the testing if the _RXO bit was set, otherwise I
> assumed that whatever event was received must have been meant to
> trigger an _LSC type event since that worked in the past.
> 
> > Some experimentation showed that this flaw in vmware e1000e emulation can
> > be worked around by not setting Other in EIAC. This is how it was before
> > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> 
> Did this actually set the _LSC bit or was it just giving you the
> _OTHER bit? The ICR for that combination would come out 0x81000000.

With my patch, after e1000e_trigger_lsc(), it results in icr=0x81000004
on real and emulated hardware.

IMO, the resulting icr read is cleaner than with your patch but it
depends on an undocumented quirk of the emulated vmware e1000e, so I
don't know which of the two workarounds is more desirable.

If you'd like to stick with your patch though, I think that you should
definitely rewrite it as:

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9f18d39bdc8f..68c0bcb8287f 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1928,7 +1928,12 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
 			__napi_schedule(&adapter->napi);
 		}
 	}
-	if (icr & E1000_ICR_LSC) {
+	if (icr & E1000_ICR_LSC || !(icr & E1000_ICR_RXO)) {
+		/* We assume if the RXO bit is not set that this is a
+		 * link status change event. This is needed due to emulated
+		 * versions of the device that may not correctly populate
+		 * the LSC bit.
+		 */
 		ew32(ICR, E1000_ICR_LSC);
 		hw->mac.get_link_status = true;
 		/* guard against interrupt when we're going down */

> 
> > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> > ---
> >
> > Jeff, I'm sending as RFC since it looks like a problem that should be fixed
> > in vmware. If you'd like to have the workaround in e1000e, I'll submit.
> 
> I would appreciate it if you could review/test the patch I submitted
> for the same issue. Specifically I would want to make certain that it
> still addresses the original RXO interrupt burst issue your reported.

I've tested both my patch and yours; they both allow link up on vmware;
link up on real 82574 and rxo mitigation on real 82574. I couldn't
conveniently test rxo on vmware.

> 
> Thanks.
> 
> - Alex
>

Benjamin Poirier Jan. 19, 2018, 1:36 p.m. UTC | #8

On 2018/01/19 17:59, Benjamin Poirier wrote:
> On 2018/01/18 07:51, Alexander Duyck wrote:
> > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote:
> > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
> > 
> > Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my
> 
> Yes. The numeric value is correct. I made a mistake when writing down
> the flag names.
> 
> > patch on was that the VMware code was sending _OTHER instead of _LSC
> > to trigger LSC events. As such in my version of the workaround I just
> 
> It's not so deterministic, sadly. In my tests, upon entry into
> e1000_msix_other() after e1000e_trigger_lsc(), with no workaround patch
> I've seen:
> icr=0x0
> icr=0x3d
> 	Reserved RXDMT0 Reserved LSC TXDW
> icr=0x46
> 	RXO LSC TXQE
> and someone else reported:
> icr=0x3c
> 	Reserved RXDMT0 Reserved LSC
> 
> > went through and did the testing if the _RXO bit was set, otherwise I
> > assumed that whatever event was received must have been meant to
> > trigger an _LSC type event since that worked in the past.
                               ^...

Re-reading that part, my thoughts are that it worked in the past, before
16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1),
because (presumably) Other was not set in EIAC. It worked after
16ecba59bc33 but before 4aea7a5c5e94 ("e1000e: Avoid receiver overrun
interrupt bursts", v4.15-rc1) because e1000_msix_other() didn't read the
value of icr. If it had, it would've found a bogus value, which is
what's happening after 4aea7a5c5e94. I wonder if we're not just getting
some uninitialized value from the emulation code... which makes me kind
of uneasy about your approach of trying to make sense of the value.
Maybe Shrikrishna can clarify where the icr value is coming from when
Other is set in EIAC.

> > 
> > > Some experimentation showed that this flaw in vmware e1000e emulation can
> > > be worked around by not setting Other in EIAC. This is how it was before
> > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> > 
> > Did this actually set the _LSC bit or was it just giving you the
> > _OTHER bit? The ICR for that combination would come out 0x81000000.
> 
> With my patch, after e1000e_trigger_lsc(), it results in icr=0x81000004
> on real and emulated hardware.
> 
> IMO, the resulting icr read is cleaner than with your patch but it
> depends on an undocumented quirk of the emulated vmware e1000e, so I
> don't know which of the two workarounds is more desirable.
> 
> If you'd like to stick with your patch though, I think that you should
> definitely rewrite it as:
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 9f18d39bdc8f..68c0bcb8287f 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -1928,7 +1928,12 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
>  			__napi_schedule(&adapter->napi);
>  		}
>  	}
> -	if (icr & E1000_ICR_LSC) {
> +	if (icr & E1000_ICR_LSC || !(icr & E1000_ICR_RXO)) {
> +		/* We assume if the RXO bit is not set that this is a
> +		 * link status change event. This is needed due to emulated
> +		 * versions of the device that may not correctly populate
> +		 * the LSC bit.
> +		 */
>  		ew32(ICR, E1000_ICR_LSC);
>  		hw->mac.get_link_status = true;
>  		/* guard against interrupt when we're going down */
> 
> > 
> > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> > > ---
> > >
> > > Jeff, I'm sending as RFC since it looks like a problem that should be fixed
> > > in vmware. If you'd like to have the workaround in e1000e, I'll submit.
> > 
> > I would appreciate it if you could review/test the patch I submitted
> > for the same issue. Specifically I would want to make certain that it
> > still addresses the original RXO interrupt burst issue your reported.
> 
> I've tested both my patch and yours; they both allow link up on vmware;
> link up on real 82574 and rxo mitigation on real 82574. I couldn't
> conveniently test rxo on vmware.
> 
> > 
> > Thanks.
> > 
> > - Alex
> >

Alexander H Duyck Jan. 19, 2018, 4:22 p.m. UTC | #9

On Fri, Jan 19, 2018 at 5:36 AM, Benjamin Poirier
<benjamin.poirier@gmail.com> wrote:
> On 2018/01/19 17:59, Benjamin Poirier wrote:
>> On 2018/01/18 07:51, Alexander Duyck wrote:
>> > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote:
>> > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
>> > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
>> > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
>> > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
>> > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
>> > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
>> >
>> > Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my
>>
>> Yes. The numeric value is correct. I made a mistake when writing down
>> the flag names.
>>
>> > patch on was that the VMware code was sending _OTHER instead of _LSC
>> > to trigger LSC events. As such in my version of the workaround I just
>>
>> It's not so deterministic, sadly. In my tests, upon entry into
>> e1000_msix_other() after e1000e_trigger_lsc(), with no workaround patch
>> I've seen:
>> icr=0x0
>> icr=0x3d
>>       Reserved RXDMT0 Reserved LSC TXDW
>> icr=0x46
>>       RXO LSC TXQE
>> and someone else reported:
>> icr=0x3c
>>       Reserved RXDMT0 Reserved LSC
>>
>> > went through and did the testing if the _RXO bit was set, otherwise I
>> > assumed that whatever event was received must have been meant to
>> > trigger an _LSC type event since that worked in the past.
>                                ^...
>
> Re-reading that part, my thoughts are that it worked in the past, before
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1),
> because (presumably) Other was not set in EIAC. It worked after
> 16ecba59bc33 but before 4aea7a5c5e94 ("e1000e: Avoid receiver overrun
> interrupt bursts", v4.15-rc1) because e1000_msix_other() didn't read the
> value of icr. If it had, it would've found a bogus value, which is
> what's happening after 4aea7a5c5e94. I wonder if we're not just getting
> some uninitialized value from the emulation code... which makes me kind
> of uneasy about your approach of trying to make sense of the value.
> Maybe Shrikrishna can clarify where the icr value is coming from when
> Other is set in EIAC.

For now I still say we let my current patch go as-is in order to
address the immediate issue. We can follow-up and do a more refined
version of things once we get the final word from VMware on how this
actually works. If nothing else the current patch appears to resolve
the currently reported issue and is already submitted.

I'm of the mind that we need to cut down on the code thrash.  This
driver is supposed to have been in a "maintenance" mode for the last
year or so as there aren't being any new parts added is my
understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e:
Do not read ICR in Other interrupt", v4.5-rc1) was submitted or
accepted in the first place. I don't see any notes about it fixing any
bug or addressing any issue and it seems like that is the start of all
the issues we have been having recently with RXO triggering more
interrupts, various link issues, and this most recent VMware issue.

- Alex

Benjamin Poirier Jan. 19, 2018, 10:45 p.m. UTC | #10

On 2018/01/19 08:22, Alexander Duyck wrote:
> On Fri, Jan 19, 2018 at 5:36 AM, Benjamin Poirier
> <benjamin.poirier@gmail.com> wrote:
> > On 2018/01/19 17:59, Benjamin Poirier wrote:
> >> On 2018/01/18 07:51, Alexander Duyck wrote:
> >> > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote:
> >> > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> >> > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> >> > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> >> > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> >> > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> >> > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
> >> >
> >> > Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my
> >>
> >> Yes. The numeric value is correct. I made a mistake when writing down
> >> the flag names.
> >>
> >> > patch on was that the VMware code was sending _OTHER instead of _LSC
> >> > to trigger LSC events. As such in my version of the workaround I just
> >>
> >> It's not so deterministic, sadly. In my tests, upon entry into
> >> e1000_msix_other() after e1000e_trigger_lsc(), with no workaround patch
> >> I've seen:
> >> icr=0x0
> >> icr=0x3d
> >>       Reserved RXDMT0 Reserved LSC TXDW
> >> icr=0x46
> >>       RXO LSC TXQE
> >> and someone else reported:
> >> icr=0x3c
> >>       Reserved RXDMT0 Reserved LSC
> >>
> >> > went through and did the testing if the _RXO bit was set, otherwise I
> >> > assumed that whatever event was received must have been meant to
> >> > trigger an _LSC type event since that worked in the past.
> >                                ^...
> >
> > Re-reading that part, my thoughts are that it worked in the past, before
> > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1),
> > because (presumably) Other was not set in EIAC. It worked after
> > 16ecba59bc33 but before 4aea7a5c5e94 ("e1000e: Avoid receiver overrun
> > interrupt bursts", v4.15-rc1) because e1000_msix_other() didn't read the
> > value of icr. If it had, it would've found a bogus value, which is
> > what's happening after 4aea7a5c5e94. I wonder if we're not just getting
> > some uninitialized value from the emulation code... which makes me kind
> > of uneasy about your approach of trying to make sense of the value.
> > Maybe Shrikrishna can clarify where the icr value is coming from when
> > Other is set in EIAC.
> 
> For now I still say we let my current patch go as-is in order to
> address the immediate issue. We can follow-up and do a more refined
> version of things once we get the final word from VMware on how this
> actually works. If nothing else the current patch appears to resolve
> the currently reported issue and is already submitted.
> 
> I'm of the mind that we need to cut down on the code thrash.  This
> driver is supposed to have been in a "maintenance" mode for the last
> year or so as there aren't being any new parts added is my
> understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e:
> Do not read ICR in Other interrupt", v4.5-rc1) was submitted or
> accepted in the first place. I don't see any notes about it fixing any
> bug or addressing any issue and it seems like that is the start of all
> the issues we have been having recently with RXO triggering more
> interrupts, various link issues, and this most recent VMware issue.

I'm sorry to say but you're the one who suggested that change:

http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html

On 2015/10/28 23:08, Alexander Duyck wrote:
> On 10/22/2015 05:32 PM, Benjamin Poirier wrote:
[...]
> 
> I would argue your first patch probably didn't go far enough to remove dead
> code.  Specifically you should only ever get into this function if LSC is
> set.  There are no other causes that should trigger this.  As such you could
> probably remove the ICR read, and instead replace it with an ICR write of
> the LSC bit since OTHER is already cleared via EIAC.
>

Benjamin Poirier Jan. 19, 2018, 10:55 p.m. UTC | #11

On 2018/01/20 07:45, Benjamin Poirier wrote:
[...]
> > 
> > I'm of the mind that we need to cut down on the code thrash.  This
> > driver is supposed to have been in a "maintenance" mode for the last
> > year or so as there aren't being any new parts added is my
> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e:
> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or
> > accepted in the first place. I don't see any notes about it fixing any
> > bug or addressing any issue and it seems like that is the start of all
> > the issues we have been having recently with RXO triggering more
> > interrupts, various link issues, and this most recent VMware issue.
> 
> I'm sorry to say but you're the one who suggested that change:
> 
> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html
> 
> On 2015/10/28 23:08, Alexander Duyck wrote:
> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote:
> [...]
> > 
> > I would argue your first patch probably didn't go far enough to remove dead
> > code.  Specifically you should only ever get into this function if LSC is
> > set.  There are no other causes that should trigger this.  As such you could
> > probably remove the ICR read, and instead replace it with an ICR write of
> > the LSC bit since OTHER is already cleared via EIAC.
> > 

... The assumption that "There are no other causes that should trigger
this." turned out to be wrong and that caused the RXO problems. Clearing
OTHER via EIAC is causing the problems with vmware now. I don't think
you foresaw those problems back in 2015 and neither did I.

Alexander H Duyck Jan. 20, 2018, 5:21 p.m. UTC | #12

On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier
<benjamin.poirier@gmail.com> wrote:
> On 2018/01/20 07:45, Benjamin Poirier wrote:
> [...]
>> >
>> > I'm of the mind that we need to cut down on the code thrash.  This
>> > driver is supposed to have been in a "maintenance" mode for the last
>> > year or so as there aren't being any new parts added is my
>> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e:
>> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or
>> > accepted in the first place. I don't see any notes about it fixing any
>> > bug or addressing any issue and it seems like that is the start of all
>> > the issues we have been having recently with RXO triggering more
>> > interrupts, various link issues, and this most recent VMware issue.
>>
>> I'm sorry to say but you're the one who suggested that change:
>>
>> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html
>>
>> On 2015/10/28 23:08, Alexander Duyck wrote:
>> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote:
>> [...]
>> >
>> > I would argue your first patch probably didn't go far enough to remove dead
>> > code.  Specifically you should only ever get into this function if LSC is
>> > set.  There are no other causes that should trigger this.  As such you could
>> > probably remove the ICR read, and instead replace it with an ICR write of
>> > the LSC bit since OTHER is already cleared via EIAC.
>> >
>
> ... The assumption that "There are no other causes that should trigger
> this." turned out to be wrong and that caused the RXO problems. Clearing
> OTHER via EIAC is causing the problems with vmware now. I don't think
> you foresaw those problems back in 2015 and neither did I.

Well that explains why I felt like I was explaining things to a
younger version of myself. I was a bit more relaxed in terms of being
willing to make arbitrary changes a few years ago. I tend to be a bit
more conservative now, at least as far as having justifications as to
why I want to do things. With any change you end up taking on risk,
and so usually a patch has a justification as to why you are making
the change.

Unfortunately at the time I didn't have all the information and based
my suggestion on a bad assumption. I would guess at the time I was
thinking of doing general code cleanup. Other drivers such as igb work
this way, but it led us down the path we are on now where we are
having to make one fix after another. It is leading in the opposite
direction of maintainability as this is all becoming more complex.
Suggesting this was a bad decision on my part at the time. I'm only
human, I make mistakes.. :-)

With further review of the code I am seeing various other issues that
could still pop up as I am not certain we should even have the "other"
interrupt messing with the NAPI polling or packet accounting logic at
all. The question I would have at this point is if we revert
16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1)
and all the related fixes for it, what do we end up with? It seems
like the code is slowly heading back in the direction of where it was
originally anyway since there have been a number of partial reverts.
I'm wondering what would happen if we were to just short-cut that and
audit the patches involved to see what we really need and don't.

Your patch as proposed is essentially another step in that direction.
I'm thinking we may want to drop my currently proposed fix for now and
instead look at going through and figure out what changes after that
first one are still really needed. It doesn't look like my fix will
make it for 4.15 anyway so we might as well focus on making certain to
have things as solid as possible by the time 4.16-rc1 rolls around.

Thanks.

- Alex

Benjamin Poirier Jan. 22, 2018, 7:12 a.m. UTC | #13

On 2018/01/20 09:21, Alexander Duyck wrote:
> On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier
> <benjamin.poirier@gmail.com> wrote:
> > On 2018/01/20 07:45, Benjamin Poirier wrote:
> > [...]
> >> >
> >> > I'm of the mind that we need to cut down on the code thrash.  This
> >> > driver is supposed to have been in a "maintenance" mode for the last
> >> > year or so as there aren't being any new parts added is my
> >> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e:
> >> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or
> >> > accepted in the first place. I don't see any notes about it fixing any
> >> > bug or addressing any issue and it seems like that is the start of all
> >> > the issues we have been having recently with RXO triggering more
> >> > interrupts, various link issues, and this most recent VMware issue.
> >>
> >> I'm sorry to say but you're the one who suggested that change:
> >>
> >> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html
> >>
> >> On 2015/10/28 23:08, Alexander Duyck wrote:
> >> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote:
> >> [...]
> >> >
> >> > I would argue your first patch probably didn't go far enough to remove dead
> >> > code.  Specifically you should only ever get into this function if LSC is
> >> > set.  There are no other causes that should trigger this.  As such you could
> >> > probably remove the ICR read, and instead replace it with an ICR write of
> >> > the LSC bit since OTHER is already cleared via EIAC.
> >> >
> >
> > ... The assumption that "There are no other causes that should trigger
> > this." turned out to be wrong and that caused the RXO problems. Clearing
> > OTHER via EIAC is causing the problems with vmware now. I don't think
> > you foresaw those problems back in 2015 and neither did I.
> 
> Well that explains why I felt like I was explaining things to a
> younger version of myself. I was a bit more relaxed in terms of being
> willing to make arbitrary changes a few years ago. I tend to be a bit
> more conservative now, at least as far as having justifications as to
> why I want to do things. With any change you end up taking on risk,
> and so usually a patch has a justification as to why you are making
> the change.
> 
> Unfortunately at the time I didn't have all the information and based
> my suggestion on a bad assumption. I would guess at the time I was
> thinking of doing general code cleanup. Other drivers such as igb work
> this way, but it led us down the path we are on now where we are
> having to make one fix after another. It is leading in the opposite
> direction of maintainability as this is all becoming more complex.
> Suggesting this was a bad decision on my part at the time. I'm only
> human, I make mistakes.. :-)

Thanks for the introspection.

> 
> With further review of the code I am seeing various other issues that
> could still pop up as I am not certain we should even have the "other"
> interrupt messing with the NAPI polling or packet accounting logic at
> all. The question I would have at this point is if we revert
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1)
> and all the related fixes for it, what do we end up with?

The patch I submitted for the current vmware issue actually finishes
reverting commit 16ecba59bc33.

I believe the relevant commits to consider are:

16ecba59bc33 e1000e: Do not read ICR in Other interrupt (v4.5-rc1)
a61cfe4ffad7 e1000e: Do not write lsc to ics in msi-x mode (v4.5-rc1)
0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)

19110cfbb34d e1000e: Separate signaling for link check/link up
             (v4.15-rc1)
4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)

4110e02eb45e e1000e: Fix e1000_check_for_copper_link_ich8lan return
	     value. (v4.15-rc8)

(submitted)  e1000e: Remove Other from EIAC.

commit 4aea7a5c5e94 essentially reverts a61cfe4ffad7 and part of
16ecba59bc33 (ICR read). The submitted patch reverts the rest of
16ecba59bc33 (EIAC clearing of Other).

> It seems
> like the code is slowly heading back in the direction of where it was
> originally anyway since there have been a number of partial reverts.
> I'm wondering what would happen if we were to just short-cut that and
> audit the patches involved to see what we really need and don't.
> 
> Your patch as proposed is essentially another step in that direction.
> I'm thinking we may want to drop my currently proposed fix for now and
> instead look at going through and figure out what changes after that
> first one are still really needed.

If the patch that I submitted for the current vmware issue is merged,
the significant commits that are left are:

0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)
	Fixes a problem in the irq disabling of the napi implementation.

19110cfbb34d e1000e: Separate signaling for link check/link up
             (v4.15-rc1)
	Fixes link flapping caused by a race condition in link
	detection. It was found because the Other interrupt was being
	triggered sort of spuriously by rxo.

4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)
	Fixes Other interrupt bursts during sustained rxo conditions.

I think all of those are still needed, or if they're removed, they need
to be reimplemented differently.

> It doesn't look like my fix will
> make it for 4.15 anyway so we might as well focus on making certain to
> have things as solid as possible by the time 4.16-rc1 rolls around.
> 
> Thanks.
> 
> - Alex

Alexander H Duyck Jan. 22, 2018, 6:01 p.m. UTC | #14

On Sun, Jan 21, 2018 at 11:12 PM, Benjamin Poirier
<benjamin.poirier@gmail.com> wrote:
> On 2018/01/20 09:21, Alexander Duyck wrote:
>> On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier
>> <benjamin.poirier@gmail.com> wrote:
>> > On 2018/01/20 07:45, Benjamin Poirier wrote:
>> > [...]
>> >> >
>> >> > I'm of the mind that we need to cut down on the code thrash.  This
>> >> > driver is supposed to have been in a "maintenance" mode for the last
>> >> > year or so as there aren't being any new parts added is my
>> >> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e:
>> >> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or
>> >> > accepted in the first place. I don't see any notes about it fixing any
>> >> > bug or addressing any issue and it seems like that is the start of all
>> >> > the issues we have been having recently with RXO triggering more
>> >> > interrupts, various link issues, and this most recent VMware issue.
>> >>
>> >> I'm sorry to say but you're the one who suggested that change:
>> >>
>> >> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html
>> >>
>> >> On 2015/10/28 23:08, Alexander Duyck wrote:
>> >> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote:
>> >> [...]
>> >> >
>> >> > I would argue your first patch probably didn't go far enough to remove dead
>> >> > code.  Specifically you should only ever get into this function if LSC is
>> >> > set.  There are no other causes that should trigger this.  As such you could
>> >> > probably remove the ICR read, and instead replace it with an ICR write of
>> >> > the LSC bit since OTHER is already cleared via EIAC.
>> >> >
>> >
>> > ... The assumption that "There are no other causes that should trigger
>> > this." turned out to be wrong and that caused the RXO problems. Clearing
>> > OTHER via EIAC is causing the problems with vmware now. I don't think
>> > you foresaw those problems back in 2015 and neither did I.
>>
>> Well that explains why I felt like I was explaining things to a
>> younger version of myself. I was a bit more relaxed in terms of being
>> willing to make arbitrary changes a few years ago. I tend to be a bit
>> more conservative now, at least as far as having justifications as to
>> why I want to do things. With any change you end up taking on risk,
>> and so usually a patch has a justification as to why you are making
>> the change.
>>
>> Unfortunately at the time I didn't have all the information and based
>> my suggestion on a bad assumption. I would guess at the time I was
>> thinking of doing general code cleanup. Other drivers such as igb work
>> this way, but it led us down the path we are on now where we are
>> having to make one fix after another. It is leading in the opposite
>> direction of maintainability as this is all becoming more complex.
>> Suggesting this was a bad decision on my part at the time. I'm only
>> human, I make mistakes.. :-)
>
> Thanks for the introspection.
>
>>
>> With further review of the code I am seeing various other issues that
>> could still pop up as I am not certain we should even have the "other"
>> interrupt messing with the NAPI polling or packet accounting logic at
>> all. The question I would have at this point is if we revert
>> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1)
>> and all the related fixes for it, what do we end up with?
>
> The patch I submitted for the current vmware issue actually finishes
> reverting commit 16ecba59bc33.
>
> I believe the relevant commits to consider are:
>
> 16ecba59bc33 e1000e: Do not read ICR in Other interrupt (v4.5-rc1)
> a61cfe4ffad7 e1000e: Do not write lsc to ics in msi-x mode (v4.5-rc1)
> 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)
>
> 19110cfbb34d e1000e: Separate signaling for link check/link up
>              (v4.15-rc1)
> 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)
>
> 4110e02eb45e e1000e: Fix e1000_check_for_copper_link_ich8lan return
>              value. (v4.15-rc8)
>
> (submitted)  e1000e: Remove Other from EIAC.
>
> commit 4aea7a5c5e94 essentially reverts a61cfe4ffad7 and part of
> 16ecba59bc33 (ICR read). The submitted patch reverts the rest of
> 16ecba59bc33 (EIAC clearing of Other).
>
>> It seems
>> like the code is slowly heading back in the direction of where it was
>> originally anyway since there have been a number of partial reverts.
>> I'm wondering what would happen if we were to just short-cut that and
>> audit the patches involved to see what we really need and don't.
>>
>> Your patch as proposed is essentially another step in that direction.
>> I'm thinking we may want to drop my currently proposed fix for now and
>> instead look at going through and figure out what changes after that
>> first one are still really needed.
>
> If the patch that I submitted for the current vmware issue is merged,
> the significant commits that are left are:
>
> 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)
>         Fixes a problem in the irq disabling of the napi implementation.

This one I fully agree is still needed.

> 19110cfbb34d e1000e: Separate signaling for link check/link up
>              (v4.15-rc1)
>         Fixes link flapping caused by a race condition in link
>         detection. It was found because the Other interrupt was being
>         triggered sort of spuriously by rxo.

This one is somewhat iffy. I am not sure if the patch description
really matches what it is doing. It doesn't appear to do what it says
it is trying to do since clearing get_link_status will still trigger a
link up, it just takes an extra 2 seconds. I think there may be issues
if you aren't using autoneg, as I don't see how you are getting the
link to report up other than the fact that mac->get_link_status has
been cleared but we are reporting a pseduo-error. In addition it is
only really needed after the RXO problem was introduced which really
didn't exist until after we stopped checking for LSC. One interesting
test we may want to look at is to see if there is an additional delay
in a link coming up for a non-autoneg setup. If we find an additional
2 second delay then I would be even more confident that this patch has
a bug.

> 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)
>         Fixes Other interrupt bursts during sustained rxo conditions.

So the RXO problem probably didn't exist until we stopped checking for
the OTHER and LSC bits in the "other" interrupt handler. Yes there
would be more "other" cause interrupts, but they shouldn't have been
causing much in the way of issues since the get_link_status value
never changed. Personally I would lean more toward the option of
reverting this patch and instead just focus on testing OTHER and LSC
as we originally were so that we don't risk messing up NAPI by messing
with ring state from a non-ring interrupt.

I will try to get to these later this week if you would like.
Unfortunately I don't have any of these devices in any of my
development systems so I have to go chase one down. Otherwise you are
free to take these on and tell me if I have made another flawed
assumption somewhere, but I am thinking the RXO issue goes away if we
get the original "other" interrupt routine back to where it was.

So the last bit in all this ends up being that because of 0a8047ac68e5
e1000e: Fix msi-x interrupt automask (v4.5-rc1) we don't seem to
auto-clear interrupt causes anymore on ICR read. I am not certain what
the impact of this is. I would be interested in finding out if a cause
left set will trigger an interrupt storm or if it just goes quiet when
we just leave the value high. If it goes quiet then that in itself
might solve the RXO interrupt burst problem if we don't clear it.
Otherwise we need to make certain to clear all of the causes that can
trigger the "other" interrupt to fire regardless of if we service the
events or not.

Thanks.

- Alex

Benjamin Poirier Jan. 24, 2018, 8:35 a.m. UTC | #15

On 2018/01/22 10:01, Alexander Duyck wrote:
[...]
> >
> > If the patch that I submitted for the current vmware issue is merged,
> > the significant commits that are left are:
> >
> > 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)
> >         Fixes a problem in the irq disabling of the napi implementation.
> 
> This one I fully agree is still needed.
> 
> > 19110cfbb34d e1000e: Separate signaling for link check/link up
> >              (v4.15-rc1)
> >         Fixes link flapping caused by a race condition in link
> >         detection. It was found because the Other interrupt was being
> >         triggered sort of spuriously by rxo.
> 
> This one is somewhat iffy. I am not sure if the patch description
> really matches what it is doing. It doesn't appear to do what it says
> it is trying to do since clearing get_link_status will still trigger a
> link up, it just takes an extra 2 seconds. I think there may be issues
> if you aren't using autoneg, as I don't see how you are getting the
> link to report up other than the fact that mac->get_link_status has
> been cleared but we are reporting a pseduo-error. In addition it is
> only really needed after the RXO problem was introduced which really
> didn't exist until after we stopped checking for LSC. One interesting
> test we may want to look at is to see if there is an additional delay
> in a link coming up for a non-autoneg setup. If we find an additional
> 2 second delay then I would be even more confident that this patch has
> a bug.

It seems like you're right but I didn't look into this part of the problem in
detail yet. I'll get back to it.

> 
> > 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)
> >         Fixes Other interrupt bursts during sustained rxo conditions.
> 
> So the RXO problem probably didn't exist until we stopped checking for
> the OTHER and LSC bits in the "other" interrupt handler. Yes there
> would be more "other" cause interrupts, but they shouldn't have been
> causing much in the way of issues since the get_link_status value
> never changed. Personally I would lean more toward the option of

I agree. I tested rxo behavior on commit 4d432f67ff00 ("e1000e: Remove
unreachable code", v4.5-rc1) which is before any significant change in that
area. (I force rxo by adding mdelay(10) to e1000_clean_rx_irq and sending a
netperf UDP_STREAM from another host). In case of sustained rxo condition, we
get repeated Other interrupts. Handling these irqs is useless work that could
be avoided when the system is already overloaded but it doesn't lead to
misbehavior like the race condition described in the log of commit
19110cfbb34d ("e1000e: Separate signaling for link check/link up", v4.15-rc1).

However, I noticed something unexpected. It seems like reading ICR doesn't
clear every bit that's set in IAM, most notably not rxo. In a different test,
I was doing a single write of RXO | OTHER to ICS, then two subsequent reads of
icr gave 0x01000041. OTOH, writing a bit to ICS reliably clears it. So if you
want to remove RXO interrupt mitigation, you should at least add a write of
RXO to ICR, to clear it. On my system it reduced Other interrupts from
~17000/s to ~1700/s when using the mdelay testing approach.

> reverting this patch and instead just focus on testing OTHER and LSC
> as we originally were so that we don't risk messing up NAPI by messing
> with ring state from a non-ring interrupt.
> 
> I will try to get to these later this week if you would like.
> Unfortunately I don't have any of these devices in any of my
> development systems so I have to go chase one down. Otherwise you are
> free to take these on and tell me if I have made another flawed
> assumption somewhere, but I am thinking the RXO issue goes away if we
> get the original "other" interrupt routine back to where it was.
> 
> So the last bit in all this ends up being that because of 0a8047ac68e5
> e1000e: Fix msi-x interrupt automask (v4.5-rc1) we don't seem to
> auto-clear interrupt causes anymore on ICR read. I am not certain what
> the impact of this is. I would be interested in finding out if a cause
> left set will trigger an interrupt storm or if it just goes quiet when
> we just leave the value high. If it goes quiet then that in itself
> might solve the RXO interrupt burst problem if we don't clear it.
> Otherwise we need to make certain to clear all of the causes that can
> trigger the "other" interrupt to fire regardless of if we service the
> events or not.

In MSI-X mode, as long as Other is not set in ICR, nothing will happen even if
the bits related to Other (LSC, RXO, MDAC, SRPD, ACK, MNG) are set. However,
that doesn't solve the rxo interrupt burst because an rxo condition in
hardware sets both RXO and Other, so it triggers an interrupt.

Alexander H Duyck Jan. 24, 2018, 4:01 p.m. UTC | #16

On Wed, Jan 24, 2018 at 12:35 AM, Benjamin Poirier
<benjamin.poirier@gmail.com> wrote:
> On 2018/01/22 10:01, Alexander Duyck wrote:
> [...]
>> >
>> > If the patch that I submitted for the current vmware issue is merged,
>> > the significant commits that are left are:
>> >
>> > 0a8047ac68e5 e1000e: Fix msi-x interrupt automask (v4.5-rc1)
>> >         Fixes a problem in the irq disabling of the napi implementation.
>>
>> This one I fully agree is still needed.
>>
>> > 19110cfbb34d e1000e: Separate signaling for link check/link up
>> >              (v4.15-rc1)
>> >         Fixes link flapping caused by a race condition in link
>> >         detection. It was found because the Other interrupt was being
>> >         triggered sort of spuriously by rxo.
>>
>> This one is somewhat iffy. I am not sure if the patch description
>> really matches what it is doing. It doesn't appear to do what it says
>> it is trying to do since clearing get_link_status will still trigger a
>> link up, it just takes an extra 2 seconds. I think there may be issues
>> if you aren't using autoneg, as I don't see how you are getting the
>> link to report up other than the fact that mac->get_link_status has
>> been cleared but we are reporting a pseduo-error. In addition it is
>> only really needed after the RXO problem was introduced which really
>> didn't exist until after we stopped checking for LSC. One interesting
>> test we may want to look at is to see if there is an additional delay
>> in a link coming up for a non-autoneg setup. If we find an additional
>> 2 second delay then I would be even more confident that this patch has
>> a bug.
>
> It seems like you're right but I didn't look into this part of the problem in
> detail yet. I'll get back to it.
>
>>
>> > 4aea7a5c5e94 e1000e: Avoid receiver overrun interrupt bursts (v4.15-rc1)
>> >         Fixes Other interrupt bursts during sustained rxo conditions.
>>
>> So the RXO problem probably didn't exist until we stopped checking for
>> the OTHER and LSC bits in the "other" interrupt handler. Yes there
>> would be more "other" cause interrupts, but they shouldn't have been
>> causing much in the way of issues since the get_link_status value
>> never changed. Personally I would lean more toward the option of
>
> I agree. I tested rxo behavior on commit 4d432f67ff00 ("e1000e: Remove
> unreachable code", v4.5-rc1) which is before any significant change in that
> area. (I force rxo by adding mdelay(10) to e1000_clean_rx_irq and sending a
> netperf UDP_STREAM from another host). In case of sustained rxo condition, we
> get repeated Other interrupts. Handling these irqs is useless work that could
> be avoided when the system is already overloaded but it doesn't lead to
> misbehavior like the race condition described in the log of commit
> 19110cfbb34d ("e1000e: Separate signaling for link check/link up", v4.15-rc1).
>
> However, I noticed something unexpected. It seems like reading ICR doesn't
> clear every bit that's set in IAM, most notably not rxo. In a different test,
> I was doing a single write of RXO | OTHER to ICS, then two subsequent reads of
> icr gave 0x01000041. OTOH, writing a bit to ICS reliably clears it. So if you
> want to remove RXO interrupt mitigation, you should at least add a write of
> RXO to ICR, to clear it. On my system it reduced Other interrupts from
> ~17000/s to ~1700/s when using the mdelay testing approach.

That makes sense. I would say we should probably just put together a
mask of all possible causes for "other" interrupts that we don't
actually care about and just clear them explicitly when we clear the
"other" bit.

>> reverting this patch and instead just focus on testing OTHER and LSC
>> as we originally were so that we don't risk messing up NAPI by messing
>> with ring state from a non-ring interrupt.
>>
>> I will try to get to these later this week if you would like.
>> Unfortunately I don't have any of these devices in any of my
>> development systems so I have to go chase one down. Otherwise you are
>> free to take these on and tell me if I have made another flawed
>> assumption somewhere, but I am thinking the RXO issue goes away if we
>> get the original "other" interrupt routine back to where it was.
>>
>> So the last bit in all this ends up being that because of 0a8047ac68e5
>> e1000e: Fix msi-x interrupt automask (v4.5-rc1) we don't seem to
>> auto-clear interrupt causes anymore on ICR read. I am not certain what
>> the impact of this is. I would be interested in finding out if a cause
>> left set will trigger an interrupt storm or if it just goes quiet when
>> we just leave the value high. If it goes quiet then that in itself
>> might solve the RXO interrupt burst problem if we don't clear it.
>> Otherwise we need to make certain to clear all of the causes that can
>> trigger the "other" interrupt to fire regardless of if we service the
>> events or not.
>
> In MSI-X mode, as long as Other is not set in ICR, nothing will happen even if
> the bits related to Other (LSC, RXO, MDAC, SRPD, ACK, MNG) are set. However,
> that doesn't solve the rxo interrupt burst because an rxo condition in
> hardware sets both RXO and Other, so it triggers an interrupt.

Right. I expect we will still have more interrupts. But by just
clearing it and moving on we at least resolve the issue without
introducing possible new ones.

I would say we could probably clear everything that is on that list
that is not LSC explicitly since we really don't care about those
events. If that changes then in the future we could avoid clearing
them until we capture an actual event.

Thanks.

- Alex

Alexander H Duyck Jan. 30, 2018, 7:46 p.m. UTC | #17

On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote:
> It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
>
> Some experimentation showed that this flaw in vmware e1000e emulation can
> be worked around by not setting Other in EIAC. This is how it was before
> 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
>
> Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---

Hi Benjamin,

How would you feel about resubmitting this patch for net?

We have some issues that have come up and it would be useful to have
this fixed in the kernel sooner rather than later. I would be okay
with us applying it for now while we work on coming up with a more
complete solution.

Thanks.

- Alex

Benjamin Poirier Jan. 31, 2018, 7:31 a.m. UTC | #18

On 2018/01/30 11:46, Alexander Duyck wrote:
> On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com> wrote:
> > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver
> > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware,
> > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
> >
> > Some experimentation showed that this flaw in vmware e1000e emulation can
> > be worked around by not setting Other in EIAC. This is how it was before
> > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> >
> > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> > ---
> 
> Hi Benjamin,
> 
> How would you feel about resubmitting this patch for net?
> 
> We have some issues that have come up and it would be useful to have
> this fixed in the kernel sooner rather than later. I would be okay
> with us applying it for now while we work on coming up with a more
> complete solution.

Ok, I've resent it in its original form. Once it's in mainline I'll
rebase the cleanups.

Brown, Aaron F Feb. 2, 2018, 4:29 a.m. UTC | #19

> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Benjamin Poirier
> Sent: Tuesday, January 30, 2018 11:31 PM
> To: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: Netdev <netdev@vger.kernel.org>; intel-wired-lan <intel-wired-
> lan@lists.osuosl.org>; linux-kernel@vger.kernel.org
> Subject: Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.
> 
> On 2018/01/30 11:46, Alexander Duyck wrote:
> > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier <bpoirier@suse.com>
> wrote:
> > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid
> receiver
> > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> > > on emulated e1000e devices. In comparison, on real e1000e 82574
> hardware,
> > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
> > >
> > > Some experimentation showed that this flaw in vmware e1000e
> emulation can
> > > be worked around by not setting Other in EIAC. This is how it was before
> > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> > >
> > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> > > ---
> >
> > Hi Benjamin,
> >
> > How would you feel about resubmitting this patch for net?
> >
> > We have some issues that have come up and it would be useful to have
> > this fixed in the kernel sooner rather than later. I would be okay
> > with us applying it for now while we work on coming up with a more
> > complete solution.
> 
> Ok, I've resent it in its original form. Once it's in mainline I'll
> rebase the cleanups.

Tested-by: Aaron Brown
<aaron.f.brown@intel.com>

> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Brown, Aaron F Feb. 2, 2018, 4:31 a.m. UTC | #20

> From: Brown, Aaron F
> Sent: Thursday, February 1, 2018 8:30 PM
> To: 'Benjamin Poirier' <bpoirier@suse.com>; Alexander Duyck
> <alexander.duyck@gmail.com>
> Cc: Netdev <netdev@vger.kernel.org>; intel-wired-lan <intel-wired-
> lan@lists.osuosl.org>; linux-kernel@vger.kernel.org
> Subject: RE: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.
> 
> > From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> > Behalf Of Benjamin Poirier
> > Sent: Tuesday, January 30, 2018 11:31 PM
> > To: Alexander Duyck <alexander.duyck@gmail.com>
> > Cc: Netdev <netdev@vger.kernel.org>; intel-wired-lan <intel-wired-
> > lan@lists.osuosl.org>; linux-kernel@vger.kernel.org
> > Subject: Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from
> EIAC.
> >
> > On 2018/01/30 11:46, Alexander Duyck wrote:
> > > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier
> <bpoirier@suse.com>
> > wrote:
> > > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid
> > receiver
> > > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> > > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in
> e1000_msix_other()
> > > > on emulated e1000e devices. In comparison, on real e1000e 82574
> > hardware,
> > > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation.
> > > >
> > > > Some experimentation showed that this flaw in vmware e1000e
> > emulation can
> > > > be worked around by not setting Other in EIAC. This is how it was
> before
> > > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> > > >
> > > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt
> bursts")
> > > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> > > > ---
> > >
> > > Hi Benjamin,
> > >
> > > How would you feel about resubmitting this patch for net?
> > >
> > > We have some issues that have come up and it would be useful to have
> > > this fixed in the kernel sooner rather than later. I would be okay
> > > with us applying it for now while we work on coming up with a more
> > > complete solution.
> >
> > Ok, I've resent it in its original form. Once it's in mainline I'll
> > rebase the cleanups.
> 
> Tested-by: Aaron Brown
> <aaron.f.brown@intel.com>
Stupid line wrap ... been that kind of day.

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

> 
> > _______________________________________________
> > Intel-wired-lan mailing list
> > Intel-wired-lan@osuosl.org
> > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

[RFC] e1000e: Remove Other from EIAC.

Commit Message

Comments

Patch