mbox series

[SRU,X,0/1] LP: #1802421 - i40e + iommu data corruption

Message ID 20181109011110.28717-1-daniel.axtens@canonical.com
Headers show
Series LP: #1802421 - i40e + iommu data corruption | expand

Message

Daniel Axtens Nov. 9, 2018, 1:11 a.m. UTC
A user reports that using an i40e with intel_iommu=on with the Xenial
GA kernel causes data corruption. Using the Xenial HWE kernel or an
out-of-tree driver more recent than the version shipped with Xenial
solves the issue.

[Impact]
Corrupted data is returned from the network card intermittently. This
is often noticeable when using apt, as the checksums are verified. If
often leads to failure of apt operations. When there are no checksums
done, this could lead to silent data corruption.

[Fix]
This was fixed somewhere post-4.4. Testing identified b32bfa17246d
("i40e: Drop packet split receive routine") which is part of a broader
refactor. Picking this patch alone is sufficient to fix the issue. My
theory is that iommu exposes an issue in the packet split receive
routine and so removing it is sufficient to prevent the problem from
occurring.

[Test]
A user tested a Xenial 4.4 kernel with this patch applied and it fixed
their issue - no data corruption was observed. (The test repeatedly
deletes the apt cache and then does apt update.)

[Regression Potential]
It's a messy change inside i40e, so the risk is that i40e will be
broken in some subtle way we haven't noticed, or have performance
issues. None of these have been observed so far.


Jesse Brandeburg (1):
  i40e: Drop packet split receive routine

 drivers/net/ethernet/intel/i40e/i40e.h        |   3 -
 .../net/ethernet/intel/i40e/i40e_debugfs.c    |   4 +-
 .../net/ethernet/intel/i40e/i40e_ethtool.c    |  19 --
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  49 +---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 244 +-----------------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |   7 -
 6 files changed, 10 insertions(+), 316 deletions(-)

Comments

Khalid Elmously Nov. 9, 2018, 4:07 a.m. UTC | #1
On 2018-11-09 12:11:09 , Daniel Axtens wrote:
> A user reports that using an i40e with intel_iommu=on with the Xenial
> GA kernel causes data corruption. Using the Xenial HWE kernel or an
> out-of-tree driver more recent than the version shipped with Xenial
> solves the issue.
> 
> [Impact]
> Corrupted data is returned from the network card intermittently. This
> is often noticeable when using apt, as the checksums are verified. If
> often leads to failure of apt operations. When there are no checksums
> done, this could lead to silent data corruption.
> 
> [Fix]
> This was fixed somewhere post-4.4. Testing identified b32bfa17246d
> ("i40e: Drop packet split receive routine") which is part of a broader
> refactor. Picking this patch alone is sufficient to fix the issue. My
> theory is that iommu exposes an issue in the packet split receive
> routine and so removing it is sufficient to prevent the problem from
> occurring.
> 
> [Test]
> A user tested a Xenial 4.4 kernel with this patch applied and it fixed
> their issue - no data corruption was observed. (The test repeatedly
> deletes the apt cache and then does apt update.)
> 
> [Regression Potential]
> It's a messy change inside i40e, so the risk is that i40e will be
> broken in some subtle way we haven't noticed, or have performance
> issues. None of these have been observed so far.
> 
> 
> Jesse Brandeburg (1):
>   i40e: Drop packet split receive routine
> 
>  drivers/net/ethernet/intel/i40e/i40e.h        |   3 -
>  .../net/ethernet/intel/i40e/i40e_debugfs.c    |   4 +-
>  .../net/ethernet/intel/i40e/i40e_ethtool.c    |  19 --
>  drivers/net/ethernet/intel/i40e/i40e_main.c   |  49 +---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 244 +-----------------
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h   |   7 -
>  6 files changed, 10 insertions(+), 316 deletions(-)
> 

Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Khalid Elmously Jan. 8, 2019, 6:07 a.m. UTC | #2
On 2018-11-09 12:11:09 , Daniel Axtens wrote:
> A user reports that using an i40e with intel_iommu=on with the Xenial
> GA kernel causes data corruption. Using the Xenial HWE kernel or an
> out-of-tree driver more recent than the version shipped with Xenial
> solves the issue.
> 
> [Impact]
> Corrupted data is returned from the network card intermittently. This
> is often noticeable when using apt, as the checksums are verified. If
> often leads to failure of apt operations. When there are no checksums
> done, this could lead to silent data corruption.
> 
> [Fix]
> This was fixed somewhere post-4.4. Testing identified b32bfa17246d
> ("i40e: Drop packet split receive routine") which is part of a broader
> refactor. Picking this patch alone is sufficient to fix the issue. My
> theory is that iommu exposes an issue in the packet split receive
> routine and so removing it is sufficient to prevent the problem from
> occurring.
> 
> [Test]
> A user tested a Xenial 4.4 kernel with this patch applied and it fixed
> their issue - no data corruption was observed. (The test repeatedly
> deletes the apt cache and then does apt update.)
> 
> [Regression Potential]
> It's a messy change inside i40e, so the risk is that i40e will be
> broken in some subtle way we haven't noticed, or have performance
> issues. None of these have been observed so far.
> 
> 
> Jesse Brandeburg (1):
>   i40e: Drop packet split receive routine
> 
>  drivers/net/ethernet/intel/i40e/i40e.h        |   3 -
>  .../net/ethernet/intel/i40e/i40e_debugfs.c    |   4 +-
>  .../net/ethernet/intel/i40e/i40e_ethtool.c    |  19 --
>  drivers/net/ethernet/intel/i40e/i40e_main.c   |  49 +---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 244 +-----------------
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h   |   7 -
>  6 files changed, 10 insertions(+), 316 deletions(-)
>

Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Khalid Elmously Jan. 8, 2019, 6:08 a.m. UTC | #3
On 2018-11-09 12:11:09 , Daniel Axtens wrote:
> A user reports that using an i40e with intel_iommu=on with the Xenial
> GA kernel causes data corruption. Using the Xenial HWE kernel or an
> out-of-tree driver more recent than the version shipped with Xenial
> solves the issue.
> 
> [Impact]
> Corrupted data is returned from the network card intermittently. This
> is often noticeable when using apt, as the checksums are verified. If
> often leads to failure of apt operations. When there are no checksums
> done, this could lead to silent data corruption.
> 
> [Fix]
> This was fixed somewhere post-4.4. Testing identified b32bfa17246d
> ("i40e: Drop packet split receive routine") which is part of a broader
> refactor. Picking this patch alone is sufficient to fix the issue. My
> theory is that iommu exposes an issue in the packet split receive
> routine and so removing it is sufficient to prevent the problem from
> occurring.
> 
> [Test]
> A user tested a Xenial 4.4 kernel with this patch applied and it fixed
> their issue - no data corruption was observed. (The test repeatedly
> deletes the apt cache and then does apt update.)
> 
> [Regression Potential]
> It's a messy change inside i40e, so the risk is that i40e will be
> broken in some subtle way we haven't noticed, or have performance
> issues. None of these have been observed so far.
> 
> 
> Jesse Brandeburg (1):
>   i40e: Drop packet split receive routine
> 
>  drivers/net/ethernet/intel/i40e/i40e.h        |   3 -
>  .../net/ethernet/intel/i40e/i40e_debugfs.c    |   4 +-
>  .../net/ethernet/intel/i40e/i40e_ethtool.c    |  19 --
>  drivers/net/ethernet/intel/i40e/i40e_main.c   |  49 +---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 244 +-----------------
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h   |   7 -
>  6 files changed, 10 insertions(+), 316 deletions(-)
> 
> -- 
> 2.17.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team