Message ID | 20181109011110.28717-1-daniel.axtens@canonical.com |
---|---|
Headers | show |
Series | LP: #1802421 - i40e + iommu data corruption | expand |
On 2018-11-09 12:11:09 , Daniel Axtens wrote: > A user reports that using an i40e with intel_iommu=on with the Xenial > GA kernel causes data corruption. Using the Xenial HWE kernel or an > out-of-tree driver more recent than the version shipped with Xenial > solves the issue. > > [Impact] > Corrupted data is returned from the network card intermittently. This > is often noticeable when using apt, as the checksums are verified. If > often leads to failure of apt operations. When there are no checksums > done, this could lead to silent data corruption. > > [Fix] > This was fixed somewhere post-4.4. Testing identified b32bfa17246d > ("i40e: Drop packet split receive routine") which is part of a broader > refactor. Picking this patch alone is sufficient to fix the issue. My > theory is that iommu exposes an issue in the packet split receive > routine and so removing it is sufficient to prevent the problem from > occurring. > > [Test] > A user tested a Xenial 4.4 kernel with this patch applied and it fixed > their issue - no data corruption was observed. (The test repeatedly > deletes the apt cache and then does apt update.) > > [Regression Potential] > It's a messy change inside i40e, so the risk is that i40e will be > broken in some subtle way we haven't noticed, or have performance > issues. None of these have been observed so far. > > > Jesse Brandeburg (1): > i40e: Drop packet split receive routine > > drivers/net/ethernet/intel/i40e/i40e.h | 3 - > .../net/ethernet/intel/i40e/i40e_debugfs.c | 4 +- > .../net/ethernet/intel/i40e/i40e_ethtool.c | 19 -- > drivers/net/ethernet/intel/i40e/i40e_main.c | 49 +--- > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 244 +----------------- > drivers/net/ethernet/intel/i40e/i40e_txrx.h | 7 - > 6 files changed, 10 insertions(+), 316 deletions(-) > Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
On 2018-11-09 12:11:09 , Daniel Axtens wrote: > A user reports that using an i40e with intel_iommu=on with the Xenial > GA kernel causes data corruption. Using the Xenial HWE kernel or an > out-of-tree driver more recent than the version shipped with Xenial > solves the issue. > > [Impact] > Corrupted data is returned from the network card intermittently. This > is often noticeable when using apt, as the checksums are verified. If > often leads to failure of apt operations. When there are no checksums > done, this could lead to silent data corruption. > > [Fix] > This was fixed somewhere post-4.4. Testing identified b32bfa17246d > ("i40e: Drop packet split receive routine") which is part of a broader > refactor. Picking this patch alone is sufficient to fix the issue. My > theory is that iommu exposes an issue in the packet split receive > routine and so removing it is sufficient to prevent the problem from > occurring. > > [Test] > A user tested a Xenial 4.4 kernel with this patch applied and it fixed > their issue - no data corruption was observed. (The test repeatedly > deletes the apt cache and then does apt update.) > > [Regression Potential] > It's a messy change inside i40e, so the risk is that i40e will be > broken in some subtle way we haven't noticed, or have performance > issues. None of these have been observed so far. > > > Jesse Brandeburg (1): > i40e: Drop packet split receive routine > > drivers/net/ethernet/intel/i40e/i40e.h | 3 - > .../net/ethernet/intel/i40e/i40e_debugfs.c | 4 +- > .../net/ethernet/intel/i40e/i40e_ethtool.c | 19 -- > drivers/net/ethernet/intel/i40e/i40e_main.c | 49 +--- > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 244 +----------------- > drivers/net/ethernet/intel/i40e/i40e_txrx.h | 7 - > 6 files changed, 10 insertions(+), 316 deletions(-) > Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
On 2018-11-09 12:11:09 , Daniel Axtens wrote: > A user reports that using an i40e with intel_iommu=on with the Xenial > GA kernel causes data corruption. Using the Xenial HWE kernel or an > out-of-tree driver more recent than the version shipped with Xenial > solves the issue. > > [Impact] > Corrupted data is returned from the network card intermittently. This > is often noticeable when using apt, as the checksums are verified. If > often leads to failure of apt operations. When there are no checksums > done, this could lead to silent data corruption. > > [Fix] > This was fixed somewhere post-4.4. Testing identified b32bfa17246d > ("i40e: Drop packet split receive routine") which is part of a broader > refactor. Picking this patch alone is sufficient to fix the issue. My > theory is that iommu exposes an issue in the packet split receive > routine and so removing it is sufficient to prevent the problem from > occurring. > > [Test] > A user tested a Xenial 4.4 kernel with this patch applied and it fixed > their issue - no data corruption was observed. (The test repeatedly > deletes the apt cache and then does apt update.) > > [Regression Potential] > It's a messy change inside i40e, so the risk is that i40e will be > broken in some subtle way we haven't noticed, or have performance > issues. None of these have been observed so far. > > > Jesse Brandeburg (1): > i40e: Drop packet split receive routine > > drivers/net/ethernet/intel/i40e/i40e.h | 3 - > .../net/ethernet/intel/i40e/i40e_debugfs.c | 4 +- > .../net/ethernet/intel/i40e/i40e_ethtool.c | 19 -- > drivers/net/ethernet/intel/i40e/i40e_main.c | 49 +--- > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 244 +----------------- > drivers/net/ethernet/intel/i40e/i40e_txrx.h | 7 - > 6 files changed, 10 insertions(+), 316 deletions(-) > > -- > 2.17.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team