mbox series

[0/4,jammy/linux-azure] swiotlb patch needed for CVM

Message ID 20220505123949.12030-1-tim.gardner@canonical.com
Headers show
Series swiotlb patch needed for CVM | expand

Message

Tim Gardner May 5, 2022, 12:39 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1971701

SRU Justification

[Impact]

[Azure][CVM] Include the swiotlb patch to increase the disk/network performance

Description
As we discussed, there will be new CVM-supporting linux-azure kernels that're based on
v5.13 and v5.15. Here I'm requesting the below patch to be included into the two kernels
because it can significantly improve the disk/network performance:

swiotlb: Split up single swiotlb lock: https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45

We have tested the patch with the upstream 5.16-rc8.
BTW, the patch is unlikely to be in the mainline kernel, as the community is trying
to resolve the lock contention issue in the swiotlb code using a different per-device
per-queue implementation, which would need quite some time to be finalized -- before
that happens, we need this out-of-tree patch to achieve good disk/network performance
for CVM GA on Azure.

(BTW, the v5.4-based linux-azure-cvm kernel does not need the patch, because it uses
a private bounce buffer implementation: drivers/hv/hv_bounce.c, which doesn’t have
the I/O performance issue caused by lock contention in the mainline kernel’s swiotlb code.)

[Test Case]

[Microsoft tested]

I tried the April-27 amd64 test kernel and it worked great for me:
1. The test kernels booted up successfully with 256 virtual CPUs + 100 GB memory.
2. The kernel worked when I changed the MTU of the NetVSC NIC.
3. The Hyper-V HeartBeat/TimeSync/ShutDown VMBsus devices also worked as expected.
4. I did some quick disk I/O and network stress tests and found no issue.

When I did the above tests, I changed the low MMIO size to 3GB (which is the setting
for a VM on Azure today) by "set-vm decui-u2004-cvm -LowMemoryMappedIoSpace 3GB".

Our test team will do more testing, including performance test. We expect the
performance of this v5.15 test kernel should be on par with the v5.4 linux-azure-cvm kernel.

[Where things could go wrong]

Networking could fail or continue to suffer from poor performance.

[Other Info]

SF: #00332721

Comments

Philip Cox May 11, 2022, 5:49 p.m. UTC | #1
Acked-By: Philip Cox <philip.cox@canonical.com>

On 2022-05-05 8:39 a.m., Tim Gardner wrote:
> BugLink: https://bugs.launchpad.net/bugs/1971701
>
> SRU Justification
>
> [Impact]
>
> [Azure][CVM] Include the swiotlb patch to increase the disk/network performance
>
> Description
> As we discussed, there will be new CVM-supporting linux-azure kernels that're based on
> v5.13 and v5.15. Here I'm requesting the below patch to be included into the two kernels
> because it can significantly improve the disk/network performance:
>
> swiotlb: Split up single swiotlb lock: https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45
>
> We have tested the patch with the upstream 5.16-rc8.
> BTW, the patch is unlikely to be in the mainline kernel, as the community is trying
> to resolve the lock contention issue in the swiotlb code using a different per-device
> per-queue implementation, which would need quite some time to be finalized -- before
> that happens, we need this out-of-tree patch to achieve good disk/network performance
> for CVM GA on Azure.
>
> (BTW, the v5.4-based linux-azure-cvm kernel does not need the patch, because it uses
> a private bounce buffer implementation: drivers/hv/hv_bounce.c, which doesn’t have
> the I/O performance issue caused by lock contention in the mainline kernel’s swiotlb code.)
>
> [Test Case]
>
> [Microsoft tested]
>
> I tried the April-27 amd64 test kernel and it worked great for me:
> 1. The test kernels booted up successfully with 256 virtual CPUs + 100 GB memory.
> 2. The kernel worked when I changed the MTU of the NetVSC NIC.
> 3. The Hyper-V HeartBeat/TimeSync/ShutDown VMBsus devices also worked as expected.
> 4. I did some quick disk I/O and network stress tests and found no issue.
>
> When I did the above tests, I changed the low MMIO size to 3GB (which is the setting
> for a VM on Azure today) by "set-vm decui-u2004-cvm -LowMemoryMappedIoSpace 3GB".
>
> Our test team will do more testing, including performance test. We expect the
> performance of this v5.15 test kernel should be on par with the v5.4 linux-azure-cvm kernel.
>
> [Where things could go wrong]
>
> Networking could fail or continue to suffer from poor performance.
>
> [Other Info]
>
> SF: #00332721
>
>
Bartlomiej Zolnierkiewicz May 12, 2022, 10:28 a.m. UTC | #2
Acked-by: Bartlomiej Zolnierkiewicz <bartlomiej.zolnierkiewicz@canonical.com>

On Thu, May 5, 2022 at 2:40 PM Tim Gardner <tim.gardner@canonical.com> wrote:
>
> BugLink: https://bugs.launchpad.net/bugs/1971701
>
> SRU Justification
>
> [Impact]
>
> [Azure][CVM] Include the swiotlb patch to increase the disk/network performance
>
> Description
> As we discussed, there will be new CVM-supporting linux-azure kernels that're based on
> v5.13 and v5.15. Here I'm requesting the below patch to be included into the two kernels
> because it can significantly improve the disk/network performance:
>
> swiotlb: Split up single swiotlb lock: https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45
>
> We have tested the patch with the upstream 5.16-rc8.
> BTW, the patch is unlikely to be in the mainline kernel, as the community is trying
> to resolve the lock contention issue in the swiotlb code using a different per-device
> per-queue implementation, which would need quite some time to be finalized -- before
> that happens, we need this out-of-tree patch to achieve good disk/network performance
> for CVM GA on Azure.
>
> (BTW, the v5.4-based linux-azure-cvm kernel does not need the patch, because it uses
> a private bounce buffer implementation: drivers/hv/hv_bounce.c, which doesn’t have
> the I/O performance issue caused by lock contention in the mainline kernel’s swiotlb code.)
>
> [Test Case]
>
> [Microsoft tested]
>
> I tried the April-27 amd64 test kernel and it worked great for me:
> 1. The test kernels booted up successfully with 256 virtual CPUs + 100 GB memory.
> 2. The kernel worked when I changed the MTU of the NetVSC NIC.
> 3. The Hyper-V HeartBeat/TimeSync/ShutDown VMBsus devices also worked as expected.
> 4. I did some quick disk I/O and network stress tests and found no issue.
>
> When I did the above tests, I changed the low MMIO size to 3GB (which is the setting
> for a VM on Azure today) by "set-vm decui-u2004-cvm -LowMemoryMappedIoSpace 3GB".
>
> Our test team will do more testing, including performance test. We expect the
> performance of this v5.15 test kernel should be on par with the v5.4 linux-azure-cvm kernel.
>
> [Where things could go wrong]
>
> Networking could fail or continue to suffer from poor performance.
>
> [Other Info]
>
> SF: #00332721
Tim Gardner May 12, 2022, 12:38 p.m. UTC | #3
Applied to jammy/linux-azure:master-next. Thanks.

-rtg

On 5/5/22 06:39, Tim Gardner wrote:
> BugLink: https://bugs.launchpad.net/bugs/1971701
> 
> SRU Justification
> 
> [Impact]
> 
> [Azure][CVM] Include the swiotlb patch to increase the disk/network performance
> 
> Description
> As we discussed, there will be new CVM-supporting linux-azure kernels that're based on
> v5.13 and v5.15. Here I'm requesting the below patch to be included into the two kernels
> because it can significantly improve the disk/network performance:
> 
> swiotlb: Split up single swiotlb lock: https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45
> 
> We have tested the patch with the upstream 5.16-rc8.
> BTW, the patch is unlikely to be in the mainline kernel, as the community is trying
> to resolve the lock contention issue in the swiotlb code using a different per-device
> per-queue implementation, which would need quite some time to be finalized -- before
> that happens, we need this out-of-tree patch to achieve good disk/network performance
> for CVM GA on Azure.
> 
> (BTW, the v5.4-based linux-azure-cvm kernel does not need the patch, because it uses
> a private bounce buffer implementation: drivers/hv/hv_bounce.c, which doesn’t have
> the I/O performance issue caused by lock contention in the mainline kernel’s swiotlb code.)
> 
> [Test Case]
> 
> [Microsoft tested]
> 
> I tried the April-27 amd64 test kernel and it worked great for me:
> 1. The test kernels booted up successfully with 256 virtual CPUs + 100 GB memory.
> 2. The kernel worked when I changed the MTU of the NetVSC NIC.
> 3. The Hyper-V HeartBeat/TimeSync/ShutDown VMBsus devices also worked as expected.
> 4. I did some quick disk I/O and network stress tests and found no issue.
> 
> When I did the above tests, I changed the low MMIO size to 3GB (which is the setting
> for a VM on Azure today) by "set-vm decui-u2004-cvm -LowMemoryMappedIoSpace 3GB".
> 
> Our test team will do more testing, including performance test. We expect the
> performance of this v5.15 test kernel should be on par with the v5.4 linux-azure-cvm kernel.
> 
> [Where things could go wrong]
> 
> Networking could fail or continue to suffer from poor performance.
> 
> [Other Info]
> 
> SF: #00332721
>