mbox series

[SRU,jammy,0/1] performance: mm/percpu-internal.h: Re-layout pcpu_chunk to mitigate false sharing

Message ID 20240214180735.702533-1-philip.cox@canonical.com
Headers show
Series performance: mm/percpu-internal.h: Re-layout pcpu_chunk to mitigate false sharing | expand

Message

Philip Cox Feb. 14, 2024, 6:07 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2053152

SRU Justification:

[Impact]
Having 'base_addr' in the same cache line as 'free_bytes', 'chunk_md', and
'base_addr' causes a performance impact on x86_64 systems which can be
observed while running UnixBench/Excel throughput.

[Fix]
 In current pcpu_chunk layout, `base_addr' is in the same cache line with
`free_bytes' and `chunk_md', and `base_addr' is at the last 8 bytes.
This patch moves `bound_map' up to `base_addr', to let `base_addr' locate 
in a new cacheline.

[Test Plan]
I have tested this patch, as has Intel.  This patch has been upstreamed since 6.5 as well.

[Where problems could occur]
The risk for regression is minimal as there should be no logic change to the code, and just
a struct is being ordered differently.  This change has been upstreamed for a while with no issues.

Comments

Stefan Bader Feb. 15, 2024, 8:50 a.m. UTC | #1
On 14.02.24 19:07, Philip Cox wrote:
> BugLink: https://bugs.launchpad.net/bugs/2053152
> 
> SRU Justification:
> 
> [Impact]
> Having 'base_addr' in the same cache line as 'free_bytes', 'chunk_md', and
> 'base_addr' causes a performance impact on x86_64 systems which can be
> observed while running UnixBench/Excel throughput.
> 
> [Fix]
>   In current pcpu_chunk layout, `base_addr' is in the same cache line with
> `free_bytes' and `chunk_md', and `base_addr' is at the last 8 bytes.
> This patch moves `bound_map' up to `base_addr', to let `base_addr' locate
> in a new cacheline.
> 
> [Test Plan]
> I have tested this patch, as has Intel.  This patch has been upstreamed since 6.5 as well.
> 
> [Where problems could occur]
> The risk for regression is minimal as there should be no logic change to the code, and just
> a struct is being ordered differently.  This change has been upstreamed for a while with no issues.
> 

Changed main task of bug report to fix-released since proposed patch is 
upstream as of v6.5-rc1 (which also means included in Mantic).

Acked-by: Stefan Bader <stefan.bader@canonical.com>
Andrei Gherzan Feb. 15, 2024, 1:08 p.m. UTC | #2
On 24/02/14 01:07PM, Philip Cox wrote:
> BugLink: https://bugs.launchpad.net/bugs/2053152
> 
> SRU Justification:
> 
> [Impact]
> Having 'base_addr' in the same cache line as 'free_bytes', 'chunk_md', and
> 'base_addr' causes a performance impact on x86_64 systems which can be
> observed while running UnixBench/Excel throughput.
> 
> [Fix]
>  In current pcpu_chunk layout, `base_addr' is in the same cache line with
> `free_bytes' and `chunk_md', and `base_addr' is at the last 8 bytes.
> This patch moves `bound_map' up to `base_addr', to let `base_addr' locate 
> in a new cacheline.
> 
> [Test Plan]
> I have tested this patch, as has Intel.  This patch has been upstreamed since 6.5 as well.
> 
> [Where problems could occur]
> The risk for regression is minimal as there should be no logic change to the code, and just
> a struct is being ordered differently.  This change has been upstreamed for a while with no issues.
> 
> -- 
> 
> 
> Yu Ma (1):
>   percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce
>     false sharing
> 
>  mm/percpu-internal.h | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Acked-by: Andrei Gherzan <andrei.gherzan@canonical.com>
Stefan Bader Feb. 19, 2024, 1:42 p.m. UTC | #3
On 14.02.24 19:07, Philip Cox wrote:
> BugLink: https://bugs.launchpad.net/bugs/2053152
> 
> SRU Justification:
> 
> [Impact]
> Having 'base_addr' in the same cache line as 'free_bytes', 'chunk_md', and
> 'base_addr' causes a performance impact on x86_64 systems which can be
> observed while running UnixBench/Excel throughput.
> 
> [Fix]
>   In current pcpu_chunk layout, `base_addr' is in the same cache line with
> `free_bytes' and `chunk_md', and `base_addr' is at the last 8 bytes.
> This patch moves `bound_map' up to `base_addr', to let `base_addr' locate
> in a new cacheline.
> 
> [Test Plan]
> I have tested this patch, as has Intel.  This patch has been upstreamed since 6.5 as well.
> 
> [Where problems could occur]
> The risk for regression is minimal as there should be no logic change to the code, and just
> a struct is being ordered differently.  This change has been upstreamed for a while with no issues.
> 

Applied to jammy:linux/master-next. Thanks.

-Stefan
Philip Cox April 26, 2024, 2:52 p.m. UTC | #4
On Wed, 2024-02-14 at 13:07 -0500, Philip Cox wrote:
> BugLink: https://bugs.launchpad.net/bugs/2053152
> 
> SRU Justification:
> 
> [Impact]
> Having 'base_addr' in the same cache line as 'free_bytes',
> 'chunk_md', and
> 'base_addr' causes a performance impact on x86_64 systems which can
> be
> observed while running UnixBench/Excel throughput.
> 
> [Fix]
>  In current pcpu_chunk layout, `base_addr' is in the same cache line
> with
> `free_bytes' and `chunk_md', and `base_addr' is at the last 8 bytes.
> This patch moves `bound_map' up to `base_addr', to let `base_addr'
> locate 
> in a new cacheline.
> 
> [Test Plan]
> I have tested this patch, as has Intel.  This patch has been
> upstreamed since 6.5 as well.
> 
> [Where problems could occur]
> The risk for regression is minimal as there should be no logic change
> to the code, and just
> a struct is being ordered differently.  This change has been
> upstreamed for a while with no issues.
> 
> -- 
> 
> 
> Yu Ma (1):
>   percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to
> reduce
>     false sharing
> 
>  mm/percpu-internal.h | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>