Message ID | 20231122041548.3655374-1-hongtao.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | Set AVOID_256FMA_CHAINS TO m_GENERIC as it's generally good to new platforms | expand |
Any comments? On Wed, Nov 22, 2023 at 12:17 PM liuhongt <hongtao.liu@intel.com> wrote: > > From: "Zhang, Annita" <annita.zhang@intel.com> > > Avoid_fma_chain was enabled in m_SAPPHIRERAPIDS, m_ALDERLAKE and > m_CORE_HYBRID. It can also be enabled in m_GENERIC to improve the > performance of -march=x86-64-v3/v4 with -mtune=generic set by > default. One SPEC2017 benchmark 510.parest_r can improve greatly due > to it. From the experiments, the single thread with -O2 > -march=x86-64-v3 can improve 26% on SPR, and 15% on Zen3. Meanwhile, > it didn't cause notable regression in previous platforms including > Cascade Lake and Ice Lake Server. > > On zenver4, it looks like fadd(3 cycles) is still fater than fma(4 > cycles). So in theory, avoid_fma_chain should be also better for > znver4. And according to [1], enable fma_chain is not a generic win on > znver4? > > ----cut from [1]--------------- > I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in > zen4 this flag may not be a win except for very specific benchmarks. I > am still doing some more detailed testing here. > -----cut end-------------- > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607962.html > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog > > * config/i386/x86-tune.def (AVOID_256FMA_CHAINS): Add > m_GENERIC. > --- > gcc/config/i386/x86-tune.def | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def > index 43fa9e8fd6d..a2e57e01550 100644 > --- a/gcc/config/i386/x86-tune.def > +++ b/gcc/config/i386/x86-tune.def > @@ -521,7 +521,7 @@ DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | m_ZNVER2 > /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or > smaller FMA chain. */ > DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | m_ZNVER3 > - | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM) > + | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC) > > /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 512bit or > smaller FMA chain. */ > -- > 2.31.1 >
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 43fa9e8fd6d..a2e57e01550 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -521,7 +521,7 @@ DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | m_ZNVER2 /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or smaller FMA chain. */ DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | m_ZNVER3 - | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM) + | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC) /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 512bit or smaller FMA chain. */
From: "Zhang, Annita" <annita.zhang@intel.com> Avoid_fma_chain was enabled in m_SAPPHIRERAPIDS, m_ALDERLAKE and m_CORE_HYBRID. It can also be enabled in m_GENERIC to improve the performance of -march=x86-64-v3/v4 with -mtune=generic set by default. One SPEC2017 benchmark 510.parest_r can improve greatly due to it. From the experiments, the single thread with -O2 -march=x86-64-v3 can improve 26% on SPR, and 15% on Zen3. Meanwhile, it didn't cause notable regression in previous platforms including Cascade Lake and Ice Lake Server. On zenver4, it looks like fadd(3 cycles) is still fater than fma(4 cycles). So in theory, avoid_fma_chain should be also better for znver4. And according to [1], enable fma_chain is not a generic win on znver4? ----cut from [1]--------------- I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in zen4 this flag may not be a win except for very specific benchmarks. I am still doing some more detailed testing here. -----cut end-------------- [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607962.html Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog * config/i386/x86-tune.def (AVOID_256FMA_CHAINS): Add m_GENERIC. --- gcc/config/i386/x86-tune.def | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)