diff mbox series

Enable inc/dec generation on Haswell+

Message ID 20171102135656.GE91830@kam.mff.cuni.cz
State New
Headers show
Series Enable inc/dec generation on Haswell+ | expand

Commit Message

Jan Hubicka Nov. 2, 2017, 1:56 p.m. UTC
Hi,
core2 used to have quite large penalty for partial flag registers store
done by INCDEC.  This was improved on Sandybridge where extra merging uop
is produced and more at Haswell where there is no extra uop unless there
is instruction accessing both.  For this reason we can use inc/dec again
on modern variants of core.

Bootstrapped/regtested x86_64-linux and tested on Haswell spec2k/spec2k6
with no measurable performance impact.

Honza

	* x86-tune.def (X86_TUNE_USE_INCDEC): Enable for Haswell+.
diff mbox series

Patch

Index: config/i386/x86-tune.def
===================================================================
--- config/i386/x86-tune.def	(revision 254199)
+++ config/i386/x86-tune.def	(working copy)
@@ -220,10 +220,15 @@  DEF_TUNE (X86_TUNE_LCP_STALL, "lcp_stall
    as "add mem, reg".  */
 DEF_TUNE (X86_TUNE_READ_MODIFY, "read_modify", ~(m_PENT | m_LAKEMONT | m_PPRO))
 
-/* X86_TUNE_USE_INCDEC: Enable use of inc/dec instructions.   */
+/* X86_TUNE_USE_INCDEC: Enable use of inc/dec instructions.
+
+   Core2 and nehalem has stall of 7 cycles for partial flag register stalls.
+   Sandy bridge and Ivy bridge generate extra uop.  On Haswell this extra uop
+   is output only when the values needs to be really merged, which is not
+   done by GCC generated code.  */
 DEF_TUNE (X86_TUNE_USE_INCDEC, "use_incdec",
-          ~(m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT | m_INTEL
-	   |  m_KNL | m_KNM | m_GENERIC))
+          ~(m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE
+	    | m_BONNELL | m_SILVERMONT | m_INTEL |  m_KNL | m_KNM | m_GENERIC))
 
 /* X86_TUNE_INTEGER_DFMODE_MOVES: Enable if integer moves are preferred
    for DFmode copies */