mbox series

[0/5,Arm] Add support for conditional instructions (CSEL, CSINC etc.) for Armv8.1-M Mainline

Message ID AM0PR08MB5121EFBFB74EC4BA33EC598D924A0@AM0PR08MB5121.eurprd08.prod.outlook.com
Headers show
Series Add support for conditional instructions (CSEL, CSINC etc.) for Armv8.1-M Mainline | expand

Message

Omar Tahir Aug. 4, 2020, 4:07 p.m. UTC
Hi all,

This patch series provides support for the following instructions that were
added in Armv8.1-M Mainline [1]:
                - CSEL
                - CSET
                - CSETM
                - CSNEG
                - CSINV
                - CSINC
                - CINC

The patch series is organised as follows:
1) Modify default tuning for -march=armv8.1-m.main.
2) New macro, predicate and constraint. New pattern *thumb2_csinv that
   generates CSINV.
3) New pattern *thumb2_csinc that generates CSINC.
4) New pattern *thumb2_csneg that generates CSNEG.
5) New predicate, new constraints. New pattern *cmovsi_insn that generates
   CSEL, CSET, CSETM, CSINC and CSINV in specific cases.

CINV and CNEG aren't used as they are aliases for CSINV and CSNEG
respectively. There is one place CINC is used, as an optimisation in an
existing pattern.

Some existing patterns are modified to force the new patterns to be used
when appropriate and to prevent undesirable "optimisations". For example,
often `if_then_else` insns are split into `cond_exec` insns (see *compare_scc
in arm.md). This makes it harder to generate instructions like CSEL, so this
behaviour is disabled when targting Armv8.1-M Mainline. The combine and ifcvt
passes also cause problems, for example *thumb2_movcond which can cause
unwanted combines. In some cases the define_insn is disabled, in others only
splitting is disabled.

Along with matching the obvious cases, some edge cases are taken advantage of
to slightly optimise code generation. For example, R1 = CC ? 1 : R0 can take
advantage of the zero register to generate CSINC R1, ZR, R0.

There are a few cases where CSEL etc. could be used, but it's more cumbersome
to do so, therefore the default IT block implementation is kept (see
*thumb2_movsicc_insn alts 8-10). In general however, code generated on
Armv8.1-M Mainline will see a large decrease in the number of IT blocks.

Entire patch series together regression tested on arm-none-eabi and
arm-none-linux-gnueabi with no regressions, with a minor performance
improvement (-0.1% cycle count) on a proprietary benchmark.

Thanks,
Omar

[1] https://static.docs.arm.com/ddi0553/bf/DDI0553B_f_armv8m_arm.pdf