[00/32] Support multiple ABIs in the same translation unit
mbox series

Message ID mptimpyzmf1.fsf@arm.com
Headers show
Series
  • Support multiple ABIs in the same translation unit
Related show

Message

Richard Sandiford Sept. 11, 2019, 7:02 p.m. UTC
This series of patches introduces some classes and helpers for handling
multiple ABIs in the same translation unit.  At the moment "ABI" maans
specifically the choice of call-clobbered registers, but I'm hoping the
structures could be used for other ABI properties in future.

The main point of the series is to use these ABI structures instead of
global information like regs_invalidated_by_call, call_used_or_fixed_regs
and targetm.hard_regno_call_part_clobbered.  This has the side effect
of making all passes take -fipa-ra into account (except sel-sched.c,
see its patch for details).

The series also makes -fipa-ra work for partially-clobbered registers too.
Previously, if the ABI said that only the upper bits of a register are
call-clobbered, we'd enforce that rule separately from the -fipa-ra
information and apply it even when -fipa-ra can prove that the registers
aren't modified.  It turns out that fixing this interacts badly with
vzeroupper on x86, so the series has a patch to fix that.

Another general knock-on change is that we now always use the equivalent
of regs_invalidated_by_call rather than call_used_reg_set when deciding
whether a register is clobbered.  Among other things, this means that
cselib no longer invalidates expressions involving the stack pointer
when processing a call, since calls are guaranteed to return with the
same stack pointer.

The main motivating case for the series is the AArch64 vector PCS
and the SVE PCS, which are variants of the base AArch64 ABI but are
interoperable with it.  (Specifically, vector PCS calls preserve the
low 128 bits of 16 vector registers rather than the usual low 64 bits
of 8 registers.  SVE PCS calls instead preserve the whole of those 16
vector registers.)  However, I realised later that we could also use
this for the tlsdesc ABI on SVE targets, which would remove the need
for CLOBBER_HIGH.  I have follow-on patches to do that.

I also think the new structures would be useful for targets that
implement interrupt-handler attributes.  At the moment, we compile
interrupt handlers pretty much like ordinary functions, using the
same optimisation heuristics as for ordinary functions, and then
account for the extra call-saved registers in the prologue and
epilogue code.  Hooks like TARGET_HARD_REGNO_SCRATCH_OK then
prevent later optimisers from introducing new uses of unprotected
call-saved registers.  If the interrupt handler ABI was described
directly, the middle-end code would work with it in the same way
as for ordinary functions, including taking it into account when
making optimisation decisions.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  There were
some extra guality.exp failures due to the extra optimisation,
but they look like acceptable regressions.

Also tested by compiling at least one target per CPU directory and
checking for no new warnings.  It's quite hard to test for accidental
code differences given the general optimisation changes above, so I
resorted to comparing the gcc.c-torture, gcc.dg and g++.dg code at
-O0 only.  This came back clean except on PRU.

The reason for the PRU differences is that the port defines
targetm.hard_regno_call_part_clobbered, but uses it to test whether
a multi-register value contains a mixture of fully-clobbered and
fully-preserved registers.  AFAICT the port doesn't actually have
individual registers that are partly clobbered, so it doesn't need
to define the hook.  (I can see how the documentation gave a misleading
impression though.  I've tried to improve it in one of the patches.)
The series moves away from testing hard_regno_call_part_clobbered
directly to testing cached information instead, and the way that the
cached information is calculated means that defining the hook the way
the PRU port does has no effect.  In other words, after the series we
treat it (rightly IMO) as having a "normal" ABI whereas before we didn't.

Sorry for the long write-up.

Richard

Comments

Steven Bosscher Sept. 12, 2019, 8:42 p.m. UTC | #1
On Wednesday, September 11, 2019, Richard Sandiford <
richard.sandiford@arm.com> wrote:.
>
>
> Sorry for the long write-up.
>
> Richard
>

*thanks* for the long write-up!

Ciao!
Steven
Dimitar Dimitrov Sept. 26, 2019, 7:24 p.m. UTC | #2
On Wed, 11 Sep 2019, 22:02:26 EEST Richard Sandiford wrote:
> The reason for the PRU differences is that the port defines
> targetm.hard_regno_call_part_clobbered, but uses it to test whether
> a multi-register value contains a mixture of fully-clobbered and
> fully-preserved registers.  AFAICT the port doesn't actually have
> individual registers that are partly clobbered, so it doesn't need
> to define the hook.  (I can see how the documentation gave a misleading
> impression though.  I've tried to improve it in one of the patches.)
> The series moves away from testing hard_regno_call_part_clobbered
> directly to testing cached information instead, and the way that the
> cached information is calculated means that defining the hook the way
> the PRU port does has no effect.  In other words, after the series we
> treat it (rightly IMO) as having a "normal" ABI whereas before we didn't.
You are correct. Port does not have partially clobbered HW registers. And 
indeed I was worried about multi-register values.

PRU testsuite showed no regression from trunk with your patch set.

With your patch set, I tried to compare PRU assembly with and without defining 
the targetm.hard_regno_call_part_clobbered hook. There was much noise in 
compare-all-tests due to lto compiler ID strings, but after some filtering I 
think the output assembly was the same.

Thanks,
Dimitar
Richard Sandiford Sept. 27, 2019, 8:58 a.m. UTC | #3
Dimitar Dimitrov <dimitar@dinux.eu> writes:
> On Wed, 11 Sep 2019, 22:02:26 EEST Richard Sandiford wrote:
>> The reason for the PRU differences is that the port defines
>> targetm.hard_regno_call_part_clobbered, but uses it to test whether
>> a multi-register value contains a mixture of fully-clobbered and
>> fully-preserved registers.  AFAICT the port doesn't actually have
>> individual registers that are partly clobbered, so it doesn't need
>> to define the hook.  (I can see how the documentation gave a misleading
>> impression though.  I've tried to improve it in one of the patches.)
>> The series moves away from testing hard_regno_call_part_clobbered
>> directly to testing cached information instead, and the way that the
>> cached information is calculated means that defining the hook the way
>> the PRU port does has no effect.  In other words, after the series we
>> treat it (rightly IMO) as having a "normal" ABI whereas before we didn't.
> You are correct. Port does not have partially clobbered HW registers. And 
> indeed I was worried about multi-register values.
>
> PRU testsuite showed no regression from trunk with your patch set.
>
> With your patch set, I tried to compare PRU assembly with and without defining 
> the targetm.hard_regno_call_part_clobbered hook. There was much noise in 
> compare-all-tests due to lto compiler ID strings, but after some filtering I 
> think the output assembly was the same.

OK, great!  Thanks for the testing.

Richard