mbox series

[0/5,amdgcn] Reduce register usage on AMD GCN

Message ID 0b37b07a-be6c-2ac6-c579-c7a522024419@codesourcery.com
Headers show
Series Reduce register usage on AMD GCN | expand

Message

Kwok Cheung Yeung Nov. 14, 2019, 3:28 p.m. UTC
Hello

Although GCN has a large register file, these registers are distributed 
among the threads (wavefronts) running on the same compute unit, so (up 
to a point) the fewer registers used in a kernel, the more kernels can 
run concurrently. While this is of limited use in trunk at the moment 
with only single-worker offloading, hopefully it will be of more use in 
the future.

These patches free up some of the registers that were previously fixed, 
and restrict the number of registers used in non-kernel functions to 64 
SGPRs and 24 VGPRs, as opposed to 102 SGPRs and 64 VGPRs before. Kernels 
can still use however many they need, but the minimum limit on the 
number of registers needed is now reduced to that of the non-kernel 
functions (since kernels cannot in general know how many registers are 
used by the functions they call, they need to reserve the maximum number 
of registers usable by the callees).

These patches need the patch 'Stash reent marker in upper bits of s1 on 
AMD GCN' in newlib to free up s[2:3] (recently committed as commit 
d14714c690c0b11b0aa7e6d09c930a321eeac7f9).

Tested in standalone configuration on a gfx900 target. I have not yet 
tested the offload configuration with trunk sources as testsuite support 
has not yet been committed yet - I will retest when this is done. 
Internal offload testing (based on a branch of OG9) revealed a number of 
regressions, but they are due to latent bugs exposed by the changes 
rather than issues with this patchset. I have already posted fixes for 
these in the following patches:

[PATCH] Support multiple registers for the frame pointer
[PATCH] [LRA] Do not use eliminable registers for spilling
[PATCH] Check suitability of spill register for mode
[PATCH] [GCN] Fix handling of VCC_CONDITIONAL_REG

Kwok