diff mbox

patch to solve recent SPEC2000 degradation

Message ID 4E54329B.5040701@redhat.com
State New
Headers show

Commit Message

Vladimir Makarov Aug. 23, 2011, 11:07 p.m. UTC
Starting Aug 20, there is about 1.5% degradation on x86 SPEC2000, 
please see http://vmakarov.fedorapeople.org/spec/.

   The reason for this was my patch 
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01623.html for solving code 
performance degradation on MIPS.

   Instead of using explicitly necessary number of registers, I used 
contains_reg_of_mode which also checks the number of necessary registers 
but also it checks that the register class can hold value of given 
mode.  This resulted in different register pressure classes (before the 
patch, they were GENERAL_REGS and FLOAT_REGS for x86.  They became only 
INT_FLOAT_REGS) because it became not costly to hold integer mode value 
in FLOAT_REGS.  The new register pressure class in own turn resulted in 
low register pressure and one region allocation in most cases instead of 
multiple region RA.  As a consequence, we got a big degradation on Intel 
32 bit targets.

   I did check code differences and did not find the code difference on 
my Aug 19 patch in 32-bit and 64-bit mode.  My mistake was in that I 
used a very small tests for which one region allocation was used with 
and without the patch.

   Sorry for any inconvenience.  It is always hard to fix performance 
bugs in IRA because it is very machine-dependent pass which is always 
used in GCC.

   The following patch should be the right patch.  It does not result in 
code difference on x86 and x86-64 SPEC2000 any more.  It still solves 
the mips code performance problem and was successfully bootstrapped on 
x86-64.

   Committed as rev. 178019.

2011-08-23  Vladimir Makarov <vmakarov@redhat.com>

         * ira.c (ira_init_register_move_cost): Check small subclasses
         through ira_reg_class_max_nregs and ira_available_class_regs.

Comments

Richard Sandiford Aug. 25, 2011, 9:57 a.m. UTC | #1
Vladimir Makarov <vmakarov@redhat.com> writes:
>    Instead of using explicitly necessary number of registers, I used 
> contains_reg_of_mode which also checks the number of necessary registers 
> but also it checks that the register class can hold value of given 
> mode.  This resulted in different register pressure classes (before the 
> patch, they were GENERAL_REGS and FLOAT_REGS for x86.  They became only 
> INT_FLOAT_REGS) because it became not costly to hold integer mode value 
> in FLOAT_REGS.  The new register pressure class in own turn resulted in 
> low register pressure and one region allocation in most cases instead of 
> multiple region RA.  As a consequence, we got a big degradation on Intel 
> 32 bit targets.

Sorry, I know I should be able to work this out, but could you explain
in a bit more detail why contains_reg_of_mode (CL1, MODE) was wrong?
The loop is calculating costs for moving values of mode MODE into and
out of CL1, and I wouldn't have expected those costs to have any meaning
if CL1 can't in fact store anything of mode MODE.

It just looked at first glance as though:

>        /* Some subclasses are to small to have enough registers to hold
>  	 a value of MODE.  Just ignore them.  */
> -      if (! contains_reg_of_mode[cl1][mode])
> +      if (ira_reg_class_max_nregs[cl1][mode] > ira_available_class_regs[cl1])
>  	continue;
>        COPY_HARD_REG_SET (temp_hard_regset, reg_class_contents[cl1]);
>        AND_COMPL_HARD_REG_SET (temp_hard_regset, no_unit_alloc_regs);

expects CLASS_MAX_NREGS (CL1, MODE) to have a certain meaning even
if CL1 can't store values of mode MODE, whereas I'd assumed it
was undefined in that case.

Richard
Vladimir Makarov Aug. 25, 2011, 4:28 p.m. UTC | #2
On 08/25/2011 05:57 AM, Richard Sandiford wrote:
> Vladimir Makarov<vmakarov@redhat.com>  writes:
>>     Instead of using explicitly necessary number of registers, I used
>> contains_reg_of_mode which also checks the number of necessary registers
>> but also it checks that the register class can hold value of given
>> mode.  This resulted in different register pressure classes (before the
>> patch, they were GENERAL_REGS and FLOAT_REGS for x86.  They became only
>> INT_FLOAT_REGS) because it became not costly to hold integer mode value
>> in FLOAT_REGS.  The new register pressure class in own turn resulted in
>> low register pressure and one region allocation in most cases instead of
>> multiple region RA.  As a consequence, we got a big degradation on Intel
>> 32 bit targets.
> Sorry, I know I should be able to work this out, but could you explain
> in a bit more detail why contains_reg_of_mode (CL1, MODE) was wrong?
> The loop is calculating costs for moving values of mode MODE into and
> out of CL1, and I wouldn't have expected those costs to have any meaning
> if CL1 can't in fact store anything of mode MODE.
>
Here is x86 example.  For an integer mode it excludes FLOAT_REGS from 
updating max register move cost for two FLOAT_INT_REGS and the integer 
mode in the loop.  At the end of function in another loop where 
ira_register_move_cost is defined more accurately from 
ira_max_register_move_cost, it results in smaller ira_register_move_cost 
for two FLOAT_INT_REGS and integer modes than ira_memory_move_cost for 
FLOAT_INT_REGS and integer mode.  And the last results in one pressure 
class FLOAT_INT_REGS instead of GENERAL_REGS and FLOAT_REGS as it should be.

It is a very complicated area.  In previous versions of IRA, I tried to 
calculate cover classes.  It never worked because the right cover 
classes was too critical for a good code and therefore I introduced a 
new macro for definition of cover classes.

After introducing IRA without cover classes, I decided to calculate 
pressure classes because their accuracy were not critical (especially 
for targets with moderate or large size) and because the algorithm 
worked as I wanted (I checked a lot of targets but not all).

If we have more troubles with pressure classes calculation, I think we 
could reconsider the approach and define pressure classes through a 
macro/hook.
> It just looked at first glance as though:
>
>>         /* Some subclasses are to small to have enough registers to hold
>>   	 a value of MODE.  Just ignore them.  */
>> -      if (! contains_reg_of_mode[cl1][mode])
>> +      if (ira_reg_class_max_nregs[cl1][mode]>  ira_available_class_regs[cl1])
>>   	continue;
>>         COPY_HARD_REG_SET (temp_hard_regset, reg_class_contents[cl1]);
>>         AND_COMPL_HARD_REG_SET (temp_hard_regset, no_unit_alloc_regs);
> expects CLASS_MAX_NREGS (CL1, MODE) to have a certain meaning even
> if CL1 can't store values of mode MODE, whereas I'd assumed it
> was undefined in that case.
>
It is a bit ambiguous area.  The documentation said that it must be not 
zero even if the register  can not hold the value.  The usual (and 
recommended) definition is based on relation of #bit in value of given 
mode and #bits in register.
Richard Sandiford Aug. 25, 2011, 6:07 p.m. UTC | #3
Vladimir Makarov <vmakarov@redhat.com> writes:
> On 08/25/2011 05:57 AM, Richard Sandiford wrote:
>> Vladimir Makarov<vmakarov@redhat.com>  writes:
>>>     Instead of using explicitly necessary number of registers, I used
>>> contains_reg_of_mode which also checks the number of necessary registers
>>> but also it checks that the register class can hold value of given
>>> mode.  This resulted in different register pressure classes (before the
>>> patch, they were GENERAL_REGS and FLOAT_REGS for x86.  They became only
>>> INT_FLOAT_REGS) because it became not costly to hold integer mode value
>>> in FLOAT_REGS.  The new register pressure class in own turn resulted in
>>> low register pressure and one region allocation in most cases instead of
>>> multiple region RA.  As a consequence, we got a big degradation on Intel
>>> 32 bit targets.
>> Sorry, I know I should be able to work this out, but could you explain
>> in a bit more detail why contains_reg_of_mode (CL1, MODE) was wrong?
>> The loop is calculating costs for moving values of mode MODE into and
>> out of CL1, and I wouldn't have expected those costs to have any meaning
>> if CL1 can't in fact store anything of mode MODE.
>>
> Here is x86 example.  For an integer mode it excludes FLOAT_REGS from 
> updating max register move cost for two FLOAT_INT_REGS and the integer 
> mode in the loop.  At the end of function in another loop where 
> ira_register_move_cost is defined more accurately from 
> ira_max_register_move_cost, it results in smaller ira_register_move_cost 
> for two FLOAT_INT_REGS and integer modes than ira_memory_move_cost for 
> FLOAT_INT_REGS and integer mode.

But isn't that correct though?  If FLOAT_REGS can't store integer modes,
and if FLOAT_INT_REGS is the union of FLOAT_REGS and INT_REGS, then doesn't
it follow that the move cost for FLOAT_INT_REGS should be the same
as for INT_REGS?

To be clear, I'm not disputing that the pressure class changes were
undesirable.  I'm just not sure why changing the move costs in this
way is the right way to get back to the desired pressure classes.

If FLOAT_REGS can't store integer modes, then it doesn't seem like we
should be taking the reported move cost for FLOAT_REGS and integer modes
into account.  I'm just not sure that those costs (or CLASS_MAX_NREGS)
are meaningful in this case.

Richard
Vladimir Makarov Aug. 25, 2011, 6:46 p.m. UTC | #4
On 08/25/2011 02:07 PM, Richard Sandiford wrote:
> Vladimir Makarov<vmakarov@redhat.com>  writes:
>> On 08/25/2011 05:57 AM, Richard Sandiford wrote:
>>> Vladimir Makarov<vmakarov@redhat.com>   writes:
>>>>      Instead of using explicitly necessary number of registers, I used
>>>> contains_reg_of_mode which also checks the number of necessary registers
>>>> but also it checks that the register class can hold value of given
>>>> mode.  This resulted in different register pressure classes (before the
>>>> patch, they were GENERAL_REGS and FLOAT_REGS for x86.  They became only
>>>> INT_FLOAT_REGS) because it became not costly to hold integer mode value
>>>> in FLOAT_REGS.  The new register pressure class in own turn resulted in
>>>> low register pressure and one region allocation in most cases instead of
>>>> multiple region RA.  As a consequence, we got a big degradation on Intel
>>>> 32 bit targets.
>>> Sorry, I know I should be able to work this out, but could you explain
>>> in a bit more detail why contains_reg_of_mode (CL1, MODE) was wrong?
>>> The loop is calculating costs for moving values of mode MODE into and
>>> out of CL1, and I wouldn't have expected those costs to have any meaning
>>> if CL1 can't in fact store anything of mode MODE.
>>>
>> Here is x86 example.  For an integer mode it excludes FLOAT_REGS from
>> updating max register move cost for two FLOAT_INT_REGS and the integer
>> mode in the loop.  At the end of function in another loop where
>> ira_register_move_cost is defined more accurately from
>> ira_max_register_move_cost, it results in smaller ira_register_move_cost
>> for two FLOAT_INT_REGS and integer modes than ira_memory_move_cost for
>> FLOAT_INT_REGS and integer mode.
> But isn't that correct though?  If FLOAT_REGS can't store integer modes,
> and if FLOAT_INT_REGS is the union of FLOAT_REGS and INT_REGS, then doesn't
> it follow that the move cost for FLOAT_INT_REGS should be the same
> as for INT_REGS?
>
I don't think it should be the same for the current algorithm of 
register pressure classes calculation, otherwise we will have 
FLOAT_INT_REGS as a single pressure class.

   Although the algorithm could be changed to follow your logic.  It 
might be not an easy task.  I spent a lot of time to write this code 
(and even much more time when I tried to write a code for cover classes 
calculation), so I'd rather avoid to this again.
> To be clear, I'm not disputing that the pressure class changes were
> undesirable.  I'm just not sure why changing the move costs in this
> way is the right way to get back to the desired pressure classes.
>
> If FLOAT_REGS can't store integer modes, then it doesn't seem like we
> should be taking the reported move cost for FLOAT_REGS and integer modes
> into account.  I'm just not sure that those costs (or CLASS_MAX_NREGS)
> are meaningful in this case.
>
There is a certain logic in your speculations.  But for some reasons, 
they are defined.  May be reload/old RA are/were based on these not 
meaningful costs.

Now I think I should try to redesign this pressure reg class 
calculation.  Again it will be not an easy task  because, unfortunately, 
target definitions for costs are a real mess.  They usually are not 
correctly defined for all classes and (even really possible) modes.

Richard, thanks for the discussion.  As usual it was always useful for me.
diff mbox

Patch

Index: ira.c
===================================================================
--- ira.c	(revision 177968)
+++ ira.c	(working copy)
@@ -1503,7 +1503,7 @@  ira_init_register_move_cost (enum machin
     {
       /* Some subclasses are to small to have enough registers to hold
 	 a value of MODE.  Just ignore them.  */
-      if (! contains_reg_of_mode[cl1][mode])
+      if (ira_reg_class_max_nregs[cl1][mode] > ira_available_class_regs[cl1])
 	continue;
       COPY_HARD_REG_SET (temp_hard_regset, reg_class_contents[cl1]);
       AND_COMPL_HARD_REG_SET (temp_hard_regset, no_unit_alloc_regs);