Patchwork Add explicit VIS intrinsics for addition and subtraction.

login
register
mail settings
Submitter Eric Botcazou
Date Sept. 28, 2011, 10:38 p.m.
Message ID <201109290038.49451.ebotcazou@adacore.com>
Download mbox | patch
Permalink /patch/116864/
State New
Headers show

Comments

Eric Botcazou - Sept. 28, 2011, 10:38 p.m.
[Vlad, if you have a few minutes, would you mind having a look at the couple of 
questions at the end of the message?  Thanks in advance].

> No problem.

Here are the results of the investigation.  Pseudo 116 needs to be assigned a 
hard register.  It is used mostly in vector instructions so we would like it 
to be assigned a FP reg, but it is initialized in insn 2:

(insn 2 5 3 2 (set (reg/v:V4HI 116 [ a ])
        (reg:V4HI 24 %i0 [ a ])) combined-1.c:7 93 {*movdf_insn_sp32_v9}
     (expr_list:REG_DEAD (reg:V4HI 24 %i0 [ a ])
        (nil)))

so it ends up being assigned the (integer) argument register %i0 instead.  It 
used to be assigned a FP reg as expected with the GCC 4.6.x series.


The register class preference discovery is OK:

    r116: preferred EXTRA_FP_REGS, alternative GENERAL_OR_EXTRA_FP_REGS, 
allocno GENERAL_OR_EXTRA_FP_REGS
    a2 (r116,l0) best EXTRA_FP_REGS, allocno GENERAL_OR_EXTRA_FP_REGS

i.e. EXTRA_FP_REGS is "preferred"/"best".  Then it seems that this preference 
is dropped and only the class of the allocno, GENERAL_OR_EXTRA_FP_REGS, is 
handed down to the coloring stage.  By contrast, in the GCC 4.6 series, the 
cover_class of the allocno is EXTRA_FP_REGS.

The initial cost for %i0 is twice as high (24000) as the cost of FP regs.  But 
then it is reduced by 12000 when process_bb_node_for_hard_reg_moves sees insn 
2 above and then again by 12000 when process_regs_for_copy sees the same insn.
So, in the end, %i0 is given cost 0 and thus beats every other register.  This 
doesn't happen in the GCC 4.6 series because %i0 isn't in the cover_class.

This is at -O1.  At -O2, there is an extra pass at the discovery stage and it 
sets the class of the allocno to EXTRA_FP_REGS, like with the GCC 4.6 series, 
so a simple workaround is


Finally the couple of questions:

 1. Is it expected that the register class preference be dropped at -O1?

 2. Is it expected that a single insn be processed by 2 different mechanisms 
that independently halve the initial cost of a hard register?
David Miller - Oct. 13, 2011, 8:17 p.m.
From: Eric Botcazou <ebotcazou@adacore.com>
Date: Thu, 29 Sep 2011 00:38:49 +0200

> [Vlad, if you have a few minutes, would you mind having a look at the couple of 
> questions at the end of the message?  Thanks in advance].

Vlad, ping?
Vladimir Makarov - Oct. 14, 2011, 2:48 p.m.
On 09/28/2011 06:38 PM, Eric Botcazou wrote:
> [Vlad, if you have a few minutes, would you mind having a look at the couple of
> questions at the end of the message?  Thanks in advance].
>
>> No problem.
> Here are the results of the investigation.  Pseudo 116 needs to be assigned a
> hard register.  It is used mostly in vector instructions so we would like it
> to be assigned a FP reg, but it is initialized in insn 2:
>
> (insn 2 5 3 2 (set (reg/v:V4HI 116 [ a ])
>          (reg:V4HI 24 %i0 [ a ])) combined-1.c:7 93 {*movdf_insn_sp32_v9}
>       (expr_list:REG_DEAD (reg:V4HI 24 %i0 [ a ])
>          (nil)))
>
> so it ends up being assigned the (integer) argument register %i0 instead.  It
> used to be assigned a FP reg as expected with the GCC 4.6.x series.
>
>
> The register class preference discovery is OK:
>
>      r116: preferred EXTRA_FP_REGS, alternative GENERAL_OR_EXTRA_FP_REGS,
> allocno GENERAL_OR_EXTRA_FP_REGS
>      a2 (r116,l0) best EXTRA_FP_REGS, allocno GENERAL_OR_EXTRA_FP_REGS
>
> i.e. EXTRA_FP_REGS is "preferred"/"best".  Then it seems that this preference
> is dropped and only the class of the allocno, GENERAL_OR_EXTRA_FP_REGS, is
> handed down to the coloring stage.  By contrast, in the GCC 4.6 series, the
> cover_class of the allocno is EXTRA_FP_REGS.
>
> The initial cost for %i0 is twice as high (24000) as the cost of FP regs.  But
> then it is reduced by 12000 when process_bb_node_for_hard_reg_moves sees insn
> 2 above and then again by 12000 when process_regs_for_copy sees the same insn.
> So, in the end, %i0 is given cost 0 and thus beats every other register.  This
> doesn't happen in the GCC 4.6 series because %i0 isn't in the cover_class.
>
> This is at -O1.  At -O2, there is an extra pass at the discovery stage and it
> sets the class of the allocno to EXTRA_FP_REGS, like with the GCC 4.6 series,
> so a simple workaround is
>
> Index: gcc.target/sparc/combined-1.c
> ===================================================================
> --- gcc.target/sparc/combined-1.c       (revision 179316)
> +++ gcc.target/sparc/combined-1.c       (working copy)
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-O -mcpu=ultrasparc -mvis" } */
> +/* { dg-options "-O2 -mcpu=ultrasparc -mvis" } */
>   typedef short vec16 __attribute__((vector_size(8)));
>   typedef int vec32 __attribute__((vector_size(8)));
>
>
> Finally the couple of questions:
>
>   1. Is it expected that the register class preference be dropped at -O1?
>
>   2. Is it expected that a single insn be processed by 2 different mechanisms
> that independently halve the initial cost of a hard register?
>
>
Sorry for the delay with the answer.  I missed this email.

About the 1st question.  Before gcc4.7, the only class (allocno class) 
used for coloring can be a cover class.  So it was not possible to use 
GENERAL_OR_EXTRA_FP_REGS in gcc4.6 and older versions.  Starting gcc4.7, 
class used for coloring can be any class which is more profitable than 
memory.  Although there is inaccuracy in cost calculations for -O1 
because only one pass for cost calculations is used (it is very 
expensive pass).  To get better cost evaluations, more passes should be 
used.  But again we don't do more 2 passes because even one pass is not 
cheap.

In brief, I don't see any criminal that the class calculation is 
different for -O1 and -O2.

About the 2nd question.  It seems to me wrong.  I'd remove function 
process_bb_node_for_hard_reg_moves and its call from 
setup_allocno_cover_class_and_costs because function  
process_regs_for_copy is more accurate (it works with subreg).   
Although, I might be miss something here.  There were a lot of problems 
and tunings of cost calculation code.  Generated code *performance* (and 
even generation of *valid* code) is very sensitive to changes in 
ira-costs.c.  So even if such change looks obvious, a lot of testing and 
benchmarking should be done.  I could do that but it will take a week or 
two before committing such change if everything is ok.
Eric Botcazou - Oct. 15, 2011, 2:41 p.m.
> About the 1st question.  Before gcc4.7, the only class (allocno class)
> used for coloring can be a cover class.  So it was not possible to use
> GENERAL_OR_EXTRA_FP_REGS in gcc4.6 and older versions.  Starting gcc4.7,
> class used for coloring can be any class which is more profitable than
> memory.  Although there is inaccuracy in cost calculations for -O1
> because only one pass for cost calculations is used (it is very
> expensive pass).  To get better cost evaluations, more passes should be
> used.  But again we don't do more 2 passes because even one pass is not
> cheap.
>
> In brief, I don't see any criminal that the class calculation is
> different for -O1 and -O2.

Fine with me.  I'm going to apply the above patchlet then.

> About the 2nd question.  It seems to me wrong.  I'd remove function
> process_bb_node_for_hard_reg_moves and its call from
> setup_allocno_cover_class_and_costs because function
> process_regs_for_copy is more accurate (it works with subreg).
> Although, I might be miss something here.  There were a lot of problems
> and tunings of cost calculation code.  Generated code *performance* (and
> even generation of *valid* code) is very sensitive to changes in
> ira-costs.c.  So even if such change looks obvious, a lot of testing and
> benchmarking should be done.  I could do that but it will take a week or
> two before committing such change if everything is ok.

Understood.  I essentially wanted to bring this to your attention, since it 
looked like a small oddity to me.  This might be something to play with for 
future improvements, but it's your call of course.

Thanks for the detailed answer.

Patch

Index: gcc.target/sparc/combined-1.c
===================================================================
--- gcc.target/sparc/combined-1.c       (revision 179316)
+++ gcc.target/sparc/combined-1.c       (working copy)
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O -mcpu=ultrasparc -mvis" } */
+/* { dg-options "-O2 -mcpu=ultrasparc -mvis" } */
 typedef short vec16 __attribute__((vector_size(8)));
 typedef int vec32 __attribute__((vector_size(8)));