Message ID | 201109290038.49451.ebotcazou@adacore.com |
---|---|
State | New |
Headers | show |
From: Eric Botcazou <ebotcazou@adacore.com> Date: Thu, 29 Sep 2011 00:38:49 +0200 > [Vlad, if you have a few minutes, would you mind having a look at the couple of > questions at the end of the message? Thanks in advance]. Vlad, ping?
On 09/28/2011 06:38 PM, Eric Botcazou wrote: > [Vlad, if you have a few minutes, would you mind having a look at the couple of > questions at the end of the message? Thanks in advance]. > >> No problem. > Here are the results of the investigation. Pseudo 116 needs to be assigned a > hard register. It is used mostly in vector instructions so we would like it > to be assigned a FP reg, but it is initialized in insn 2: > > (insn 2 5 3 2 (set (reg/v:V4HI 116 [ a ]) > (reg:V4HI 24 %i0 [ a ])) combined-1.c:7 93 {*movdf_insn_sp32_v9} > (expr_list:REG_DEAD (reg:V4HI 24 %i0 [ a ]) > (nil))) > > so it ends up being assigned the (integer) argument register %i0 instead. It > used to be assigned a FP reg as expected with the GCC 4.6.x series. > > > The register class preference discovery is OK: > > r116: preferred EXTRA_FP_REGS, alternative GENERAL_OR_EXTRA_FP_REGS, > allocno GENERAL_OR_EXTRA_FP_REGS > a2 (r116,l0) best EXTRA_FP_REGS, allocno GENERAL_OR_EXTRA_FP_REGS > > i.e. EXTRA_FP_REGS is "preferred"/"best". Then it seems that this preference > is dropped and only the class of the allocno, GENERAL_OR_EXTRA_FP_REGS, is > handed down to the coloring stage. By contrast, in the GCC 4.6 series, the > cover_class of the allocno is EXTRA_FP_REGS. > > The initial cost for %i0 is twice as high (24000) as the cost of FP regs. But > then it is reduced by 12000 when process_bb_node_for_hard_reg_moves sees insn > 2 above and then again by 12000 when process_regs_for_copy sees the same insn. > So, in the end, %i0 is given cost 0 and thus beats every other register. This > doesn't happen in the GCC 4.6 series because %i0 isn't in the cover_class. > > This is at -O1. At -O2, there is an extra pass at the discovery stage and it > sets the class of the allocno to EXTRA_FP_REGS, like with the GCC 4.6 series, > so a simple workaround is > > Index: gcc.target/sparc/combined-1.c > =================================================================== > --- gcc.target/sparc/combined-1.c (revision 179316) > +++ gcc.target/sparc/combined-1.c (working copy) > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O -mcpu=ultrasparc -mvis" } */ > +/* { dg-options "-O2 -mcpu=ultrasparc -mvis" } */ > typedef short vec16 __attribute__((vector_size(8))); > typedef int vec32 __attribute__((vector_size(8))); > > > Finally the couple of questions: > > 1. Is it expected that the register class preference be dropped at -O1? > > 2. Is it expected that a single insn be processed by 2 different mechanisms > that independently halve the initial cost of a hard register? > > Sorry for the delay with the answer. I missed this email. About the 1st question. Before gcc4.7, the only class (allocno class) used for coloring can be a cover class. So it was not possible to use GENERAL_OR_EXTRA_FP_REGS in gcc4.6 and older versions. Starting gcc4.7, class used for coloring can be any class which is more profitable than memory. Although there is inaccuracy in cost calculations for -O1 because only one pass for cost calculations is used (it is very expensive pass). To get better cost evaluations, more passes should be used. But again we don't do more 2 passes because even one pass is not cheap. In brief, I don't see any criminal that the class calculation is different for -O1 and -O2. About the 2nd question. It seems to me wrong. I'd remove function process_bb_node_for_hard_reg_moves and its call from setup_allocno_cover_class_and_costs because function process_regs_for_copy is more accurate (it works with subreg). Although, I might be miss something here. There were a lot of problems and tunings of cost calculation code. Generated code *performance* (and even generation of *valid* code) is very sensitive to changes in ira-costs.c. So even if such change looks obvious, a lot of testing and benchmarking should be done. I could do that but it will take a week or two before committing such change if everything is ok.
> About the 1st question. Before gcc4.7, the only class (allocno class) > used for coloring can be a cover class. So it was not possible to use > GENERAL_OR_EXTRA_FP_REGS in gcc4.6 and older versions. Starting gcc4.7, > class used for coloring can be any class which is more profitable than > memory. Although there is inaccuracy in cost calculations for -O1 > because only one pass for cost calculations is used (it is very > expensive pass). To get better cost evaluations, more passes should be > used. But again we don't do more 2 passes because even one pass is not > cheap. > > In brief, I don't see any criminal that the class calculation is > different for -O1 and -O2. Fine with me. I'm going to apply the above patchlet then. > About the 2nd question. It seems to me wrong. I'd remove function > process_bb_node_for_hard_reg_moves and its call from > setup_allocno_cover_class_and_costs because function > process_regs_for_copy is more accurate (it works with subreg). > Although, I might be miss something here. There were a lot of problems > and tunings of cost calculation code. Generated code *performance* (and > even generation of *valid* code) is very sensitive to changes in > ira-costs.c. So even if such change looks obvious, a lot of testing and > benchmarking should be done. I could do that but it will take a week or > two before committing such change if everything is ok. Understood. I essentially wanted to bring this to your attention, since it looked like a small oddity to me. This might be something to play with for future improvements, but it's your call of course. Thanks for the detailed answer.
Index: gcc.target/sparc/combined-1.c =================================================================== --- gcc.target/sparc/combined-1.c (revision 179316) +++ gcc.target/sparc/combined-1.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O -mcpu=ultrasparc -mvis" } */ +/* { dg-options "-O2 -mcpu=ultrasparc -mvis" } */ typedef short vec16 __attribute__((vector_size(8))); typedef int vec32 __attribute__((vector_size(8)));
[Vlad, if you have a few minutes, would you mind having a look at the couple of questions at the end of the message? Thanks in advance]. > No problem. Here are the results of the investigation. Pseudo 116 needs to be assigned a hard register. It is used mostly in vector instructions so we would like it to be assigned a FP reg, but it is initialized in insn 2: (insn 2 5 3 2 (set (reg/v:V4HI 116 [ a ]) (reg:V4HI 24 %i0 [ a ])) combined-1.c:7 93 {*movdf_insn_sp32_v9} (expr_list:REG_DEAD (reg:V4HI 24 %i0 [ a ]) (nil))) so it ends up being assigned the (integer) argument register %i0 instead. It used to be assigned a FP reg as expected with the GCC 4.6.x series. The register class preference discovery is OK: r116: preferred EXTRA_FP_REGS, alternative GENERAL_OR_EXTRA_FP_REGS, allocno GENERAL_OR_EXTRA_FP_REGS a2 (r116,l0) best EXTRA_FP_REGS, allocno GENERAL_OR_EXTRA_FP_REGS i.e. EXTRA_FP_REGS is "preferred"/"best". Then it seems that this preference is dropped and only the class of the allocno, GENERAL_OR_EXTRA_FP_REGS, is handed down to the coloring stage. By contrast, in the GCC 4.6 series, the cover_class of the allocno is EXTRA_FP_REGS. The initial cost for %i0 is twice as high (24000) as the cost of FP regs. But then it is reduced by 12000 when process_bb_node_for_hard_reg_moves sees insn 2 above and then again by 12000 when process_regs_for_copy sees the same insn. So, in the end, %i0 is given cost 0 and thus beats every other register. This doesn't happen in the GCC 4.6 series because %i0 isn't in the cover_class. This is at -O1. At -O2, there is an extra pass at the discovery stage and it sets the class of the allocno to EXTRA_FP_REGS, like with the GCC 4.6 series, so a simple workaround is Finally the couple of questions: 1. Is it expected that the register class preference be dropped at -O1? 2. Is it expected that a single insn be processed by 2 different mechanisms that independently halve the initial cost of a hard register?