Message ID | ada59c67-0b8e-5ceb-cdc5-0ee82d39bc45@redhat.com |
---|---|
State | New |
Headers | show |
Series | RFA: a x86 test modification | expand |
On 11/28/18 2:47 PM, Vladimir Makarov wrote: > The patch I committed today recently for > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88207 > > creates a new regression for pr34256.c. 2 moves is expected but gcc > with the patch generates 3 moves. I think now RA generates the right code. > > We have the following code before RA > > (insn 7 6 13 2 (set (reg:V2SI 88) > (plus:V2SI (reg:V2SI 89 [ x ]) > (mem/c:V2SI (symbol_ref:DI ("y") [flags 0x2] <var_decl > 0x7faad08b5b40 y>) [1 y+0 S8 A64])))"mmintrin.h":306:18 1115 {*mmx_addv2si3} > > (expr_list:REG_DEAD (reg:V2SI 89 [ x ]) > (nil))) > (insn 13 7 14 2 (set (reg/i:DI 0 ax) > (subreg:DI (reg:V2SI 88) 0)) "pr34256.c":11:1 66 {*movdi_internal} > (expr_list:REG_DEAD (reg:V2SI 88) > > The test is expected to assign mmx reg to pseudo 88 but gcc with the > patch assigns memory to it. The cost of mmx to general reg move is 13, > while overall cost of mmx to mem and mem to general moves is 10. So IRA > now chooses memory for pseudo 88 according to the minimal cost. > > Now, if we want still assign mmx reg to the pseudo 88, we should change > the costs in machine-dependent x86 code. But I think it might create > other unexpected code generation. As mmx is basically not used nowadays > the test is not important, I just propose the following patch. > > Is it ok for the trunk? > > > > > pr88207-2.patch > > Index: testsuite/ChangeLog > =================================================================== > --- testsuite/ChangeLog (revision 266582) > +++ testsuite/ChangeLog (working copy) > @@ -1,3 +1,7 @@ > +2018-11-28 Vladimir Makarov <vmakarov@redhat.com> > + > + * gcc.target/i386/pr34256.c: Adjust the number of expected moves. OK. jeff
On Thu, Nov 29, 2018 at 12:43 AM Jeff Law <law@redhat.com> wrote: > > On 11/28/18 2:47 PM, Vladimir Makarov wrote: > > The patch I committed today recently for > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88207 > > > > creates a new regression for pr34256.c. 2 moves is expected but gcc > > with the patch generates 3 moves. I think now RA generates the right code. > > > > We have the following code before RA > > > > (insn 7 6 13 2 (set (reg:V2SI 88) > > (plus:V2SI (reg:V2SI 89 [ x ]) > > (mem/c:V2SI (symbol_ref:DI ("y") [flags 0x2] <var_decl > > 0x7faad08b5b40 y>) [1 y+0 S8 A64])))"mmintrin.h":306:18 1115 {*mmx_addv2si3} > > > > (expr_list:REG_DEAD (reg:V2SI 89 [ x ]) > > (nil))) > > (insn 13 7 14 2 (set (reg/i:DI 0 ax) > > (subreg:DI (reg:V2SI 88) 0)) "pr34256.c":11:1 66 {*movdi_internal} > > (expr_list:REG_DEAD (reg:V2SI 88) > > > > The test is expected to assign mmx reg to pseudo 88 but gcc with the > > patch assigns memory to it. The cost of mmx to general reg move is 13, > > while overall cost of mmx to mem and mem to general moves is 10. So IRA > > now chooses memory for pseudo 88 according to the minimal cost. > > > > Now, if we want still assign mmx reg to the pseudo 88, we should change > > the costs in machine-dependent x86 code. But I think it might create > > other unexpected code generation. As mmx is basically not used nowadays > > the test is not important, I just propose the following patch. > > > > Is it ok for the trunk? The generated code is correct, in i386.c, secondary_memory_needed, we have: /* ??? This is a lie. We do have moves between mmx/general, and for mmx/sse2. But by saying we need secondary memory we discourage the register allocator from using the mmx registers unless needed. */ if (MMX_CLASS_P (class1) != MMX_CLASS_P (class2)) return true; But I noticed that for -mtune=generic we still generate direct move. The cost of interunit MMX -> GR moves should always be higher than the cost of indirect move (due to secondary_memory_needed), c.f. the comment in ix86_register_move_cost: /* In case we require secondary memory, compute cost of the store followed by load. In order to avoid bad register allocation choices, we need for this to be *at least* as high as the symmetric MEMORY_MOVE_COST. */ IMO, the issue with -mtune=generic warrants some additional analysis. I'll look at target cost calculations. Thanks, Uros.
On Thu, Nov 29, 2018 at 8:03 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Thu, Nov 29, 2018 at 12:43 AM Jeff Law <law@redhat.com> wrote: > > > > On 11/28/18 2:47 PM, Vladimir Makarov wrote: > > > The patch I committed today recently for > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88207 > > > > > > creates a new regression for pr34256.c. 2 moves is expected but gcc > > > with the patch generates 3 moves. I think now RA generates the right code. > > > > > > We have the following code before RA > > > > > > (insn 7 6 13 2 (set (reg:V2SI 88) > > > (plus:V2SI (reg:V2SI 89 [ x ]) > > > (mem/c:V2SI (symbol_ref:DI ("y") [flags 0x2] <var_decl > > > 0x7faad08b5b40 y>) [1 y+0 S8 A64])))"mmintrin.h":306:18 1115 {*mmx_addv2si3} > > > > > > (expr_list:REG_DEAD (reg:V2SI 89 [ x ]) > > > (nil))) > > > (insn 13 7 14 2 (set (reg/i:DI 0 ax) > > > (subreg:DI (reg:V2SI 88) 0)) "pr34256.c":11:1 66 {*movdi_internal} > > > (expr_list:REG_DEAD (reg:V2SI 88) > > > > > > The test is expected to assign mmx reg to pseudo 88 but gcc with the > > > patch assigns memory to it. The cost of mmx to general reg move is 13, > > > while overall cost of mmx to mem and mem to general moves is 10. So IRA > > > now chooses memory for pseudo 88 according to the minimal cost. > > > > > > Now, if we want still assign mmx reg to the pseudo 88, we should change > > > the costs in machine-dependent x86 code. But I think it might create > > > other unexpected code generation. As mmx is basically not used nowadays > > > the test is not important, I just propose the following patch. > > > > > > Is it ok for the trunk? > > The generated code is correct, in i386.c, secondary_memory_needed, we have: > > /* ??? This is a lie. We do have moves between mmx/general, and for > mmx/sse2. But by saying we need secondary memory we discourage the > register allocator from using the mmx registers unless needed. */ > if (MMX_CLASS_P (class1) != MMX_CLASS_P (class2)) > return true; > > But I noticed that for -mtune=generic we still generate direct move. > The cost of interunit MMX -> GR moves should always be higher than the > cost of indirect move (due to secondary_memory_needed), c.f. the > comment in ix86_register_move_cost: > > /* In case we require secondary memory, compute cost of the store followed > by load. In order to avoid bad register allocation choices, we need > for this to be *at least* as high as the symmetric MEMORY_MOVE_COST. */ > > IMO, the issue with -mtune=generic warrants some additional analysis. > I'll look at target cost calculations. It looks that there are some issues how RA handles costs. For -mtune=intel, we have following costs: MMX->GR = 13 MMX->mem = 6 mem->GR = 4 The direct move has higher cost, so indirect move is emitted (which is correct). and for -mtune=generic, we have: MMX->GR = 13 MMX->mem = 6 mem->GR = 6 Although the direct move has slightly higher cost, direct move is emitted. According to the above comment in ix86_register_move_cost, we do have *at least* as high cost for direct move as for indirect move sequence (in fact, we will always have higher cost for direct move), but RA still emits direct move. While there are some target-dependent issues with handling of MMX regs (corrected by attached patch), the patch doesn't change the generated code. I suspect that since in failing -mtune=generic case we have costs that are very close to each other (12 vs 13), that perhaps some threshold in RA cost compares is off by one. Uros. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index cef809f575fb..9c850678401c 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -39528,7 +39528,7 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass, default: return 100; } - if (in) + if (in == 2) return MAX (ix86_cost->mmx_load [index], ix86_cost->mmx_store [index]); return in ? ix86_cost->mmx_load [index] : ix86_cost->mmx_store [index]; } diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index e60b2296ab6b..66da7bcd7e5d 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -208,9 +208,9 @@ ] (const_string "DI"))) (set (attr "preferred_for_speed") - (cond [(eq_attr "alternative" "10,15") + (cond [(eq_attr "alternative" "9,15") (symbol_ref "TARGET_INTER_UNIT_MOVES_FROM_VEC") - (eq_attr "alternative" "11,16") + (eq_attr "alternative" "10,16") (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC") ] (symbol_ref "true")))])
Index: testsuite/ChangeLog =================================================================== --- testsuite/ChangeLog (revision 266582) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,7 @@ +2018-11-28 Vladimir Makarov <vmakarov@redhat.com> + + * gcc.target/i386/pr34256.c: Adjust the number of expected moves. + 2018-11-28 Marek Polacek <polacek@redhat.com> PR c++/88222 - ICE with bit-field with invalid type. Index: testsuite/gcc.target/i386/pr34256.c =================================================================== --- testsuite/gcc.target/i386/pr34256.c (revision 266155) +++ testsuite/gcc.target/i386/pr34256.c (working copy) @@ -10,5 +10,5 @@ unsigned long long foo(__m64 m) { return _mm_cvtm64_si64(_mm_add_pi32(x, y)); } -/* { dg-final { scan-assembler-times "mov" 2 { target { nonpic || pie_enabled } } } } */ -/* { dg-final { scan-assembler-times "mov" 4 { target { { ! nonpic } && { ! pie_enabled } } } } } */ +/* { dg-final { scan-assembler-times "mov" 3 { target { nonpic || pie_enabled } } } } */ +/* { dg-final { scan-assembler-times "mov" 5 { target { { ! nonpic } && { ! pie_enabled } } } } } */