diff mbox

Enable EBX for x86 in 32bits PIC code

Message ID 53FE3D46.8050904@redhat.com
State New
Headers show

Commit Message

Vladimir Makarov Aug. 27, 2014, 8:19 p.m. UTC
On 2014-08-26 5:42 PM, Ilya Enkovich wrote:
> Hi,
>
> Here is a patch I tried.  I apply it over revision 214215.  Unfortunately I do not have a small reproducer but the problem can be easily reproduced on SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float value is compared with float constant 1.0.  It is inlined into read_arch function and can be easily found in RTL dump of function read_arch as a float comparison with 1.0 after the first call to strtod function.
>
> Here is a compilation string I use:
>
> gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math -mfpmath=sse -m32  -march=slm -fPIE -pie -c -o read_arch.o       -DSPEC_CPU2000        read_arch.c
>
> In my final assembler comparison with 1.0 looks like:
>
> comiss  .LC11@GOTOFF(%ebp), %xmm0       # 1101  *cmpisf_sse     [length = 7]
>
> and %ebp here doesn't have a proper value.
>
> I'll try to make a smaller reproducer if these instructions don't help.

I've managed to reproduce it.  Although it would be better to send the 
patch as an attachment.

The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC. 
  Then in a region when a *new* pseudo used as PIC we rematerialize a 
constant which transformed in memory addressed through *original* PIC 
pseudo.

To solve the problem we should prevent such splitting and guarantee that 
PIC pseudo allocnos in different region gets the same hard reg.

The following patch should solve the problem.

Comments

Ilya Enkovich Aug. 28, 2014, 8:28 a.m. UTC | #1
2014-08-28 0:19 GMT+04:00 Vladimir Makarov <vmakarov@redhat.com>:
> On 2014-08-26 5:42 PM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> Here is a patch I tried.  I apply it over revision 214215.  Unfortunately
>> I do not have a small reproducer but the problem can be easily reproduced on
>> SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float
>> value is compared with float constant 1.0.  It is inlined into read_arch
>> function and can be easily found in RTL dump of function read_arch as a
>> float comparison with 1.0 after the first call to strtod function.
>>
>> Here is a compilation string I use:
>>
>> gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math
>> -mfpmath=sse -m32  -march=slm -fPIE -pie -c -o read_arch.o
>> -DSPEC_CPU2000        read_arch.c
>>
>> In my final assembler comparison with 1.0 looks like:
>>
>> comiss  .LC11@GOTOFF(%ebp), %xmm0       # 1101  *cmpisf_sse     [length =
>> 7]
>>
>> and %ebp here doesn't have a proper value.
>>
>> I'll try to make a smaller reproducer if these instructions don't help.
>
>
> I've managed to reproduce it.  Although it would be better to send the patch
> as an attachment.
>
> The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC.
> Then in a region when a *new* pseudo used as PIC we rematerialize a constant
> which transformed in memory addressed through *original* PIC pseudo.
>
> To solve the problem we should prevent such splitting and guarantee that PIC
> pseudo allocnos in different region gets the same hard reg.
>
> The following patch should solve the problem.
>

Thanks for the patch! I'll try it and be back with results.

Ilya
>
Ilya Enkovich Aug. 29, 2014, 6:47 a.m. UTC | #2
2014-08-28 12:28 GMT+04:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
> 2014-08-28 0:19 GMT+04:00 Vladimir Makarov <vmakarov@redhat.com>:
>> On 2014-08-26 5:42 PM, Ilya Enkovich wrote:
>>>
>>> Hi,
>>>
>>> Here is a patch I tried.  I apply it over revision 214215.  Unfortunately
>>> I do not have a small reproducer but the problem can be easily reproduced on
>>> SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float
>>> value is compared with float constant 1.0.  It is inlined into read_arch
>>> function and can be easily found in RTL dump of function read_arch as a
>>> float comparison with 1.0 after the first call to strtod function.
>>>
>>> Here is a compilation string I use:
>>>
>>> gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math
>>> -mfpmath=sse -m32  -march=slm -fPIE -pie -c -o read_arch.o
>>> -DSPEC_CPU2000        read_arch.c
>>>
>>> In my final assembler comparison with 1.0 looks like:
>>>
>>> comiss  .LC11@GOTOFF(%ebp), %xmm0       # 1101  *cmpisf_sse     [length =
>>> 7]
>>>
>>> and %ebp here doesn't have a proper value.
>>>
>>> I'll try to make a smaller reproducer if these instructions don't help.
>>
>>
>> I've managed to reproduce it.  Although it would be better to send the patch
>> as an attachment.
>>
>> The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC.
>> Then in a region when a *new* pseudo used as PIC we rematerialize a constant
>> which transformed in memory addressed through *original* PIC pseudo.
>>
>> To solve the problem we should prevent such splitting and guarantee that PIC
>> pseudo allocnos in different region gets the same hard reg.
>>
>> The following patch should solve the problem.
>>
>
> Thanks for the patch! I'll try it and be back with results.

Seems your patch doesn't cover all cases.  Attached is a modified
patch (with your changes included) and a test where double constant is
wrongly rematerialized.  I also see in ira dump that there is still a
copy of PIC reg created:

Initialization of original PIC reg:
(insn 23 22 24 2 (set (reg:SI 127)
        (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 3 bx)
        (nil)))
...
Copy is created:
(insn 135 37 25 3 (set (reg:SI 138 [127])
        (reg:SI 127)) 90 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 127)
        (nil)))
...
Copy is used:
(insn 119 25 122 3 (set (reg:DF 134)
        (mem/u/c:DF (plus:SI (reg:SI 138 [127])
                (const:SI (unspec:SI [
                            (symbol_ref/u:SI ("*.LC0") [flags 0x2])
                        ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
     (expr_list:REG_EQUIV (const_double:DF
2.9999999999999997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11])
        (nil)))

After reload we have new usage of r127 which is allocated to ecx which
actually does not have any definition in this function at all.

(insn 151 42 44 4 (set (reg:SI 0 ax [147])
        (plus:SI (reg:SI 2 cx [127])
            (const:SI (unspec:SI [
                        (symbol_ref/u:SI ("*.LC0") [flags 0x2])
                    ] UNSPEC_GOTOFF)))) test.cc:44 213 {*leasi}
     (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC0") [flags 0x2])
        (nil)))
(insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
        (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
            (mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
790 {*fop_df_comm_sse}
     (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
            (const_double:DF
2.9999999999999997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11]))
        (nil)))

Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc

Thanks,
Ilya

>
> Ilya
>>
Vladimir Makarov Sept. 2, 2014, 2:29 p.m. UTC | #3
On 08/29/2014 02:47 AM, Ilya Enkovich wrote:
> Seems your patch doesn't cover all cases.  Attached is a modified
> patch (with your changes included) and a test where double constant is
> wrongly rematerialized.  I also see in ira dump that there is still a
> copy of PIC reg created:
>
> Initialization of original PIC reg:
> (insn 23 22 24 2 (set (reg:SI 127)
>         (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 3 bx)
>         (nil)))
> ...
> Copy is created:
> (insn 135 37 25 3 (set (reg:SI 138 [127])
>         (reg:SI 127)) 90 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 127)
>         (nil)))
> ...
> Copy is used:
> (insn 119 25 122 3 (set (reg:DF 134)
>         (mem/u/c:DF (plus:SI (reg:SI 138 [127])
>                 (const:SI (unspec:SI [
>                             (symbol_ref/u:SI ("*.LC0") [flags 0x2])
>                         ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
>      (expr_list:REG_EQUIV (const_double:DF
> 2.9999999999999997371893933895137251965934410691261292e-4
> [0x0.9d495182a99308p-11])
>         (nil)))
>
> After reload we have new usage of r127 which is allocated to ecx which
> actually does not have any definition in this function at all.
>
> (insn 151 42 44 4 (set (reg:SI 0 ax [147])
>         (plus:SI (reg:SI 2 cx [127])
>             (const:SI (unspec:SI [
>                         (symbol_ref/u:SI ("*.LC0") [flags 0x2])
>                     ] UNSPEC_GOTOFF)))) test.cc:44 213 {*leasi}
>      (expr_list:REG_EQUAL (symbol_ref/u:SI ("*.LC0") [flags 0x2])
>         (nil)))
> (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
>         (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
>             (mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
> 790 {*fop_df_comm_sse}
>      (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
>             (const_double:DF
> 2.9999999999999997371893933895137251965934410691261292e-4
> [0x0.9d495182a99308p-11]))
>         (nil)))
>
> Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc
>
>
Ok, Ilya.  I'll look at the problem this week.
diff mbox

Patch

Index: ira-color.c
===================================================================
--- ira-color.c	(revision 214576)
+++ ira-color.c	(working copy)
@@ -3239,9 +3239,10 @@ 
 	  ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass);
 	  ira_assert (bitmap_bit_p (subloop_node->all_allocnos,
 				    ALLOCNO_NUM (subloop_allocno)));
-	  if ((flag_ira_region == IRA_REGION_MIXED)
-	      && (loop_tree_node->reg_pressure[pclass]
-		  <= ira_class_hard_regs_num[pclass]))
+	  if ((flag_ira_region == IRA_REGION_MIXED
+	       && (loop_tree_node->reg_pressure[pclass]
+		   <= ira_class_hard_regs_num[pclass]))
+	      || regno == (int) REGNO (pic_offset_table_rtx))
 	    {
 	      if (! ALLOCNO_ASSIGNED_P (subloop_allocno))
 		{
Index: ira-emit.c
===================================================================
--- ira-emit.c	(revision 214576)
+++ ira-emit.c	(working copy)
@@ -620,7 +620,8 @@ 
 		  /* don't create copies because reload can spill an
 		     allocno set by copy although the allocno will not
 		     get memory slot.  */
-		  || ira_equiv_no_lvalue_p (regno)))
+		  || ira_equiv_no_lvalue_p (regno)
+		  || ALLOCNO_REGNO (allocno) == REGNO (pic_offset_table_rtx)))
 	    continue;
 	  original_reg = allocno_emit_reg (allocno);
 	  if (parent_allocno == NULL