Message ID | 20121113143010.GA27142@gmail.com |
---|---|
State | New |
Headers | show |
On Tue, Nov 13, 2012 at 3:30 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> >>>> Since x32 runs in 64-bit mode, for address -0x40000300(%rax), hardware >> >>>> sign-extends displacement from 32-bits to 64-bits and adds it to %rax. >> >>>> But x32 wants 32-bit -0x40000300, not 64-bit -0x40000300. This patch >> >>>> uses 32-bit registers instead of 64-bit registers when displacement >> >>>> < -16*1024*1024. -16*1024*1024 is used instead of 0 so that we will >> >>>> still generate -16(%rsp) instead of -16(%esp). >> >>>> >> >>>> Tested it on Linux/x32. OK to install? >> >>> >> >>> This problem uncovers a bug in the middle-end, so I guess it would be >> >>> better to fix it there. >> >>> >> >>> Uros. >> >> >> >> The problem is it isn't safe to transform >> >> >> >> (zero_extend:DI (plus:SI (FOO:SI) (const_int Y))) >> >> >> >> to >> >> >> >> (plus:DI (zero_extend:DI (FOO:SI)) (const_int Y)) >> >> >> >> when Y is negative and its absolute value is greater than FOO. However, >> >> we have to do this transformation since other parts of GCC depend on >> >> it. If we revert the fix for >> >> >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49721 >> >> >> >> we will get >> >> >> >> FAIL: gcc.c-torture/compile/990523-1.c -O3 -g (internal compiler error) >> >> FAIL: gcc.c-torture/compile/990523-1.c -O3 -g (test for excess errors) >> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-all-loo >> >> ps -finline-functions (internal compiler error) >> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-all-loo >> >> ps -finline-functions (test for excess errors) >> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-loops >> >> (internal compiler error) >> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-loops >> >> (test for excess errors) >> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer (internal compi >> >> ler error) >> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer (test for exces >> >> s errors) >> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -g (internal compiler error) >> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -g (test for excess errors) >> >> FAIL: gcc.dg/Warray-bounds.c (internal compiler error) >> >> FAIL: gcc.dg/Warray-bounds.c (test for excess errors) >> >> >> >> since we generate pseudo registers to convert SImode to DImode >> >> after reload. Fixing it requires significant changes. >> >> >> >> This is only a problem for 64-bit register address since the symbolic >> >> address is 32-bit. Using 32-bit base/index registers will work around >> >> this issue. >> > >> > This address >> > >> > (plus:DI (zero_extend:DI (FOO:SI)) (const_int Y)) >> > >> > is OK for x32 as long as Y, which is encoded as 32-bit immediate, >> > is zero-extend from 32-bit to 64-bit. SImode address does it. >> > My patch optimizes it a little bit by using SImode address only >> > for Y < -16*1024*1024. >> >> I was wondering, why we operate with constant -16*1024*1024? Should we >> use 0x7FFFFFF instead? Since the MSB is always zero, we are safe. >> > > We can check 0x7FFFFFF, i.e., disp < 0. I use -16*1024*1024, which > is also used to check legitimate address displacements for PIC, to > reduce code sizes for small negative displacements. Or we can always > encode negative displacements with zero-extension, including -1(%rsp). > >> Please add a fat ??? comment, why we paper-over this issue and repost >> the latest patch. I got lost in all the versions :( >> > > Here is the updated patch. > > gcc/ > > 2012-11-13 Eric Botcazou <ebotcazou@adacore.com> > H.J. Lu <hongjiu.lu@intel.com> > > PR target/55142 > * config/i386/i386.c (legitimize_pic_address): Properly handle > REG + CONST. > (ix86_print_operand_address): For x32, zero-extend negative > displacement if it < -16*1024*1024. > > gcc/testsuite/ > > 2012-11-13 H.J. Lu <hongjiu.lu@intel.com> > > PR target/55142 > * gcc.target/i386/pr55142-1.c: New file. > * gcc.target/i386/pr55142-2.c: Likewise. OK, for mainline SVN (with the reservation that middle-end fix was not found to be a viable solution, so target fix is papering-over real issue. Let's wait for the next victim... ). Oh, and please fix a typo of mine in the one line above the patch hunk; the modifier for SI addresses should be 'k', not 'l'. BTW: Do we need this patch also for 4.7? x32 address-mode is long by default there, but the problem doesn't trigger. Thanks, Uros.
On Tue, Nov 13, 2012 at 10:03 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Tue, Nov 13, 2012 at 3:30 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > >>> >>>> Since x32 runs in 64-bit mode, for address -0x40000300(%rax), hardware >>> >>>> sign-extends displacement from 32-bits to 64-bits and adds it to %rax. >>> >>>> But x32 wants 32-bit -0x40000300, not 64-bit -0x40000300. This patch >>> >>>> uses 32-bit registers instead of 64-bit registers when displacement >>> >>>> < -16*1024*1024. -16*1024*1024 is used instead of 0 so that we will >>> >>>> still generate -16(%rsp) instead of -16(%esp). >>> >>>> >>> >>>> Tested it on Linux/x32. OK to install? >>> >>> >>> >>> This problem uncovers a bug in the middle-end, so I guess it would be >>> >>> better to fix it there. >>> >>> >>> >>> Uros. >>> >> >>> >> The problem is it isn't safe to transform >>> >> >>> >> (zero_extend:DI (plus:SI (FOO:SI) (const_int Y))) >>> >> >>> >> to >>> >> >>> >> (plus:DI (zero_extend:DI (FOO:SI)) (const_int Y)) >>> >> >>> >> when Y is negative and its absolute value is greater than FOO. However, >>> >> we have to do this transformation since other parts of GCC depend on >>> >> it. If we revert the fix for >>> >> >>> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49721 >>> >> >>> >> we will get >>> >> >>> >> FAIL: gcc.c-torture/compile/990523-1.c -O3 -g (internal compiler error) >>> >> FAIL: gcc.c-torture/compile/990523-1.c -O3 -g (test for excess errors) >>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-all-loo >>> >> ps -finline-functions (internal compiler error) >>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-all-loo >>> >> ps -finline-functions (test for excess errors) >>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-loops >>> >> (internal compiler error) >>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-loops >>> >> (test for excess errors) >>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer (internal compi >>> >> ler error) >>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer (test for exces >>> >> s errors) >>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -g (internal compiler error) >>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -g (test for excess errors) >>> >> FAIL: gcc.dg/Warray-bounds.c (internal compiler error) >>> >> FAIL: gcc.dg/Warray-bounds.c (test for excess errors) >>> >> >>> >> since we generate pseudo registers to convert SImode to DImode >>> >> after reload. Fixing it requires significant changes. >>> >> >>> >> This is only a problem for 64-bit register address since the symbolic >>> >> address is 32-bit. Using 32-bit base/index registers will work around >>> >> this issue. >>> > >>> > This address >>> > >>> > (plus:DI (zero_extend:DI (FOO:SI)) (const_int Y)) >>> > >>> > is OK for x32 as long as Y, which is encoded as 32-bit immediate, >>> > is zero-extend from 32-bit to 64-bit. SImode address does it. >>> > My patch optimizes it a little bit by using SImode address only >>> > for Y < -16*1024*1024. >>> >>> I was wondering, why we operate with constant -16*1024*1024? Should we >>> use 0x7FFFFFF instead? Since the MSB is always zero, we are safe. >>> >> >> We can check 0x7FFFFFF, i.e., disp < 0. I use -16*1024*1024, which >> is also used to check legitimate address displacements for PIC, to >> reduce code sizes for small negative displacements. Or we can always >> encode negative displacements with zero-extension, including -1(%rsp). >> >>> Please add a fat ??? comment, why we paper-over this issue and repost >>> the latest patch. I got lost in all the versions :( >>> >> >> Here is the updated patch. >> >> gcc/ >> >> 2012-11-13 Eric Botcazou <ebotcazou@adacore.com> >> H.J. Lu <hongjiu.lu@intel.com> >> >> PR target/55142 >> * config/i386/i386.c (legitimize_pic_address): Properly handle >> REG + CONST. >> (ix86_print_operand_address): For x32, zero-extend negative >> displacement if it < -16*1024*1024. >> >> gcc/testsuite/ >> >> 2012-11-13 H.J. Lu <hongjiu.lu@intel.com> >> >> PR target/55142 >> * gcc.target/i386/pr55142-1.c: New file. >> * gcc.target/i386/pr55142-2.c: Likewise. > > OK, for mainline SVN (with the reservation that middle-end fix was not > found to be a viable solution, so target fix is papering-over real > issue. Let's wait for the next victim... ). That is true. > Oh, and please fix a typo of mine in the one line above the patch > hunk; the modifier for SI addresses should be 'k', not 'l'. Will do. > BTW: Do we need this patch also for 4.7? x32 address-mode is long by > default there, but the problem doesn't trigger. > This regression is triggered by revision 188008: http://gcc.gnu.org/ml/gcc-cvs/2012-05/msg00038.html aka the un-sign-extension of sizetype constants. We can backport it to 4.7 branch if we want to be on the safer side. Thanks.
On Tue, Nov 13, 2012 at 7:23 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> >>>> Since x32 runs in 64-bit mode, for address -0x40000300(%rax), hardware >>>> >>>> sign-extends displacement from 32-bits to 64-bits and adds it to %rax. >>>> >>>> But x32 wants 32-bit -0x40000300, not 64-bit -0x40000300. This patch >>>> >>>> uses 32-bit registers instead of 64-bit registers when displacement >>>> >>>> < -16*1024*1024. -16*1024*1024 is used instead of 0 so that we will >>>> >>>> still generate -16(%rsp) instead of -16(%esp). >>>> >>>> >>>> >>>> Tested it on Linux/x32. OK to install? >>>> >>> >>>> >>> This problem uncovers a bug in the middle-end, so I guess it would be >>>> >>> better to fix it there. >>>> >>> >>>> >>> Uros. >>>> >> >>>> >> The problem is it isn't safe to transform >>>> >> >>>> >> (zero_extend:DI (plus:SI (FOO:SI) (const_int Y))) >>>> >> >>>> >> to >>>> >> >>>> >> (plus:DI (zero_extend:DI (FOO:SI)) (const_int Y)) >>>> >> >>>> >> when Y is negative and its absolute value is greater than FOO. However, >>>> >> we have to do this transformation since other parts of GCC depend on >>>> >> it. If we revert the fix for >>>> >> >>>> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49721 >>>> >> >>>> >> we will get >>>> >> >>>> >> FAIL: gcc.c-torture/compile/990523-1.c -O3 -g (internal compiler error) >>>> >> FAIL: gcc.c-torture/compile/990523-1.c -O3 -g (test for excess errors) >>>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-all-loo >>>> >> ps -finline-functions (internal compiler error) >>>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-all-loo >>>> >> ps -finline-functions (test for excess errors) >>>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-loops >>>> >> (internal compiler error) >>>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer -funroll-loops >>>> >> (test for excess errors) >>>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer (internal compi >>>> >> ler error) >>>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -fomit-frame-pointer (test for exces >>>> >> s errors) >>>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -g (internal compiler error) >>>> >> FAIL: gcc.c-torture/compile/pr41634.c -O3 -g (test for excess errors) >>>> >> FAIL: gcc.dg/Warray-bounds.c (internal compiler error) >>>> >> FAIL: gcc.dg/Warray-bounds.c (test for excess errors) >>>> >> >>>> >> since we generate pseudo registers to convert SImode to DImode >>>> >> after reload. Fixing it requires significant changes. >>>> >> >>>> >> This is only a problem for 64-bit register address since the symbolic >>>> >> address is 32-bit. Using 32-bit base/index registers will work around >>>> >> this issue. >>>> > >>>> > This address >>>> > >>>> > (plus:DI (zero_extend:DI (FOO:SI)) (const_int Y)) >>>> > >>>> > is OK for x32 as long as Y, which is encoded as 32-bit immediate, >>>> > is zero-extend from 32-bit to 64-bit. SImode address does it. >>>> > My patch optimizes it a little bit by using SImode address only >>>> > for Y < -16*1024*1024. >>>> >>>> I was wondering, why we operate with constant -16*1024*1024? Should we >>>> use 0x7FFFFFF instead? Since the MSB is always zero, we are safe. >>>> >>> >>> We can check 0x7FFFFFF, i.e., disp < 0. I use -16*1024*1024, which >>> is also used to check legitimate address displacements for PIC, to >>> reduce code sizes for small negative displacements. Or we can always >>> encode negative displacements with zero-extension, including -1(%rsp). >>> >>>> Please add a fat ??? comment, why we paper-over this issue and repost >>>> the latest patch. I got lost in all the versions :( >>>> >>> >>> Here is the updated patch. >>> >>> gcc/ >>> >>> 2012-11-13 Eric Botcazou <ebotcazou@adacore.com> >>> H.J. Lu <hongjiu.lu@intel.com> >>> >>> PR target/55142 >>> * config/i386/i386.c (legitimize_pic_address): Properly handle >>> REG + CONST. >>> (ix86_print_operand_address): For x32, zero-extend negative >>> displacement if it < -16*1024*1024. >>> >>> gcc/testsuite/ >>> >>> 2012-11-13 H.J. Lu <hongjiu.lu@intel.com> >>> >>> PR target/55142 >>> * gcc.target/i386/pr55142-1.c: New file. >>> * gcc.target/i386/pr55142-2.c: Likewise. >> >> OK, for mainline SVN (with the reservation that middle-end fix was not >> found to be a viable solution, so target fix is papering-over real >> issue. Let's wait for the next victim... ). > > That is true. > >> Oh, and please fix a typo of mine in the one line above the patch >> hunk; the modifier for SI addresses should be 'k', not 'l'. > > Will do. > >> BTW: Do we need this patch also for 4.7? x32 address-mode is long by >> default there, but the problem doesn't trigger. >> > > This regression is triggered by revision 188008: > > http://gcc.gnu.org/ml/gcc-cvs/2012-05/msg00038.html > > aka the un-sign-extension of sizetype constants. We can > backport it to 4.7 branch if we want to be on the safer side. Yes, please backport the patch to 4.7 after it lives a couple of days in mainline without problems. Thanks, Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 39ed32e..ee75d8e 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12195,7 +12195,6 @@ legitimize_pic_address (rtx orig, rtx reg) { rtx addr = orig; rtx new_rtx = orig; - rtx base; #if TARGET_MACHO if (TARGET_MACHO && !TARGET_64BIT) @@ -12400,20 +12399,33 @@ legitimize_pic_address (rtx orig, rtx reg) } else { - base = legitimize_pic_address (XEXP (addr, 0), reg); - new_rtx = legitimize_pic_address (XEXP (addr, 1), - base == reg ? NULL_RTX : reg); + rtx base = legitimize_pic_address (op0, reg); + enum machine_mode mode = GET_MODE (base); + new_rtx + = legitimize_pic_address (op1, base == reg ? NULL_RTX : reg); if (CONST_INT_P (new_rtx)) - new_rtx = plus_constant (Pmode, base, INTVAL (new_rtx)); + { + if (INTVAL (new_rtx) < -16*1024*1024 + || INTVAL (new_rtx) >= 16*1024*1024) + { + if (!x86_64_immediate_operand (new_rtx, mode)) + new_rtx = force_reg (mode, new_rtx); + new_rtx + = gen_rtx_PLUS (mode, force_reg (mode, base), new_rtx); + } + else + new_rtx = plus_constant (mode, base, INTVAL (new_rtx)); + } else { - if (GET_CODE (new_rtx) == PLUS && CONSTANT_P (XEXP (new_rtx, 1))) + if (GET_CODE (new_rtx) == PLUS + && CONSTANT_P (XEXP (new_rtx, 1))) { - base = gen_rtx_PLUS (Pmode, base, XEXP (new_rtx, 0)); + base = gen_rtx_PLUS (mode, base, XEXP (new_rtx, 0)); new_rtx = XEXP (new_rtx, 1); } - new_rtx = gen_rtx_PLUS (Pmode, base, new_rtx); + new_rtx = gen_rtx_PLUS (mode, base, new_rtx); } } } @@ -14504,6 +14516,29 @@ ix86_print_operand_address (FILE *file, rtx addr) gcc_assert (!code); code = 'l'; } + else if (code == 0 + && TARGET_X32 + && disp + && CONST_INT_P (disp) + && INTVAL (disp) < -16*1024*1024) + { + /* X32 runs in 64-bit mode, where displacement, DISP, in + address DISP(%r64), is encoded as 32-bit immediate sign- + extended from 32-bit to 64-bit. For -0x40000300(%r64), + address is %r64 + 0xffffffffbffffd00. When %r64 < + 0x40000300, like 0x37ffe064, address is 0xfffffffff7ffdd64, + which is invalid for x32. The correct address is %r64 + - 0x40000300 == 0xf7ffdd64. To properly encode + -0x40000300(%r64) for x32, we zero-extend negative + displacement by forcing addr32 prefix which truncates + 0xfffffffff7ffdd64 to 0xf7ffdd64. In theory, we should + zero-extend all negative displacements, including -1(%rsp). + However, for small negative displacements, sign-extension + won't cause overflow. We only zero-extend negative + displacements if they < -16*1024*1024, which is also used + to check legitimate address displacements for PIC. */ + code = 'k'; + } if (ASSEMBLER_DIALECT == ASM_ATT) { diff --git a/gcc/testsuite/gcc.target/i386/pr55142-1.c b/gcc/testsuite/gcc.target/i386/pr55142-1.c new file mode 100644 index 0000000..e6b5f12 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr55142-1.c @@ -0,0 +1,35 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-require-effective-target maybe_x32 } */ +/* { dg-require-effective-target fpic } */ +/* { dg-options "-O2 -mx32 -maddress-mode=long -fpic" } */ + +typedef int int32_t; +typedef unsigned int uint32_t; +typedef int32_t Elf32_Sword; +typedef struct +{ + Elf32_Sword d_tag; +} Elf32_Dyn; +struct link_map +{ + Elf32_Dyn *l_ld; + Elf32_Dyn *l_info[34]; +}; +extern struct link_map _dl_rtld_map __attribute__ ((visibility ("hidden"))); +static void elf_get_dynamic_info (struct link_map *l) +{ + Elf32_Dyn *dyn = l->l_ld; + Elf32_Dyn **info; + info = l->l_info; + while (dyn->d_tag != 0) + { + if ((uint32_t) (0x6ffffeff - dyn->d_tag) < 11) + info[0x6ffffeff - dyn->d_tag + 12] = dyn; + ++dyn; + } +} +void +foo (void) +{ + elf_get_dynamic_info (&_dl_rtld_map); +} diff --git a/gcc/testsuite/gcc.target/i386/pr55142-2.c b/gcc/testsuite/gcc.target/i386/pr55142-2.c new file mode 100644 index 0000000..598c524 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr55142-2.c @@ -0,0 +1,34 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-require-effective-target maybe_x32 } */ +/* { dg-require-effective-target fpic } */ +/* { dg-options "-O3 -mx32 -maddress-mode=long -fpic" } */ +/* { dg-final { scan-assembler-not "movl\[\\t \]*%edx,\[\\t \]*-1073742592\\(%r(.x|.i|.p|\[1-9\]*)\\)" } } */ + +typedef int int32_t; +typedef unsigned int uint32_t; +typedef uint32_t Elf32_Word; +typedef int32_t Elf32_Sword; +typedef uint32_t Elf32_Addr; +typedef struct { + Elf32_Sword d_tag; + union { + Elf32_Word d_val; + Elf32_Addr d_ptr; + } d_un; +} Elf32_Dyn; +struct link_map { + Elf32_Dyn *l_ld; + Elf32_Dyn *l_info[34 + 16 + 3 + 12 + 11]; +}; +void +elf_get_dynamic_info (struct link_map *l) +{ + Elf32_Dyn *dyn = l->l_ld; + Elf32_Dyn **info = l->l_info; + typedef Elf32_Word d_tag_utype; + while (dyn->d_tag != 0) { + if ((d_tag_utype) (0x6ffffeff - dyn->d_tag) < 11) + info[(0x6ffffeff - dyn->d_tag) + 34 + 16 + 3 + 12] = dyn; + ++dyn; + } +}