Message ID | 20150609125703.GA10247@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
On Tue, Jun 9, 2015 at 2:57 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Tue, Jun 09, 2015 at 02:32:07PM +0200, Uros Bizjak wrote: >> > - emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0))); >> > + addr = XEXP (part[1][0], 0); >> > + if (TARGET_TLS_DIRECT_SEG_REFS) >> > + { >> > + struct ix86_address parts; >> > + int ok = ix86_decompose_address (addr, &parts); >> > + gcc_assert (ok); >> > + if (parts.seg == DEFAULT_TLS_SEG_REG) >> > + { >> > + /* It is not valid to use %gs: or %fs: in >> > + lea though, so we need to remove it from the >> > + address used for lea and add it to each individual >> > + memory loads instead. */ >> > + addr = copy_rtx (addr); >> > + rtx *x = &addr; >> > + while (GET_CODE (*x) == PLUS) >> >> Why not use RTX iterators here? IMO, it would be much more readable. > > Do you mean something like this? It is larger and don't see readability > advantages there at all (plus the 4.8/4.9 backports can't use that anyway). > But if you prefer it, I can retest it. I'm afraid that simple scan loop won't work correctly on x32. There are some issues with UNSPEC_TP for this target, so we have to generate zero_extend of SImode UNSPEC, e.g.: (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) as can be seen in get_thread_pointer to construct the address. It looks that your loop won't find the UNSPEC_TP tag in the above case. Uros.
On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote: > I'm afraid that simple scan loop won't work correctly on x32. There > are some issues with UNSPEC_TP for this target, so we have to generate > zero_extend of SImode UNSPEC, e.g.: > > (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) > > as can be seen in get_thread_pointer to construct the address. It > looks that your loop won't find the UNSPEC_TP tag in the above case. You're right, for -m32 it would need to start with rtx *x = &addr; + while (GET_CODE (*x) == ZERO_EXTEND + || GET_CODE (*x) == AND + || GET_CODE (*x) == SUBREG) + x = &XEXP (*x, 0); to get at the PLUS. Now, with either the original patch with the above ammendment, or with the iterators, the question is what to do with the UNSPEC_TP for -m32. Is it ok to just use addr32 on the lea and use normal Pmode (== DImode) addressing for the memory reads (if I read the code well, that is what it does right now for the non-TLS ones: if (GET_MODE (base) != Pmode) base = gen_rtx_REG (Pmode, REGNO (base)); )? Then we'd need to change tls_base mode to Pmode. Jakub
On Tue, Jun 9, 2015 at 3:39 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote: >> I'm afraid that simple scan loop won't work correctly on x32. There >> are some issues with UNSPEC_TP for this target, so we have to generate >> zero_extend of SImode UNSPEC, e.g.: >> >> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) >> >> as can be seen in get_thread_pointer to construct the address. It >> looks that your loop won't find the UNSPEC_TP tag in the above case. > > You're right, for -m32 it would need to start with > rtx *x = &addr; > + while (GET_CODE (*x) == ZERO_EXTEND > + || GET_CODE (*x) == AND > + || GET_CODE (*x) == SUBREG) > + x = &XEXP (*x, 0); > to get at the PLUS. Now, with either the original patch with the above > ammendment, or with the iterators, the question is what to do > with the UNSPEC_TP for -m32. Is it ok to just use addr32 on the > lea and use normal Pmode (== DImode) addressing for the memory reads > (if I read the code well, that is what it does right now for the non-TLS > ones: > if (GET_MODE (base) != Pmode) > base = gen_rtx_REG (Pmode, REGNO (base)); > )? Then we'd need to change tls_base mode to Pmode. (I assume you referred to -mx32 above. Please also note that -mx32 uses Pmode == SImode by default). Yes, please emit zero-extended address load to a DImode register, and use this register with UNSPEC_TP to form final DImode address. The LEA that will be generated from address load will use DImode inputs due to %H operand modifier and will output to (implicitly zero-extended) SImode register. FYI, you just hit two special cases here: - addresses that mention %fs and %gs are always in DImode - LEA input operands are also always in DImode, but compiler already takes care of it. Uros.
On Tue, Jun 9, 2015 at 3:39 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote: >> I'm afraid that simple scan loop won't work correctly on x32. There >> are some issues with UNSPEC_TP for this target, so we have to generate >> zero_extend of SImode UNSPEC, e.g.: >> >> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) >> >> as can be seen in get_thread_pointer to construct the address. It >> looks that your loop won't find the UNSPEC_TP tag in the above case. > > You're right, for -m32 it would need to start with > rtx *x = &addr; > + while (GET_CODE (*x) == ZERO_EXTEND > + || GET_CODE (*x) == AND > + || GET_CODE (*x) == SUBREG) > + x = &XEXP (*x, 0); Oh, you can use SImode_address_operand predicate here. Uros.
On Tue, Jun 09, 2015 at 04:12:33PM +0200, Uros Bizjak wrote: > On Tue, Jun 9, 2015 at 3:39 PM, Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Jun 09, 2015 at 03:21:55PM +0200, Uros Bizjak wrote: > >> I'm afraid that simple scan loop won't work correctly on x32. There > >> are some issues with UNSPEC_TP for this target, so we have to generate > >> zero_extend of SImode UNSPEC, e.g.: > >> > >> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) > >> > >> as can be seen in get_thread_pointer to construct the address. It > >> looks that your loop won't find the UNSPEC_TP tag in the above case. > > > > You're right, for -m32 it would need to start with Yeah, I meant -mx32 (which I have no experience with nor spare time for). > > rtx *x = &addr; > > + while (GET_CODE (*x) == ZERO_EXTEND > > + || GET_CODE (*x) == AND > > + || GET_CODE (*x) == SUBREG) > > + x = &XEXP (*x, 0); > > Oh, you can use SImode_address_operand predicate here. Do I need to loop, or can there be just one SImode_address_operand code? Do you want to use the iterators (as in the second patch) or not (then is if (SImode_address_operand (addr, VOIDmode)) x = &XEXP (addr, 0); ok)? Is Pmode always SImode for -mx32, or depending on some switch or something? Would it be acceptable to just guard the changes in the patch with !TARGET_X32 and let H.J. deal with that target? I'm afraid I'm lost when to ZERO_EXTEND addr (if needed at all), etc. Jakub
On Tue, Jun 9, 2015 at 4:26 PM, Jakub Jelinek <jakub@redhat.com> wrote: >> >> I'm afraid that simple scan loop won't work correctly on x32. There >> >> are some issues with UNSPEC_TP for this target, so we have to generate >> >> zero_extend of SImode UNSPEC, e.g.: >> >> >> >> (plus:DI (zero_extend:DI (unspec:SI [...] UNSPEC_TP) (reg:DI ...)) >> >> >> >> as can be seen in get_thread_pointer to construct the address. It >> >> looks that your loop won't find the UNSPEC_TP tag in the above case. >> > >> > You're right, for -m32 it would need to start with > > Yeah, I meant -mx32 (which I have no experience with nor spare time for). > >> > rtx *x = &addr; >> > + while (GET_CODE (*x) == ZERO_EXTEND >> > + || GET_CODE (*x) == AND >> > + || GET_CODE (*x) == SUBREG) >> > + x = &XEXP (*x, 0); >> >> Oh, you can use SImode_address_operand predicate here. > > Do I need to loop, or can there be just one SImode_address_operand IIRC, apart from the whole address, only UNSPEC_TP can be zero_extended. It is a hardware "feature" (== HW bug) that addr32 doesn't apply to segment registers. > code? Do you want to use the iterators (as in the second patch) or not > (then is > if (SImode_address_operand (addr, VOIDmode)) > x = &XEXP (addr, 0); > ok)? Is Pmode always SImode for -mx32, or depending on some switch or Nope, it depends on -maddress-mode switch, and can be SImode or DImode. > something? Would it be acceptable to just guard the changes in the patch > with !TARGET_X32 and let H.J. deal with that target? I'm afraid I'm lost > when to ZERO_EXTEND addr (if needed at all), etc. If you wish, I can take your patch and take if further. -mx32 is a delicate beast... Uros. > > Jakub
On Tue, Jun 09, 2015 at 06:16:32PM +0200, Uros Bizjak wrote: > > something? Would it be acceptable to just guard the changes in the patch > > with !TARGET_X32 and let H.J. deal with that target? I'm afraid I'm lost > > when to ZERO_EXTEND addr (if needed at all), etc. > > If you wish, I can take your patch and take if further. -mx32 is a > delicate beast... If you could, it would be appreciated, I'm quite busy with OpenMP 4.1 stuff now. Note that for -m64/-mx32 it will be much harder to create a reproducer, because to trigger the bug one has to convince the register allocator to allocate the lhs of the load in certain registers (not that hard), but also the index register (to be scaled, also not that hard) and also the register holding the tls symbol immediate. Wonder if one has to keep all but the two registers live across the load or something similar. Jakub
--- gcc/config/i386/i386.c.jj 2015-06-08 15:41:19.000000000 +0200 +++ gcc/config/i386/i386.c 2015-06-09 14:42:18.357849227 +0200 @@ -22866,7 +22866,7 @@ ix86_split_long_move (rtx operands[]) Do an lea to the last part and use only one colliding move. */ else if (collisions > 1) { - rtx base; + rtx base, addr, tls_base = NULL_RTX; collisions = 1; @@ -22877,10 +22877,48 @@ ix86_split_long_move (rtx operands[]) if (GET_MODE (base) != Pmode) base = gen_rtx_REG (Pmode, REGNO (base)); - emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0))); + addr = XEXP (part[1][0], 0); + if (TARGET_TLS_DIRECT_SEG_REFS) + { + struct ix86_address parts; + int ok = ix86_decompose_address (addr, &parts); + gcc_assert (ok); + if (parts.seg == DEFAULT_TLS_SEG_REG) + { + /* It is not valid to use %gs: or %fs: in + lea though, so we need to remove it from the + address used for lea and add it to each individual + memory loads instead. */ + addr = copy_rtx (addr); + subrtx_ptr_iterator::array_type array; + FOR_EACH_SUBRTX_PTR (iter, array, &addr, NONCONST) + { + rtx *x = *iter; + if (GET_CODE (*x) == PLUS) + { + for (i = 0; i < 2; i++) + if (GET_CODE (XEXP (*x, i)) == UNSPEC + && XINT (XEXP (*x, i), 1) == UNSPEC_TP) + { + tls_base = XEXP (*x, i); + *x = XEXP (*x, 1 - i); + break; + } + if (tls_base) + break; + } + } + gcc_assert (tls_base); + } + } + emit_insn (gen_rtx_SET (base, addr)); + if (tls_base) + base = gen_rtx_PLUS (GET_MODE (base), base, tls_base); part[1][0] = replace_equiv_address (part[1][0], base); for (i = 1; i < nparts; i++) { + if (tls_base) + base = copy_rtx (base); tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i); part[1][i] = replace_equiv_address (part[1][i], tmp); } --- gcc/testsuite/gcc.dg/tls/pr66470.c.jj 2015-06-09 11:59:05.543954781 +0200 +++ gcc/testsuite/gcc.dg/tls/pr66470.c 2015-06-09 11:58:43.000000000 +0200 @@ -0,0 +1,29 @@ +/* PR target/66470 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target tls } */ + +extern __thread unsigned long long a[10]; +extern __thread struct S { int a, b; } b[10]; + +unsigned long long +foo (long x) +{ + return a[x]; +} + +struct S +bar (long x) +{ + return b[x]; +} + +#ifdef __SIZEOF_INT128__ +extern __thread unsigned __int128 c[10]; + +unsigned __int128 +baz (long x) +{ + return c[x]; +} +#endif