Message ID | CAFULd4Yt89-2SXfd3EAQMyfXfVFh_uz5at+yDRBvKCQ9gp5Mzw@mail.gmail.com |
---|---|
State | New |
Headers | show |
Uros Bizjak <ubizjak@gmail.com> writes: > Hello! > > Currently, we use %rbx as REAL_PIC_OFFSET_TABLE_REGNUM on x86_64. > Since this register gets marked as fixed reg in > ix86_conditional_register_usage, we get into troubles with insns that > use %rbx (cmpxchg, cpuid). According to x86_64 psABI, we are free to > use any register, so attached patch changes %rbx with %r15 (also > following the example in the psABI). This patch has no implications on > small code model (that doesn't use REAL_PIC_OFFSET_TABLE_REGNUM > anyway), but on medium and large code model fixes usage of cpuid.h > (please see PR 55712 [1]) and avoids a pair of xchgs around cmpxchg or > cpuid instructions. So everyone who worked around this and use r15 now broke. It would be probably better to just teach the register allocator to spill those registers as needed, to handle some asm() statement. Not sure how hard that would be. -Andi
In the case of cpuid, the code is hardly performance sensitive, and probably runs only at startup. An alternative solution for the broken code here is to move the result from rbx to another register, and to save/restore rbx. Currently, this is the only place in libgcc and newlib affected by this problem. Leif Ekblad ----- Original Message ----- From: "Andi Kleen" <andi@firstfloor.org> To: "Uros Bizjak" <ubizjak@gmail.com> Cc: <gcc-patches@gcc.gnu.org>; "Richard Henderson" <rth@redhat.com>; "Jakub Jelinek" <jakub@redhat.com>; <jh@suse.cz>; "H.J. Lu" <hjl.tools@gmail.com> Sent: Monday, December 24, 2012 10:32 PM Subject: Re: [RFC PATCH, i386]: Use %r15 for REAL_PIC_OFFSET_TABLE_REGNUM on x86_64 > Uros Bizjak <ubizjak@gmail.com> writes: > >> Hello! >> >> Currently, we use %rbx as REAL_PIC_OFFSET_TABLE_REGNUM on x86_64. >> Since this register gets marked as fixed reg in >> ix86_conditional_register_usage, we get into troubles with insns that >> use %rbx (cmpxchg, cpuid). According to x86_64 psABI, we are free to >> use any register, so attached patch changes %rbx with %r15 (also >> following the example in the psABI). This patch has no implications on >> small code model (that doesn't use REAL_PIC_OFFSET_TABLE_REGNUM >> anyway), but on medium and large code model fixes usage of cpuid.h >> (please see PR 55712 [1]) and avoids a pair of xchgs around cmpxchg or >> cpuid instructions. > > So everyone who worked around this and use r15 now broke. > > It would be probably better to just teach the register allocator > to spill those registers as needed, to handle some asm() statement. > Not sure how hard that would be. > > -Andi > > -- > ak@linux.intel.com -- Speaking for myself only
On Monday 24 December 2012 17:26:47 Leif Ekblad wrote: > In the case of cpuid, the code is hardly performance sensitive, and > probably runs only at startup. An alternative solution for the broken code > here is to move the result from rbx to another register, and to > save/restore rbx. Currently, this is the only place in libgcc and newlib > affected by this problem. it's not a question of performance. i can't remember how many various projects i've had to tweak the inline asm code to work with __PIC__ (either because it's going into shared library code or it's being built as a PIE). Andi's point is now we have to redo all of that work a 2nd time and handle two different cases depending on gcc version ? it'd be a _lot_ better if gcc were intelligent and end users didn't have to code crap like stuffing %ebx somewhere temporarily. -mike
On Tue, Dec 25, 2012 at 6:54 AM, Mike Frysinger <vapier@gentoo.org> wrote: > On Monday 24 December 2012 17:26:47 Leif Ekblad wrote: >> In the case of cpuid, the code is hardly performance sensitive, and >> probably runs only at startup. An alternative solution for the broken code >> here is to move the result from rbx to another register, and to >> save/restore rbx. Currently, this is the only place in libgcc and newlib >> affected by this problem. > > it's not a question of performance. i can't remember how many various > projects i've had to tweak the inline asm code to work with __PIC__ (either > because it's going into shared library code or it's being built as a PIE). > Andi's point is now we have to redo all of that work a 2nd time and handle two > different cases depending on gcc version ? it'd be a _lot_ better if gcc were > intelligent and end users didn't have to code crap like stuffing %ebx somewhere > temporarily. Please note that we are not talking about 32bit code, where this would make a difference, but for 64bit targets with -mcmodel=medium and -mcmodel=large exclusively. The default x64_64 -mcmodel=small doesn't use PIC register, other code models are rarely used, so I sincerely doubt that any %rbx workarounds were needed in the past for x86_64. Uros.
On Tuesday 25 December 2012 14:12:09 Uros Bizjak wrote: > On Tue, Dec 25, 2012 at 6:54 AM, Mike Frysinger <vapier@gentoo.org> wrote: > > On Monday 24 December 2012 17:26:47 Leif Ekblad wrote: > >> In the case of cpuid, the code is hardly performance sensitive, and > >> probably runs only at startup. An alternative solution for the broken > >> code here is to move the result from rbx to another register, and to > >> save/restore rbx. Currently, this is the only place in libgcc and > >> newlib affected by this problem. > > > > it's not a question of performance. i can't remember how many various > > projects i've had to tweak the inline asm code to work with __PIC__ > > (either because it's going into shared library code or it's being built > > as a PIE). Andi's point is now we have to redo all of that work a 2nd > > time and handle two different cases depending on gcc version ? it'd be > > a _lot_ better if gcc were intelligent and end users didn't have to code > > crap like stuffing %ebx somewhere temporarily. > > Please note that we are not talking about 32bit code, where this would > make a difference, but for 64bit targets with -mcmodel=medium and > -mcmodel=large exclusively. The default x64_64 -mcmodel=small doesn't > use PIC register, other code models are rarely used, so I sincerely > doubt that any %rbx workarounds were needed in the past for x86_64. i'm aware. the comment still applies. you're breaking asm code that used to work because the gcc inline asm code isn't intelligent enough (currently) to transparently handle the PIC register for the user. -mike
On Tue, Dec 25, 2012 at 8:27 PM, Mike Frysinger <vapier@gentoo.org> wrote: >> >> In the case of cpuid, the code is hardly performance sensitive, and >> >> probably runs only at startup. An alternative solution for the broken >> >> code here is to move the result from rbx to another register, and to >> >> save/restore rbx. Currently, this is the only place in libgcc and >> >> newlib affected by this problem. >> > >> > it's not a question of performance. i can't remember how many various >> > projects i've had to tweak the inline asm code to work with __PIC__ >> > (either because it's going into shared library code or it's being built >> > as a PIE). Andi's point is now we have to redo all of that work a 2nd >> > time and handle two different cases depending on gcc version ? it'd be >> > a _lot_ better if gcc were intelligent and end users didn't have to code >> > crap like stuffing %ebx somewhere temporarily. >> >> Please note that we are not talking about 32bit code, where this would >> make a difference, but for 64bit targets with -mcmodel=medium and >> -mcmodel=large exclusively. The default x64_64 -mcmodel=small doesn't >> use PIC register, other code models are rarely used, so I sincerely >> doubt that any %rbx workarounds were needed in the past for x86_64. > > i'm aware. the comment still applies. you're breaking asm code that used to > work because the gcc inline asm code isn't intelligent enough (currently) to > transparently handle the PIC register for the user. Can you please post a real-world example, where using %r15 would break existing code? Uros.
> Can you please post a real-world example, where using %r15 would break > existing code? I used to run into problems like this when porting code to gcc from icc or VC. A lot of hyper optimized inline assembler snippets wants to use all registers and icc/VC support that. With gcc usually had to add some manual push/pops. In older gcc versions usually more than one because reload tended to error out otherwise. So by try and error used the non fixed registers, but let the compiler know about the others. This case would break. In 64bit it was less a problem than on 32bit, but could still happen. Admittedly medium is somewhat obscure and rarely used (and very slow), but someone could be still using it. -Andi
On Wed, Dec 26, 2012 at 9:16 PM, Andi Kleen <andi@firstfloor.org> wrote: >> Can you please post a real-world example, where using %r15 would break >> existing code? > > I used to run into problems like this when porting code to gcc from icc or VC. > A lot of hyper optimized inline assembler snippets wants to use all registers > and icc/VC support that. With gcc usually had to add some manual > push/pops. In older gcc versions usually more than one because reload > tended to error out otherwise. > > So by try and error used the non fixed registers, but let the compiler know > about the others. This case would break. > > In 64bit it was less a problem than on 32bit, but could still happen. > > Admittedly medium is somewhat obscure and rarely used (and very slow), > but someone could be still using it. The alternative approach is changing cpuid definition in cpuid.h (as in attached patch) to preserve %rbx, but we can't detect various code model settings there. Since the change depends on the definition of __PIC__, we unnecessary preserve %rbx also for -mcmodel=small. However, code that involves cpuid is rarely performance critical, so perhaps we can live with this tradeoff. IMO, this patch can be used on 4.7 branch, too. Uros.
On 12/27/2012 12:08 AM, Uros Bizjak wrote: > The alternative approach is changing cpuid definition in cpuid.h (as > in attached patch) to preserve %rbx, but we can't detect various code > model settings there. Since the change depends on the definition of > __PIC__, we unnecessary preserve %rbx also for -mcmodel=small. Certainly we can. We also control the preprocessor defines. All that's needed is that we create one for the code model. r~
Index: i386.h =================================================================== --- i386.h (revision 194703) +++ i386.h (working copy) @@ -1173,7 +1173,7 @@ the pic register when possible. The change is visible after the prologue has been emitted. */ -#define REAL_PIC_OFFSET_TABLE_REGNUM BX_REG +#define REAL_PIC_OFFSET_TABLE_REGNUM (TARGET_64BIT ? R15_REG : BX_REG) #define PIC_OFFSET_TABLE_REGNUM \ ((TARGET_64BIT && ix86_cmodel == CM_SMALL_PIC) \ Index: i386.md =================================================================== --- i386.md (revision 194703) +++ i386.md (working copy) @@ -301,6 +301,8 @@ (R11_REG 40) (R12_REG 41) (R13_REG 42) + (R14_REG 43) + (R15_REG 44) (XMM8_REG 45) (XMM9_REG 46) (XMM10_REG 47)