Message ID | 4C41BD52.5040905@codesourcery.com |
---|---|
State | New |
Headers | show |
On Sat, Jul 17, 2010 at 7:25 AM, Bernd Schmidt <bernds@codesourcery.com> wrote: > On 07/17/2010 04:38 AM, H.J. Lu wrote: >> This caused: >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44970 > > Apparently, the sse_prologue_save_insn is broken. > It is more than that. It failed to boostrap on Linux/ia32 when configured with --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --with-fpmath=sse
On 17 Jul 2010, at 16:03, H.J. Lu wrote: > On Sat, Jul 17, 2010 at 7:25 AM, Bernd Schmidt <bernds@codesourcery.com > > wrote: >> On 07/17/2010 04:38 AM, H.J. Lu wrote: >>> This caused: >>> >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44970 >> >> Apparently, the sse_prologue_save_insn is broken. >> > > It is more than that. It failed to boostrap on Linux/ia32 when > configured with > > --enable-clocale=gnu --with-system-zlib --enable-shared > --with-demangler-in-ld --with-fpmath=sse I think it breaks powerpc bootstrap too. Iain
On 07/17/2010 05:03 PM, H.J. Lu wrote: > It is more than that. It failed to boostrap on Linux/ia32 when > configured with > > --enable-clocale=gnu --with-system-zlib --enable-shared > --with-demangler-in-ld --with-fpmath=sse I can't seem to reproduce this. Is that the full command line? Bernd
On Sat, Jul 17, 2010 at 8:59 AM, Bernd Schmidt <bernds@codesourcery.com> wrote: > On 07/17/2010 05:03 PM, H.J. Lu wrote: >> It is more than that. It failed to boostrap on Linux/ia32 when >> configured with >> >> --enable-clocale=gnu --with-system-zlib --enable-shared >> --with-demangler-in-ld --with-fpmath=sse > > I can't seem to reproduce this. Is that the full command line? > > I used: ../src-trunk/configure \ --enable-clocale=gnu --with-system-zlib --enable-shared --with- demangler-in-ld -with-plugin-ld=ld.gold --enable-gold --with-fpmath=sse on Fedora 12/ia32.
On 07/17/2010 06:14 PM, H.J. Lu wrote: > On Sat, Jul 17, 2010 at 8:59 AM, Bernd Schmidt <bernds@codesourcery.com> wrote: >> On 07/17/2010 05:03 PM, H.J. Lu wrote: >>> It is more than that. It failed to boostrap on Linux/ia32 when >>> configured with >>> >>> --enable-clocale=gnu --with-system-zlib --enable-shared >>> --with-demangler-in-ld --with-fpmath=sse >> >> I can't seem to reproduce this. Is that the full command line? >> >> > > I used: > > ../src-trunk/configure \ > --enable-clocale=gnu --with-system-zlib --enable-shared --with- > demangler-in-ld -with-plugin-ld=ld.gold --enable-gold --with-fpmath=sse > > on Fedora 12/ia32. I'm on Gentoo, without gold - not sure whether that made a difference, but I'm not seeing these failures. I don't have access to SPEC2k6 either. Can you isolate any testcases? Bernd
On Sat, Jul 17, 2010 at 10:17 AM, Bernd Schmidt <bernds@codesourcery.com> wrote: > On 07/17/2010 06:14 PM, H.J. Lu wrote: >> On Sat, Jul 17, 2010 at 8:59 AM, Bernd Schmidt <bernds@codesourcery.com> wrote: >>> On 07/17/2010 05:03 PM, H.J. Lu wrote: >>>> It is more than that. It failed to boostrap on Linux/ia32 when >>>> configured with >>>> >>>> --enable-clocale=gnu --with-system-zlib --enable-shared >>>> --with-demangler-in-ld --with-fpmath=sse >>> >>> I can't seem to reproduce this. Is that the full command line? >>> >>> >> >> I used: >> >> ../src-trunk/configure \ >> --enable-clocale=gnu --with-system-zlib --enable-shared --with- >> demangler-in-ld -with-plugin-ld=ld.gold --enable-gold --with-fpmath=sse >> >> on Fedora 12/ia32. > > I'm on Gentoo, without gold - not sure whether that made a difference, Is that possible for you to install Fedora 12/13? 64bit Fedora is fine. You can bootstrap 32bit gcc on 64bit Fedora. > but I'm not seeing these failures. I don't have access to SPEC2k6 > either. Can you isolate any testcases? > SPEC CPU 2006 failure is the run-time comparison failure. It won't be easy to find a small testcase. Gcc bootstrap comparison failure and testsuite regressions are much easier to debug. Thanks.
On Sat, Jul 17, 2010 at 8:59 AM, Bernd Schmidt <bernds@codesourcery.com> wrote: > On 07/17/2010 05:03 PM, H.J. Lu wrote: >> It is more than that. It failed to boostrap on Linux/ia32 when >> configured with >> >> --enable-clocale=gnu --with-system-zlib --enable-shared >> --with-demangler-in-ld --with-fpmath=sse > > I can't seem to reproduce this. Is that the full command line? > Many people have reported bootstrap failure on Linux/ia32. May I suggest you reproduce it on a different Linux/ia32 OS? I know it can be reproduced on Fedora 12/13. Thanks.
On 07/17/2010 07:25 AM, Bernd Schmidt wrote: > leaq 0(,%rax,4), %rcx > movl $.L2, %eax > subq %rcx, %rax > jmp *%rax I've often thought this was over-engineering in the x86_64 abi. This jump table is trading memory bandwidth for unpredictability in the branch target. I've often wondered if we'd get better performance if we changed to a simple comparison against zero. I.e. test %al,%al jz 1f // 8 xmm stores 1: H.J., do you think you'd be able to measure performance on this? r~
On Mon, Jul 19, 2010 at 8:41 AM, Richard Henderson <rth@redhat.com> wrote: > On 07/17/2010 07:25 AM, Bernd Schmidt wrote: >> leaq 0(,%rax,4), %rcx >> movl $.L2, %eax >> subq %rcx, %rax >> jmp *%rax > > I've often thought this was over-engineering in the x86_64 abi. > This jump table is trading memory bandwidth for unpredictability > in the branch target. > > I've often wondered if we'd get better performance if we changed > to a simple comparison against zero. I.e. > > test %al,%al > jz 1f > // 8 xmm stores > 1: > > H.J., do you think you'd be able to measure performance on this? > Sure.
Richard Henderson <rth@redhat.com> writes: > On 07/17/2010 07:25 AM, Bernd Schmidt wrote: >> leaq 0(,%rax,4), %rcx >> movl $.L2, %eax >> subq %rcx, %rax >> jmp *%rax > > I've often thought this was over-engineering in the x86_64 abi. > This jump table is trading memory bandwidth for unpredictability > in the branch target. The other problem with the jump is that if you misdeclare the prototype and nothing correct is passed in %eax then you get a random jump somewhere which tends to be hard to debug. This has happened in the past when porting existing code. Just alone getting rid of that would be worth changing it. varargs shouldn't be that time critical anyways -Andi
On Mon, Jul 19, 2010 at 9:09 AM, Andi Kleen <andi@firstfloor.org> wrote: > Richard Henderson <rth@redhat.com> writes: > >> On 07/17/2010 07:25 AM, Bernd Schmidt wrote: >>> leaq 0(,%rax,4), %rcx >>> movl $.L2, %eax >>> subq %rcx, %rax >>> jmp *%rax >> >> I've often thought this was over-engineering in the x86_64 abi. >> This jump table is trading memory bandwidth for unpredictability >> in the branch target. > > The other problem with the jump is that if you misdeclare > the prototype and nothing correct is passed in %eax FWIW, we DON'T support varargs without correct prototype on x86-64. See AVX support in x86-64 psABI for details.
On 07/17/10 08:25, Bernd Schmidt wrote: > On 07/17/2010 04:38 AM, H.J. Lu wrote: > >> This caused: >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44970 >> > Apparently, the sse_prologue_save_insn is broken. > > diff -dru old/nest-stdar-1.s new/nest-stdar-1.s > --- old/nest-stdar-1.s 2010-07-17 14:10:40.308605357 +0000 > +++ new/nest-stdar-1.s 2010-07-17 14:00:30.592312121 +0000 > @@ -9,25 +9,24 @@ > subq $48, %rsp > .cfi_def_cfa_offset 56 > leaq 0(,%rax,4), %rcx > - leaq 39(%rsp), %rdx > movl $.L2, %eax > subq %rcx, %rax > jmp *%rax > - movaps %xmm7, -15(%rdx) > - movaps %xmm6, -31(%rdx) > - movaps %xmm5, -47(%rdx) > - movaps %xmm4, -63(%rdx) > - movaps %xmm3, -79(%rdx) > - movaps %xmm2, -95(%rdx) > - movaps %xmm1, -111(%rdx) > - movaps %xmm0, -127(%rdx) > + movaps %xmm7, 24(%rsp) > + movaps %xmm6, 8(%rsp) > + movaps %xmm5, -8(%rsp) > + movaps %xmm4, -24(%rsp) > + movaps %xmm3, -40(%rsp) > + movaps %xmm2, -56(%rsp) > + movaps %xmm1, -72(%rsp) > + movaps %xmm0, -88(%rsp) > > It's implementing a crazy jump table, which requires that all insns have > the same length, which in turn requires that no one modifies the address > in the pattern. > Unreal. Anytime I see such fragile code I want to cry -- then I find out it's for varargs and I want to scream. > I can fix this testcase with the patch below, but I'll leave it for the > x86 maintainers to choose this fix or another. > Yea, let's let the x86 maintainers fix this -- presumably they'll find something that doesn't rely upon exact instruction lengths. jeff
> On 07/17/2010 07:25 AM, Bernd Schmidt wrote: > > leaq 0(,%rax,4), %rcx > > movl $.L2, %eax > > subq %rcx, %rax > > jmp *%rax > > I've often thought this was over-engineering in the x86_64 abi. > This jump table is trading memory bandwidth for unpredictability > in the branch target. > > I've often wondered if we'd get better performance if we changed > to a simple comparison against zero. I.e. > > test %al,%al > jz 1f > // 8 xmm stores > 1: > > H.J., do you think you'd be able to measure performance on this? THe orginal problem was the fact that early K8 chips had no way of effectively storing SSE register to memory whithout knowing its type. So the stores in prologue executed very slow when reformating happent. Same reason was for not having callee saved/restored SSE regs. On current chips this is not big issue, so I do not care what way we output. In fact I used to have patch for doing the jz but lost it. I think we might keep supporting both to get some checking that ABI is not terribly broken (i.e. that no other copmilers just feeds rax with random value, but always by number of args). Honza > > > > r~
On Mon, Jul 19, 2010 at 09:15:47AM -0700, H.J. Lu wrote: > On Mon, Jul 19, 2010 at 9:09 AM, Andi Kleen <andi@firstfloor.org> wrote: > > Richard Henderson <rth@redhat.com> writes: > > > >> On 07/17/2010 07:25 AM, Bernd Schmidt wrote: > >>> leaq 0(,%rax,4), %rcx > >>> movl $.L2, %eax > >>> subq %rcx, %rax > >>> jmp *%rax > >> > >> I've often thought this was over-engineering in the x86_64 abi. > >> This jump table is trading memory bandwidth for unpredictability > >> in the branch target. > > > > The other problem with the jump is that if you misdeclare > > the prototype and nothing correct is passed in %eax > > FWIW, we DON'T support varargs without correct prototype > on x86-64. See AVX support in x86-64 psABI for details. Yes, it's the same on non AVX. But still if it goes wrong (and in practice it sometimes does) it's better if it's a smooth landing than if it jumps randomly over the address space. -Andi
On Mon, Jul 19, 2010 at 8:56 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Jul 19, 2010 at 8:41 AM, Richard Henderson <rth@redhat.com> wrote: >> On 07/17/2010 07:25 AM, Bernd Schmidt wrote: >>> leaq 0(,%rax,4), %rcx >>> movl $.L2, %eax >>> subq %rcx, %rax >>> jmp *%rax >> >> I've often thought this was over-engineering in the x86_64 abi. >> This jump table is trading memory bandwidth for unpredictability >> in the branch target. >> >> I've often wondered if we'd get better performance if we changed >> to a simple comparison against zero. I.e. >> >> test %al,%al >> jz 1f >> // 8 xmm stores >> 1: >> >> H.J., do you think you'd be able to measure performance on this? >> > > Sure. Just to be clear. I can test performance impact if there is a patch. But I don't have time to create a patch in the near future.
diff -dru old/nest-stdar-1.s new/nest-stdar-1.s --- old/nest-stdar-1.s 2010-07-17 14:10:40.308605357 +0000 +++ new/nest-stdar-1.s 2010-07-17 14:00:30.592312121 +0000 @@ -9,25 +9,24 @@ subq $48, %rsp .cfi_def_cfa_offset 56 leaq 0(,%rax,4), %rcx - leaq 39(%rsp), %rdx movl $.L2, %eax subq %rcx, %rax jmp *%rax - movaps %xmm7, -15(%rdx) - movaps %xmm6, -31(%rdx) - movaps %xmm5, -47(%rdx) - movaps %xmm4, -63(%rdx) - movaps %xmm3, -79(%rdx) - movaps %xmm2, -95(%rdx) - movaps %xmm1, -111(%rdx) - movaps %xmm0, -127(%rdx) + movaps %xmm7, 24(%rsp) + movaps %xmm6, 8(%rsp) + movaps %xmm5, -8(%rsp) + movaps %xmm4, -24(%rsp) + movaps %xmm3, -40(%rsp) + movaps %xmm2, -56(%rsp) + movaps %xmm1, -72(%rsp) + movaps %xmm0, -88(%rsp) It's implementing a crazy jump table, which requires that all insns have the same length, which in turn requires that no one modifies the address in the pattern. I can fix this testcase with the patch below, but I'll leave it for the x86 maintainers to choose this fix or another. Bernd