Message ID | CAGQ9bdzCThi2A12F6x6gpwwEDVQR0RT_n5NbKUSu3qZHE39Zuw@mail.gmail.com |
---|---|
State | New |
Headers | show |
From: Konstantin Serebryany <konstantin.s.serebryany@gmail.com> Date: Mon, 3 Dec 2012 22:18:56 +0400 > On Mon, Dec 3, 2012 at 10:02 PM, David Miller <davem@davemloft.net> wrote: >> The only changes to libsantizier is to put __sparc__ checks where >> __powerpc__ checks exist in the unwind code. > > Like this? > > =================================================================== > --- asan/asan_linux.cc (revision 169136) > +++ asan/asan_linux.cc (working copy) > @@ -158,7 +158,9 @@ > stack->trace[0] = pc; > if ((max_s) > 1) { > stack->max_size = max_s; > -#if defined(__arm__) || defined(__powerpc__) || defined(__powerpc64__) > +#if defined(__arm__) || \ > + defined(__powerpc__) || defined(__powerpc64__) || \ > + defined(__sparc__) > _Unwind_Backtrace(Unwind_Trace, stack); > // Pop off the two ASAN functions from the backtrace. > stack->PopStackFrames(2); Yes, that's perfect. We could also add a __sparc__ block to sanitizer_stacktrace.cc:patch_pc(). The Sparc PC is actually 8 bytes after the caller's jump. Sparc has a delay slot, the place to return to is 2 instructions after the call/jump, and instructions are all 4 bytes long. > We either need to align the redzones by 32 always, or for some platforms. > Either is fine for me. I'm ambivalent as well.
On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote: > The LLVM implementation always used 32-byte alignment for stack redzones. > I never actually did any performance checking on x86 (32-byte aligned > vs 8-byte aligned), > although I suspect 32-byte aligned redzones should be ~2x faster. Why? The 32-byte realigning has significant cost, plus often one extra register is eaten (the DRAP register), even bigger cost on non-i?86/x86_64 targets. Jakub
On Mon, Dec 3, 2012 at 10:31 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote: >> The LLVM implementation always used 32-byte alignment for stack redzones. >> I never actually did any performance checking on x86 (32-byte aligned >> vs 8-byte aligned), >> although I suspect 32-byte aligned redzones should be ~2x faster. > > Why? The 32-byte realigning has significant cost, plus often one > extra register is eaten (the DRAP register), even bigger cost on > non-i?86/x86_64 targets. Maybe because my understanding of x86 is rather old (or plain wrong). I tried a micro benchmark on Xeon E5-2690 and unaligned strores are just slightly more expensive (< 10%). I'll do more benchmarks with the actual asan instrumentation ~ tomorrow. So, I guess we need to align the redzones conditionally for sparc, etc. --kcc > > Jakub
On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote: > The LLVM implementation always used 32-byte alignment for stack redzones. > I never actually did any performance checking on x86 (32-byte aligned > vs 8-byte aligned), > although I suspect 32-byte aligned redzones should be ~2x faster. If the ~2x faster comes from unaligned vs. aligned integer stores, I can't spot anything like that on e.g. __attribute__((noinline, noclone)) void foo (int *p) { int i; for (i = 0; i < 32; i++) p[i] = 0x12345678; } int main (int argc, const char **argv) { char buf[1024]; int *p = &buf[argc - 1]; int i; __builtin_printf ("%p\n", p); for (i = 0; i < 100000000; i++) foo (p); return 0; } Time with zero arguments (i.e. argc 1) is the same as time with 1, 2 or 3 arguments on SandyBridge CPU. I guess there could be penalties on page boundaries, etc., but I think hot caches is the usual operation on the stack. Jakub
I've committed a flag to the LLVM implementation to not realign the stack (-mllvm -asan-realign-stack=0). On Xeon W3690 I've measured no performance difference (tried C/C++ part of SPEC2006). So, on x86 it's probably the right thing to not realign the stack. --kcc On Mon, Dec 3, 2012 at 10:41 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote: >> The LLVM implementation always used 32-byte alignment for stack redzones. >> I never actually did any performance checking on x86 (32-byte aligned >> vs 8-byte aligned), >> although I suspect 32-byte aligned redzones should be ~2x faster. > > If the ~2x faster comes from unaligned vs. aligned integer stores, I can't > spot anything like that on e.g. > > __attribute__((noinline, noclone)) void > foo (int *p) > { > int i; > for (i = 0; i < 32; i++) > p[i] = 0x12345678; > } > > int > main (int argc, const char **argv) > { > char buf[1024]; > int *p = &buf[argc - 1]; > int i; > __builtin_printf ("%p\n", p); > for (i = 0; i < 100000000; i++) > foo (p); > return 0; > } > > Time with zero arguments (i.e. argc 1) is the same as time with 1, 2 or 3 > arguments on SandyBridge CPU. I guess there could be penalties on page > boundaries, etc., but I think hot caches is the usual operation on the > stack. > > Jakub
=================================================================== --- asan/asan_linux.cc (revision 169136) +++ asan/asan_linux.cc (working copy) @@ -158,7 +158,9 @@ stack->trace[0] = pc; if ((max_s) > 1) { stack->max_size = max_s; -#if defined(__arm__) || defined(__powerpc__) || defined(__powerpc64__) +#if defined(__arm__) || \ + defined(__powerpc__) || defined(__powerpc64__) || \ + defined(__sparc__) _Unwind_Backtrace(Unwind_Trace, stack); // Pop off the two ASAN functions from the backtrace. stack->PopStackFrames(2);