Patchwork Sparc ASAN

login
register
mail settings
Submitter Konstantin Serebryany
Date Dec. 3, 2012, 6:18 p.m.
Message ID <CAGQ9bdzCThi2A12F6x6gpwwEDVQR0RT_n5NbKUSu3qZHE39Zuw@mail.gmail.com>
Download mbox | patch
Permalink /patch/203410/
State New
Headers show

Comments

Konstantin Serebryany - Dec. 3, 2012, 6:18 p.m.
On Mon, Dec 3, 2012 at 10:02 PM, David Miller <davem@davemloft.net> wrote:
> From: Konstantin Serebryany <konstantin.s.serebryany@gmail.com>
> Date: Tue, 27 Nov 2012 18:12:00 +0400
>
>> On Wed, Nov 21, 2012 at 9:05 PM, Konstantin Serebryany
>> <konstantin.s.serebryany@gmail.com> wrote:
>>> On Wed, Nov 21, 2012 at 8:40 PM, David Miller <davem@davemloft.net> wrote:
>>>> From: Konstantin Serebryany <konstantin.s.serebryany@gmail.com>
>>>> Date: Wed, 21 Nov 2012 19:39:52 +0400
>>>>
>>>>> There are various other things that asan library does not support.
>>>>
>>>> I'm trying to understand why making the page size variable
>>>> is such a difficult endeavour.
>>>
>>> Maybe it's not *that* difficult.
>>> Looking at it carefully, the major problem I can see is that some
>>> other constants are defined based on this one.
>>> Give me a few days to resolve it...
>>> http://code.google.com/p/address-sanitizer/issues/detail?id=128
>>
>>  http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193849 removes the
>> kPageSize constant in favor of a function call.
>> Please give it a try.
>>
>> BTW, libsanitizer now has a usable process to quickly pull the upstream changes.
>> It should make it easier for us to iterate on platform-specific patches.
>
> So, with the hacks for unaligned accesses, Sparc seems to be working
> here.
>
> The only changes to libsantizier is to put __sparc__ checks where
> __powerpc__ checks exist in the unwind code.

Like this?





>
> Are there any clear thoughts about what we should do in the end
> wrt. the stack ASAN alignment issues?  Do we plan to 32-byte
> align the stack variables or similar?  Otherwise we need to add
> some ugly code to do half-word/byte at a time accesses, as needed.

The LLVM implementation always used 32-byte alignment for stack redzones.
I never actually did any performance checking on x86 (32-byte aligned
vs 8-byte aligned),
although I suspect 32-byte aligned redzones should be ~2x faster.
8-byte aligned redzones trade some CPU for some stack memory and
probably slightly increase
the chances of catching large (33+ bytes off) buffer overflows.

For strict alignment architectures 8-byte aligned redzones is
obviously not a choice.
We either need to align the redzones by 32 always, or for some platforms.
Either is fine for me.

--kcc
David Miller - Dec. 3, 2012, 6:29 p.m.
From: Konstantin Serebryany <konstantin.s.serebryany@gmail.com>
Date: Mon, 3 Dec 2012 22:18:56 +0400

> On Mon, Dec 3, 2012 at 10:02 PM, David Miller <davem@davemloft.net> wrote:
>> The only changes to libsantizier is to put __sparc__ checks where
>> __powerpc__ checks exist in the unwind code.
> 
> Like this?
> 
> ===================================================================
> --- asan/asan_linux.cc  (revision 169136)
> +++ asan/asan_linux.cc  (working copy)
> @@ -158,7 +158,9 @@
>    stack->trace[0] = pc;
>    if ((max_s) > 1) {
>      stack->max_size = max_s;
> -#if defined(__arm__) || defined(__powerpc__) || defined(__powerpc64__)
> +#if defined(__arm__) || \
> +    defined(__powerpc__) || defined(__powerpc64__) || \
> +    defined(__sparc__)
>      _Unwind_Backtrace(Unwind_Trace, stack);
>      // Pop off the two ASAN functions from the backtrace.
>      stack->PopStackFrames(2);

Yes, that's perfect.

We could also add a __sparc__ block to sanitizer_stacktrace.cc:patch_pc().
The Sparc PC is actually 8 bytes after the caller's jump.  Sparc has
a delay slot, the place to return to is 2 instructions after the call/jump,
and instructions are all 4 bytes long.

> We either need to align the redzones by 32 always, or for some platforms.
> Either is fine for me.

I'm ambivalent as well.
Jakub Jelinek - Dec. 3, 2012, 6:31 p.m.
On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote:
> The LLVM implementation always used 32-byte alignment for stack redzones.
> I never actually did any performance checking on x86 (32-byte aligned
> vs 8-byte aligned),
> although I suspect 32-byte aligned redzones should be ~2x faster.

Why?  The 32-byte realigning has significant cost, plus often one
extra register is eaten (the DRAP register), even bigger cost on
non-i?86/x86_64 targets.

	Jakub
Konstantin Serebryany - Dec. 3, 2012, 6:37 p.m.
On Mon, Dec 3, 2012 at 10:31 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote:
>> The LLVM implementation always used 32-byte alignment for stack redzones.
>> I never actually did any performance checking on x86 (32-byte aligned
>> vs 8-byte aligned),
>> although I suspect 32-byte aligned redzones should be ~2x faster.
>
> Why?  The 32-byte realigning has significant cost, plus often one
> extra register is eaten (the DRAP register), even bigger cost on
> non-i?86/x86_64 targets.

Maybe because my understanding of x86 is rather old (or plain wrong).
I tried a micro benchmark on Xeon E5-2690 and unaligned strores are
just slightly more expensive (< 10%).
I'll do more benchmarks with the actual asan instrumentation ~ tomorrow.

So, I guess we need to align the redzones conditionally for sparc, etc.

--kcc

>
>         Jakub
Jakub Jelinek - Dec. 3, 2012, 6:41 p.m.
On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote:
> The LLVM implementation always used 32-byte alignment for stack redzones.
> I never actually did any performance checking on x86 (32-byte aligned
> vs 8-byte aligned),
> although I suspect 32-byte aligned redzones should be ~2x faster.

If the ~2x faster comes from unaligned vs. aligned integer stores, I can't
spot anything like that on e.g.

__attribute__((noinline, noclone)) void
foo (int *p)
{
  int i;
  for (i = 0; i < 32; i++)
    p[i] = 0x12345678;
}

int
main (int argc, const char **argv)
{
  char buf[1024];
  int *p = &buf[argc - 1];
  int i;
  __builtin_printf ("%p\n", p);
  for (i = 0; i < 100000000; i++)
    foo (p);
  return 0;
}

Time with zero arguments (i.e. argc 1) is the same as time with 1, 2 or 3
arguments on SandyBridge CPU.  I guess there could be penalties on page
boundaries, etc., but I think hot caches is the usual operation on the
stack.

	Jakub
Konstantin Serebryany - Dec. 4, 2012, 6:22 a.m.
I've committed a flag to the LLVM implementation to not realign the
stack (-mllvm -asan-realign-stack=0).
On Xeon W3690 I've measured no performance difference (tried C/C++
part of SPEC2006).
So, on x86 it's probably the right thing to not realign the stack.

--kcc

On Mon, Dec 3, 2012 at 10:41 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Dec 03, 2012 at 10:18:56PM +0400, Konstantin Serebryany wrote:
>> The LLVM implementation always used 32-byte alignment for stack redzones.
>> I never actually did any performance checking on x86 (32-byte aligned
>> vs 8-byte aligned),
>> although I suspect 32-byte aligned redzones should be ~2x faster.
>
> If the ~2x faster comes from unaligned vs. aligned integer stores, I can't
> spot anything like that on e.g.
>
> __attribute__((noinline, noclone)) void
> foo (int *p)
> {
>   int i;
>   for (i = 0; i < 32; i++)
>     p[i] = 0x12345678;
> }
>
> int
> main (int argc, const char **argv)
> {
>   char buf[1024];
>   int *p = &buf[argc - 1];
>   int i;
>   __builtin_printf ("%p\n", p);
>   for (i = 0; i < 100000000; i++)
>     foo (p);
>   return 0;
> }
>
> Time with zero arguments (i.e. argc 1) is the same as time with 1, 2 or 3
> arguments on SandyBridge CPU.  I guess there could be penalties on page
> boundaries, etc., but I think hot caches is the usual operation on the
> stack.
>
>         Jakub

Patch

===================================================================
--- asan/asan_linux.cc  (revision 169136)
+++ asan/asan_linux.cc  (working copy)
@@ -158,7 +158,9 @@ 
   stack->trace[0] = pc;
   if ((max_s) > 1) {
     stack->max_size = max_s;
-#if defined(__arm__) || defined(__powerpc__) || defined(__powerpc64__)
+#if defined(__arm__) || \
+    defined(__powerpc__) || defined(__powerpc64__) || \
+    defined(__sparc__)
     _Unwind_Backtrace(Unwind_Trace, stack);
     // Pop off the two ASAN functions from the backtrace.
     stack->PopStackFrames(2);