diff mbox series

Come up with --param asan-stack-small-redzone (PR sanitizer/81715).

Message ID e603cb5f-c598-f599-90df-33773a1bb357@suse.cz
State New
Headers show
Series Come up with --param asan-stack-small-redzone (PR sanitizer/81715). | expand

Commit Message

Martin Liška Sept. 25, 2018, 9:05 a.m. UTC
Hi.

As requested in PR81715, GCC emits bigger middle redzones for small variables.
It's analyzed in following comment: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c28

For now I'm suggesting to shrink shadow memory for variables <= 16B to 32B (including variable storage).
LLVM is more aggressive as they allocate just 16B of shadow memory for variables <= 4B. That would
require bigger code refactoring in asan.c and I would like to avoid that.

For detailed statistics of Linux Kernel, please take a look here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c29

I'm testing the patch now.

Thoughts?

gcc/ChangeLog:

2018-09-25  Martin Liska  <mliska@suse.cz>

	* cfgexpand.c (expand_stack_vars): Smaller middle redzones
	if requested.
	* doc/invoke.texi: New param.
	* params.def (PARAM_ASAN_STACK_SMALL_REDZONE): Likewise.
	* params.h (ASAN_STACK_SMALL_REDZONE): Likewise.

gcc/testsuite/ChangeLog:

2018-09-25  Martin Liska  <mliska@suse.cz>

	* c-c++-common/asan/asan-stack-small.c: New test.
---
 gcc/cfgexpand.c                               | 13 +++++++--
 gcc/doc/invoke.texi                           |  3 ++
 gcc/params.def                                |  6 ++++
 gcc/params.h                                  |  2 ++
 .../c-c++-common/asan/asan-stack-small.c      | 29 +++++++++++++++++++
 5 files changed, 51 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/asan/asan-stack-small.c

Comments

Jakub Jelinek Sept. 25, 2018, 9:24 a.m. UTC | #1
On Tue, Sep 25, 2018 at 11:05:30AM +0200, Martin Liška wrote:
> As requested in PR81715, GCC emits bigger middle redzones for small variables.
> It's analyzed in following comment: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c28

First of all, does LLVM make the variable sized red zone size only for
automatic variables, or also for global/local statics, or for alloca?

Have you considered also making the red zones larger for very large
variables?

> For now I'm suggesting to shrink shadow memory for variables <= 16B to 32B (including variable storage).
> LLVM is more aggressive as they allocate just 16B of shadow memory for variables <= 4B. That would
> require bigger code refactoring in asan.c and I would like to avoid that.

What exactly would need changing to support the 12-15 bytes long red zones
for 4-1 bytes long automatic vars?
Just asan_emit_stack_protection or something other?

> +	      poly_uint64 size = stack_vars[i].size;
> +	      /* For small variables shrink middle redzone (including
> +	       * variable store) just to ASAN_RED_ZONE_SIZE.  */

We don't use this comment style (* at start of comment continuation lines).
Otherwise it looks reasonable, but I wouldn't stop here.

	Jakub
Martin Liška Sept. 25, 2018, 10:10 a.m. UTC | #2
On 9/25/18 11:24 AM, Jakub Jelinek wrote:
> On Tue, Sep 25, 2018 at 11:05:30AM +0200, Martin Liška wrote:
>> As requested in PR81715, GCC emits bigger middle redzones for small variables.
>> It's analyzed in following comment: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c28
> 
> First of all, does LLVM make the variable sized red zone size only for
> automatic variables, or also for global/local statics, or for alloca?

Yes, definitely for global variables, as seen here:

lib/Transforms/Instrumentation/AddressSanitizer.cpp:
  2122      Type *Ty = G->getValueType();
  2123      uint64_t SizeInBytes = DL.getTypeAllocSize(Ty);
  2124      uint64_t MinRZ = MinRedzoneSizeForGlobal();
  2125      // MinRZ <= RZ <= kMaxGlobalRedzone
  2126      // and trying to make RZ to be ~ 1/4 of SizeInBytes.
  2127      uint64_t RZ = std::max(
  2128          MinRZ, std::min(kMaxGlobalRedzone, (SizeInBytes / MinRZ / 4) * MinRZ));
  2129      uint64_t RightRedzoneSize = RZ;
  2130      // Round up to MinRZ
  2131      if (SizeInBytes % MinRZ) RightRedzoneSize += MinRZ - (SizeInBytes % MinRZ);
  2132      assert(((RightRedzoneSize + SizeInBytes) % MinRZ) == 0);
  2133      Type *RightRedZoneTy = ArrayType::get(IRB.getInt8Ty(), RightRedzoneSize);

So roughly to SizeInBytes / 4. Confirmed:

0x000001180a7c is located 4 bytes to the left of global variable 'b' defined in 'main.c:2:6' (0x1180a80) of size 4096
0x000001180a7c is located 1020 bytes to the right of global variable 'a' defined in 'main.c:1:6' (0x117f680) of size 4096

size == 4096, rz_size == 1024.

> 
> Have you considered also making the red zones larger for very large
> variables?

I was mainly focused on shrinking as that's limiting usage of asan-stack in KASAN.
But I'm open for it. Would you follow what LLVM does, or do you have a specific idea how
to growth?

> 
>> For now I'm suggesting to shrink shadow memory for variables <= 16B to 32B (including variable storage).
>> LLVM is more aggressive as they allocate just 16B of shadow memory for variables <= 4B. That would
>> require bigger code refactoring in asan.c and I would like to avoid that.
> 
> What exactly would need changing to support the 12-15 bytes long red zones
> for 4-1 bytes long automatic vars?
> Just asan_emit_stack_protection or something other?

Primarily this function, that would need a generalization. Apart from that we're
also doing alignment to ASAN_RED_ZONE_SIZE:

	      prev_offset = align_base (prev_offset,
					MAX (alignb, ASAN_RED_ZONE_SIZE),
					!FRAME_GROWS_DOWNWARD);

Maybe it has consequences I don't see right now?

> 
>> +	      poly_uint64 size = stack_vars[i].size;
>> +	      /* For small variables shrink middle redzone (including
>> +	       * variable store) just to ASAN_RED_ZONE_SIZE.  */
> 
> We don't use this comment style (* at start of comment continuation lines).

Sure, easy to fix.

> Otherwise it looks reasonable, but I wouldn't stop here.

First feedback is positive:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c30

It's questionable whether handling of variables 1-4B wide worth further changes.

Martin

> 
> 	Jakub
>
Martin Liška Sept. 25, 2018, 10:17 a.m. UTC | #3
On 9/25/18 12:10 PM, Martin Liška wrote:
> On 9/25/18 11:24 AM, Jakub Jelinek wrote:
>> On Tue, Sep 25, 2018 at 11:05:30AM +0200, Martin Liška wrote:
>>> As requested in PR81715, GCC emits bigger middle redzones for small variables.
>>> It's analyzed in following comment: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c28
>>
>> First of all, does LLVM make the variable sized red zone size only for
>> automatic variables, or also for global/local statics, or for alloca?
> 
> Yes, definitely for global variables, as seen here:
> 
> lib/Transforms/Instrumentation/AddressSanitizer.cpp:
>   2122      Type *Ty = G->getValueType();
>   2123      uint64_t SizeInBytes = DL.getTypeAllocSize(Ty);
>   2124      uint64_t MinRZ = MinRedzoneSizeForGlobal();
>   2125      // MinRZ <= RZ <= kMaxGlobalRedzone
>   2126      // and trying to make RZ to be ~ 1/4 of SizeInBytes.
>   2127      uint64_t RZ = std::max(
>   2128          MinRZ, std::min(kMaxGlobalRedzone, (SizeInBytes / MinRZ / 4) * MinRZ));
>   2129      uint64_t RightRedzoneSize = RZ;
>   2130      // Round up to MinRZ
>   2131      if (SizeInBytes % MinRZ) RightRedzoneSize += MinRZ - (SizeInBytes % MinRZ);
>   2132      assert(((RightRedzoneSize + SizeInBytes) % MinRZ) == 0);
>   2133      Type *RightRedZoneTy = ArrayType::get(IRB.getInt8Ty(), RightRedzoneSize);
> 
> So roughly to SizeInBytes / 4. Confirmed:
> 
> 0x000001180a7c is located 4 bytes to the left of global variable 'b' defined in 'main.c:2:6' (0x1180a80) of size 4096
> 0x000001180a7c is located 1020 bytes to the right of global variable 'a' defined in 'main.c:1:6' (0x117f680) of size 4096
> 
> size == 4096, rz_size == 1024.

...
For allocas, red zones are constant if I see correctly.

Martin

> 
>>
>> Have you considered also making the red zones larger for very large
>> variables?
> 
> I was mainly focused on shrinking as that's limiting usage of asan-stack in KASAN.
> But I'm open for it. Would you follow what LLVM does, or do you have a specific idea how
> to growth?
> 
>>
>>> For now I'm suggesting to shrink shadow memory for variables <= 16B to 32B (including variable storage).
>>> LLVM is more aggressive as they allocate just 16B of shadow memory for variables <= 4B. That would
>>> require bigger code refactoring in asan.c and I would like to avoid that.
>>
>> What exactly would need changing to support the 12-15 bytes long red zones
>> for 4-1 bytes long automatic vars?
>> Just asan_emit_stack_protection or something other?
> 
> Primarily this function, that would need a generalization. Apart from that we're
> also doing alignment to ASAN_RED_ZONE_SIZE:
> 
> 	      prev_offset = align_base (prev_offset,
> 					MAX (alignb, ASAN_RED_ZONE_SIZE),
> 					!FRAME_GROWS_DOWNWARD);
> 
> Maybe it has consequences I don't see right now?
> 
>>
>>> +	      poly_uint64 size = stack_vars[i].size;
>>> +	      /* For small variables shrink middle redzone (including
>>> +	       * variable store) just to ASAN_RED_ZONE_SIZE.  */
>>
>> We don't use this comment style (* at start of comment continuation lines).
> 
> Sure, easy to fix.
> 
>> Otherwise it looks reasonable, but I wouldn't stop here.
> 
> First feedback is positive:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c30
> 
> It's questionable whether handling of variables 1-4B wide worth further changes.
> 
> Martin
> 
>>
>> 	Jakub
>>
>
Jakub Jelinek Sept. 25, 2018, 10:40 a.m. UTC | #4
On Tue, Sep 25, 2018 at 12:10:42PM +0200, Martin Liška wrote: On 9/25/18
> 11:24 AM, Jakub Jelinek wrote:
> > On Tue, Sep 25, 2018 at 11:05:30AM +0200, Martin Liška wrote:
> >> As requested in PR81715, GCC emits bigger middle redzones for small variables.
> >> It's analyzed in following comment: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c28
> > 
> > First of all, does LLVM make the variable sized red zone size only for
> > automatic variables, or also for global/local statics, or for alloca?
> 
> Yes, definitely for global variables, as seen here:
> 
> lib/Transforms/Instrumentation/AddressSanitizer.cpp:
>   2122      Type *Ty = G->getValueType();
>   2123      uint64_t SizeInBytes = DL.getTypeAllocSize(Ty);
>   2124      uint64_t MinRZ = MinRedzoneSizeForGlobal();
>   2125      // MinRZ <= RZ <= kMaxGlobalRedzone
>   2126      // and trying to make RZ to be ~ 1/4 of SizeInBytes.
>   2127      uint64_t RZ = std::max(
>   2128          MinRZ, std::min(kMaxGlobalRedzone, (SizeInBytes / MinRZ / 4) * MinRZ));
>   2129      uint64_t RightRedzoneSize = RZ;
>   2130      // Round up to MinRZ
>   2131      if (SizeInBytes % MinRZ) RightRedzoneSize += MinRZ - (SizeInBytes % MinRZ);
>   2132      assert(((RightRedzoneSize + SizeInBytes) % MinRZ) == 0);
>   2133      Type *RightRedZoneTy = ArrayType::get(IRB.getInt8Ty(), RightRedzoneSize);
> 
> So roughly to SizeInBytes / 4. Confirmed:

Changing non-automatic vars will be certainly harder.  Let's do it later.

> > Have you considered also making the red zones larger for very large
> > variables?
> 
> I was mainly focused on shrinking as that's limiting usage of asan-stack in KASAN.
> But I'm open for it. Would you follow what LLVM does, or do you have a specific idea how
> to growth?

Dunno if they've done some analysis before picking the current sizes, unless
we you do some I'd follow their numbers, it doesn't look totally
unreasonable, a compromise between not wasting too much for more frequent
smaller vars and for larger vars catching even larger out of bound accesses.

> > What exactly would need changing to support the 12-15 bytes long red zones
> > for 4-1 bytes long automatic vars?
> > Just asan_emit_stack_protection or something other?
> 
> Primarily this function, that would need a generalization. Apart from that we're
> also doing alignment to ASAN_RED_ZONE_SIZE:
> 
> 	      prev_offset = align_base (prev_offset,
> 					MAX (alignb, ASAN_RED_ZONE_SIZE),
> 					!FRAME_GROWS_DOWNWARD);
> 
> Maybe it has consequences I don't see right now?

Actually, I think even:
+                 && i != n - 1                                                                                                                    
in your patch isn't correct, vars could be last even if they aren't == n - 1
or vice versa, the sorting is done by many factors, vars can go into
multiple buckets based on predicate etc.
So, rather than trying to guess what is last here it should be left to
expand_used_vars for one side, and perhaps based on whether any vars were
placed at all on the other side (don't remember if asan supports anything
but FRAME_GROWS_DOWNWARD).

> First feedback is positive:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c30
> 
> It's questionable whether handling of variables 1-4B wide worth further changes.

I'd think the really small vars are quite common, isn't that the case (sure,
address taken ones will be less common, but still not rare).

	Jakub
Martin Liška Sept. 25, 2018, 3:26 p.m. UTC | #5
On 9/25/18 12:40 PM, Jakub Jelinek wrote:
> On Tue, Sep 25, 2018 at 12:10:42PM +0200, Martin Liška wrote: On 9/25/18
>> 11:24 AM, Jakub Jelinek wrote:
>>> On Tue, Sep 25, 2018 at 11:05:30AM +0200, Martin Liška wrote:
>>>> As requested in PR81715, GCC emits bigger middle redzones for small variables.
>>>> It's analyzed in following comment: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c28
>>>
>>> First of all, does LLVM make the variable sized red zone size only for
>>> automatic variables, or also for global/local statics, or for alloca?
>>
>> Yes, definitely for global variables, as seen here:
>>
>> lib/Transforms/Instrumentation/AddressSanitizer.cpp:
>>   2122      Type *Ty = G->getValueType();
>>   2123      uint64_t SizeInBytes = DL.getTypeAllocSize(Ty);
>>   2124      uint64_t MinRZ = MinRedzoneSizeForGlobal();
>>   2125      // MinRZ <= RZ <= kMaxGlobalRedzone
>>   2126      // and trying to make RZ to be ~ 1/4 of SizeInBytes.
>>   2127      uint64_t RZ = std::max(
>>   2128          MinRZ, std::min(kMaxGlobalRedzone, (SizeInBytes / MinRZ / 4) * MinRZ));
>>   2129      uint64_t RightRedzoneSize = RZ;
>>   2130      // Round up to MinRZ
>>   2131      if (SizeInBytes % MinRZ) RightRedzoneSize += MinRZ - (SizeInBytes % MinRZ);
>>   2132      assert(((RightRedzoneSize + SizeInBytes) % MinRZ) == 0);
>>   2133      Type *RightRedZoneTy = ArrayType::get(IRB.getInt8Ty(), RightRedzoneSize);
>>
>> So roughly to SizeInBytes / 4. Confirmed:
> 
> Changing non-automatic vars will be certainly harder.  Let's do it later.
> 
>>> Have you considered also making the red zones larger for very large
>>> variables?
>>
>> I was mainly focused on shrinking as that's limiting usage of asan-stack in KASAN.
>> But I'm open for it. Would you follow what LLVM does, or do you have a specific idea how
>> to growth?
> 
> Dunno if they've done some analysis before picking the current sizes, unless
> we you do some I'd follow their numbers, it doesn't look totally
> unreasonable, a compromise between not wasting too much for more frequent
> smaller vars and for larger vars catching even larger out of bound accesses.

Agree with that!

> 
>>> What exactly would need changing to support the 12-15 bytes long red zones
>>> for 4-1 bytes long automatic vars?
>>> Just asan_emit_stack_protection or something other?
>>
>> Primarily this function, that would need a generalization. Apart from that we're
>> also doing alignment to ASAN_RED_ZONE_SIZE:
>>
>> 	      prev_offset = align_base (prev_offset,
>> 					MAX (alignb, ASAN_RED_ZONE_SIZE),
>> 					!FRAME_GROWS_DOWNWARD);
>>
>> Maybe it has consequences I don't see right now?
> 
> Actually, I think even:
> +                 && i != n - 1                                                                                                                    
> in your patch isn't correct, vars could be last even if they aren't == n - 1
> or vice versa, the sorting is done by many factors, vars can go into
> multiple buckets based on predicate etc.
> So, rather than trying to guess what is last here it should be left to
> expand_used_vars for one side, and perhaps based on whether any vars were
> placed at all on the other side (don't remember if asan supports anything
> but FRAME_GROWS_DOWNWARD).

OK, it only affects size of red redzone, that can be bigger.

> 
>> First feedback is positive:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c30
>>
>> It's questionable whether handling of variables 1-4B wide worth further changes.
> 
> I'd think the really small vars are quite common, isn't that the case (sure,
> address taken ones will be less common, but still not rare).

So I decided to write the patch properly, I have a working version that survives
asan.exp tests. The code that creates red-zones is more generic and all stack
vars are now aligned just to ASAN_SHADOW_GRANULARITY.

The only missing piece is how to implement asan_emit_redzone_payload more smart.
It means doing memory stores with 8,4,2,1 sizes in order to reduce # of insns.
Do we have somewhere a similar code?

Do you like the generalization I did in general?

Thanks,
Maritn

> 
> 	Jakub
>
From eb1a81fb08b288a8a4e0b2c5a055931e027f233c Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 25 Sep 2018 10:54:37 +0200
Subject: [PATCH] First semi-working version.

---
 gcc/asan.c                                    | 90 ++++++++++---------
 gcc/asan.h                                    | 20 +++++
 gcc/cfgexpand.c                               | 10 +--
 .../c-c++-common/asan/asan-stack-small.c      | 28 ++++++
 4 files changed, 102 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/asan/asan-stack-small.c

diff --git a/gcc/asan.c b/gcc/asan.c
index 235e219479d..5b8ae77b0c6 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1158,15 +1158,24 @@ asan_pp_string (pretty_printer *pp)
 /* Return a CONST_INT representing 4 subsequent shadow memory bytes.  */
 
 static rtx
-asan_shadow_cst (unsigned char shadow_bytes[4])
+asan_emit_redzone_payload (rtx shadow_mem, unsigned int payload_size,
+			   unsigned char shadow_byte,
+			   unsigned extra_shadow_byte)
 {
-  int i;
-  unsigned HOST_WIDE_INT val = 0;
-  gcc_assert (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
-  for (i = 0; i < 4; i++)
-    val |= (unsigned HOST_WIDE_INT) shadow_bytes[BYTES_BIG_ENDIAN ? 3 - i : i]
-	   << (BITS_PER_UNIT * i);
-  return gen_int_mode (val, SImode);
+  if (extra_shadow_byte)
+    {
+      emit_move_insn (shadow_mem, gen_int_mode (extra_shadow_byte, QImode));
+      shadow_mem = adjust_address (shadow_mem, VOIDmode, 1);
+      --payload_size;
+    }
+
+  for (unsigned i = 0; i < payload_size; i++)
+    {
+      emit_move_insn (shadow_mem, gen_int_mode (shadow_byte, QImode));
+      shadow_mem = adjust_address (shadow_mem, VOIDmode, 1);
+    }
+
+  return shadow_mem;
 }
 
 /* Clear shadow memory at SHADOW_MEM, LEN bytes.  Can't call a library call here
@@ -1256,7 +1265,6 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   rtx_code_label *lab;
   rtx_insn *insns;
   char buf[32];
-  unsigned char shadow_bytes[4];
   HOST_WIDE_INT base_offset = offsets[length - 1];
   HOST_WIDE_INT base_align_bias = 0, offset, prev_offset;
   HOST_WIDE_INT asan_frame_size = offsets[0] - base_offset;
@@ -1398,7 +1406,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 		     + (base_align_bias >> ASAN_SHADOW_SHIFT));
   gcc_assert (asan_shadow_set != -1
 	      && (ASAN_RED_ZONE_SIZE >> ASAN_SHADOW_SHIFT) == 4);
-  shadow_mem = gen_rtx_MEM (SImode, shadow_base);
+  shadow_mem = gen_rtx_MEM (QImode, shadow_base);
   set_mem_alias_set (shadow_mem, asan_shadow_set);
   if (STRICT_ALIGNMENT)
     set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
@@ -1408,39 +1416,39 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
       if (l == 2)
 	cur_shadow_byte = ASAN_STACK_MAGIC_RIGHT;
       offset = offsets[l - 1];
-      if ((offset - base_offset) & (ASAN_RED_ZONE_SIZE - 1))
+
+      char extra_shadow_byte = 0;
+
+      /* If a red-zone is not aligned to ASAN_SHADOW_GRANULARITY then
+	 the previous stack variable has size % ASAN_SHADOW_GRANULARITY != 0.
+	 In that case we have to emit one extra byte that will describe
+	 how many bytes (our of ASAN_SHADOW_GRANULARITY) can be accessed.  */
+      if ((offset - base_offset) & (ASAN_SHADOW_GRANULARITY - 1))
 	{
-	  int i;
 	  HOST_WIDE_INT aoff
 	    = base_offset + ((offset - base_offset)
-			     & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1));
-	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
-				       (aoff - prev_offset)
-				       >> ASAN_SHADOW_SHIFT);
-	  prev_offset = aoff;
-	  for (i = 0; i < 4; i++, aoff += ASAN_SHADOW_GRANULARITY)
-	    if (aoff < offset)
-	      {
-		if (aoff < offset - (HOST_WIDE_INT)ASAN_SHADOW_GRANULARITY + 1)
-		  shadow_bytes[i] = 0;
-		else
-		  shadow_bytes[i] = offset - aoff;
-	      }
-	    else
-	      shadow_bytes[i] = ASAN_STACK_MAGIC_MIDDLE;
-	  emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes));
+			     & ~(ASAN_SHADOW_GRANULARITY - HOST_WIDE_INT_1));
+	  extra_shadow_byte = offset - aoff;
 	  offset = aoff;
 	}
-      while (offset <= offsets[l - 2] - ASAN_RED_ZONE_SIZE)
-	{
-	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
-				       (offset - prev_offset)
-				       >> ASAN_SHADOW_SHIFT);
-	  prev_offset = offset;
-	  memset (shadow_bytes, cur_shadow_byte, 4);
-	  emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes));
-	  offset += ASAN_RED_ZONE_SIZE;
-	}
+
+
+      /* Adjust shadow memory to beginning of shadow memory
+	 (or one byte earlier).  */
+      shadow_mem = adjust_address (shadow_mem, VOIDmode,
+				   (offset - prev_offset)
+				   >> ASAN_SHADOW_SHIFT);
+
+      /* Calculate size of red zone payload.  */
+      unsigned int payload_size
+	= (offsets[l - 2] - offset) / ASAN_SHADOW_GRANULARITY;
+      offset += payload_size * ASAN_SHADOW_GRANULARITY;
+
+      /* Emit red zone content.  */
+      shadow_mem = asan_emit_redzone_payload (shadow_mem, payload_size,
+					      cur_shadow_byte, extra_shadow_byte);
+
+      prev_offset = offset;
       cur_shadow_byte = ASAN_STACK_MAGIC_MIDDLE;
     }
   do_pending_stack_adjust ();
@@ -1501,7 +1509,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   for (l = length; l; l -= 2)
     {
       offset = base_offset + ((offsets[l - 1] - base_offset)
-			     & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1));
+			     & ~(ASAN_SHADOW_GRANULARITY - HOST_WIDE_INT_1));
       if (last_offset + last_size != offset)
 	{
 	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
@@ -1513,7 +1521,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 	  last_size = 0;
 	}
       last_size += base_offset + ((offsets[l - 2] - base_offset)
-				  & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1))
+				  & ~(ASAN_SHADOW_GRANULARITY - HOST_WIDE_INT_1))
 		   - offset;
 
       /* Unpoison shadow memory that corresponds to a variable that is 
@@ -1534,7 +1542,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 			   "%s (%" PRId64 " B)\n", n, size);
 		}
 
-		last_size += size & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1);
+		last_size += size & ~(ASAN_SHADOW_GRANULARITY - HOST_WIDE_INT_1);
 	    }
 	}
     }
diff --git a/gcc/asan.h b/gcc/asan.h
index 2f431b4f938..c0736a0cb6f 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -102,6 +102,26 @@ asan_red_zone_size (unsigned int size)
   return c ? 2 * ASAN_RED_ZONE_SIZE - c : ASAN_RED_ZONE_SIZE;
 }
 
+/* Return how much a stack variable occupy on a stack
+   including a space for redzone.  */
+
+static inline unsigned int
+asan_var_and_redzone_size (unsigned int size)
+{
+  if (size <= 4)
+    return 16;
+  else if (size <= 16)
+    return 32;
+  else if (size <= 128)
+    return 32 + size;
+  else if (size <= 512)
+    return 64 + size;
+  else if (size <= 4096)
+    return 128 + size;
+  else
+    return 256 + size;
+}
+
 extern bool set_asan_shadow_offset (const char *);
 
 extern void set_sanitized_sections (const char *);
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 35ca276e4ad..e84c82599e6 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1125,13 +1125,13 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	      && stack_vars[i].size.is_constant ())
 	    {
 	      prev_offset = align_base (prev_offset,
-					MAX (alignb, ASAN_RED_ZONE_SIZE),
+					MAX (alignb, ASAN_SHADOW_GRANULARITY),
 					!FRAME_GROWS_DOWNWARD);
 	      tree repr_decl = NULL_TREE;
+	      poly_uint64 size =  asan_var_and_redzone_size (stack_vars[i].size.to_constant ());
 	      offset
-		= alloc_stack_frame_space (stack_vars[i].size
-					   + ASAN_RED_ZONE_SIZE,
-					   MAX (alignb, ASAN_RED_ZONE_SIZE));
+		= alloc_stack_frame_space (size,
+					   MAX (alignb, ASAN_SHADOW_GRANULARITY));
 
 	      data->asan_vec.safe_push (prev_offset);
 	      /* Allocating a constant amount of space from a constant
@@ -2254,7 +2254,7 @@ expand_used_vars (void)
 			 & ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
 	  /* Allocating a constant amount of space from a constant
 	     starting offset must give a constant result.  */
-	  offset = (alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE)
+	  offset = (alloc_stack_frame_space (redzonesz, ASAN_SHADOW_GRANULARITY)
 		    .to_constant ());
 	  data.asan_vec.safe_push (prev_offset);
 	  data.asan_vec.safe_push (offset);
diff --git a/gcc/testsuite/c-c++-common/asan/asan-stack-small.c b/gcc/testsuite/c-c++-common/asan/asan-stack-small.c
new file mode 100644
index 00000000000..11a56b8db4c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/asan/asan-stack-small.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+char *pa;
+char *pb;
+char *pc;
+
+void access (volatile char *ptr)
+{
+  *ptr = 'x';
+}
+
+int main (int argc, char **argv)
+{
+  char a;
+  char b;
+  char c;
+
+  pa = &a;
+  pb = &b;
+  pc = &c;
+
+  access (pb);
+  access (pc);
+  // access 'b' here
+  access (pa + 32);
+
+  return 0;
+}
Jakub Jelinek Sept. 25, 2018, 3:53 p.m. UTC | #6
On Tue, Sep 25, 2018 at 05:26:44PM +0200, Martin Liška wrote:
> The only missing piece is how to implement asan_emit_redzone_payload more smart.
> It means doing memory stores with 8,4,2,1 sizes in order to reduce # of insns.
> Do we have somewhere a similar code?

Yeah, that is a very important optimization.  I wasn't using DImode because
at least on x86_64 64-bit constants are quite expensive and on several other
targets even more so, so SImode was a compromise to get size of the prologue
under control and not very slow.  What I think we want is figure out ranges
of shadow bytes we want to initialize and the values we want to store there,
perhaps take also into account strict alignment vs. non-strict alignment,
and perform kind of store merging for it.  Given that 2 shadow bytes would
be only used for the very small variables (<=4 bytes in size, so <= 0.5
bytes of shadow), we'd just need a way to remember the 2 shadow bytes across
handling adjacent vars and store it together.

I think we want to introduce some define for minimum red zone size and use
it instead of the granularity (granularity is 8 bytes, but minimum red zone
size if we count into it also the very small variable size is 16 bytes).

> --- a/gcc/asan.h
> +++ b/gcc/asan.h
> @@ -102,6 +102,26 @@ asan_red_zone_size (unsigned int size)
>    return c ? 2 * ASAN_RED_ZONE_SIZE - c : ASAN_RED_ZONE_SIZE;
>  }
>  
> +/* Return how much a stack variable occupy on a stack
> +   including a space for redzone.  */
> +
> +static inline unsigned int
> +asan_var_and_redzone_size (unsigned int size)

The argument needs to be UHWI, otherwise you do a wrong thing for
say 4GB + 4 bytes long variable.  Ditto the result.

> +{
> +  if (size <= 4)
> +    return 16;
> +  else if (size <= 16)
> +    return 32;
> +  else if (size <= 128)
> +    return 32 + size;
> +  else if (size <= 512)
> +    return 64 + size;
> +  else if (size <= 4096)
> +    return 128 + size;
> +  else
> +    return 256 + size;

I'd prefer size + const instead of const + size operand order.

> @@ -1125,13 +1125,13 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>  	      && stack_vars[i].size.is_constant ())
>  	    {
>  	      prev_offset = align_base (prev_offset,
> -					MAX (alignb, ASAN_RED_ZONE_SIZE),
> +					MAX (alignb, ASAN_SHADOW_GRANULARITY),

Use that ASAN_MIN_RED_ZONE_SIZE (16) here.

>  					!FRAME_GROWS_DOWNWARD);
>  	      tree repr_decl = NULL_TREE;
> +	      poly_uint64 size =  asan_var_and_redzone_size (stack_vars[i].size.to_constant ());

Too long line.  Two spaces instead of one.  Why poly_uint64?
Plus, perhaps if data->asan_vec is empty (i.e. when assigning the topmost
automatic variable in a frame), we should ensure that size is at least
2 * ASAN_RED_ZONE_SIZE (or just 1 * ASAN_RED_ZONE_SIZE). 

>  	      offset
> -		= alloc_stack_frame_space (stack_vars[i].size
> -					   + ASAN_RED_ZONE_SIZE,
> -					   MAX (alignb, ASAN_RED_ZONE_SIZE));
> +		= alloc_stack_frame_space (size,
> +					   MAX (alignb, ASAN_SHADOW_GRANULARITY));

Again, too long line and we want 16 instead of 8 here too.
>  
>  	      data->asan_vec.safe_push (prev_offset);
>  	      /* Allocating a constant amount of space from a constant
> @@ -2254,7 +2254,7 @@ expand_used_vars (void)
>  			 & ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
>  	  /* Allocating a constant amount of space from a constant
>  	     starting offset must give a constant result.  */
> -	  offset = (alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE)
> +	  offset = (alloc_stack_frame_space (redzonesz, ASAN_SHADOW_GRANULARITY)

and here too.

	Jakub
Martin Liška Sept. 26, 2018, 9:33 a.m. UTC | #7
On 9/25/18 5:53 PM, Jakub Jelinek wrote:
> On Tue, Sep 25, 2018 at 05:26:44PM +0200, Martin Liška wrote:
>> The only missing piece is how to implement asan_emit_redzone_payload more smart.
>> It means doing memory stores with 8,4,2,1 sizes in order to reduce # of insns.
>> Do we have somewhere a similar code?
> 
> Yeah, that is a very important optimization.  I wasn't using DImode because
> at least on x86_64 64-bit constants are quite expensive and on several other
> targets even more so, so SImode was a compromise to get size of the prologue
> under control and not very slow.  What I think we want is figure out ranges

Ah, some time ago, I remember you mentioned the 64-bit constants are expensive
(even on x86_64). Btw. it's what clang used for the red zone instrumentation.

> of shadow bytes we want to initialize and the values we want to store there,
> perhaps take also into account strict alignment vs. non-strict alignment,
> and perform kind of store merging for it.  Given that 2 shadow bytes would
> be only used for the very small variables (<=4 bytes in size, so <= 0.5
> bytes of shadow), we'd just need a way to remember the 2 shadow bytes across
> handling adjacent vars and store it together.

Agree, it's implemented in next version of patch.

> 
> I think we want to introduce some define for minimum red zone size and use
> it instead of the granularity (granularity is 8 bytes, but minimum red zone
> size if we count into it also the very small variable size is 16 bytes).
> 
>> --- a/gcc/asan.h
>> +++ b/gcc/asan.h
>> @@ -102,6 +102,26 @@ asan_red_zone_size (unsigned int size)
>>    return c ? 2 * ASAN_RED_ZONE_SIZE - c : ASAN_RED_ZONE_SIZE;
>>  }
>>  
>> +/* Return how much a stack variable occupy on a stack
>> +   including a space for redzone.  */
>> +
>> +static inline unsigned int
>> +asan_var_and_redzone_size (unsigned int size)
> 
> The argument needs to be UHWI, otherwise you do a wrong thing for
> say 4GB + 4 bytes long variable.  Ditto the result.
> 
>> +{
>> +  if (size <= 4)
>> +    return 16;
>> +  else if (size <= 16)
>> +    return 32;
>> +  else if (size <= 128)
>> +    return 32 + size;
>> +  else if (size <= 512)
>> +    return 64 + size;
>> +  else if (size <= 4096)
>> +    return 128 + size;
>> +  else
>> +    return 256 + size;
> 
> I'd prefer size + const instead of const + size operand order.
> 
>> @@ -1125,13 +1125,13 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>>  	      && stack_vars[i].size.is_constant ())
>>  	    {
>>  	      prev_offset = align_base (prev_offset,
>> -					MAX (alignb, ASAN_RED_ZONE_SIZE),
>> +					MAX (alignb, ASAN_SHADOW_GRANULARITY),
> 
> Use that ASAN_MIN_RED_ZONE_SIZE (16) here.
> 
>>  					!FRAME_GROWS_DOWNWARD);
>>  	      tree repr_decl = NULL_TREE;
>> +	      poly_uint64 size =  asan_var_and_redzone_size (stack_vars[i].size.to_constant ());
> 
> Too long line.  Two spaces instead of one.  Why poly_uint64?
> Plus, perhaps if data->asan_vec is empty (i.e. when assigning the topmost
> automatic variable in a frame), we should ensure that size is at least
> 2 * ASAN_RED_ZONE_SIZE (or just 1 * ASAN_RED_ZONE_SIZE). 
> 
>>  	      offset
>> -		= alloc_stack_frame_space (stack_vars[i].size
>> -					   + ASAN_RED_ZONE_SIZE,
>> -					   MAX (alignb, ASAN_RED_ZONE_SIZE));
>> +		= alloc_stack_frame_space (size,
>> +					   MAX (alignb, ASAN_SHADOW_GRANULARITY));
> 
> Again, too long line and we want 16 instead of 8 here too.
>>  
>>  	      data->asan_vec.safe_push (prev_offset);
>>  	      /* Allocating a constant amount of space from a constant
>> @@ -2254,7 +2254,7 @@ expand_used_vars (void)
>>  			 & ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
>>  	  /* Allocating a constant amount of space from a constant
>>  	     starting offset must give a constant result.  */
>> -	  offset = (alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE)
>> +	  offset = (alloc_stack_frame_space (redzonesz, ASAN_SHADOW_GRANULARITY)
> 
> and here too.
> 
> 	Jakub
> 

The rest is also implemented as requested. I'm testing Linux kernel now, will send
stats to the PR created for it.

Patch survives testing on x86_64-linux-gnu.

Martin
From acbea3a2127a5eb19e23a202010847e098cf8ce8 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 25 Sep 2018 10:54:37 +0200
Subject: [PATCH] Make red zone size more flexible for stack variables (PR
 sanitizer/81715).

gcc/ChangeLog:

2018-09-26  Martin Liska  <mliska@suse.cz>

	PR sanitizer/81715
	* asan.c (asan_shadow_cst): Remove.
	(asan_emit_redzone_payload): New.
	(asan_emit_stack_protection): Make it more
	flexible to support arbitrary size of red zones.
	* asan.h (ASAN_MIN_RED_ZONE_SIZE): New.
	(asan_var_and_redzone_size): Likewise.
	* cfgexpand.c (expand_stack_vars): Reserve size
	for stack vars according to asan_var_and_redzone_size.
	(expand_used_vars): Make smaller offset based
	on ASAN_SHADOW_GRANULARITY.

gcc/testsuite/ChangeLog:

2018-09-26  Martin Liska  <mliska@suse.cz>

	PR sanitizer/81715
	* c-c++-common/asan/asan-stack-small.c: New test.
---
 gcc/asan.c                                    | 108 +++++++++++-------
 gcc/asan.h                                    |  25 ++++
 gcc/cfgexpand.c                               |  16 ++-
 .../c-c++-common/asan/asan-stack-small.c      |  28 +++++
 4 files changed, 132 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/asan/asan-stack-small.c

diff --git a/gcc/asan.c b/gcc/asan.c
index 235e219479d..6552ca66132 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1155,18 +1155,34 @@ asan_pp_string (pretty_printer *pp)
   return build1 (ADDR_EXPR, shadow_ptr_types[0], ret);
 }
 
-/* Return a CONST_INT representing 4 subsequent shadow memory bytes.  */
+/* Emit red zones payload that started at SHADOW_MEM address.
+   SHADOW_BYTES contains payload that should be stored.  */
 
 static rtx
-asan_shadow_cst (unsigned char shadow_bytes[4])
+asan_emit_redzone_payload (rtx shadow_mem,
+			   auto_vec<unsigned char> &shadow_bytes)
 {
-  int i;
-  unsigned HOST_WIDE_INT val = 0;
-  gcc_assert (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
-  for (i = 0; i < 4; i++)
-    val |= (unsigned HOST_WIDE_INT) shadow_bytes[BYTES_BIG_ENDIAN ? 3 - i : i]
-	   << (BITS_PER_UNIT * i);
-  return gen_int_mode (val, SImode);
+  while (!shadow_bytes.is_empty ())
+    {
+      unsigned l = shadow_bytes.length ();
+      unsigned chunk = l >= 4 ? 4 : (l >= 2 ? 2 : 1);
+      machine_mode mode = chunk == 4 ? SImode : (chunk == 2 ? HImode : QImode);
+      shadow_mem = adjust_address (shadow_mem, mode, 0);
+
+      unsigned HOST_WIDE_INT val = 0;
+      gcc_assert (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
+      for (unsigned i = 0; i < chunk; i++)
+	{
+	  unsigned char v = shadow_bytes[BYTES_BIG_ENDIAN ? chunk - i : i];
+	  val |= (unsigned HOST_WIDE_INT)v << (BITS_PER_UNIT * i);
+	}
+      rtx c = gen_int_mode (val, mode);
+      emit_move_insn (shadow_mem, c);
+      shadow_mem = adjust_address (shadow_mem, VOIDmode, chunk);
+      shadow_bytes.block_remove (0, chunk);
+    }
+
+  return shadow_mem;
 }
 
 /* Clear shadow memory at SHADOW_MEM, LEN bytes.  Can't call a library call here
@@ -1256,7 +1272,6 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   rtx_code_label *lab;
   rtx_insn *insns;
   char buf[32];
-  unsigned char shadow_bytes[4];
   HOST_WIDE_INT base_offset = offsets[length - 1];
   HOST_WIDE_INT base_align_bias = 0, offset, prev_offset;
   HOST_WIDE_INT asan_frame_size = offsets[0] - base_offset;
@@ -1398,49 +1413,64 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 		     + (base_align_bias >> ASAN_SHADOW_SHIFT));
   gcc_assert (asan_shadow_set != -1
 	      && (ASAN_RED_ZONE_SIZE >> ASAN_SHADOW_SHIFT) == 4);
-  shadow_mem = gen_rtx_MEM (SImode, shadow_base);
+  shadow_mem = gen_rtx_MEM (QImode, shadow_base);
   set_mem_alias_set (shadow_mem, asan_shadow_set);
   if (STRICT_ALIGNMENT)
     set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
   prev_offset = base_offset;
+
+  auto_vec<unsigned char> shadow_bytes (64);
   for (l = length; l; l -= 2)
     {
       if (l == 2)
 	cur_shadow_byte = ASAN_STACK_MAGIC_RIGHT;
       offset = offsets[l - 1];
-      if ((offset - base_offset) & (ASAN_RED_ZONE_SIZE - 1))
+
+      bool merging_p = !shadow_bytes.is_empty ();
+      bool extra_byte = (offset - base_offset) & (ASAN_SHADOW_GRANULARITY - 1);
+      /* If a red-zone is not aligned to ASAN_SHADOW_GRANULARITY then
+	 the previous stack variable has size % ASAN_SHADOW_GRANULARITY != 0.
+	 In that case we have to emit one extra byte that will describe
+	 how many bytes (our of ASAN_SHADOW_GRANULARITY) can be accessed.  */
+      if (extra_byte)
 	{
-	  int i;
 	  HOST_WIDE_INT aoff
 	    = base_offset + ((offset - base_offset)
-			     & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1));
-	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
-				       (aoff - prev_offset)
-				       >> ASAN_SHADOW_SHIFT);
-	  prev_offset = aoff;
-	  for (i = 0; i < 4; i++, aoff += ASAN_SHADOW_GRANULARITY)
-	    if (aoff < offset)
-	      {
-		if (aoff < offset - (HOST_WIDE_INT)ASAN_SHADOW_GRANULARITY + 1)
-		  shadow_bytes[i] = 0;
-		else
-		  shadow_bytes[i] = offset - aoff;
-	      }
-	    else
-	      shadow_bytes[i] = ASAN_STACK_MAGIC_MIDDLE;
-	  emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes));
+			     & ~(ASAN_SHADOW_GRANULARITY - HOST_WIDE_INT_1));
+	  shadow_bytes.quick_push (offset - aoff);
 	  offset = aoff;
 	}
-      while (offset <= offsets[l - 2] - ASAN_RED_ZONE_SIZE)
+
+      /* Adjust shadow memory to beginning of shadow memory
+	 (or one byte earlier).  */
+      if (!merging_p)
+	shadow_mem = adjust_address (shadow_mem, VOIDmode,
+				     (offset - prev_offset)
+				     >> ASAN_SHADOW_SHIFT);
+
+      if (extra_byte)
+	offset += ASAN_SHADOW_GRANULARITY;
+
+      /* Calculate size of red zone payload.  */
+      while (offset < offsets[l - 2])
 	{
-	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
-				       (offset - prev_offset)
-				       >> ASAN_SHADOW_SHIFT);
+	  shadow_bytes.quick_push (cur_shadow_byte);
+	  offset += ASAN_SHADOW_GRANULARITY;
+	}
+
+      /* Do simple store merging for 2 adjacent small variables
+	 that will need 4 bytes in total to emit red zones.  */
+      if (shadow_bytes.length () == 2
+	  && l >= 3
+	  && ((unsigned HOST_WIDE_INT)(offsets[l - 3] - offsets[l - 2])
+	      < ASAN_SHADOW_GRANULARITY))
+	;
+      else
+	{
+	  shadow_mem = asan_emit_redzone_payload (shadow_mem, shadow_bytes);
 	  prev_offset = offset;
-	  memset (shadow_bytes, cur_shadow_byte, 4);
-	  emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes));
-	  offset += ASAN_RED_ZONE_SIZE;
 	}
+
       cur_shadow_byte = ASAN_STACK_MAGIC_MIDDLE;
     }
   do_pending_stack_adjust ();
@@ -1501,7 +1531,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   for (l = length; l; l -= 2)
     {
       offset = base_offset + ((offsets[l - 1] - base_offset)
-			     & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1));
+			     & ~(ASAN_MIN_RED_ZONE_SIZE - HOST_WIDE_INT_1));
       if (last_offset + last_size != offset)
 	{
 	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
@@ -1513,7 +1543,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 	  last_size = 0;
 	}
       last_size += base_offset + ((offsets[l - 2] - base_offset)
-				  & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1))
+				  & ~(ASAN_MIN_RED_ZONE_SIZE - HOST_WIDE_INT_1))
 		   - offset;
 
       /* Unpoison shadow memory that corresponds to a variable that is 
@@ -1534,7 +1564,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 			   "%s (%" PRId64 " B)\n", n, size);
 		}
 
-		last_size += size & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1);
+		last_size += size & ~(ASAN_MIN_RED_ZONE_SIZE - HOST_WIDE_INT_1);
 	    }
 	}
     }
diff --git a/gcc/asan.h b/gcc/asan.h
index 2f431b4f938..e1b9b491e67 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -53,6 +53,11 @@ extern hash_set <tree> *asan_used_labels;
    up to 2 * ASAN_RED_ZONE_SIZE - 1 bytes.  */
 #define ASAN_RED_ZONE_SIZE	32
 
+/* Stack variable use more compact red zones.  The size includes also
+   size of variable itself.  */
+
+#define ASAN_MIN_RED_ZONE_SIZE	16
+
 /* Shadow memory values for stack protection.  Left is below protected vars,
    the first pointer in stack corresponding to that offset contains
    ASAN_STACK_FRAME_MAGIC word, the second pointer to a string describing
@@ -102,6 +107,26 @@ asan_red_zone_size (unsigned int size)
   return c ? 2 * ASAN_RED_ZONE_SIZE - c : ASAN_RED_ZONE_SIZE;
 }
 
+/* Return how much a stack variable occupis on a stack
+   including a space for red zone.  */
+
+static inline unsigned HOST_WIDE_INT
+asan_var_and_redzone_size (unsigned HOST_WIDE_INT size)
+{
+  if (size <= 4)
+    return 16;
+  else if (size <= 16)
+    return 32;
+  else if (size <= 128)
+    return size + 32;
+  else if (size <= 512)
+    return size + 64;
+  else if (size <= 4096)
+    return size + 128;
+  else
+    return size + 256;
+}
+
 extern bool set_asan_shadow_offset (const char *);
 
 extern void set_sanitized_sections (const char *);
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 35ca276e4ad..1a1abe1f6a2 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1125,13 +1125,17 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	      && stack_vars[i].size.is_constant ())
 	    {
 	      prev_offset = align_base (prev_offset,
-					MAX (alignb, ASAN_RED_ZONE_SIZE),
+					MAX (alignb, ASAN_MIN_RED_ZONE_SIZE),
 					!FRAME_GROWS_DOWNWARD);
 	      tree repr_decl = NULL_TREE;
-	      offset
-		= alloc_stack_frame_space (stack_vars[i].size
-					   + ASAN_RED_ZONE_SIZE,
-					   MAX (alignb, ASAN_RED_ZONE_SIZE));
+	      unsigned HOST_WIDE_INT size
+		= asan_var_and_redzone_size (stack_vars[i].size.to_constant ());
+	      if (data->asan_vec.is_empty ())
+		size = MAX (size, ASAN_RED_ZONE_SIZE);
+
+	      unsigned HOST_WIDE_INT alignment = MAX (alignb,
+						      ASAN_MIN_RED_ZONE_SIZE);
+	      offset = alloc_stack_frame_space (size, alignment);
 
 	      data->asan_vec.safe_push (prev_offset);
 	      /* Allocating a constant amount of space from a constant
@@ -2254,7 +2258,7 @@ expand_used_vars (void)
 			 & ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
 	  /* Allocating a constant amount of space from a constant
 	     starting offset must give a constant result.  */
-	  offset = (alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE)
+	  offset = (alloc_stack_frame_space (redzonesz, ASAN_SHADOW_GRANULARITY)
 		    .to_constant ());
 	  data.asan_vec.safe_push (prev_offset);
 	  data.asan_vec.safe_push (offset);
diff --git a/gcc/testsuite/c-c++-common/asan/asan-stack-small.c b/gcc/testsuite/c-c++-common/asan/asan-stack-small.c
new file mode 100644
index 00000000000..11a56b8db4c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/asan/asan-stack-small.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+char *pa;
+char *pb;
+char *pc;
+
+void access (volatile char *ptr)
+{
+  *ptr = 'x';
+}
+
+int main (int argc, char **argv)
+{
+  char a;
+  char b;
+  char c;
+
+  pa = &a;
+  pb = &b;
+  pc = &c;
+
+  access (pb);
+  access (pc);
+  // access 'b' here
+  access (pa + 32);
+
+  return 0;
+}
Martin Liška Oct. 9, 2018, 8:29 a.m. UTC | #8
PING^1

On 9/26/18 11:33 AM, Martin Liška wrote:
> On 9/25/18 5:53 PM, Jakub Jelinek wrote:
>> On Tue, Sep 25, 2018 at 05:26:44PM +0200, Martin Liška wrote:
>>> The only missing piece is how to implement asan_emit_redzone_payload more smart.
>>> It means doing memory stores with 8,4,2,1 sizes in order to reduce # of insns.
>>> Do we have somewhere a similar code?
>>
>> Yeah, that is a very important optimization.  I wasn't using DImode because
>> at least on x86_64 64-bit constants are quite expensive and on several other
>> targets even more so, so SImode was a compromise to get size of the prologue
>> under control and not very slow.  What I think we want is figure out ranges
> 
> Ah, some time ago, I remember you mentioned the 64-bit constants are expensive
> (even on x86_64). Btw. it's what clang used for the red zone instrumentation.
> 
>> of shadow bytes we want to initialize and the values we want to store there,
>> perhaps take also into account strict alignment vs. non-strict alignment,
>> and perform kind of store merging for it.  Given that 2 shadow bytes would
>> be only used for the very small variables (<=4 bytes in size, so <= 0.5
>> bytes of shadow), we'd just need a way to remember the 2 shadow bytes across
>> handling adjacent vars and store it together.
> 
> Agree, it's implemented in next version of patch.
> 
>>
>> I think we want to introduce some define for minimum red zone size and use
>> it instead of the granularity (granularity is 8 bytes, but minimum red zone
>> size if we count into it also the very small variable size is 16 bytes).
>>
>>> --- a/gcc/asan.h
>>> +++ b/gcc/asan.h
>>> @@ -102,6 +102,26 @@ asan_red_zone_size (unsigned int size)
>>>    return c ? 2 * ASAN_RED_ZONE_SIZE - c : ASAN_RED_ZONE_SIZE;
>>>  }
>>>  
>>> +/* Return how much a stack variable occupy on a stack
>>> +   including a space for redzone.  */
>>> +
>>> +static inline unsigned int
>>> +asan_var_and_redzone_size (unsigned int size)
>>
>> The argument needs to be UHWI, otherwise you do a wrong thing for
>> say 4GB + 4 bytes long variable.  Ditto the result.
>>
>>> +{
>>> +  if (size <= 4)
>>> +    return 16;
>>> +  else if (size <= 16)
>>> +    return 32;
>>> +  else if (size <= 128)
>>> +    return 32 + size;
>>> +  else if (size <= 512)
>>> +    return 64 + size;
>>> +  else if (size <= 4096)
>>> +    return 128 + size;
>>> +  else
>>> +    return 256 + size;
>>
>> I'd prefer size + const instead of const + size operand order.
>>
>>> @@ -1125,13 +1125,13 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>>>  	      && stack_vars[i].size.is_constant ())
>>>  	    {
>>>  	      prev_offset = align_base (prev_offset,
>>> -					MAX (alignb, ASAN_RED_ZONE_SIZE),
>>> +					MAX (alignb, ASAN_SHADOW_GRANULARITY),
>>
>> Use that ASAN_MIN_RED_ZONE_SIZE (16) here.
>>
>>>  					!FRAME_GROWS_DOWNWARD);
>>>  	      tree repr_decl = NULL_TREE;
>>> +	      poly_uint64 size =  asan_var_and_redzone_size (stack_vars[i].size.to_constant ());
>>
>> Too long line.  Two spaces instead of one.  Why poly_uint64?
>> Plus, perhaps if data->asan_vec is empty (i.e. when assigning the topmost
>> automatic variable in a frame), we should ensure that size is at least
>> 2 * ASAN_RED_ZONE_SIZE (or just 1 * ASAN_RED_ZONE_SIZE). 
>>
>>>  	      offset
>>> -		= alloc_stack_frame_space (stack_vars[i].size
>>> -					   + ASAN_RED_ZONE_SIZE,
>>> -					   MAX (alignb, ASAN_RED_ZONE_SIZE));
>>> +		= alloc_stack_frame_space (size,
>>> +					   MAX (alignb, ASAN_SHADOW_GRANULARITY));
>>
>> Again, too long line and we want 16 instead of 8 here too.
>>>  
>>>  	      data->asan_vec.safe_push (prev_offset);
>>>  	      /* Allocating a constant amount of space from a constant
>>> @@ -2254,7 +2254,7 @@ expand_used_vars (void)
>>>  			 & ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
>>>  	  /* Allocating a constant amount of space from a constant
>>>  	     starting offset must give a constant result.  */
>>> -	  offset = (alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE)
>>> +	  offset = (alloc_stack_frame_space (redzonesz, ASAN_SHADOW_GRANULARITY)
>>
>> and here too.
>>
>> 	Jakub
>>
> 
> The rest is also implemented as requested. I'm testing Linux kernel now, will send
> stats to the PR created for it.
> 
> Patch survives testing on x86_64-linux-gnu.
> 
> Martin
>
Martin Liška Oct. 23, 2018, 9:02 a.m. UTC | #9
PING^2

On 10/9/18 10:29 AM, Martin Liška wrote:
> PING^1
> 
> On 9/26/18 11:33 AM, Martin Liška wrote:
>> On 9/25/18 5:53 PM, Jakub Jelinek wrote:
>>> On Tue, Sep 25, 2018 at 05:26:44PM +0200, Martin Liška wrote:
>>>> The only missing piece is how to implement asan_emit_redzone_payload more smart.
>>>> It means doing memory stores with 8,4,2,1 sizes in order to reduce # of insns.
>>>> Do we have somewhere a similar code?
>>>
>>> Yeah, that is a very important optimization.  I wasn't using DImode because
>>> at least on x86_64 64-bit constants are quite expensive and on several other
>>> targets even more so, so SImode was a compromise to get size of the prologue
>>> under control and not very slow.  What I think we want is figure out ranges
>>
>> Ah, some time ago, I remember you mentioned the 64-bit constants are expensive
>> (even on x86_64). Btw. it's what clang used for the red zone instrumentation.
>>
>>> of shadow bytes we want to initialize and the values we want to store there,
>>> perhaps take also into account strict alignment vs. non-strict alignment,
>>> and perform kind of store merging for it.  Given that 2 shadow bytes would
>>> be only used for the very small variables (<=4 bytes in size, so <= 0.5
>>> bytes of shadow), we'd just need a way to remember the 2 shadow bytes across
>>> handling adjacent vars and store it together.
>>
>> Agree, it's implemented in next version of patch.
>>
>>>
>>> I think we want to introduce some define for minimum red zone size and use
>>> it instead of the granularity (granularity is 8 bytes, but minimum red zone
>>> size if we count into it also the very small variable size is 16 bytes).
>>>
>>>> --- a/gcc/asan.h
>>>> +++ b/gcc/asan.h
>>>> @@ -102,6 +102,26 @@ asan_red_zone_size (unsigned int size)
>>>>    return c ? 2 * ASAN_RED_ZONE_SIZE - c : ASAN_RED_ZONE_SIZE;
>>>>  }
>>>>  
>>>> +/* Return how much a stack variable occupy on a stack
>>>> +   including a space for redzone.  */
>>>> +
>>>> +static inline unsigned int
>>>> +asan_var_and_redzone_size (unsigned int size)
>>>
>>> The argument needs to be UHWI, otherwise you do a wrong thing for
>>> say 4GB + 4 bytes long variable.  Ditto the result.
>>>
>>>> +{
>>>> +  if (size <= 4)
>>>> +    return 16;
>>>> +  else if (size <= 16)
>>>> +    return 32;
>>>> +  else if (size <= 128)
>>>> +    return 32 + size;
>>>> +  else if (size <= 512)
>>>> +    return 64 + size;
>>>> +  else if (size <= 4096)
>>>> +    return 128 + size;
>>>> +  else
>>>> +    return 256 + size;
>>>
>>> I'd prefer size + const instead of const + size operand order.
>>>
>>>> @@ -1125,13 +1125,13 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>>>>  	      && stack_vars[i].size.is_constant ())
>>>>  	    {
>>>>  	      prev_offset = align_base (prev_offset,
>>>> -					MAX (alignb, ASAN_RED_ZONE_SIZE),
>>>> +					MAX (alignb, ASAN_SHADOW_GRANULARITY),
>>>
>>> Use that ASAN_MIN_RED_ZONE_SIZE (16) here.
>>>
>>>>  					!FRAME_GROWS_DOWNWARD);
>>>>  	      tree repr_decl = NULL_TREE;
>>>> +	      poly_uint64 size =  asan_var_and_redzone_size (stack_vars[i].size.to_constant ());
>>>
>>> Too long line.  Two spaces instead of one.  Why poly_uint64?
>>> Plus, perhaps if data->asan_vec is empty (i.e. when assigning the topmost
>>> automatic variable in a frame), we should ensure that size is at least
>>> 2 * ASAN_RED_ZONE_SIZE (or just 1 * ASAN_RED_ZONE_SIZE). 
>>>
>>>>  	      offset
>>>> -		= alloc_stack_frame_space (stack_vars[i].size
>>>> -					   + ASAN_RED_ZONE_SIZE,
>>>> -					   MAX (alignb, ASAN_RED_ZONE_SIZE));
>>>> +		= alloc_stack_frame_space (size,
>>>> +					   MAX (alignb, ASAN_SHADOW_GRANULARITY));
>>>
>>> Again, too long line and we want 16 instead of 8 here too.
>>>>  
>>>>  	      data->asan_vec.safe_push (prev_offset);
>>>>  	      /* Allocating a constant amount of space from a constant
>>>> @@ -2254,7 +2254,7 @@ expand_used_vars (void)
>>>>  			 & ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
>>>>  	  /* Allocating a constant amount of space from a constant
>>>>  	     starting offset must give a constant result.  */
>>>> -	  offset = (alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE)
>>>> +	  offset = (alloc_stack_frame_space (redzonesz, ASAN_SHADOW_GRANULARITY)
>>>
>>> and here too.
>>>
>>> 	Jakub
>>>
>>
>> The rest is also implemented as requested. I'm testing Linux kernel now, will send
>> stats to the PR created for it.
>>
>> Patch survives testing on x86_64-linux-gnu.
>>
>> Martin
>>
>
Martin Liška Nov. 13, 2018, 7:50 a.m. UTC | #10
PING^3

On 10/23/18 11:02 AM, Martin Liška wrote:
> PING^2
> 
> On 10/9/18 10:29 AM, Martin Liška wrote:
>> PING^1
>>
>> On 9/26/18 11:33 AM, Martin Liška wrote:
>>> On 9/25/18 5:53 PM, Jakub Jelinek wrote:
>>>> On Tue, Sep 25, 2018 at 05:26:44PM +0200, Martin Liška wrote:
>>>>> The only missing piece is how to implement asan_emit_redzone_payload more smart.
>>>>> It means doing memory stores with 8,4,2,1 sizes in order to reduce # of insns.
>>>>> Do we have somewhere a similar code?
>>>>
>>>> Yeah, that is a very important optimization.  I wasn't using DImode because
>>>> at least on x86_64 64-bit constants are quite expensive and on several other
>>>> targets even more so, so SImode was a compromise to get size of the prologue
>>>> under control and not very slow.  What I think we want is figure out ranges
>>>
>>> Ah, some time ago, I remember you mentioned the 64-bit constants are expensive
>>> (even on x86_64). Btw. it's what clang used for the red zone instrumentation.
>>>
>>>> of shadow bytes we want to initialize and the values we want to store there,
>>>> perhaps take also into account strict alignment vs. non-strict alignment,
>>>> and perform kind of store merging for it.  Given that 2 shadow bytes would
>>>> be only used for the very small variables (<=4 bytes in size, so <= 0.5
>>>> bytes of shadow), we'd just need a way to remember the 2 shadow bytes across
>>>> handling adjacent vars and store it together.
>>>
>>> Agree, it's implemented in next version of patch.
>>>
>>>>
>>>> I think we want to introduce some define for minimum red zone size and use
>>>> it instead of the granularity (granularity is 8 bytes, but minimum red zone
>>>> size if we count into it also the very small variable size is 16 bytes).
>>>>
>>>>> --- a/gcc/asan.h
>>>>> +++ b/gcc/asan.h
>>>>> @@ -102,6 +102,26 @@ asan_red_zone_size (unsigned int size)
>>>>>    return c ? 2 * ASAN_RED_ZONE_SIZE - c : ASAN_RED_ZONE_SIZE;
>>>>>  }
>>>>>  
>>>>> +/* Return how much a stack variable occupy on a stack
>>>>> +   including a space for redzone.  */
>>>>> +
>>>>> +static inline unsigned int
>>>>> +asan_var_and_redzone_size (unsigned int size)
>>>>
>>>> The argument needs to be UHWI, otherwise you do a wrong thing for
>>>> say 4GB + 4 bytes long variable.  Ditto the result.
>>>>
>>>>> +{
>>>>> +  if (size <= 4)
>>>>> +    return 16;
>>>>> +  else if (size <= 16)
>>>>> +    return 32;
>>>>> +  else if (size <= 128)
>>>>> +    return 32 + size;
>>>>> +  else if (size <= 512)
>>>>> +    return 64 + size;
>>>>> +  else if (size <= 4096)
>>>>> +    return 128 + size;
>>>>> +  else
>>>>> +    return 256 + size;
>>>>
>>>> I'd prefer size + const instead of const + size operand order.
>>>>
>>>>> @@ -1125,13 +1125,13 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
>>>>>  	      && stack_vars[i].size.is_constant ())
>>>>>  	    {
>>>>>  	      prev_offset = align_base (prev_offset,
>>>>> -					MAX (alignb, ASAN_RED_ZONE_SIZE),
>>>>> +					MAX (alignb, ASAN_SHADOW_GRANULARITY),
>>>>
>>>> Use that ASAN_MIN_RED_ZONE_SIZE (16) here.
>>>>
>>>>>  					!FRAME_GROWS_DOWNWARD);
>>>>>  	      tree repr_decl = NULL_TREE;
>>>>> +	      poly_uint64 size =  asan_var_and_redzone_size (stack_vars[i].size.to_constant ());
>>>>
>>>> Too long line.  Two spaces instead of one.  Why poly_uint64?
>>>> Plus, perhaps if data->asan_vec is empty (i.e. when assigning the topmost
>>>> automatic variable in a frame), we should ensure that size is at least
>>>> 2 * ASAN_RED_ZONE_SIZE (or just 1 * ASAN_RED_ZONE_SIZE). 
>>>>
>>>>>  	      offset
>>>>> -		= alloc_stack_frame_space (stack_vars[i].size
>>>>> -					   + ASAN_RED_ZONE_SIZE,
>>>>> -					   MAX (alignb, ASAN_RED_ZONE_SIZE));
>>>>> +		= alloc_stack_frame_space (size,
>>>>> +					   MAX (alignb, ASAN_SHADOW_GRANULARITY));
>>>>
>>>> Again, too long line and we want 16 instead of 8 here too.
>>>>>  
>>>>>  	      data->asan_vec.safe_push (prev_offset);
>>>>>  	      /* Allocating a constant amount of space from a constant
>>>>> @@ -2254,7 +2254,7 @@ expand_used_vars (void)
>>>>>  			 & ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
>>>>>  	  /* Allocating a constant amount of space from a constant
>>>>>  	     starting offset must give a constant result.  */
>>>>> -	  offset = (alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE)
>>>>> +	  offset = (alloc_stack_frame_space (redzonesz, ASAN_SHADOW_GRANULARITY)
>>>>
>>>> and here too.
>>>>
>>>> 	Jakub
>>>>
>>>
>>> The rest is also implemented as requested. I'm testing Linux kernel now, will send
>>> stats to the PR created for it.
>>>
>>> Patch survives testing on x86_64-linux-gnu.
>>>
>>> Martin
>>>
>>
>
Jakub Jelinek Nov. 28, 2018, 11:41 a.m. UTC | #11
On Wed, Sep 26, 2018 at 11:33:25AM +0200, Martin Liška wrote:
> 2018-09-26  Martin Liska  <mliska@suse.cz>
> 
> 	PR sanitizer/81715
> 	* asan.c (asan_shadow_cst): Remove.
> 	(asan_emit_redzone_payload): New.
> 	(asan_emit_stack_protection): Make it more
> 	flexible to support arbitrary size of red zones.
> 	* asan.h (ASAN_MIN_RED_ZONE_SIZE): New.
> 	(asan_var_and_redzone_size): Likewise.
> 	* cfgexpand.c (expand_stack_vars): Reserve size
> 	for stack vars according to asan_var_and_redzone_size.
> 	(expand_used_vars): Make smaller offset based
> 	on ASAN_SHADOW_GRANULARITY.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-09-26  Martin Liska  <mliska@suse.cz>
> 
> 	PR sanitizer/81715
> 	* c-c++-common/asan/asan-stack-small.c: New test.

Sorry for the delay.  I had a quick look.  Using:
struct S { char a[32]; };
void bar (void *, void *, void *, void *);

int
foo (void)
{
  struct S a, b, c, d;
  bar (&a, &b, &c, &d);
  return 0;
}

int
baz (void)
{
  char a;
  short b;
  int c;
  long long d;
  bar (&a, &b, &c, &d);
  return 0;
}
-O2 -fsanitize=address, I see that foo is identical before/after your patch,
perfect.
Looking at baz, I see issues though:
.LC2:
        .string "4 32 1 4 a:15 48 2 4 b:16 64 4 4 c:17 80 8 4 d:18"
...
        movq    %rbx, %rbp
        movl    $-3580, %edx
        movl    $-3085, %ecx
        movq    $1102416563, (%rbx)
        shrq    $3, %rbp
        movq    $.LC2, 8(%rbx)
        leaq    48(%rbx), %rsi
        leaq    32(%rbx), %rdi
        movq    $.LASANPC1, 16(%rbx)
        movw    %dx, 2147450888(%rbp)
        leaq    64(%rbx), %rdx
        movw    %cx, 2147450891(%rbp)
        leaq    80(%rbx), %rcx
        movl    $-235802127, 2147450880(%rbp)
        movl    $-234687999, 2147450884(%rbp)
        movb    $-13, 2147450893(%rbp)
        call    bar
        cmpq    %rbx, %r12
        jne     .L15
        movq    $0, 2147450880(%rbp)
        xorl    %eax, %eax
        movl    $0, 2147450888(%rbp)
        movw    %ax, 2147450892(%rbp)
So, looks like the shadow at (%rbx>>3)+0x7fff8000 is:
/ 2147450880(%rbp)
|           / 2147450884(%rbp)
|           |           / 2147450888(%rbp)
|           |           |           / 2147450892(%rbp)
|           |           |           |
f1 f1 f1 f1 01 f2 02 f2 04 f2 00 f3 f3 f3

Two problems, it uses unconditionally unaligned stores, without
checking if the target supports them at all (in this case it does).
And, it doesn't check if it wouldn't be more efficient to use
32-bit stores.  It isn't that we want the gaps to have whatever
value the shadow memory contained before, we want it to be 0 and just rely
on it being zero from earlier.
Another question is if it wouldn't be worth to ensure the variable area is
always a multiple of 32 bytes (thus the corresponding shadow always multiple
of 4 bytes).  Then, for this testcase, you'd store:
$-235802127, 2147450880(%rbp)	// 0xf1f1f1f1
$-234687999, 2147450884(%rbp)   // 0xf202f201
$-218041852, 2147450888(%rbp)   // 0xf300f204
$-202116109, 2147450892(%rbp)   // 0xf3f3f3f3
For the single char/short/int/long var in a function case you'd still emit
just f1 f1 f1 f1 0[1240] f3 f3 f3
and the number of final f3 bytes would be from 3 to 6 (or 2 to 5), 1 is
probably too few.

Rather than special-casing the two very small adjacent vars case,
just use a rolling handling of the shadow bytes, if you fill all 4,
emit immediately, otherwise queue later and flush if the next shadow offset
is outside of the 4 bytes.  Either always use SImode stores, or check rtx
costs; if the 32-bit constants you'd store is as expensive or less than the
16-bit or 8-bit constant, use 32-bit store, otherwise use a 16-bit or 8-bit
one.  If you want to complicate it further, allow unaligned stores on
targets that do allow them, but use them with care, if you could use same
amount of aligned stores with the same cost of constants, prefer aligned
ones.

	Jakub
Martin Liška Nov. 29, 2018, 3:03 p.m. UTC | #12
On 11/28/18 12:41 PM, Jakub Jelinek wrote:
> On Wed, Sep 26, 2018 at 11:33:25AM +0200, Martin Liška wrote:
>> 2018-09-26  Martin Liska  <mliska@suse.cz>
>>
>> 	PR sanitizer/81715
>> 	* asan.c (asan_shadow_cst): Remove.
>> 	(asan_emit_redzone_payload): New.
>> 	(asan_emit_stack_protection): Make it more
>> 	flexible to support arbitrary size of red zones.
>> 	* asan.h (ASAN_MIN_RED_ZONE_SIZE): New.
>> 	(asan_var_and_redzone_size): Likewise.
>> 	* cfgexpand.c (expand_stack_vars): Reserve size
>> 	for stack vars according to asan_var_and_redzone_size.
>> 	(expand_used_vars): Make smaller offset based
>> 	on ASAN_SHADOW_GRANULARITY.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2018-09-26  Martin Liska  <mliska@suse.cz>
>>
>> 	PR sanitizer/81715
>> 	* c-c++-common/asan/asan-stack-small.c: New test.
> 
> Sorry for the delay.  I had a quick look.  Using:
> struct S { char a[32]; };
> void bar (void *, void *, void *, void *);
> 
> int
> foo (void)
> {
>   struct S a, b, c, d;
>   bar (&a, &b, &c, &d);
>   return 0;
> }
> 
> int
> baz (void)
> {
>   char a;
>   short b;
>   int c;
>   long long d;
>   bar (&a, &b, &c, &d);
>   return 0;
> }
> -O2 -fsanitize=address, I see that foo is identical before/after your patch,
> perfect.
> Looking at baz, I see issues though:
> .LC2:
>         .string "4 32 1 4 a:15 48 2 4 b:16 64 4 4 c:17 80 8 4 d:18"
> ...
>         movq    %rbx, %rbp
>         movl    $-3580, %edx
>         movl    $-3085, %ecx
>         movq    $1102416563, (%rbx)
>         shrq    $3, %rbp
>         movq    $.LC2, 8(%rbx)
>         leaq    48(%rbx), %rsi
>         leaq    32(%rbx), %rdi
>         movq    $.LASANPC1, 16(%rbx)
>         movw    %dx, 2147450888(%rbp)
>         leaq    64(%rbx), %rdx
>         movw    %cx, 2147450891(%rbp)
>         leaq    80(%rbx), %rcx
>         movl    $-235802127, 2147450880(%rbp)
>         movl    $-234687999, 2147450884(%rbp)
>         movb    $-13, 2147450893(%rbp)
>         call    bar
>         cmpq    %rbx, %r12
>         jne     .L15
>         movq    $0, 2147450880(%rbp)
>         xorl    %eax, %eax
>         movl    $0, 2147450888(%rbp)
>         movw    %ax, 2147450892(%rbp)
> So, looks like the shadow at (%rbx>>3)+0x7fff8000 is:
> / 2147450880(%rbp)
> |           / 2147450884(%rbp)
> |           |           / 2147450888(%rbp)
> |           |           |           / 2147450892(%rbp)
> |           |           |           |
> f1 f1 f1 f1 01 f2 02 f2 04 f2 00 f3 f3 f3

Hi.

> 
> Two problems, it uses unconditionally unaligned stores, without
> checking if the target supports them at all (in this case it does).
> And, it doesn't check if it wouldn't be more efficient to use
> 32-bit stores. 

Ok, so now we reduced the alignment of stack variables to ASAN_MIN_RED_ZONE_SIZE (16).
What options do wehave when we emit the red zones? The only guarantee we have is
that in shadow memory we are aligned to at least 2. Should byte-based emission used
for STRICT_ALIGNMENT targets?

>  It isn't that we want the gaps to have whatever
> value the shadow memory contained before, we want it to be 0 and just rely
> on it being zero from earlier.
> Another question is if it wouldn't be worth to ensure the variable area is
> always a multiple of 32 bytes (thus the corresponding shadow always multiple
> of 4 bytes).  Then, for this testcase, you'd store:
> $-235802127, 2147450880(%rbp)	// 0xf1f1f1f1
> $-234687999, 2147450884(%rbp)   // 0xf202f201
> $-218041852, 2147450888(%rbp)   // 0xf300f204
> $-202116109, 2147450892(%rbp)   // 0xf3f3f3f3
> For the single char/short/int/long var in a function case you'd still emit
> just f1 f1 f1 f1 0[1240] f3 f3 f3
> and the number of final f3 bytes would be from 3 to 6 (or 2 to 5), 1 is
> probably too few.

I like the idea, so that can be achieved by setting alignment of the last stack
variable to 32, right?

> 
> Rather than special-casing the two very small adjacent vars case,
> just use a rolling handling of the shadow bytes, if you fill all 4,
> emit immediately, otherwise queue later and flush if the next shadow offset
> is outside of the 4 bytes.

I rewrote it and I'm sending untested patch (however surviving asan.exp tests).
I must agree the code is now much more readable.

>  Either always use SImode stores, or check rtx

I'm now using only SImode stores, which should be the same as before the patch.
The complication now we have is that the guarantee alignment is only 2B in shadow
mem.

Martin

> costs; if the 32-bit constants you'd store is as expensive or less than the
> 16-bit or 8-bit constant, use 32-bit store, otherwise use a 16-bit or 8-bit
> one.  If you want to complicate it further, allow unaligned stores on
> targets that do allow them, but use them with care, if you could use same
> amount of aligned stores with the same cost of constants, prefer aligned
> ones.
> 
> 	Jakub
>
From 6088a945ccbe9bf9c49c8238fe923f63688a778f Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 25 Sep 2018 10:54:37 +0200
Subject: [PATCH] Make red zone size more flexible for stack variables (PR
 sanitizer/81715).

---
 gcc/asan.c                                    | 145 ++++++++++++------
 gcc/asan.h                                    |  25 +++
 gcc/cfgexpand.c                               |  16 +-
 .../c-c++-common/asan/asan-stack-small.c      |  28 ++++
 4 files changed, 162 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/asan/asan-stack-small.c

diff --git a/gcc/asan.c b/gcc/asan.c
index 45906bf8fee..af5b7264e6f 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1155,20 +1155,6 @@ asan_pp_string (pretty_printer *pp)
   return build1 (ADDR_EXPR, shadow_ptr_types[0], ret);
 }
 
-/* Return a CONST_INT representing 4 subsequent shadow memory bytes.  */
-
-static rtx
-asan_shadow_cst (unsigned char shadow_bytes[4])
-{
-  int i;
-  unsigned HOST_WIDE_INT val = 0;
-  gcc_assert (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
-  for (i = 0; i < 4; i++)
-    val |= (unsigned HOST_WIDE_INT) shadow_bytes[BYTES_BIG_ENDIAN ? 3 - i : i]
-	   << (BITS_PER_UNIT * i);
-  return gen_int_mode (val, SImode);
-}
-
 /* Clear shadow memory at SHADOW_MEM, LEN bytes.  Can't call a library call here
    though.  */
 
@@ -1235,6 +1221,79 @@ shadow_mem_size (unsigned HOST_WIDE_INT size)
   return ROUND_UP (size, ASAN_SHADOW_GRANULARITY) / ASAN_SHADOW_GRANULARITY;
 }
 
+#define RZ_BUFFER_SIZE 4
+
+struct asan_redzone_buffer
+{
+  asan_redzone_buffer (rtx shadow_mem, HOST_WIDE_INT prev_offset):
+    m_shadow_mem (shadow_mem), m_prev_offset (prev_offset),
+    m_shadow_bytes (RZ_BUFFER_SIZE)
+  {}
+
+  void emit_redzone_byte (HOST_WIDE_INT offset, unsigned char value);
+  void flush_redzone_payload (void);
+
+  rtx m_shadow_mem;
+  HOST_WIDE_INT m_prev_offset;
+  auto_vec<unsigned char> m_shadow_bytes;
+};
+
+void
+asan_redzone_buffer::emit_redzone_byte (HOST_WIDE_INT offset,
+					unsigned char value)
+{
+  gcc_assert ((offset & (ASAN_SHADOW_GRANULARITY - 1)) == 0);
+  gcc_assert (offset >= m_prev_offset);
+
+  HOST_WIDE_INT off
+    = m_prev_offset + ASAN_SHADOW_GRANULARITY * m_shadow_bytes.length ();
+  if (off == offset)
+    {
+      /* Consecutive shadow memory byte.  */
+      m_shadow_bytes.safe_push (value);
+      if (m_shadow_bytes.length () == RZ_BUFFER_SIZE)
+	flush_redzone_payload ();
+    }
+  else
+    {
+      if (!m_shadow_bytes.is_empty ())
+	flush_redzone_payload ();
+
+      /* Adjust m_prev_offset and m_shadow_mem.  */
+      m_shadow_mem = adjust_address (m_shadow_mem, VOIDmode,
+				     (offset - m_prev_offset)
+				     >> ASAN_SHADOW_SHIFT);
+      m_prev_offset = offset;
+      m_shadow_bytes.safe_push (value);
+    }
+}
+
+void
+asan_redzone_buffer::flush_redzone_payload (void)
+{
+  gcc_assert (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
+
+  if (m_shadow_bytes.is_empty ())
+    return;
+
+  /* Fill it to RZ_BUFFER_SIZE bytes with zeros if needed.  */
+  unsigned l = m_shadow_bytes.length ();
+  for (unsigned i = 0; i <= RZ_BUFFER_SIZE - l; i++)
+    m_shadow_bytes.safe_push (0);
+
+  unsigned HOST_WIDE_INT val = 0;
+  for (unsigned i = 0; i < RZ_BUFFER_SIZE; i++)
+    {
+      unsigned char v = m_shadow_bytes[BYTES_BIG_ENDIAN ? RZ_BUFFER_SIZE - i : i];
+      val |= (unsigned HOST_WIDE_INT)v << (BITS_PER_UNIT * i);
+    }
+
+  rtx c = gen_int_mode (val, SImode);
+  m_shadow_mem = adjust_address (m_shadow_mem, SImode, 0);
+  emit_move_insn (m_shadow_mem, c);
+  m_shadow_bytes.truncate (0);
+}
+
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
    directly, epilogue sequence returned.  BASE is the register holding the
    stack base, against which OFFSETS array offsets are relative to, OFFSETS
@@ -1256,7 +1315,6 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   rtx_code_label *lab;
   rtx_insn *insns;
   char buf[32];
-  unsigned char shadow_bytes[4];
   HOST_WIDE_INT base_offset = offsets[length - 1];
   HOST_WIDE_INT base_align_bias = 0, offset, prev_offset;
   HOST_WIDE_INT asan_frame_size = offsets[0] - base_offset;
@@ -1416,51 +1474,46 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 		     + (base_align_bias >> ASAN_SHADOW_SHIFT));
   gcc_assert (asan_shadow_set != -1
 	      && (ASAN_RED_ZONE_SIZE >> ASAN_SHADOW_SHIFT) == 4);
-  shadow_mem = gen_rtx_MEM (SImode, shadow_base);
+  shadow_mem = gen_rtx_MEM (QImode, shadow_base);
   set_mem_alias_set (shadow_mem, asan_shadow_set);
   if (STRICT_ALIGNMENT)
     set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
   prev_offset = base_offset;
+
+  asan_redzone_buffer rz_buffer (shadow_mem, prev_offset);
   for (l = length; l; l -= 2)
     {
       if (l == 2)
 	cur_shadow_byte = ASAN_STACK_MAGIC_RIGHT;
       offset = offsets[l - 1];
-      if ((offset - base_offset) & (ASAN_RED_ZONE_SIZE - 1))
+
+      bool extra_byte = (offset - base_offset) & (ASAN_SHADOW_GRANULARITY - 1);
+      /* If a red-zone is not aligned to ASAN_SHADOW_GRANULARITY then
+	 the previous stack variable has size % ASAN_SHADOW_GRANULARITY != 0.
+	 In that case we have to emit one extra byte that will describe
+	 how many bytes (our of ASAN_SHADOW_GRANULARITY) can be accessed.  */
+      if (extra_byte)
 	{
-	  int i;
 	  HOST_WIDE_INT aoff
 	    = base_offset + ((offset - base_offset)
-			     & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1));
-	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
-				       (aoff - prev_offset)
-				       >> ASAN_SHADOW_SHIFT);
-	  prev_offset = aoff;
-	  for (i = 0; i < 4; i++, aoff += ASAN_SHADOW_GRANULARITY)
-	    if (aoff < offset)
-	      {
-		if (aoff < offset - (HOST_WIDE_INT)ASAN_SHADOW_GRANULARITY + 1)
-		  shadow_bytes[i] = 0;
-		else
-		  shadow_bytes[i] = offset - aoff;
-	      }
-	    else
-	      shadow_bytes[i] = ASAN_STACK_MAGIC_MIDDLE;
-	  emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes));
-	  offset = aoff;
+			     & ~(ASAN_SHADOW_GRANULARITY - HOST_WIDE_INT_1));
+	  rz_buffer.emit_redzone_byte (aoff, offset - aoff);
+	  offset = aoff + ASAN_SHADOW_GRANULARITY;
 	}
-      while (offset <= offsets[l - 2] - ASAN_RED_ZONE_SIZE)
+
+      /* Calculate size of red zone payload.  */
+      while (offset < offsets[l - 2])
 	{
-	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
-				       (offset - prev_offset)
-				       >> ASAN_SHADOW_SHIFT);
-	  prev_offset = offset;
-	  memset (shadow_bytes, cur_shadow_byte, 4);
-	  emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes));
-	  offset += ASAN_RED_ZONE_SIZE;
+	  rz_buffer.emit_redzone_byte (offset, cur_shadow_byte);
+	  offset += ASAN_SHADOW_GRANULARITY;
 	}
+
       cur_shadow_byte = ASAN_STACK_MAGIC_MIDDLE;
     }
+
+  /* Flush red zone buffer.  */
+  rz_buffer.flush_redzone_payload ();
+
   do_pending_stack_adjust ();
 
   /* Construct epilogue sequence.  */
@@ -1519,7 +1572,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   for (l = length; l; l -= 2)
     {
       offset = base_offset + ((offsets[l - 1] - base_offset)
-			     & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1));
+			     & ~(ASAN_MIN_RED_ZONE_SIZE - HOST_WIDE_INT_1));
       if (last_offset + last_size != offset)
 	{
 	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
@@ -1531,7 +1584,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 	  last_size = 0;
 	}
       last_size += base_offset + ((offsets[l - 2] - base_offset)
-				  & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1))
+				  & ~(ASAN_MIN_RED_ZONE_SIZE - HOST_WIDE_INT_1))
 		   - offset;
 
       /* Unpoison shadow memory that corresponds to a variable that is 
@@ -1552,7 +1605,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 			   "%s (%" PRId64 " B)\n", n, size);
 		}
 
-		last_size += size & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1);
+		last_size += size & ~(ASAN_MIN_RED_ZONE_SIZE - HOST_WIDE_INT_1);
 	    }
 	}
     }
diff --git a/gcc/asan.h b/gcc/asan.h
index 2f431b4f938..e1b9b491e67 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -53,6 +53,11 @@ extern hash_set <tree> *asan_used_labels;
    up to 2 * ASAN_RED_ZONE_SIZE - 1 bytes.  */
 #define ASAN_RED_ZONE_SIZE	32
 
+/* Stack variable use more compact red zones.  The size includes also
+   size of variable itself.  */
+
+#define ASAN_MIN_RED_ZONE_SIZE	16
+
 /* Shadow memory values for stack protection.  Left is below protected vars,
    the first pointer in stack corresponding to that offset contains
    ASAN_STACK_FRAME_MAGIC word, the second pointer to a string describing
@@ -102,6 +107,26 @@ asan_red_zone_size (unsigned int size)
   return c ? 2 * ASAN_RED_ZONE_SIZE - c : ASAN_RED_ZONE_SIZE;
 }
 
+/* Return how much a stack variable occupis on a stack
+   including a space for red zone.  */
+
+static inline unsigned HOST_WIDE_INT
+asan_var_and_redzone_size (unsigned HOST_WIDE_INT size)
+{
+  if (size <= 4)
+    return 16;
+  else if (size <= 16)
+    return 32;
+  else if (size <= 128)
+    return size + 32;
+  else if (size <= 512)
+    return size + 64;
+  else if (size <= 4096)
+    return size + 128;
+  else
+    return size + 256;
+}
+
 extern bool set_asan_shadow_offset (const char *);
 
 extern void set_sanitized_sections (const char *);
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 21bdcdaeaa3..a839391d298 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1125,13 +1125,17 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	      && stack_vars[i].size.is_constant ())
 	    {
 	      prev_offset = align_base (prev_offset,
-					MAX (alignb, ASAN_RED_ZONE_SIZE),
+					MAX (alignb, ASAN_MIN_RED_ZONE_SIZE),
 					!FRAME_GROWS_DOWNWARD);
 	      tree repr_decl = NULL_TREE;
-	      offset
-		= alloc_stack_frame_space (stack_vars[i].size
-					   + ASAN_RED_ZONE_SIZE,
-					   MAX (alignb, ASAN_RED_ZONE_SIZE));
+	      unsigned HOST_WIDE_INT size
+		= asan_var_and_redzone_size (stack_vars[i].size.to_constant ());
+	      if (data->asan_vec.is_empty ())
+		size = MAX (size, ASAN_RED_ZONE_SIZE);
+
+	      unsigned HOST_WIDE_INT alignment = MAX (alignb,
+						      ASAN_MIN_RED_ZONE_SIZE);
+	      offset = alloc_stack_frame_space (size, alignment);
 
 	      data->asan_vec.safe_push (prev_offset);
 	      /* Allocating a constant amount of space from a constant
@@ -2259,7 +2263,7 @@ expand_used_vars (void)
 			 & ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
 	  /* Allocating a constant amount of space from a constant
 	     starting offset must give a constant result.  */
-	  offset = (alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE)
+	  offset = (alloc_stack_frame_space (redzonesz, ASAN_SHADOW_GRANULARITY)
 		    .to_constant ());
 	  data.asan_vec.safe_push (prev_offset);
 	  data.asan_vec.safe_push (offset);
diff --git a/gcc/testsuite/c-c++-common/asan/asan-stack-small.c b/gcc/testsuite/c-c++-common/asan/asan-stack-small.c
new file mode 100644
index 00000000000..11a56b8db4c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/asan/asan-stack-small.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+char *pa;
+char *pb;
+char *pc;
+
+void access (volatile char *ptr)
+{
+  *ptr = 'x';
+}
+
+int main (int argc, char **argv)
+{
+  char a;
+  char b;
+  char c;
+
+  pa = &a;
+  pb = &b;
+  pc = &c;
+
+  access (pb);
+  access (pc);
+  // access 'b' here
+  access (pa + 32);
+
+  return 0;
+}
Jakub Jelinek Nov. 29, 2018, 3:17 p.m. UTC | #13
On Thu, Nov 29, 2018 at 04:03:42PM +0100, Martin Liška wrote:
> > Two problems, it uses unconditionally unaligned stores, without
> > checking if the target supports them at all (in this case it does).
> > And, it doesn't check if it wouldn't be more efficient to use
> > 32-bit stores. 
> 
> Ok, so now we reduced the alignment of stack variables to ASAN_MIN_RED_ZONE_SIZE (16).
> What options do wehave when we emit the red zones? The only guarantee we have is
> that in shadow memory we are aligned to at least 2. Should byte-based emission used
> for STRICT_ALIGNMENT targets?

While individual variables can have smaller alignment, why we can't as before ensure
the whole block of all the vars is 32-byte aligned and (see below) also
padded to 32 bytes?  Doesn't __asan_stack_malloc* align those as before and
if not doing use after return sanitization, has your patch changed anything
in how the whole block is aligned on the stack?  Of course, on non-strict
alignment targets we might not care that much.

> >  It isn't that we want the gaps to have whatever
> > value the shadow memory contained before, we want it to be 0 and just rely
> > on it being zero from earlier.
> > Another question is if it wouldn't be worth to ensure the variable area is
> > always a multiple of 32 bytes (thus the corresponding shadow always multiple
> > of 4 bytes).  Then, for this testcase, you'd store:
> > $-235802127, 2147450880(%rbp)	// 0xf1f1f1f1
> > $-234687999, 2147450884(%rbp)   // 0xf202f201
> > $-218041852, 2147450888(%rbp)   // 0xf300f204
> > $-202116109, 2147450892(%rbp)   // 0xf3f3f3f3
> > For the single char/short/int/long var in a function case you'd still emit
> > just f1 f1 f1 f1 0[1240] f3 f3 f3
> > and the number of final f3 bytes would be from 3 to 6 (or 2 to 5), 1 is
> > probably too few.

Here, the padding to 32 bytes.

	Jakub
Martin Liška Nov. 29, 2018, 4:37 p.m. UTC | #14
On 11/29/18 4:17 PM, Jakub Jelinek wrote:
> On Thu, Nov 29, 2018 at 04:03:42PM +0100, Martin Liška wrote:
>>> Two problems, it uses unconditionally unaligned stores, without
>>> checking if the target supports them at all (in this case it does).
>>> And, it doesn't check if it wouldn't be more efficient to use
>>> 32-bit stores. 
>>
>> Ok, so now we reduced the alignment of stack variables to ASAN_MIN_RED_ZONE_SIZE (16).
>> What options do wehave when we emit the red zones? The only guarantee we have is
>> that in shadow memory we are aligned to at least 2. Should byte-based emission used
>> for STRICT_ALIGNMENT targets?
> 
> While individual variables can have smaller alignment, why we can't as before ensure
> the whole block of all the vars is 32-byte aligned and (see below) also
> padded to 32 bytes?  Doesn't __asan_stack_malloc* align those as before and
> if not doing use after return sanitization, has your patch changed anything
> in how the whole block is aligned on the stack?  Of course, on non-strict
> alignment targets we might not care that much.

We can ensure the alignment and padding will be done to 32-bytes. However the current
code jumps over the offsets and can eventually do a store to 1B aligned memory.
See small demo:

$ cat asan-smaller.c
void bar (void *a, void *b, void *c, void *d, void *e, void *f, void *g)
{
  int *p = a;
  *p = 123;
}


int
baz (void)
{
  char a;
  char e[17];
  char e1[30];
  char e2[40];
  bar (&a, 0, 0, 0, &e[0], &e1[0], &e2[0]);
  return 0;
}

int main()
{
  baz();
}

$ gcc asan-smaller.c  -fsanitize=address  && ./a.out
flushing into: -256, values: f1f1f1f1
flushing into: -224, values: 01f20000
flushing into: -192, values: 01f2f2f2
flushing into: -160, values: f2f20000
flushing into: -120, values: 06f2f2f2
flushing into: -88, values: f2000000
flushing into: -40, values: f3f3f3f3
flushing into: -8, values: f3000000
=================================================================
==7793==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffffdac0 at pc 0x0000004011a1 bp 0x7fffffffda30 sp 0x7fffffffda28
WRITE of size 4 at 0x7fffffffdac0 thread T0
    #0 0x4011a0 in bar (/home/marxin/Programming/testcases/a.out+0x4011a0)
    #1 0x401293 in baz (/home/marxin/Programming/testcases/a.out+0x401293)
    #2 0x40133b in main (/home/marxin/Programming/testcases/a.out+0x40133b)
    #3 0x7ffff7019fea in __libc_start_main ../csu/libc-start.c:308
    #4 0x401099 in _start (/home/marxin/Programming/testcases/a.out+0x401099)

Address 0x7fffffffdac0 is located in stack of thread T0 at offset 32 in frame
    #0 0x4011bd in baz (/home/marxin/Programming/testcases/a.out+0x4011bd)

  This frame has 4 object(s):
    [32, 33) 'a' (line 11) <== Memory access at offset 32 partially overflows this variable
    [48, 65) 'e' (line 12)
    [112, 142) 'e1' (line 13)
    [176, 216) 'e2' (line 14)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/home/marxin/Programming/testcases/a.out+0x4011a0) in bar
Shadow bytes around the buggy address:
  0x10007fff7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10007fff7b50: 00 00 00 00 f1 f1 f1 f1[01]f2 00 00 01 f2 f2 f2
  0x10007fff7b60: f2 f2 00 00 00 06 f2 f2 f2 f2 00 00 00 00 00 f3
  0x10007fff7b70: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7b90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10007fff7ba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

as seen 06f2f2f2 is completely unaligned write.

So the current code needs to be learned that once we move to another variable we must
start the offset at 32B boundary. Let me take a look..

Martin

> 
>>>  It isn't that we want the gaps to have whatever
>>> value the shadow memory contained before, we want it to be 0 and just rely
>>> on it being zero from earlier.
>>> Another question is if it wouldn't be worth to ensure the variable area is
>>> always a multiple of 32 bytes (thus the corresponding shadow always multiple
>>> of 4 bytes).  Then, for this testcase, you'd store:
>>> $-235802127, 2147450880(%rbp)	// 0xf1f1f1f1
>>> $-234687999, 2147450884(%rbp)   // 0xf202f201
>>> $-218041852, 2147450888(%rbp)   // 0xf300f204
>>> $-202116109, 2147450892(%rbp)   // 0xf3f3f3f3
>>> For the single char/short/int/long var in a function case you'd still emit
>>> just f1 f1 f1 f1 0[1240] f3 f3 f3
>>> and the number of final f3 bytes would be from 3 to 6 (or 2 to 5), 1 is
>>> probably too few.
> 
> Here, the padding to 32 bytes.
> 
> 	Jakub
>
Jakub Jelinek Nov. 29, 2018, 4:46 p.m. UTC | #15
On Thu, Nov 29, 2018 at 05:37:11PM +0100, Martin Liška wrote:
>   0x10007fff7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   0x10007fff7b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   0x10007fff7b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   0x10007fff7b30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   0x10007fff7b40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> =>0x10007fff7b50: 00 00 00 00 f1 f1 f1 f1[01]f2 00 00 01 f2 f2 f2
>   0x10007fff7b60: f2 f2 00 00 00 06 f2 f2 f2 f2 00 00 00 00 00 f3
>   0x10007fff7b70: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
>   0x10007fff7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   0x10007fff7b90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   0x10007fff7ba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 
> as seen 06f2f2f2 is completely unaligned write.

Sure, but the whole frame section where the vars from one function are
is properly aligned.  So, you shouldn't store 0xf2f2f206, but 0xf2f20600
one byte earlier, and then 0x0000f2f2 after it.  Of course, if we decide
that we want to do solely SImode stores.  If we want to complicate it
through determining costs and comparing if we should do a 1 or 2 or 4 byte
store and if unaligned is possible and not very expensive consider even
those in some cases, the code might be more complex.
I guess for now I'd be fine with SImode stores only and if we get complains,
we can address that incrementally (the question is if we'd want hooks or
what to determine it).

	Jakub
Martin Liška Nov. 30, 2018, 11:44 a.m. UTC | #16
On 11/29/18 5:46 PM, Jakub Jelinek wrote:
> On Thu, Nov 29, 2018 at 05:37:11PM +0100, Martin Liška wrote:
>>   0x10007fff7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>   0x10007fff7b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>   0x10007fff7b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>   0x10007fff7b30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>   0x10007fff7b40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> =>0x10007fff7b50: 00 00 00 00 f1 f1 f1 f1[01]f2 00 00 01 f2 f2 f2
>>   0x10007fff7b60: f2 f2 00 00 00 06 f2 f2 f2 f2 00 00 00 00 00 f3
>>   0x10007fff7b70: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
>>   0x10007fff7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>   0x10007fff7b90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>   0x10007fff7ba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>
>> as seen 06f2f2f2 is completely unaligned write.
> 
> Sure, but the whole frame section where the vars from one function are
> is properly aligned.  So, you shouldn't store 0xf2f2f206, but 0xf2f20600
> one byte earlier, and then 0x0000f2f2 after it.  Of course, if we decide
> that we want to do solely SImode stores.  If we want to complicate it
> through determining costs and comparing if we should do a 1 or 2 or 4 byte
> store and if unaligned is possible and not very expensive consider even
> those in some cases, the code might be more complex.
> I guess for now I'd be fine with SImode stores only and if we get complains,
> we can address that incrementally (the question is if we'd want hooks or
> what to determine it).
> 
> 	Jakub
> 

Ok, I'm sending updated version of the patch. I factored out the shadow memory
byte emission into a class, it's responsible for underlying flushing and guarantees
that stores are 4B aligned (when beginning of stack vars is properly aligned
to ASAN_RED_ZONE_SIZE).

So far I tested the patch on x86_64-linux-gnu and ppc64le-linux-gnu machine.

Martin
From 06b434ecf6e80bd1f99c1f48d41a72fd6fe1d1f6 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 25 Sep 2018 10:54:37 +0200
Subject: [PATCH] Make red zone size more flexible for stack variables (PR
 sanitizer/81715).

gcc/ChangeLog:

2018-11-30  Martin Liska  <mliska@suse.cz>

	PR sanitizer/81715
	* asan.c (asan_shadow_cst): Remove, partially transform
	into flush_redzone_payload.
	(RZ_BUFFER_SIZE): New.
	(struct asan_redzone_buffer): New.
	(asan_redzone_buffer::emit_redzone_byte): Likewise.
	(asan_redzone_buffer::flush_redzone_payload): Likewise.
	(asan_redzone_buffer::flush_if_full): Likewise.
	(asan_emit_stack_protection): Use asan_redzone_buffer class
	that is responsible for proper aligned stores and flushing
	of shadow memory payload.
	* asan.h (ASAN_MIN_RED_ZONE_SIZE): New.
	(asan_var_and_redzone_size): Likewise.
	* cfgexpand.c (expand_stack_vars): Use smaller alignment
	(ASAN_MIN_RED_ZONE_SIZE) in order to make shadow memory
	for automatic variables more compact.

gcc/testsuite/ChangeLog:

2018-11-30  Martin Liska  <mliska@suse.cz>

	PR sanitizer/81715
	* c-c++-common/asan/asan-stack-small.c: New test.
---
 gcc/asan.c                                    | 202 ++++++++++++++----
 gcc/asan.h                                    |  25 +++
 gcc/cfgexpand.c                               |  14 +-
 .../c-c++-common/asan/asan-stack-small.c      |  28 +++
 4 files changed, 219 insertions(+), 50 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/asan/asan-stack-small.c

diff --git a/gcc/asan.c b/gcc/asan.c
index 45906bf8fee..5d1d1dec064 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1155,20 +1155,6 @@ asan_pp_string (pretty_printer *pp)
   return build1 (ADDR_EXPR, shadow_ptr_types[0], ret);
 }
 
-/* Return a CONST_INT representing 4 subsequent shadow memory bytes.  */
-
-static rtx
-asan_shadow_cst (unsigned char shadow_bytes[4])
-{
-  int i;
-  unsigned HOST_WIDE_INT val = 0;
-  gcc_assert (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
-  for (i = 0; i < 4; i++)
-    val |= (unsigned HOST_WIDE_INT) shadow_bytes[BYTES_BIG_ENDIAN ? 3 - i : i]
-	   << (BITS_PER_UNIT * i);
-  return gen_int_mode (val, SImode);
-}
-
 /* Clear shadow memory at SHADOW_MEM, LEN bytes.  Can't call a library call here
    though.  */
 
@@ -1235,6 +1221,136 @@ shadow_mem_size (unsigned HOST_WIDE_INT size)
   return ROUND_UP (size, ASAN_SHADOW_GRANULARITY) / ASAN_SHADOW_GRANULARITY;
 }
 
+/* Always emit 4 bytes at a time.  */
+#define RZ_BUFFER_SIZE 4
+
+/* ASAN redzone buffer container that handles emission of shadow bytes.  */
+struct asan_redzone_buffer
+{
+  /* Constructor.  */
+  asan_redzone_buffer (rtx shadow_mem, HOST_WIDE_INT prev_offset):
+    m_shadow_mem (shadow_mem), m_prev_offset (prev_offset),
+    m_original_offset (prev_offset), m_shadow_bytes (RZ_BUFFER_SIZE)
+  {}
+
+  /* Emit VALUE shadow byte at a given OFFSET.  */
+  void emit_redzone_byte (HOST_WIDE_INT offset, unsigned char value);
+
+  /* Emit RTX emission of the content of the buffer.  */
+  void flush_redzone_payload (void);
+
+private:
+  /* Flush if the content of the buffer is full
+     (equal to RZ_BUFFER_SIZE).  */
+  void flush_if_full (void);
+
+  /* Memory where we last emitted a redzone payload.  */
+  rtx m_shadow_mem;
+
+  /* Relative offset where we last emitted a redzone payload.  */
+  HOST_WIDE_INT m_prev_offset;
+
+  /* Relative original offset.  Used for checking only.  */
+  HOST_WIDE_INT m_original_offset;
+
+public:
+  /* Buffer with redzone payload.  */
+  auto_vec<unsigned char> m_shadow_bytes;
+};
+
+/* Emit VALUE shadow byte at a given OFFSET.  */
+
+void
+asan_redzone_buffer::emit_redzone_byte (HOST_WIDE_INT offset,
+					unsigned char value)
+{
+  gcc_assert ((offset & (ASAN_SHADOW_GRANULARITY - 1)) == 0);
+  gcc_assert (offset >= m_prev_offset);
+
+  HOST_WIDE_INT off
+    = m_prev_offset + ASAN_SHADOW_GRANULARITY * m_shadow_bytes.length ();
+  if (off == offset)
+    {
+      /* Consecutive shadow memory byte.  */
+      m_shadow_bytes.safe_push (value);
+      flush_if_full ();
+    }
+  else
+    {
+      if (!m_shadow_bytes.is_empty ())
+	flush_redzone_payload ();
+
+      /* Maybe start earlier in order to use aligned store.  */
+      HOST_WIDE_INT align = (offset - m_prev_offset) % ASAN_RED_ZONE_SIZE;
+      if (align)
+	{
+	  offset -= align;
+	  for (unsigned i = 0; i < align / BITS_PER_UNIT; i++)
+	    m_shadow_bytes.safe_push (0);
+	}
+
+      /* Adjust m_prev_offset and m_shadow_mem.  */
+      HOST_WIDE_INT diff = offset - m_prev_offset;
+      m_shadow_mem = adjust_address (m_shadow_mem, VOIDmode,
+				     diff >> ASAN_SHADOW_SHIFT);
+      m_prev_offset = offset;
+      m_shadow_bytes.safe_push (value);
+      flush_if_full ();
+    }
+}
+
+/* Emit RTX emission of the content of the buffer.  */
+
+void
+asan_redzone_buffer::flush_redzone_payload (void)
+{
+  gcc_assert (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
+
+  if (m_shadow_bytes.is_empty ())
+    return;
+
+  /* Be sure we always emit to an aligned address.  */
+  gcc_assert (((m_prev_offset - m_original_offset)
+	      & (ASAN_RED_ZONE_SIZE - 1)) == 0);
+
+  /* Fill it to RZ_BUFFER_SIZE bytes with zeros if needed.  */
+  unsigned l = m_shadow_bytes.length ();
+  for (unsigned i = 0; i <= RZ_BUFFER_SIZE - l; i++)
+    m_shadow_bytes.safe_push (0);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file,
+	     "Flushing rzbuffer at offset %" PRId64 " with: ", m_prev_offset);
+
+  unsigned HOST_WIDE_INT val = 0;
+  for (unsigned i = 0; i < RZ_BUFFER_SIZE; i++)
+    {
+      unsigned char v
+	= m_shadow_bytes[BYTES_BIG_ENDIAN ? RZ_BUFFER_SIZE - i : i];
+      val |= (unsigned HOST_WIDE_INT)v << (BITS_PER_UNIT * i);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "%02x ", v);
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "\n");
+
+  rtx c = gen_int_mode (val, SImode);
+  m_shadow_mem = adjust_address (m_shadow_mem, SImode, 0);
+  emit_move_insn (m_shadow_mem, c);
+  m_shadow_bytes.truncate (0);
+}
+
+/* Flush if the content of the buffer is full
+   (equal to RZ_BUFFER_SIZE).  */
+
+void
+asan_redzone_buffer::flush_if_full (void)
+{
+  if (m_shadow_bytes.length () == RZ_BUFFER_SIZE)
+    flush_redzone_payload ();
+}
+
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
    directly, epilogue sequence returned.  BASE is the register holding the
    stack base, against which OFFSETS array offsets are relative to, OFFSETS
@@ -1256,7 +1372,6 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   rtx_code_label *lab;
   rtx_insn *insns;
   char buf[32];
-  unsigned char shadow_bytes[4];
   HOST_WIDE_INT base_offset = offsets[length - 1];
   HOST_WIDE_INT base_align_bias = 0, offset, prev_offset;
   HOST_WIDE_INT asan_frame_size = offsets[0] - base_offset;
@@ -1421,46 +1536,43 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   if (STRICT_ALIGNMENT)
     set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
   prev_offset = base_offset;
+
+  asan_redzone_buffer rz_buffer (shadow_mem, prev_offset);
   for (l = length; l; l -= 2)
     {
       if (l == 2)
 	cur_shadow_byte = ASAN_STACK_MAGIC_RIGHT;
       offset = offsets[l - 1];
-      if ((offset - base_offset) & (ASAN_RED_ZONE_SIZE - 1))
+
+      bool extra_byte = (offset - base_offset) & (ASAN_SHADOW_GRANULARITY - 1);
+      /* If a red-zone is not aligned to ASAN_SHADOW_GRANULARITY then
+	 the previous stack variable has size % ASAN_SHADOW_GRANULARITY != 0.
+	 In that case we have to emit one extra byte that will describe
+	 how many bytes (our of ASAN_SHADOW_GRANULARITY) can be accessed.  */
+      if (extra_byte)
 	{
-	  int i;
 	  HOST_WIDE_INT aoff
 	    = base_offset + ((offset - base_offset)
-			     & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1));
-	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
-				       (aoff - prev_offset)
-				       >> ASAN_SHADOW_SHIFT);
-	  prev_offset = aoff;
-	  for (i = 0; i < 4; i++, aoff += ASAN_SHADOW_GRANULARITY)
-	    if (aoff < offset)
-	      {
-		if (aoff < offset - (HOST_WIDE_INT)ASAN_SHADOW_GRANULARITY + 1)
-		  shadow_bytes[i] = 0;
-		else
-		  shadow_bytes[i] = offset - aoff;
-	      }
-	    else
-	      shadow_bytes[i] = ASAN_STACK_MAGIC_MIDDLE;
-	  emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes));
-	  offset = aoff;
+			     & ~(ASAN_SHADOW_GRANULARITY - HOST_WIDE_INT_1));
+	  rz_buffer.emit_redzone_byte (aoff, offset - aoff);
+	  offset = aoff + ASAN_SHADOW_GRANULARITY;
 	}
-      while (offset <= offsets[l - 2] - ASAN_RED_ZONE_SIZE)
+
+      /* Calculate size of red zone payload.  */
+      while (offset < offsets[l - 2])
 	{
-	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
-				       (offset - prev_offset)
-				       >> ASAN_SHADOW_SHIFT);
-	  prev_offset = offset;
-	  memset (shadow_bytes, cur_shadow_byte, 4);
-	  emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes));
-	  offset += ASAN_RED_ZONE_SIZE;
+	  rz_buffer.emit_redzone_byte (offset, cur_shadow_byte);
+	  offset += ASAN_SHADOW_GRANULARITY;
 	}
+
       cur_shadow_byte = ASAN_STACK_MAGIC_MIDDLE;
     }
+
+  /* As the automatic variables are aligned to
+     ASAN_RED_ZONE_SIZE / ASAN_SHADOW_GRANULARITY, the buffer should be
+     flushed here.  */
+  gcc_assert (rz_buffer.m_shadow_bytes.is_empty ());
+
   do_pending_stack_adjust ();
 
   /* Construct epilogue sequence.  */
@@ -1519,7 +1631,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
   for (l = length; l; l -= 2)
     {
       offset = base_offset + ((offsets[l - 1] - base_offset)
-			     & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1));
+			     & ~(ASAN_MIN_RED_ZONE_SIZE - HOST_WIDE_INT_1));
       if (last_offset + last_size != offset)
 	{
 	  shadow_mem = adjust_address (shadow_mem, VOIDmode,
@@ -1531,7 +1643,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 	  last_size = 0;
 	}
       last_size += base_offset + ((offsets[l - 2] - base_offset)
-				  & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1))
+				  & ~(ASAN_MIN_RED_ZONE_SIZE - HOST_WIDE_INT_1))
 		   - offset;
 
       /* Unpoison shadow memory that corresponds to a variable that is 
@@ -1552,7 +1664,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 			   "%s (%" PRId64 " B)\n", n, size);
 		}
 
-		last_size += size & ~(ASAN_RED_ZONE_SIZE - HOST_WIDE_INT_1);
+		last_size += size & ~(ASAN_MIN_RED_ZONE_SIZE - HOST_WIDE_INT_1);
 	    }
 	}
     }
diff --git a/gcc/asan.h b/gcc/asan.h
index 2f431b4f938..e1b9b491e67 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -53,6 +53,11 @@ extern hash_set <tree> *asan_used_labels;
    up to 2 * ASAN_RED_ZONE_SIZE - 1 bytes.  */
 #define ASAN_RED_ZONE_SIZE	32
 
+/* Stack variable use more compact red zones.  The size includes also
+   size of variable itself.  */
+
+#define ASAN_MIN_RED_ZONE_SIZE	16
+
 /* Shadow memory values for stack protection.  Left is below protected vars,
    the first pointer in stack corresponding to that offset contains
    ASAN_STACK_FRAME_MAGIC word, the second pointer to a string describing
@@ -102,6 +107,26 @@ asan_red_zone_size (unsigned int size)
   return c ? 2 * ASAN_RED_ZONE_SIZE - c : ASAN_RED_ZONE_SIZE;
 }
 
+/* Return how much a stack variable occupis on a stack
+   including a space for red zone.  */
+
+static inline unsigned HOST_WIDE_INT
+asan_var_and_redzone_size (unsigned HOST_WIDE_INT size)
+{
+  if (size <= 4)
+    return 16;
+  else if (size <= 16)
+    return 32;
+  else if (size <= 128)
+    return size + 32;
+  else if (size <= 512)
+    return size + 64;
+  else if (size <= 4096)
+    return size + 128;
+  else
+    return size + 256;
+}
+
 extern bool set_asan_shadow_offset (const char *);
 
 extern void set_sanitized_sections (const char *);
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 21bdcdaeaa3..5e23bc242b9 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1125,13 +1125,17 @@ expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 	      && stack_vars[i].size.is_constant ())
 	    {
 	      prev_offset = align_base (prev_offset,
-					MAX (alignb, ASAN_RED_ZONE_SIZE),
+					MAX (alignb, ASAN_MIN_RED_ZONE_SIZE),
 					!FRAME_GROWS_DOWNWARD);
 	      tree repr_decl = NULL_TREE;
-	      offset
-		= alloc_stack_frame_space (stack_vars[i].size
-					   + ASAN_RED_ZONE_SIZE,
-					   MAX (alignb, ASAN_RED_ZONE_SIZE));
+	      unsigned HOST_WIDE_INT size
+		= asan_var_and_redzone_size (stack_vars[i].size.to_constant ());
+	      if (data->asan_vec.is_empty ())
+		size = MAX (size, ASAN_RED_ZONE_SIZE);
+
+	      unsigned HOST_WIDE_INT alignment = MAX (alignb,
+						      ASAN_MIN_RED_ZONE_SIZE);
+	      offset = alloc_stack_frame_space (size, alignment);
 
 	      data->asan_vec.safe_push (prev_offset);
 	      /* Allocating a constant amount of space from a constant
diff --git a/gcc/testsuite/c-c++-common/asan/asan-stack-small.c b/gcc/testsuite/c-c++-common/asan/asan-stack-small.c
new file mode 100644
index 00000000000..11a56b8db4c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/asan/asan-stack-small.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+char *pa;
+char *pb;
+char *pc;
+
+void access (volatile char *ptr)
+{
+  *ptr = 'x';
+}
+
+int main (int argc, char **argv)
+{
+  char a;
+  char b;
+  char c;
+
+  pa = &a;
+  pb = &b;
+  pc = &c;
+
+  access (pb);
+  access (pc);
+  // access 'b' here
+  access (pa + 32);
+
+  return 0;
+}
Jakub Jelinek Nov. 30, 2018, noon UTC | #17
On Fri, Nov 30, 2018 at 12:44:04PM +0100, Martin Liška wrote:
> Ok, I'm sending updated version of the patch. I factored out the shadow memory
> byte emission into a class, it's responsible for underlying flushing and guarantees
> that stores are 4B aligned (when beginning of stack vars is properly aligned
> to ASAN_RED_ZONE_SIZE).
> 
> So far I tested the patch on x86_64-linux-gnu and ppc64le-linux-gnu machine.

Can you please do a bootstrap-asan too at least on the former to test it
some more?
Ok for trunk if that succeeds or doesn't regress compared to without this patch.

Thanks.

> 2018-11-30  Martin Liska  <mliska@suse.cz>
> 
> 	PR sanitizer/81715
> 	* asan.c (asan_shadow_cst): Remove, partially transform
> 	into flush_redzone_payload.
> 	(RZ_BUFFER_SIZE): New.
> 	(struct asan_redzone_buffer): New.
> 	(asan_redzone_buffer::emit_redzone_byte): Likewise.
> 	(asan_redzone_buffer::flush_redzone_payload): Likewise.
> 	(asan_redzone_buffer::flush_if_full): Likewise.
> 	(asan_emit_stack_protection): Use asan_redzone_buffer class
> 	that is responsible for proper aligned stores and flushing
> 	of shadow memory payload.
> 	* asan.h (ASAN_MIN_RED_ZONE_SIZE): New.
> 	(asan_var_and_redzone_size): Likewise.
> 	* cfgexpand.c (expand_stack_vars): Use smaller alignment
> 	(ASAN_MIN_RED_ZONE_SIZE) in order to make shadow memory
> 	for automatic variables more compact.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-11-30  Martin Liska  <mliska@suse.cz>
> 
> 	PR sanitizer/81715
> 	* c-c++-common/asan/asan-stack-small.c: New test.

	Jakub
diff mbox series

Patch

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 35ca276e4ad..c735e9ca2a3 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1128,9 +1128,18 @@  expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 					MAX (alignb, ASAN_RED_ZONE_SIZE),
 					!FRAME_GROWS_DOWNWARD);
 	      tree repr_decl = NULL_TREE;
+	      poly_uint64 size = stack_vars[i].size;
+	      /* For small variables shrink middle redzone (including
+	       * variable store) just to ASAN_RED_ZONE_SIZE.  */
+	      if (ASAN_STACK_SMALL_REDZONE
+		  && i != n - 1
+		  && (stack_vars[i].size.to_constant ()
+		      <= (ASAN_RED_ZONE_SIZE / 2)))
+		;
+	      else
+		size += ASAN_RED_ZONE_SIZE;
 	      offset
-		= alloc_stack_frame_space (stack_vars[i].size
-					   + ASAN_RED_ZONE_SIZE,
+		= alloc_stack_frame_space (size,
 					   MAX (alignb, ASAN_RED_ZONE_SIZE));
 
 	      data->asan_vec.safe_push (prev_offset);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7ef4e7a449b..1c55887b9cd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11313,6 +11313,9 @@  Enable buffer overflow detection for stack objects.  This kind of
 protection is enabled by default when using @option{-fsanitize=address}.
 To disable stack protection use @option{--param asan-stack=0} option.
 
+@item asan-stack-small-redzone
+Emit smaller middle redzone for variables smaller or equal to 16 bytes.
+
 @item asan-instrument-reads
 Enable buffer overflow detection for memory reads.  This kind of
 protection is enabled by default when using @option{-fsanitize=address}.
diff --git a/gcc/params.def b/gcc/params.def
index 9f0697327d4..f1fafb4002d 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1201,6 +1201,12 @@  DEFPARAM (PARAM_ASAN_STACK,
          "Enable asan stack protection.",
          1, 0, 1)
 
+DEFPARAM (PARAM_ASAN_STACK_SMALL_REDZONE,
+	  "asan-stack-small-redzone",
+	  "Emit smaller middle redzone for variables smaller "
+	  "or equal to 16 bytes.",
+	  0, 0, 1)
+
 DEFPARAM (PARAM_ASAN_PROTECT_ALLOCAS,
 	"asan-instrument-allocas",
 	"Enable asan allocas/VLAs protection.",
diff --git a/gcc/params.h b/gcc/params.h
index 8aa960a904e..42df40e66fd 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -236,6 +236,8 @@  extern void init_param_values (int *params);
   PARAM_VALUE (PARAM_ALLOW_PACKED_STORE_DATA_RACES)
 #define ASAN_STACK \
   PARAM_VALUE (PARAM_ASAN_STACK)
+#define ASAN_STACK_SMALL_REDZONE \
+  PARAM_VALUE (PARAM_ASAN_STACK_SMALL_REDZONE)
 #define ASAN_PROTECT_ALLOCAS \
   PARAM_VALUE (PARAM_ASAN_PROTECT_ALLOCAS)
 #define ASAN_GLOBALS \
diff --git a/gcc/testsuite/c-c++-common/asan/asan-stack-small.c b/gcc/testsuite/c-c++-common/asan/asan-stack-small.c
new file mode 100644
index 00000000000..dc08edaad71
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/asan/asan-stack-small.c
@@ -0,0 +1,29 @@ 
+/* { dg-do run } */
+/* { dg-options "--param asan-stack-small-redzone=1" } */
+
+char *pa;
+char *pb;
+char *pc;
+
+void access (volatile char *ptr)
+{
+  *ptr = 'x';
+}
+
+int main (int argc, char **argv)
+{
+  char a;
+  char b;
+  char c;
+
+  pa = &a;
+  pb = &b;
+  pc = &c;
+
+  access (pb);
+  access (pc);
+  // access 'b' here
+  access (pa + 32);
+
+  return 0;
+}