[5/X,libsanitizer,mid-end] Introduce stack variable handling for HWASAN
diff mbox series

Message ID HE1PR0802MB22512B2A16096825F43EF7C8E0780@HE1PR0802MB2251.eurprd08.prod.outlook.com
State New
Headers show
Series
  • [5/X,libsanitizer,mid-end] Introduce stack variable handling for HWASAN
Related show

Commit Message

Matthew Malcomson Nov. 7, 2019, 6:37 p.m. UTC
Handling stack variables has three features.

1) Ensure HWASAN required alignment for stack variables

When tagging shadow memory, we need to ensure that each tag granule is
only used by one variable at a time.

This is done by ensuring that each tagged variable is aligned to the tag
granule representation size and also ensure that the end of each
variable as an alignment boundary between the end and the start of any
other data stored on the stack.

This patch ensures that by adding alignment requirements in
`align_local_variable` and forcing all stack variable allocation to be
deferred so that `expand_stack_vars` can ensure the stack pointer is
aligned before allocating any variable for the current frame.

2) Put tags into each stack variable pointer

Make sure that every pointer to a stack variable includes a tag of some
sort on it.

The way tagging works is:
  1) For every new stack frame, a random tag is generated.
  2) A base register is formed from the stack pointer value and this
     random tag.
  3) References to stack variables are now formed with RTL describing an
     offset from this base in both tag and value.

The random tag generation is handled by a backend hook.  This hook
decides whether to introduce a random tag or use the stack background
based on the parameter hwasan-random-frame-tag.  Using the stack
background is necessary for testing and bootstrap.  It is necessary
during bootstrap to avoid breaking the `configure` test program for
determining stack direction.

Using the stack background means that every stack frame has the initial
tag of zero and variables are tagged with incrementing tags from 1,
which also makes debugging a bit easier.

The tag&value offsets are also handled by a backend hook.

This patch also adds some macros defining how the HWASAN shadow memory
is stored and how a tag is stored in a pointer.

3) For each stack variable, tag and untag the shadow stack on function
   prologue and epilogue.

On entry to each function we tag the relevant shadow stack region for
each stack variable the tag to match the tag added to each pointer for
that variable.

This is the first patch where we use the HWASAN shadow space, so we need
to add in the libhwasan initialisation code that creates this shadow
memory region into the binary we produce.  This instrumentation is done
in `compile_file`.

When exiting a function we need to ensure the shadow stack for this
function has no remaining tag.  Without clearing the shadow stack area
for this stack frame, later function calls could get false positives
when those later function calls check untagged areas (such as parameters
passed on the stack) against a shadow stack area with left-over tag.

Hence we ensure that the entire stack frame is cleared on function exit.

gcc/ChangeLog:

2019-11-07  Matthew Malcomson  <matthew.malcomson@arm.com>

	* asan.c (hwasan_record_base): New function.
	(hwasan_emit_untag_frame): New.
	(hwasan_increment_tag): New function.
	(hwasan_with_tag): New function.
	(hwasan_tag_init): New function.
	(initialize_sanitizer_builtins): Define new builtins.
	(ATTR_NOTHROW_LIST): New macro.
	(hwasan_current_tag): New.
	(hwasan_emit_prologue): New.
	(hwasan_create_untagged_base): New.
	(hwasan_finish_file): New.
	(hwasan_sanitize_stack_p): New.
	(memory_tagging_p): New.
	* asan.h (hwasan_record_base): New declaration.
	(hwasan_emit_untag_frame): New.
	(hwasan_increment_tag): New declaration.
	(hwasan_with_tag): New declaration.
	(hwasan_sanitize_stack_p): New declaration.
	(hwasan_tag_init): New declaration.
	(memory_tagging_p): New declaration.
	(HWASAN_TAG_SIZE): New macro.
	(HWASAN_TAG_GRANULE_SIZE):New macro.
	(HWASAN_SHIFT):New macro.
	(HWASAN_SHIFT_RTX):New macro.
	(HWASAN_STACK_BACKGROUND):New macro.
	(hwasan_finish_file): New.
	(hwasan_current_tag): New.
	(hwasan_create_untagged_base): New.
	(hwasan_emit_prologue): New.
	* cfgexpand.c (struct stack_vars_data): Add information to
	record hwasan variable stack offsets.
	(expand_stack_vars): Ensure variables are offset from a tagged
	base. Record offsets for hwasan. Ensure alignment.
	(expand_used_vars): Call function to emit prologue, and get
	untagging instructions for function exit.
	(align_local_variable): Ensure alignment.
	(defer_stack_allocation): Ensure all variables are deferred so
	they can be handled by `expand_stack_vars`.
	(expand_one_stack_var_at): Account for tags in
	variables when using HWASAN.
	(expand_one_stack_var_1): Pass new argument to
	expand_one_stack_var_at.
	(init_vars_expansion): Initialise hwasan internal variables when
	starting variable expansion.
	* doc/tm.texi (TARGET_MEMTAG_GENTAG): Document.
	* doc/tm.texi.in (TARGET_MEMTAG_GENTAG): Document.
	* explow.c (get_dynamic_stack_base): Parametrise stack vars RTX
	base.
	* explow.h (get_dynamic_stack_base): New declaration.
	* expr.c (force_operand): Use new addtag_force_operand hook.
	* target.def (TARGET_MEMTAG_GENTAG, TARGET_MEMTAG_ADDTAG,
	TARGET_MEMTAG_ADDTAG_FORCE_OPERAND): Introduce new hooks.
	* targhooks.c (default_memtag_gentag, default_memtag_addtag):
	New default hooks.
	* targhooks.h (default_memtag_gentag, default_memtag_addtag):
	Declare new default hooks.
	* builtin-types.def (BT_FN_VOID_PTR_UINT8_SIZE): New.
	* builtins.def (DEF_SANITIZER_BUILTIN): Enable for HWASAN.
	* sanitizer.def (BUILT_IN_HWASAN_INIT): New.
	(BUILT_IN_HWASAN_TAG_MEM): New.
	* toplev.c (compile_file): Emit libhwasan initialisation.



###############     Attachment also inlined for ease of reply    ###############
diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
index 4f60bed3fd6e98b47a3a38aea6eba2a7c320da25..91989f4bb1db6ccff564383777757b896645e541 100644
--- a/config/bootstrap-hwasan.mk
+++ b/config/bootstrap-hwasan.mk
@@ -1,7 +1,11 @@
 # This option enables -fsanitize=hwaddress for stage2 and stage3.
+# We need to disable random frame tags for bootstrap since the autoconf check
+# for which direction the stack is growing has UB that a random frame tag
+# breaks.  Running with a random frame tag gives approx. 50% chance of
+# bootstrap comparison diff in libiberty/alloca.c.
 
-STAGE2_CFLAGS += -fsanitize=hwaddress
-STAGE3_CFLAGS += -fsanitize=hwaddress
+STAGE2_CFLAGS += -fsanitize=hwaddress --param hwasan-random-frame-tag=0
+STAGE3_CFLAGS += -fsanitize=hwaddress --param hwasan-random-frame-tag=0
 POSTSTAGE1_LDFLAGS += -fsanitize=hwaddress -static-libhwasan \
 		      -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/ \
 		      -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/ \
diff --git a/gcc/asan.h b/gcc/asan.h
index 7675f18a84ee3f187ba4cb40db0ce232f3958762..467231f8dad031a6176aeaddb9414f768b2af3fc 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -23,6 +23,18 @@ along with GCC; see the file COPYING3.  If not see
 
 extern void asan_function_start (void);
 extern void asan_finish_file (void);
+extern void hwasan_finish_file (void);
+extern void hwasan_record_base (rtx);
+extern uint8_t hwasan_current_tag ();
+extern void hwasan_increment_tag ();
+extern rtx hwasan_with_tag (rtx, poly_int64);
+extern void hwasan_tag_init ();
+extern rtx hwasan_create_untagged_base (rtx);
+extern rtx hwasan_extract_tag (rtx tagged_pointer);
+extern void hwasan_emit_prologue (rtx *, rtx *, poly_int64 *, uint8_t *, size_t);
+extern rtx_insn *hwasan_emit_untag_frame (rtx, rtx, rtx_insn *);
+extern bool memory_tagging_p (void);
+extern bool hwasan_sanitize_stack_p (void);
 extern rtx_insn *asan_emit_stack_protection (rtx, rtx, unsigned int,
 					     HOST_WIDE_INT *, tree *, int);
 extern rtx_insn *asan_emit_allocas_unpoison (rtx, rtx, rtx_insn *);
@@ -75,6 +87,31 @@ extern hash_set <tree> *asan_used_labels;
 
 #define ASAN_USE_AFTER_SCOPE_ATTRIBUTE	"use after scope memory"
 
+/* NOTE: The values below define an ABI and are hard-coded to these values in
+   libhwasan, hence they can't be changed independently here.  */
+/* How many bits are used to store a tag in a pointer.
+   HWASAN uses the entire top byte of a pointer (i.e. 8 bits).  */
+#define HWASAN_TAG_SIZE 8
+/* Tag Granule of HWASAN shadow stack.
+   This is the size in real memory that each byte in the shadow memory refers
+   to.  I.e. if a variable is X bytes long in memory then it's tag in shadow
+   memory will span X / HWASAN_TAG_GRANULE_SIZE bytes.
+   Most variables will need to be aligned to this amount since two variables
+   that are neighbours in memory and share a tag granule would need to share
+   the same tag (the shared tag granule can only store one tag).  */
+#define HWASAN_TAG_SHIFT_SIZE 4
+#define HWASAN_TAG_GRANULE_SIZE (1ULL << HWASAN_TAG_SHIFT_SIZE)
+/* Define the tag for the stack background.
+   This defines what tag the stack pointer will be and hence what tag all
+   variables that are not given special tags are (e.g. spilled registers,
+   and parameters passed on the stack).  */
+#define HWASAN_STACK_BACKGROUND 0
+/* How many bits to shift in order to access the tag bits.
+   The tag is stored in the top 8 bits of a pointer hence shifting 56 bits will
+   leave just the tag.  */
+#define HWASAN_SHIFT 56
+#define HWASAN_SHIFT_RTX const_int_rtx[MAX_SAVED_CONST_INT + HWASAN_SHIFT]
+
 /* Various flags for Asan builtins.  */
 enum asan_check_flags
 {
diff --git a/gcc/asan.c b/gcc/asan.c
index a731bd490b4e78e916ae20fc9a0249c1fc04daa5..2e79d39785467651c352169dae4551a47d7b3613 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -257,6 +257,9 @@ hash_set<tree> *asan_handled_variables = NULL;
 
 hash_set <tree> *asan_used_labels = NULL;
 
+static uint8_t tag_offset = 0;
+static rtx hwasan_base_ptr = NULL_RTX;
+
 /* Sets shadow offset to value in string VAL.  */
 
 bool
@@ -1352,6 +1355,21 @@ asan_redzone_buffer::flush_if_full (void)
     flush_redzone_payload ();
 }
 
+/* Returns whether we are tagging pointers and checking those tags on memory
+   access.  This is true when checking with either in software or hardware.  */
+bool
+memory_tagging_p ()
+{
+    return sanitize_flags_p (SANITIZE_HWADDRESS);
+}
+
+/* Are we tagging the stack?  */
+bool
+hwasan_sanitize_stack_p ()
+{
+  return (memory_tagging_p () && HWASAN_STACK);
+}
+
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
    directly, epilogue sequence returned.  BASE is the register holding the
    stack base, against which OFFSETS array offsets are relative to, OFFSETS
@@ -2884,6 +2902,11 @@ initialize_sanitizer_builtins (void)
     = build_function_type_list (void_type_node, uint64_type_node,
 				ptr_type_node, NULL_TREE);
 
+  tree BT_FN_VOID_PTR_UINT8_SIZE
+    = build_function_type_list (void_type_node, ptr_type_node,
+				unsigned_char_type_node, size_type_node,
+				NULL_TREE);
+
   tree BT_FN_BOOL_VPTR_PTR_IX_INT_INT[5];
   tree BT_FN_IX_CONST_VPTR_INT[5];
   tree BT_FN_IX_VPTR_IX_INT[5];
@@ -2934,6 +2957,8 @@ initialize_sanitizer_builtins (void)
 #define BT_FN_I16_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[4]
 #define BT_FN_I16_VPTR_I16_INT BT_FN_IX_VPTR_IX_INT[4]
 #define BT_FN_VOID_VPTR_I16_INT BT_FN_VOID_VPTR_IX_INT[4]
+#undef ATTR_NOTHROW_LIST
+#define ATTR_NOTHROW_LIST ECF_NOTHROW
 #undef ATTR_NOTHROW_LEAF_LIST
 #define ATTR_NOTHROW_LEAF_LIST ECF_NOTHROW | ECF_LEAF
 #undef ATTR_TMPURE_NOTHROW_LEAF_LIST
@@ -3684,4 +3709,189 @@ make_pass_asan_O0 (gcc::context *ctxt)
   return new pass_asan_O0 (ctxt);
 }
 
+void
+hwasan_record_base (rtx base)
+{
+  /* Initialise tag of the base register.
+     This has to be done as soon as the stack is getting expanded to ensure
+     anything emitted with `get_dynamic_stack_base` will use the value set here
+     instead of using a register without a value.
+     Especially note that RTL expansion of large aligned values does that.  */
+  targetm.memtag.gentag (base, virtual_stack_vars_rtx);
+  hwasan_base_ptr = base;
+}
+
+uint8_t
+hwasan_current_tag ()
+{
+  return tag_offset;
+}
+
+void
+hwasan_increment_tag ()
+{
+  uint8_t tag_bits = HWASAN_TAG_SIZE;
+  tag_offset = (tag_offset + 1) % (1 << tag_bits);
+}
+
+rtx
+hwasan_with_tag (rtx base, poly_int64 offset)
+{
+  gcc_assert (tag_offset < (1 << HWASAN_TAG_SIZE));
+  return targetm.memtag.addtag (base, offset, tag_offset);
+}
+
+/* Clear internal state for the next function.
+   This function is called before variables on the stack get expanded, in
+   `init_vars_expansion`.  */
+void
+hwasan_tag_init ()
+{
+  delete asan_used_labels;
+  asan_used_labels = NULL;
+
+  hwasan_base_ptr = NULL_RTX;
+  tag_offset = HWASAN_STACK_BACKGROUND + 1;
+}
+
+rtx
+hwasan_extract_tag (rtx tagged_pointer)
+{
+  rtx tag = expand_simple_binop (Pmode,
+				 LSHIFTRT,
+				 tagged_pointer,
+				 HWASAN_SHIFT_RTX,
+				 NULL_RTX,
+				 /* unsignedp = */0,
+				 OPTAB_DIRECT);
+  return gen_lowpart (QImode, tag);
+}
+
+void
+hwasan_emit_prologue (rtx *bases,
+		      rtx *untagged_bases,
+		      poly_int64 *offsets,
+		      uint8_t *tags,
+		      size_t length)
+{
+  /* We need untagged base pointers since libhwasan only accepts untagged
+    pointers in __hwasan_tag_memory.  We need the tagged base pointer to obtain
+    the base tag for an offset.  */
+  for (size_t i = 0; (i * 2) + 1 < length; i++)
+    {
+      poly_int64 start = offsets[i * 2];
+      poly_int64 end = offsets[(i * 2) + 1];
+
+      poly_int64 bot, top;
+      if (known_ge (start, end))
+	{
+	  top = start;
+	  bot = end;
+	}
+      else
+	{
+	  top = end;
+	  bot = start;
+	}
+      poly_int64 size = (top - bot);
+
+      /* Can't check that all poly_int64's are aligned, but still nice
+	 to check those that are compile-time constants.  */
+      HOST_WIDE_INT tmp;
+      if (top.is_constant (&tmp))
+	gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
+      if (bot.is_constant (&tmp))
+	gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
+      if (size.is_constant (&tmp))
+	gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
+
+      rtx ret = init_one_libfunc ("__hwasan_tag_memory");
+      rtx base_tag = hwasan_extract_tag (bases[i]);
+      /* In the case of tag overflow we would want modulo wrapping -- which
+	 should be given from the `plus_constant` in QImode.  */
+      rtx tag = plus_constant (QImode, base_tag, tags[i]);
+      emit_library_call (ret,
+			 LCT_NORMAL,
+			 VOIDmode,
+			 plus_constant (ptr_mode, untagged_bases[i], bot),
+			 ptr_mode,
+			 tag,
+			 QImode,
+			 gen_int_mode (size, ptr_mode),
+			 ptr_mode);
+    }
+}
+
+rtx_insn *
+hwasan_emit_untag_frame (rtx dynamic, rtx vars, rtx_insn *before)
+{
+  if (before)
+    push_to_sequence (before);
+  else
+    start_sequence ();
+
+  dynamic = convert_memory_address (ptr_mode, dynamic);
+  vars = convert_memory_address (ptr_mode, vars);
+
+  rtx top_rtx;
+  rtx bot_rtx;
+  if (STACK_GROWS_DOWNWARD)
+    {
+      top_rtx = vars;
+      bot_rtx = dynamic;
+    }
+  else
+    {
+      top_rtx = dynamic;
+      bot_rtx = vars;
+    }
+
+  rtx size_rtx = expand_simple_binop (Pmode, MINUS, top_rtx, bot_rtx,
+				  NULL_RTX, /* unsignedp = */0, OPTAB_DIRECT);
+
+  rtx ret = init_one_libfunc ("__hwasan_tag_memory");
+  emit_library_call (ret, LCT_NORMAL, VOIDmode,
+      bot_rtx, ptr_mode,
+      const0_rtx, QImode,
+      size_rtx, ptr_mode);
+
+  do_pending_stack_adjust ();
+  rtx_insn *insns = get_insns ();
+  end_sequence ();
+  return insns;
+}
+
+rtx
+hwasan_create_untagged_base (rtx orig_base)
+{
+  rtx untagged_base = gen_reg_rtx (Pmode);
+  rtx tag_mask = gen_int_mode ((1ULL << HWASAN_SHIFT) - 1, Pmode);
+  untagged_base = expand_binop (Pmode, and_optab,
+				orig_base, tag_mask,
+				untagged_base, true, OPTAB_DIRECT);
+  gcc_assert (untagged_base);
+  return untagged_base;
+}
+
+/* Needs to be GTY(()), because cgraph_build_static_cdtor may
+   invoke ggc_collect.  */
+static GTY(()) tree hwasan_ctor_statements;
+
+void
+hwasan_finish_file (void)
+{
+  /* Do not emit constructor initialisation for the kernel.
+     (the kernel has its own initialisation already).  */
+  if (flag_sanitize & SANITIZE_KERNEL_HWADDRESS)
+    return;
+
+  /* Avoid instrumenting code in the hwasan constructors/destructors.  */
+  flag_sanitize &= ~SANITIZE_HWADDRESS;
+  int priority = MAX_RESERVED_INIT_PRIORITY - 1;
+  tree fn = builtin_decl_implicit (BUILT_IN_HWASAN_INIT);
+  append_to_statement_list (build_call_expr (fn, 0), &hwasan_ctor_statements);
+  cgraph_build_static_cdtor ('I', hwasan_ctor_statements, priority);
+  flag_sanitize |= SANITIZE_HWADDRESS;
+}
+
 #include "gt-asan.h"
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index e5c9e063c480d1392b6c2b395ec9d029b6d94209..d05f597b6434f39fe95d4f28dd2ef3ed463dd925 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -625,6 +625,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_UINT32_UINT32_PTR,
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_SIZE_SIZE_PTR, BT_VOID, BT_SIZE, BT_SIZE,
 		     BT_PTR)
 DEF_FUNCTION_TYPE_3 (BT_FN_UINT_UINT_PTR_PTR, BT_UINT, BT_UINT, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_UINT8_SIZE, BT_VOID, BT_PTR, BT_UINT8,
+		     BT_SIZE)
 
 DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR,
 		     BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR)
diff --git a/gcc/builtins.def b/gcc/builtins.def
index d8233f5f760f8426e8ff85473ecda02aa6c2655b..3f621ffdbda0acf5949348882a7c1f3504634666 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -244,6 +244,7 @@ along with GCC; see the file COPYING3.  If not see
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
 	       true, true, true, ATTRS, true, \
 	      (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
+				| SANITIZE_HWADDRESS \
 				| SANITIZE_UNDEFINED \
 				| SANITIZE_UNDEFINED_NONDEFAULT) \
 	       || flag_sanitize_coverage))
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index c34a53b526b50d49cd73ab5a5c383efc6da5a23e..8e1ea21621e350b3d8779b79b0d7f69d571caa08 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -378,7 +378,14 @@ align_local_variable (tree decl, bool really_expand)
       if (really_expand)
 	SET_DECL_ALIGN (decl, align);
     }
-  return align / BITS_PER_UNIT;
+
+  unsigned int ret_align = align / BITS_PER_UNIT;
+
+  if (hwasan_sanitize_stack_p ())
+    ret_align = ret_align > HWASAN_TAG_GRANULE_SIZE
+      ? ret_align
+      : HWASAN_TAG_GRANULE_SIZE;
+  return ret_align;
 }
 
 /* Align given offset BASE with ALIGN.  Truncate up if ALIGN_UP is true,
@@ -986,7 +993,7 @@ dump_stack_var_partition (void)
 
 static void
 expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
-			 poly_int64 offset)
+			 poly_int64 offset, rtx stack_base)
 {
   unsigned align;
   rtx x;
@@ -994,7 +1001,11 @@ expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
   /* If this fails, we've overflowed the stack frame.  Error nicely?  */
   gcc_assert (known_eq (offset, trunc_int_for_mode (offset, Pmode)));
 
-  x = plus_constant (Pmode, base, offset);
+  if (hwasan_sanitize_stack_p ())
+    x = hwasan_with_tag (base, offset);
+  else
+    x = plus_constant (Pmode, base, offset);
+
   x = gen_rtx_MEM (TREE_CODE (decl) == SSA_NAME
 		   ? TYPE_MODE (TREE_TYPE (decl))
 		   : DECL_MODE (SSAVAR (decl)), x);
@@ -1004,7 +1015,7 @@ expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
       /* Set alignment we actually gave this decl if it isn't an SSA name.
          If it is we generate stack slots only accidentally so it isn't as
 	 important, we'll simply use the alignment that is already set.  */
-      if (base == virtual_stack_vars_rtx)
+      if (base == stack_base)
 	offset -= frame_phase;
       align = known_alignment (offset);
       align *= BITS_PER_UNIT;
@@ -1030,9 +1041,19 @@ public:
      The vector is in reversed, highest offset pairs come first.  */
   auto_vec<HOST_WIDE_INT> asan_vec;
 
+  /* HWASAN records the poly_int64 so it can handle any stack variable.  */
+  auto_vec<poly_int64> hwasan_vec;
+  auto_vec<rtx> hwasan_untagged_base_vec;
+  auto_vec<rtx> hwasan_base_vec;
+
   /* Vector of partition representative decls in between the paddings.  */
   auto_vec<tree> asan_decl_vec;
 
+  /* Vector of tag offsets representing the tag for each stack variable.
+     Each offset determines the difference between the randomly generated
+     tag for the current frame and the tag for this stack variable.  */
+  auto_vec<uint8_t> hwasan_tag_vec;
+
   /* Base pseudo register for Address Sanitizer protected automatic vars.  */
   rtx asan_base;
 
@@ -1050,6 +1071,7 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
   size_t si, i, j, n = stack_vars_num;
   poly_uint64 large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
+  rtx large_untagged_base = NULL;
   unsigned large_align = 0;
   bool large_allocation_done = false;
   tree decl;
@@ -1096,11 +1118,17 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
 	}
     }
 
+  if (hwasan_sanitize_stack_p () && data->asan_base == NULL)
+    {
+      data->asan_base = gen_reg_rtx (Pmode);
+      hwasan_record_base (data->asan_base);
+    }
+
   for (si = 0; si < n; ++si)
     {
       rtx base;
       unsigned base_align, alignb;
-      poly_int64 offset;
+      poly_int64 offset = 0;
 
       i = stack_vars_sorted[si];
 
@@ -1121,10 +1149,36 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
       if (pred && !pred (i))
 	continue;
 
+      base = hwasan_sanitize_stack_p ()
+	? data->asan_base
+	: virtual_stack_vars_rtx;
       alignb = stack_vars[i].alignb;
       if (alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
 	{
-	  base = virtual_stack_vars_rtx;
+	  if (hwasan_sanitize_stack_p ())
+	    {
+	      /* Allocate zero bytes to take advantage of the
+		 alloc_stack_frame_space logic of ensuring the stack is aligned
+		 despite having poly_int64's to deal with.
+
+		 There must be no tag granule "shared" between different
+		 objects.  This means that no HWASAN_TAG_GRANULE_SIZE byte
+		 chunk can have more than one object in it.
+
+		 We ensure this by forcing the end of the last bit of data to
+		 be aligned to HWASAN_TAG_GRANULE_SIZE bytes here, and setting
+		 the start of each variable to be aligned to
+		 HWASAN_TAG_GRANULE_SIZE bytes in `align_local_variable`.
+
+		 We can't align just one of the start or end, since there are
+		 untagged things stored on the stack that we have no control on
+		 the alignment and these can't share a tag granule with a
+		 tagged variable.  */
+	      gcc_assert (stack_vars[i].alignb >= HWASAN_TAG_GRANULE_SIZE);
+	      offset = alloc_stack_frame_space (0, HWASAN_TAG_GRANULE_SIZE);
+	      data->hwasan_vec.safe_push (offset);
+	      data->hwasan_untagged_base_vec.safe_push (virtual_stack_vars_rtx);
+	    }
 	  /* ASAN description strings don't yet have a syntax for expressing
 	     polynomial offsets.  */
 	  HOST_WIDE_INT prev_offset;
@@ -1204,6 +1258,9 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
 	      offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
 	      base_align = crtl->max_used_stack_slot_alignment;
 	    }
+
+	  if (hwasan_sanitize_stack_p ())
+	    data->hwasan_vec.safe_push (offset);
 	}
       else
 	{
@@ -1223,14 +1280,31 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
 	      loffset = alloc_stack_frame_space
 		(rtx_to_poly_int64 (large_allocsize),
 		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
-	      large_base = get_dynamic_stack_base (loffset, large_align);
+	      large_base = get_dynamic_stack_base (loffset, large_align, base);
 	      large_allocation_done = true;
 	    }
-	  gcc_assert (large_base != NULL);
 
+	  gcc_assert (large_base != NULL);
 	  large_alloc = aligned_upper_bound (large_alloc, alignb);
+	  if (hwasan_sanitize_stack_p ())
+	    {
+	      /* An object with a large alignment requirement means that the
+		 alignment requirement is greater than the required alignment
+		 for tags.  */
+	      if (!large_untagged_base)
+		large_untagged_base = hwasan_create_untagged_base (large_base);
+	      data->hwasan_vec.safe_push (large_alloc);
+	      data->hwasan_untagged_base_vec.safe_push (large_untagged_base);
+	    }
 	  offset = large_alloc;
 	  large_alloc += stack_vars[i].size;
+	  if (hwasan_sanitize_stack_p ())
+	    {
+	      /* Ensure the end of the variable is also aligned correctly.  */
+	      poly_int64 align_again =
+		aligned_upper_bound (large_alloc, HWASAN_TAG_GRANULE_SIZE);
+	      data->hwasan_vec.safe_push (align_again);
+	    }
 
 	  base = large_base;
 	  base_align = large_align;
@@ -1242,7 +1316,21 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
 	{
 	  expand_one_stack_var_at (stack_vars[j].decl,
 				   base, base_align,
-				   offset);
+				   offset,
+				   hwasan_sanitize_stack_p ()
+				   ? data->asan_base
+				   : virtual_stack_vars_rtx);
+	}
+
+      if (hwasan_sanitize_stack_p ())
+	{
+	  /* Record the tag for this object in `data` so the prologue knows
+	     what tag to put in the shadow memory during cfgexpand.c.
+	     Then increment the tag so that the next object has a different
+	     tag to this object.  */
+	  data->hwasan_base_vec.safe_push (base);
+	  data->hwasan_tag_vec.safe_push (hwasan_current_tag ());
+	  hwasan_increment_tag ();
 	}
     }
 
@@ -1339,7 +1427,8 @@ expand_one_stack_var_1 (tree var)
   offset = alloc_stack_frame_space (size, byte_align);
 
   expand_one_stack_var_at (var, virtual_stack_vars_rtx,
-			   crtl->max_used_stack_slot_alignment, offset);
+			   crtl->max_used_stack_slot_alignment, offset,
+			   virtual_stack_vars_rtx);
 }
 
 /* Wrapper for expand_one_stack_var_1 that checks SSA_NAMEs are
@@ -1552,8 +1641,13 @@ defer_stack_allocation (tree var, bool toplevel)
 
   /* If stack protection is enabled, *all* stack variables must be deferred,
      so that we can re-order the strings to the top of the frame.
-     Similarly for Address Sanitizer.  */
-  if (flag_stack_protect || asan_sanitize_stack_p ())
+     Similarly for Address Sanitizer.
+     When tagging memory we defer all stack variables so we can handle them in
+     one place (handle here meaning ensure they are aligned and record
+     information on each variables position in the stack).  */
+  if (flag_stack_protect
+      || asan_sanitize_stack_p ()
+      || hwasan_sanitize_stack_p ())
     return true;
 
   unsigned int align = TREE_CODE (var) == SSA_NAME
@@ -1938,6 +2032,8 @@ init_vars_expansion (void)
   /* Initialize local stack smashing state.  */
   has_protected_decls = false;
   has_short_buffer = false;
+  if (hwasan_sanitize_stack_p ())
+    hwasan_tag_init ();
 }
 
 /* Free up stack variable graph data.  */
@@ -2297,12 +2393,27 @@ expand_used_vars (void)
 	}
 
       expand_stack_vars (NULL, &data);
+
+      if (hwasan_sanitize_stack_p ())
+	hwasan_emit_prologue (data.hwasan_base_vec.address (),
+			      data.hwasan_untagged_base_vec.address (),
+			      data.hwasan_vec.address (),
+			      data.hwasan_tag_vec.address (),
+			      data.hwasan_vec.length ());
     }
 
   if (asan_sanitize_allocas_p () && cfun->calls_alloca)
     var_end_seq = asan_emit_allocas_unpoison (virtual_stack_dynamic_rtx,
 					      virtual_stack_vars_rtx,
 					      var_end_seq);
+  /* Here we clear tags fro the entire frame of this function.
+     We need to clear tags of *something* if we have tagged either local
+     variables or alloca objects.  */
+  else if (hwasan_sanitize_stack_p ()
+	   && (cfun->calls_alloca || stack_vars_num > 0))
+    var_end_seq = hwasan_emit_untag_frame (virtual_stack_dynamic_rtx,
+					   virtual_stack_vars_rtx,
+					   var_end_seq);
 
   fini_vars_expansion ();
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index e842b734c9c20253986880ba3622a8f692d3ca88..718d2e8aac56553bdc30c592fe70fa10aa911736 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2982,6 +2982,31 @@ A target hook which lets a backend compute the set of pressure classes to  be us
 True if backend architecture naturally supports ignoring the top byte of pointers.  This feature means that -fsanitize=hwaddress can work.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_MEMTAG_ADDTAG (rtx @var{base}, poly_int64 @var{addr_offset}, uint8_t @var{tag_offset})
+Emit an RTX representing BASE offset in value by ADDR_OFFSET and in tag by TAG_OFFSET.
+The resulting RTX must either be a valid memory address or be able to get
+put into an operand with force_operand.  If overridden the more common case
+is that we force this into an operand using the backend hook
+"addtag_force_operand" that is called in force_operand.
+
+It is expected that that "addtag_force_operand" recognises the RTX
+generated by "addtag" and emits code to force that RTX into an operand.
+@end deftypefn
+
+@deftypefn {Target Hook} rtx TARGET_MEMTAG_ADDTAG_FORCE_OPERAND (rtx @var{oper}, rtx @var{target})
+If the RTL expression OPER is of the form generated by targetm.memtag.addtag,
+then emit instructions to move the value into an operand (i.e. for
+force_operand).
+TARGET is an RTX suggestion of where to generate the value.
+This hook is most often implemented by emitting instructions to put the
+expression into a pseudo register, then returning that pseudo register.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_MEMTAG_GENTAG (rtx @var{base}, rtx @var{untagged})
+Set the BASE argument to UNTAGGED with some random tag.
+This function is used to generate a tagged base for the current stack frame.
+@end deftypefn
+
 @node Stack and Calling
 @section Stack Layout and Calling Conventions
 @cindex calling conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6136ac1a5fe0c0980d5b5123b67b102a1b1e0bcc..3d3761dbc097e6f73f4f4f937aa713b93872a4a2 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2379,6 +2379,12 @@ in the reload pass.
 
 @hook TARGET_MEMTAG_CAN_TAG_ADDRESSES
 
+@hook TARGET_MEMTAG_ADDTAG
+
+@hook TARGET_MEMTAG_ADDTAG_FORCE_OPERAND
+
+@hook TARGET_MEMTAG_GENTAG
+
 @node Stack and Calling
 @section Stack Layout and Calling Conventions
 @cindex calling conventions
diff --git a/gcc/explow.h b/gcc/explow.h
index 5110ad82d6a024fda1d3a3eaf80de40c5e6ad3b6..333948e0c69a1b1132e9a1d06707dc63f1226262 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -102,7 +102,7 @@ extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned,
 extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *);
 
 /* Returns the address of the dynamic stack space without allocating it.  */
-extern rtx get_dynamic_stack_base (poly_int64, unsigned);
+extern rtx get_dynamic_stack_base (poly_int64, unsigned, rtx);
 
 /* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
 extern rtx align_dynamic_address (rtx, unsigned);
diff --git a/gcc/explow.c b/gcc/explow.c
index 7eb854bca4a6dcc5b15e5c42df1a5e88a19f2464..2728f7a4b1ee8ed0d20287b716a3d0ad5a97d84b 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1577,10 +1577,14 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
    OFFSET is the offset of the area into the virtual stack vars area.
 
    REQUIRED_ALIGN is the alignment (in bits) required for the region
-   of memory.  */
+   of memory.
+
+   BASE is the rtx of the base of this virtual stack vars area.
+   The only time this is not `virtual_stack_vars_rtx` is when tagging pointers
+   on the stack.  */
 
 rtx
-get_dynamic_stack_base (poly_int64 offset, unsigned required_align)
+get_dynamic_stack_base (poly_int64 offset, unsigned required_align, rtx base)
 {
   rtx target;
 
@@ -1588,7 +1592,7 @@ get_dynamic_stack_base (poly_int64 offset, unsigned required_align)
     crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
 
   target = gen_reg_rtx (Pmode);
-  emit_move_insn (target, virtual_stack_vars_rtx);
+  emit_move_insn (target, base);
   target = expand_binop (Pmode, add_optab, target,
 			 gen_int_mode (offset, Pmode),
 			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
diff --git a/gcc/expr.c b/gcc/expr.c
index 476c6865f20828fc68f455e70d4874eaabd9d08d..24d011e698af0dbf3635ba5f9d8275376a124bf4 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7500,6 +7500,13 @@ force_operand (rtx value, rtx target)
       return subtarget;
     }
 
+  if (targetm.memtag.addtag_force_operand)
+    {
+      rtx ret = targetm.memtag.addtag_force_operand (value, target);
+      if (ret)
+	return ret;
+    }
+
   if (ARITHMETIC_P (value))
     {
       op2 = XEXP (value, 1);
diff --git a/gcc/sanitizer.def b/gcc/sanitizer.def
index 374d15007d868363d9b4fbf467e1e462abbca61a..7bd50715f24a2cb154b578e2abdea4e8fcdb2107 100644
--- a/gcc/sanitizer.def
+++ b/gcc/sanitizer.def
@@ -180,6 +180,12 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_POINTER_COMPARE, "__sanitizer_ptr_cmp",
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_POINTER_SUBTRACT, "__sanitizer_ptr_sub",
 		      BT_FN_VOID_PTR_PTRMODE, ATTR_NOTHROW_LEAF_LIST)
 
+/* Hardware Address Sanitizer.  */
+DEF_SANITIZER_BUILTIN(BUILT_IN_HWASAN_INIT, "__hwasan_init",
+		      BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_HWASAN_TAG_MEM, "__hwasan_tag_memory",
+		      BT_FN_VOID_PTR_UINT8_SIZE, ATTR_NOTHROW_LIST)
+
 /* Thread Sanitizer */
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_INIT, "__tsan_init", 
 		      BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
diff --git a/gcc/target.def b/gcc/target.def
index ad16a151b6af9b1ee13918c8f2980280d75b1d90..3c533acbe3965cdb0870621e364a009353f72c2e 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -6751,6 +6751,37 @@ DEFHOOK
  pointers.  This feature means that -fsanitize=hwaddress can work.",
  bool, (), default_memtag_can_tag_addresses)
 
+DEFHOOK
+(addtag,
+ "Emit an RTX representing BASE offset in value by ADDR_OFFSET and in tag by\
+ TAG_OFFSET.\n\
+The resulting RTX must either be a valid memory address or be able to get\n\
+put into an operand with force_operand.  If overridden the more common case\n\
+is that we force this into an operand using the backend hook\n\
+\"addtag_force_operand\" that is called in force_operand.\n\
+\n\
+It is expected that that \"addtag_force_operand\" recognises the RTX\n\
+generated by \"addtag\" and emits code to force that RTX into an operand.",
+rtx, (rtx base, poly_int64 addr_offset, uint8_t tag_offset),
+default_memtag_addtag)
+
+DEFHOOK
+(addtag_force_operand,
+ "If the RTL expression OPER is of the form generated by targetm.memtag.addtag,\n\
+then emit instructions to move the value into an operand (i.e. for\n\
+force_operand).\n\
+TARGET is an RTX suggestion of where to generate the value.\n\
+This hook is most often implemented by emitting instructions to put the\n\
+expression into a pseudo register, then returning that pseudo register.",
+rtx, (rtx oper, rtx target), NULL)
+
+DEFHOOK
+(gentag,
+ "Set the BASE argument to UNTAGGED with some random tag.\n\
+This function is used to generate a tagged base for the current stack frame.",
+  void, (rtx base, rtx untagged),
+  default_memtag_gentag)
+
 HOOK_VECTOR_END (memtag)
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 94e865259f35e46e26f6b4763c5e2f9dc9ed1b83..b0e32102acacdf7a64f1e3d314a966d1d3f062c7 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -283,4 +283,6 @@ extern bool speculation_safe_value_not_needed (bool);
 extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx);
 
 extern bool default_memtag_can_tag_addresses ();
+extern void default_memtag_gentag (rtx, rtx);
+extern rtx default_memtag_addtag (rtx, poly_int64, uint8_t);
 #endif /* GCC_TARGHOOKS_H */
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 6e9e877c32e2c8705056bdf0ce2b1b6f125d93c3..cd9f98fc800d7232ead50b03f951364f76c01adc 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "varasm.h"
 #include "flags.h"
 #include "explow.h"
+#include "expmed.h"
 #include "calls.h"
 #include "expr.h"
 #include "output.h"
@@ -84,6 +85,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "sbitmap.h"
 #include "function-abi.h"
+#include "attribs.h"
+#include "asan.h"
 
 bool
 default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
@@ -2371,4 +2374,78 @@ default_memtag_can_tag_addresses ()
   return false;
 }
 
+void
+default_memtag_gentag (rtx base, rtx untagged)
+{
+  gcc_assert (HWASAN_STACK);
+  if (HWASAN_RANDOM_FRAME_TAG)
+    {
+    rtx temp = gen_reg_rtx (QImode);
+    rtx ret = init_one_libfunc ("__hwasan_generate_tag");
+    rtx new_tag = emit_library_call_value (ret, temp, LCT_NORMAL, QImode);
+    emit_move_insn (base, untagged);
+    /* We know that `base` is not the stack pointer, since we never want to put
+      a randomly generated tag into the stack pointer.  Hence we can use
+      `store_bit_field` which on aarch64 generates a `bfi` which can not act on
+      the stack pointer.  */
+    store_bit_field (base, 8, 56, 0, 0, QImode, new_tag, false);
+    }
+  else
+    {
+      /* NOTE: The kernel API does not have __hwasan_generate_tag exposed.
+	 In the future we may add the option emit random tags with inline
+	 instrumentation instead of function calls.  This would be the same
+	 between the kernel and userland.  */
+      emit_move_insn (base, untagged);
+    }
+}
+
+rtx
+default_memtag_addtag (rtx base, poly_int64 offset, uint8_t tag_offset)
+{
+  /* Need to look into what the most efficient code sequence is.
+     This is a code sequence that would be emitted *many* times, so we
+     want it as small as possible.
+
+     If the tag offset is greater that (1 << 7) then the most efficient
+     sequence here would give UB from signed integer overflow in the
+     poly_int64.  Hence in that case we emit the slightly less efficient
+     sequence.
+
+     There are two places where tag overflow is a question:
+       - Tagging the shadow stack.
+	  (both tagging and untagging).
+       - Tagging addressable pointers.
+
+     We need to ensure both behaviours are the same (i.e. that the tag that
+     ends up in a pointer after "overflowing" the tag bits with a tag addition
+     is the same that ends up in the shadow space).
+
+     The aim is that the behaviour of tag addition should follow modulo
+     wrapping in both instances.
+
+     The libhwasan code doesn't have any path that increments a pointers tag,
+     which means it has no opinion on what happens when a tag increment
+     overflows (and hence we can choose our own behaviour).  */
+
+  if (tag_offset < (1 << 7))
+    {
+      offset += ((uint64_t)tag_offset << HWASAN_SHIFT);
+      return plus_constant (Pmode, base, offset);
+    }
+  else
+    {
+      /* This is the fallback, it would be nice if it had less instructions,
+	 but we can look for cleverer ways later.  */
+      uint64_t tag_mask = ~(0xFFUL << HWASAN_SHIFT);
+      rtx untagged_base = gen_rtx_AND (Pmode, GEN_INT (tag_mask), base);
+      rtx new_addr = plus_constant (Pmode, untagged_base, offset);
+
+      rtx original_tag_value = gen_rtx_LSHIFTRT (Pmode, base, GEN_INT (HWASAN_SHIFT));
+      rtx new_tag_value = plus_constant (Pmode, original_tag_value, tag_offset);
+      rtx new_tag = gen_rtx_ASHIFT (Pmode, new_tag_value, GEN_INT (HWASAN_SHIFT));
+      return gen_rtx_IOR (Pmode, new_addr, new_tag);
+    }
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/toplev.c b/gcc/toplev.c
index ab67384249a3437ac37f42f741ed516884677f9f..7bd75548d2aebb3415ac85ec40ad25e5ca794094 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -508,6 +508,9 @@ compile_file (void)
       if (flag_sanitize & SANITIZE_THREAD)
 	tsan_finish_file ();
 
+      if (flag_sanitize & SANITIZE_HWADDRESS)
+	hwasan_finish_file ();
+
       omp_finish_file ();
 
       hsa_output_brig ();

Comments

Martin Liška Nov. 20, 2019, 2:02 p.m. UTC | #1
On 11/7/19 7:37 PM, Matthew Malcomson wrote:
> Handling stack variables has three features.
> 
> 1) Ensure HWASAN required alignment for stack variables
> 
> When tagging shadow memory, we need to ensure that each tag granule is
> only used by one variable at a time.
> 
> This is done by ensuring that each tagged variable is aligned to the tag
> granule representation size and also ensure that the end of each
> variable as an alignment boundary between the end and the start of any
> other data stored on the stack.
> 
> This patch ensures that by adding alignment requirements in
> `align_local_variable` and forcing all stack variable allocation to be
> deferred so that `expand_stack_vars` can ensure the stack pointer is
> aligned before allocating any variable for the current frame.
> 
> 2) Put tags into each stack variable pointer
> 
> Make sure that every pointer to a stack variable includes a tag of some
> sort on it.
> 
> The way tagging works is:
>    1) For every new stack frame, a random tag is generated.
>    2) A base register is formed from the stack pointer value and this
>       random tag.
>    3) References to stack variables are now formed with RTL describing an
>       offset from this base in both tag and value.
> 
> The random tag generation is handled by a backend hook.  This hook
> decides whether to introduce a random tag or use the stack background
> based on the parameter hwasan-random-frame-tag.  Using the stack
> background is necessary for testing and bootstrap.  It is necessary
> during bootstrap to avoid breaking the `configure` test program for
> determining stack direction.
> 
> Using the stack background means that every stack frame has the initial
> tag of zero and variables are tagged with incrementing tags from 1,
> which also makes debugging a bit easier.
> 
> The tag&value offsets are also handled by a backend hook.
> 
> This patch also adds some macros defining how the HWASAN shadow memory
> is stored and how a tag is stored in a pointer.
> 
> 3) For each stack variable, tag and untag the shadow stack on function
>     prologue and epilogue.
> 
> On entry to each function we tag the relevant shadow stack region for
> each stack variable the tag to match the tag added to each pointer for
> that variable.
> 
> This is the first patch where we use the HWASAN shadow space, so we need
> to add in the libhwasan initialisation code that creates this shadow
> memory region into the binary we produce.  This instrumentation is done
> in `compile_file`.
> 
> When exiting a function we need to ensure the shadow stack for this
> function has no remaining tag.  Without clearing the shadow stack area
> for this stack frame, later function calls could get false positives
> when those later function calls check untagged areas (such as parameters
> passed on the stack) against a shadow stack area with left-over tag.
> 
> Hence we ensure that the entire stack frame is cleared on function exit.
> 
> gcc/ChangeLog:
> 
> 2019-11-07  Matthew Malcomson  <matthew.malcomson@arm.com>
> 
> 	* asan.c (hwasan_record_base): New function.
> 	(hwasan_emit_untag_frame): New.
> 	(hwasan_increment_tag): New function.
> 	(hwasan_with_tag): New function.
> 	(hwasan_tag_init): New function.
> 	(initialize_sanitizer_builtins): Define new builtins.
> 	(ATTR_NOTHROW_LIST): New macro.
> 	(hwasan_current_tag): New.
> 	(hwasan_emit_prologue): New.
> 	(hwasan_create_untagged_base): New.
> 	(hwasan_finish_file): New.
> 	(hwasan_sanitize_stack_p): New.
> 	(memory_tagging_p): New.
> 	* asan.h (hwasan_record_base): New declaration.
> 	(hwasan_emit_untag_frame): New.
> 	(hwasan_increment_tag): New declaration.
> 	(hwasan_with_tag): New declaration.
> 	(hwasan_sanitize_stack_p): New declaration.
> 	(hwasan_tag_init): New declaration.
> 	(memory_tagging_p): New declaration.
> 	(HWASAN_TAG_SIZE): New macro.
> 	(HWASAN_TAG_GRANULE_SIZE):New macro.
> 	(HWASAN_SHIFT):New macro.
> 	(HWASAN_SHIFT_RTX):New macro.
> 	(HWASAN_STACK_BACKGROUND):New macro.
> 	(hwasan_finish_file): New.
> 	(hwasan_current_tag): New.
> 	(hwasan_create_untagged_base): New.
> 	(hwasan_emit_prologue): New.
> 	* cfgexpand.c (struct stack_vars_data): Add information to
> 	record hwasan variable stack offsets.
> 	(expand_stack_vars): Ensure variables are offset from a tagged
> 	base. Record offsets for hwasan. Ensure alignment.
> 	(expand_used_vars): Call function to emit prologue, and get
> 	untagging instructions for function exit.
> 	(align_local_variable): Ensure alignment.
> 	(defer_stack_allocation): Ensure all variables are deferred so
> 	they can be handled by `expand_stack_vars`.
> 	(expand_one_stack_var_at): Account for tags in
> 	variables when using HWASAN.
> 	(expand_one_stack_var_1): Pass new argument to
> 	expand_one_stack_var_at.
> 	(init_vars_expansion): Initialise hwasan internal variables when
> 	starting variable expansion.
> 	* doc/tm.texi (TARGET_MEMTAG_GENTAG): Document.
> 	* doc/tm.texi.in (TARGET_MEMTAG_GENTAG): Document.
> 	* explow.c (get_dynamic_stack_base): Parametrise stack vars RTX
> 	base.
> 	* explow.h (get_dynamic_stack_base): New declaration.
> 	* expr.c (force_operand): Use new addtag_force_operand hook.
> 	* target.def (TARGET_MEMTAG_GENTAG, TARGET_MEMTAG_ADDTAG,
> 	TARGET_MEMTAG_ADDTAG_FORCE_OPERAND): Introduce new hooks.
> 	* targhooks.c (default_memtag_gentag, default_memtag_addtag):
> 	New default hooks.
> 	* targhooks.h (default_memtag_gentag, default_memtag_addtag):
> 	Declare new default hooks.
> 	* builtin-types.def (BT_FN_VOID_PTR_UINT8_SIZE): New.
> 	* builtins.def (DEF_SANITIZER_BUILTIN): Enable for HWASAN.
> 	* sanitizer.def (BUILT_IN_HWASAN_INIT): New.
> 	(BUILT_IN_HWASAN_TAG_MEM): New.
> 	* toplev.c (compile_file): Emit libhwasan initialisation.
> 
> 
> 
> ###############     Attachment also inlined for ease of reply    ###############
> 
> 
> diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
> index 4f60bed3fd6e98b47a3a38aea6eba2a7c320da25..91989f4bb1db6ccff564383777757b896645e541 100644
> --- a/config/bootstrap-hwasan.mk
> +++ b/config/bootstrap-hwasan.mk
> @@ -1,7 +1,11 @@
>   # This option enables -fsanitize=hwaddress for stage2 and stage3.
> +# We need to disable random frame tags for bootstrap since the autoconf check
> +# for which direction the stack is growing has UB that a random frame tag
> +# breaks.  Running with a random frame tag gives approx. 50% chance of
> +# bootstrap comparison diff in libiberty/alloca.c.

Here I would like to see what's exactly the problem. I would expect ASAN will
have exactly the same problem? Can you please isolate it and file a bug. I bet
a configure script should not expose an undefined behavior.

>   
> -STAGE2_CFLAGS += -fsanitize=hwaddress
> -STAGE3_CFLAGS += -fsanitize=hwaddress
> +STAGE2_CFLAGS += -fsanitize=hwaddress --param hwasan-random-frame-tag=0
> +STAGE3_CFLAGS += -fsanitize=hwaddress --param hwasan-random-frame-tag=0
>   POSTSTAGE1_LDFLAGS += -fsanitize=hwaddress -static-libhwasan \
>   		      -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/ \
>   		      -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/ \
> diff --git a/gcc/asan.h b/gcc/asan.h
> index 7675f18a84ee3f187ba4cb40db0ce232f3958762..467231f8dad031a6176aeaddb9414f768b2af3fc 100644
> --- a/gcc/asan.h
> +++ b/gcc/asan.h
> @@ -23,6 +23,18 @@ along with GCC; see the file COPYING3.  If not see
>   
>   extern void asan_function_start (void);
>   extern void asan_finish_file (void);
> +extern void hwasan_finish_file (void);
> +extern void hwasan_record_base (rtx);
> +extern uint8_t hwasan_current_tag ();
> +extern void hwasan_increment_tag ();
> +extern rtx hwasan_with_tag (rtx, poly_int64);
> +extern void hwasan_tag_init ();
> +extern rtx hwasan_create_untagged_base (rtx);
> +extern rtx hwasan_extract_tag (rtx tagged_pointer);
> +extern void hwasan_emit_prologue (rtx *, rtx *, poly_int64 *, uint8_t *, size_t);
> +extern rtx_insn *hwasan_emit_untag_frame (rtx, rtx, rtx_insn *);
> +extern bool memory_tagging_p (void);
> +extern bool hwasan_sanitize_stack_p (void);
>   extern rtx_insn *asan_emit_stack_protection (rtx, rtx, unsigned int,
>   					     HOST_WIDE_INT *, tree *, int);
>   extern rtx_insn *asan_emit_allocas_unpoison (rtx, rtx, rtx_insn *);
> @@ -75,6 +87,31 @@ extern hash_set <tree> *asan_used_labels;
>   
>   #define ASAN_USE_AFTER_SCOPE_ATTRIBUTE	"use after scope memory"
>   
> +/* NOTE: The values below define an ABI and are hard-coded to these values in
> +   libhwasan, hence they can't be changed independently here.  */
> +/* How many bits are used to store a tag in a pointer.
> +   HWASAN uses the entire top byte of a pointer (i.e. 8 bits).  */
> +#define HWASAN_TAG_SIZE 8
> +/* Tag Granule of HWASAN shadow stack.
> +   This is the size in real memory that each byte in the shadow memory refers
> +   to.  I.e. if a variable is X bytes long in memory then it's tag in shadow
> +   memory will span X / HWASAN_TAG_GRANULE_SIZE bytes.
> +   Most variables will need to be aligned to this amount since two variables
> +   that are neighbours in memory and share a tag granule would need to share
> +   the same tag (the shared tag granule can only store one tag).  */
> +#define HWASAN_TAG_SHIFT_SIZE 4
> +#define HWASAN_TAG_GRANULE_SIZE (1ULL << HWASAN_TAG_SHIFT_SIZE)
> +/* Define the tag for the stack background.
> +   This defines what tag the stack pointer will be and hence what tag all
> +   variables that are not given special tags are (e.g. spilled registers,
> +   and parameters passed on the stack).  */
> +#define HWASAN_STACK_BACKGROUND 0
> +/* How many bits to shift in order to access the tag bits.
> +   The tag is stored in the top 8 bits of a pointer hence shifting 56 bits will
> +   leave just the tag.  */
> +#define HWASAN_SHIFT 56
> +#define HWASAN_SHIFT_RTX const_int_rtx[MAX_SAVED_CONST_INT + HWASAN_SHIFT]
> +
>   /* Various flags for Asan builtins.  */
>   enum asan_check_flags
>   {
> diff --git a/gcc/asan.c b/gcc/asan.c
> index a731bd490b4e78e916ae20fc9a0249c1fc04daa5..2e79d39785467651c352169dae4551a47d7b3613 100644
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -257,6 +257,9 @@ hash_set<tree> *asan_handled_variables = NULL;
>   
>   hash_set <tree> *asan_used_labels = NULL;
>   
> +static uint8_t tag_offset = 0;
> +static rtx hwasan_base_ptr = NULL_RTX;
> +
>   /* Sets shadow offset to value in string VAL.  */
>   
>   bool
> @@ -1352,6 +1355,21 @@ asan_redzone_buffer::flush_if_full (void)
>       flush_redzone_payload ();
>   }
>   
> +/* Returns whether we are tagging pointers and checking those tags on memory
> +   access.  This is true when checking with either in software or hardware.  */
> +bool
> +memory_tagging_p ()

This one is very commonly used function and I don't like the name much. I would
prefer something like hwasan_p or sanitize_hwasan_p. Something which will have
hwasan in its name ;)

> +{
> +    return sanitize_flags_p (SANITIZE_HWADDRESS);
> +}
> +
> +/* Are we tagging the stack?  */
> +bool
> +hwasan_sanitize_stack_p ()
> +{
> +  return (memory_tagging_p () && HWASAN_STACK);
> +}
> +
>   /* Insert code to protect stack vars.  The prologue sequence should be emitted
>      directly, epilogue sequence returned.  BASE is the register holding the
>      stack base, against which OFFSETS array offsets are relative to, OFFSETS
> @@ -2884,6 +2902,11 @@ initialize_sanitizer_builtins (void)
>       = build_function_type_list (void_type_node, uint64_type_node,
>   				ptr_type_node, NULL_TREE);
>   
> +  tree BT_FN_VOID_PTR_UINT8_SIZE
> +    = build_function_type_list (void_type_node, ptr_type_node,
> +				unsigned_char_type_node, size_type_node,
> +				NULL_TREE);
> +
>     tree BT_FN_BOOL_VPTR_PTR_IX_INT_INT[5];
>     tree BT_FN_IX_CONST_VPTR_INT[5];
>     tree BT_FN_IX_VPTR_IX_INT[5];
> @@ -2934,6 +2957,8 @@ initialize_sanitizer_builtins (void)
>   #define BT_FN_I16_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[4]
>   #define BT_FN_I16_VPTR_I16_INT BT_FN_IX_VPTR_IX_INT[4]
>   #define BT_FN_VOID_VPTR_I16_INT BT_FN_VOID_VPTR_IX_INT[4]
> +#undef ATTR_NOTHROW_LIST
> +#define ATTR_NOTHROW_LIST ECF_NOTHROW
>   #undef ATTR_NOTHROW_LEAF_LIST
>   #define ATTR_NOTHROW_LEAF_LIST ECF_NOTHROW | ECF_LEAF
>   #undef ATTR_TMPURE_NOTHROW_LEAF_LIST
> @@ -3684,4 +3709,189 @@ make_pass_asan_O0 (gcc::context *ctxt)
>     return new pass_asan_O0 (ctxt);
>   }

Here you miss a function comment. There are actually multiple function that miss
a comment.

>   
> +void
> +hwasan_record_base (rtx base)
> +{
> +  /* Initialise tag of the base register.
> +     This has to be done as soon as the stack is getting expanded to ensure
> +     anything emitted with `get_dynamic_stack_base` will use the value set here
> +     instead of using a register without a value.
> +     Especially note that RTL expansion of large aligned values does that.  */
> +  targetm.memtag.gentag (base, virtual_stack_vars_rtx);
> +  hwasan_base_ptr = base;
> +}
> +
> +uint8_t
> +hwasan_current_tag ()
> +{
> +  return tag_offset;
> +}
> +
> +void
> +hwasan_increment_tag ()
> +{
> +  uint8_t tag_bits = HWASAN_TAG_SIZE;
> +  tag_offset = (tag_offset + 1) % (1 << tag_bits);

I know HWASAN_TAG_SIZE is quite fixed value, but maybe you would
like to add a static check (STATIC_ASSERT) that
HWASAN_TAG_SIZE can fit in sizeof(tag_offset)?

> +}
> +
> +rtx
> +hwasan_with_tag (rtx base, poly_int64 offset)
> +{
> +  gcc_assert (tag_offset < (1 << HWASAN_TAG_SIZE));
> +  return targetm.memtag.addtag (base, offset, tag_offset);
> +}
> +
> +/* Clear internal state for the next function.
> +   This function is called before variables on the stack get expanded, in
> +   `init_vars_expansion`.  */
> +void
> +hwasan_tag_init ()
> +{
> +  delete asan_used_labels;
> +  asan_used_labels = NULL;
> +
> +  hwasan_base_ptr = NULL_RTX;
> +  tag_offset = HWASAN_STACK_BACKGROUND + 1;
> +}
> +
> +rtx
> +hwasan_extract_tag (rtx tagged_pointer)
> +{
> +  rtx tag = expand_simple_binop (Pmode,
> +				 LSHIFTRT,
> +				 tagged_pointer,
> +				 HWASAN_SHIFT_RTX,
> +				 NULL_RTX,
> +				 /* unsignedp = */0,
> +				 OPTAB_DIRECT);
> +  return gen_lowpart (QImode, tag);
> +}
> +
> +void
> +hwasan_emit_prologue (rtx *bases,
> +		      rtx *untagged_bases,
> +		      poly_int64 *offsets,
> +		      uint8_t *tags,
> +		      size_t length)
> +{
> +  /* We need untagged base pointers since libhwasan only accepts untagged
> +    pointers in __hwasan_tag_memory.  We need the tagged base pointer to obtain
> +    the base tag for an offset.  */
> +  for (size_t i = 0; (i * 2) + 1 < length; i++)
> +    {
> +      poly_int64 start = offsets[i * 2];
> +      poly_int64 end = offsets[(i * 2) + 1];
> +
> +      poly_int64 bot, top;
> +      if (known_ge (start, end))
> +	{
> +	  top = start;
> +	  bot = end;
> +	}
> +      else
> +	{
> +	  top = end;
> +	  bot = start;
> +	}
> +      poly_int64 size = (top - bot);
> +
> +      /* Can't check that all poly_int64's are aligned, but still nice
> +	 to check those that are compile-time constants.  */
> +      HOST_WIDE_INT tmp;
> +      if (top.is_constant (&tmp))
> +	gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
> +      if (bot.is_constant (&tmp))
> +	gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
> +      if (size.is_constant (&tmp))
> +	gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
> +
> +      rtx ret = init_one_libfunc ("__hwasan_tag_memory");
> +      rtx base_tag = hwasan_extract_tag (bases[i]);
> +      /* In the case of tag overflow we would want modulo wrapping -- which
> +	 should be given from the `plus_constant` in QImode.  */
> +      rtx tag = plus_constant (QImode, base_tag, tags[i]);
> +      emit_library_call (ret,
> +			 LCT_NORMAL,
> +			 VOIDmode,
> +			 plus_constant (ptr_mode, untagged_bases[i], bot),
> +			 ptr_mode,
> +			 tag,
> +			 QImode,
> +			 gen_int_mode (size, ptr_mode),
> +			 ptr_mode);
> +    }
> +}
> +
> +rtx_insn *
> +hwasan_emit_untag_frame (rtx dynamic, rtx vars, rtx_insn *before)
> +{
> +  if (before)
> +    push_to_sequence (before);
> +  else
> +    start_sequence ();
> +
> +  dynamic = convert_memory_address (ptr_mode, dynamic);
> +  vars = convert_memory_address (ptr_mode, vars);
> +
> +  rtx top_rtx;
> +  rtx bot_rtx;
> +  if (STACK_GROWS_DOWNWARD)
> +    {
> +      top_rtx = vars;
> +      bot_rtx = dynamic;
> +    }
> +  else
> +    {
> +      top_rtx = dynamic;
> +      bot_rtx = vars;
> +    }
> +
> +  rtx size_rtx = expand_simple_binop (Pmode, MINUS, top_rtx, bot_rtx,
> +				  NULL_RTX, /* unsignedp = */0, OPTAB_DIRECT);
> +
> +  rtx ret = init_one_libfunc ("__hwasan_tag_memory");
> +  emit_library_call (ret, LCT_NORMAL, VOIDmode,
> +      bot_rtx, ptr_mode,
> +      const0_rtx, QImode,
> +      size_rtx, ptr_mode);
> +
> +  do_pending_stack_adjust ();
> +  rtx_insn *insns = get_insns ();
> +  end_sequence ();
> +  return insns;
> +}
> +
> +rtx
> +hwasan_create_untagged_base (rtx orig_base)
> +{
> +  rtx untagged_base = gen_reg_rtx (Pmode);
> +  rtx tag_mask = gen_int_mode ((1ULL << HWASAN_SHIFT) - 1, Pmode);
> +  untagged_base = expand_binop (Pmode, and_optab,
> +				orig_base, tag_mask,
> +				untagged_base, true, OPTAB_DIRECT);
> +  gcc_assert (untagged_base);
> +  return untagged_base;
> +}
> +
> +/* Needs to be GTY(()), because cgraph_build_static_cdtor may
> +   invoke ggc_collect.  */
> +static GTY(()) tree hwasan_ctor_statements;
> +
> +void
> +hwasan_finish_file (void)
> +{
> +  /* Do not emit constructor initialisation for the kernel.
> +     (the kernel has its own initialisation already).  */
> +  if (flag_sanitize & SANITIZE_KERNEL_HWADDRESS)
> +    return;
> +
> +  /* Avoid instrumenting code in the hwasan constructors/destructors.  */
> +  flag_sanitize &= ~SANITIZE_HWADDRESS;
> +  int priority = MAX_RESERVED_INIT_PRIORITY - 1;
> +  tree fn = builtin_decl_implicit (BUILT_IN_HWASAN_INIT);
> +  append_to_statement_list (build_call_expr (fn, 0), &hwasan_ctor_statements);
> +  cgraph_build_static_cdtor ('I', hwasan_ctor_statements, priority);
> +  flag_sanitize |= SANITIZE_HWADDRESS;
> +}
> +
>   #include "gt-asan.h"
> diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
> index e5c9e063c480d1392b6c2b395ec9d029b6d94209..d05f597b6434f39fe95d4f28dd2ef3ed463dd925 100644
> --- a/gcc/builtin-types.def
> +++ b/gcc/builtin-types.def
> @@ -625,6 +625,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_UINT32_UINT32_PTR,
>   DEF_FUNCTION_TYPE_3 (BT_FN_VOID_SIZE_SIZE_PTR, BT_VOID, BT_SIZE, BT_SIZE,
>   		     BT_PTR)
>   DEF_FUNCTION_TYPE_3 (BT_FN_UINT_UINT_PTR_PTR, BT_UINT, BT_UINT, BT_PTR, BT_PTR)
> +DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_UINT8_SIZE, BT_VOID, BT_PTR, BT_UINT8,
> +		     BT_SIZE)
>   
>   DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR,
>   		     BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR)
> diff --git a/gcc/builtins.def b/gcc/builtins.def
> index d8233f5f760f8426e8ff85473ecda02aa6c2655b..3f621ffdbda0acf5949348882a7c1f3504634666 100644
> --- a/gcc/builtins.def
> +++ b/gcc/builtins.def
> @@ -244,6 +244,7 @@ along with GCC; see the file COPYING3.  If not see
>     DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
>   	       true, true, true, ATTRS, true, \
>   	      (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
> +				| SANITIZE_HWADDRESS \
>   				| SANITIZE_UNDEFINED \
>   				| SANITIZE_UNDEFINED_NONDEFAULT) \
>   	       || flag_sanitize_coverage))
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index c34a53b526b50d49cd73ab5a5c383efc6da5a23e..8e1ea21621e350b3d8779b79b0d7f69d571caa08 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -378,7 +378,14 @@ align_local_variable (tree decl, bool really_expand)
>         if (really_expand)
>   	SET_DECL_ALIGN (decl, align);
>       }
> -  return align / BITS_PER_UNIT;
> +
> +  unsigned int ret_align = align / BITS_PER_UNIT;
> +
> +  if (hwasan_sanitize_stack_p ())
> +    ret_align = ret_align > HWASAN_TAG_GRANULE_SIZE
> +      ? ret_align
> +      : HWASAN_TAG_GRANULE_SIZE;

This can be simplified into something like:
if (hwasan_sanitize_stack_p ())
   ret_align = MIN (HWASAN_TAG_GRANULE_SIZE, ret_align);

Martin

> +  return ret_align;
>   }
>   
>   /* Align given offset BASE with ALIGN.  Truncate up if ALIGN_UP is true,
> @@ -986,7 +993,7 @@ dump_stack_var_partition (void)
>   
>   static void
>   expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
> -			 poly_int64 offset)
> +			 poly_int64 offset, rtx stack_base)
>   {
>     unsigned align;
>     rtx x;
> @@ -994,7 +1001,11 @@ expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
>     /* If this fails, we've overflowed the stack frame.  Error nicely?  */
>     gcc_assert (known_eq (offset, trunc_int_for_mode (offset, Pmode)));
>   
> -  x = plus_constant (Pmode, base, offset);
> +  if (hwasan_sanitize_stack_p ())
> +    x = hwasan_with_tag (base, offset);
> +  else
> +    x = plus_constant (Pmode, base, offset);
> +
>     x = gen_rtx_MEM (TREE_CODE (decl) == SSA_NAME
>   		   ? TYPE_MODE (TREE_TYPE (decl))
>   		   : DECL_MODE (SSAVAR (decl)), x);
> @@ -1004,7 +1015,7 @@ expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
>         /* Set alignment we actually gave this decl if it isn't an SSA name.
>            If it is we generate stack slots only accidentally so it isn't as
>   	 important, we'll simply use the alignment that is already set.  */
> -      if (base == virtual_stack_vars_rtx)
> +      if (base == stack_base)
>   	offset -= frame_phase;
>         align = known_alignment (offset);
>         align *= BITS_PER_UNIT;
> @@ -1030,9 +1041,19 @@ public:
>        The vector is in reversed, highest offset pairs come first.  */
>     auto_vec<HOST_WIDE_INT> asan_vec;
>   
> +  /* HWASAN records the poly_int64 so it can handle any stack variable.  */
> +  auto_vec<poly_int64> hwasan_vec;
> +  auto_vec<rtx> hwasan_untagged_base_vec;
> +  auto_vec<rtx> hwasan_base_vec;
> +
>     /* Vector of partition representative decls in between the paddings.  */
>     auto_vec<tree> asan_decl_vec;
>   
> +  /* Vector of tag offsets representing the tag for each stack variable.
> +     Each offset determines the difference between the randomly generated
> +     tag for the current frame and the tag for this stack variable.  */
> +  auto_vec<uint8_t> hwasan_tag_vec;
> +
>     /* Base pseudo register for Address Sanitizer protected automatic vars.  */
>     rtx asan_base;
>   
> @@ -1050,6 +1071,7 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
>     size_t si, i, j, n = stack_vars_num;
>     poly_uint64 large_size = 0, large_alloc = 0;
>     rtx large_base = NULL;
> +  rtx large_untagged_base = NULL;
>     unsigned large_align = 0;
>     bool large_allocation_done = false;
>     tree decl;
> @@ -1096,11 +1118,17 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
>   	}
>       }
>   
> +  if (hwasan_sanitize_stack_p () && data->asan_base == NULL)
> +    {
> +      data->asan_base = gen_reg_rtx (Pmode);
> +      hwasan_record_base (data->asan_base);
> +    }
> +
>     for (si = 0; si < n; ++si)
>       {
>         rtx base;
>         unsigned base_align, alignb;
> -      poly_int64 offset;
> +      poly_int64 offset = 0;
>   
>         i = stack_vars_sorted[si];
>   
> @@ -1121,10 +1149,36 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
>         if (pred && !pred (i))
>   	continue;
>   
> +      base = hwasan_sanitize_stack_p ()
> +	? data->asan_base
> +	: virtual_stack_vars_rtx;
>         alignb = stack_vars[i].alignb;
>         if (alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
>   	{
> -	  base = virtual_stack_vars_rtx;
> +	  if (hwasan_sanitize_stack_p ())
> +	    {
> +	      /* Allocate zero bytes to take advantage of the
> +		 alloc_stack_frame_space logic of ensuring the stack is aligned
> +		 despite having poly_int64's to deal with.
> +
> +		 There must be no tag granule "shared" between different
> +		 objects.  This means that no HWASAN_TAG_GRANULE_SIZE byte
> +		 chunk can have more than one object in it.
> +
> +		 We ensure this by forcing the end of the last bit of data to
> +		 be aligned to HWASAN_TAG_GRANULE_SIZE bytes here, and setting
> +		 the start of each variable to be aligned to
> +		 HWASAN_TAG_GRANULE_SIZE bytes in `align_local_variable`.
> +
> +		 We can't align just one of the start or end, since there are
> +		 untagged things stored on the stack that we have no control on
> +		 the alignment and these can't share a tag granule with a
> +		 tagged variable.  */
> +	      gcc_assert (stack_vars[i].alignb >= HWASAN_TAG_GRANULE_SIZE);
> +	      offset = alloc_stack_frame_space (0, HWASAN_TAG_GRANULE_SIZE);
> +	      data->hwasan_vec.safe_push (offset);
> +	      data->hwasan_untagged_base_vec.safe_push (virtual_stack_vars_rtx);
> +	    }
>   	  /* ASAN description strings don't yet have a syntax for expressing
>   	     polynomial offsets.  */
>   	  HOST_WIDE_INT prev_offset;
> @@ -1204,6 +1258,9 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
>   	      offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
>   	      base_align = crtl->max_used_stack_slot_alignment;
>   	    }
> +
> +	  if (hwasan_sanitize_stack_p ())
> +	    data->hwasan_vec.safe_push (offset);
>   	}
>         else
>   	{
> @@ -1223,14 +1280,31 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
>   	      loffset = alloc_stack_frame_space
>   		(rtx_to_poly_int64 (large_allocsize),
>   		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
> -	      large_base = get_dynamic_stack_base (loffset, large_align);
> +	      large_base = get_dynamic_stack_base (loffset, large_align, base);
>   	      large_allocation_done = true;
>   	    }
> -	  gcc_assert (large_base != NULL);
>   
> +	  gcc_assert (large_base != NULL);
>   	  large_alloc = aligned_upper_bound (large_alloc, alignb);
> +	  if (hwasan_sanitize_stack_p ())
> +	    {
> +	      /* An object with a large alignment requirement means that the
> +		 alignment requirement is greater than the required alignment
> +		 for tags.  */
> +	      if (!large_untagged_base)
> +		large_untagged_base = hwasan_create_untagged_base (large_base);
> +	      data->hwasan_vec.safe_push (large_alloc);
> +	      data->hwasan_untagged_base_vec.safe_push (large_untagged_base);
> +	    }
>   	  offset = large_alloc;
>   	  large_alloc += stack_vars[i].size;
> +	  if (hwasan_sanitize_stack_p ())
> +	    {
> +	      /* Ensure the end of the variable is also aligned correctly.  */
> +	      poly_int64 align_again =
> +		aligned_upper_bound (large_alloc, HWASAN_TAG_GRANULE_SIZE);
> +	      data->hwasan_vec.safe_push (align_again);
> +	    }
>   
>   	  base = large_base;
>   	  base_align = large_align;
> @@ -1242,7 +1316,21 @@ expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
>   	{
>   	  expand_one_stack_var_at (stack_vars[j].decl,
>   				   base, base_align,
> -				   offset);
> +				   offset,
> +				   hwasan_sanitize_stack_p ()
> +				   ? data->asan_base
> +				   : virtual_stack_vars_rtx);
> +	}
> +
> +      if (hwasan_sanitize_stack_p ())
> +	{
> +	  /* Record the tag for this object in `data` so the prologue knows
> +	     what tag to put in the shadow memory during cfgexpand.c.
> +	     Then increment the tag so that the next object has a different
> +	     tag to this object.  */
> +	  data->hwasan_base_vec.safe_push (base);
> +	  data->hwasan_tag_vec.safe_push (hwasan_current_tag ());
> +	  hwasan_increment_tag ();
>   	}
>       }
>   
> @@ -1339,7 +1427,8 @@ expand_one_stack_var_1 (tree var)
>     offset = alloc_stack_frame_space (size, byte_align);
>   
>     expand_one_stack_var_at (var, virtual_stack_vars_rtx,
> -			   crtl->max_used_stack_slot_alignment, offset);
> +			   crtl->max_used_stack_slot_alignment, offset,
> +			   virtual_stack_vars_rtx);
>   }
>   
>   /* Wrapper for expand_one_stack_var_1 that checks SSA_NAMEs are
> @@ -1552,8 +1641,13 @@ defer_stack_allocation (tree var, bool toplevel)
>   
>     /* If stack protection is enabled, *all* stack variables must be deferred,
>        so that we can re-order the strings to the top of the frame.
> -     Similarly for Address Sanitizer.  */
> -  if (flag_stack_protect || asan_sanitize_stack_p ())
> +     Similarly for Address Sanitizer.
> +     When tagging memory we defer all stack variables so we can handle them in
> +     one place (handle here meaning ensure they are aligned and record
> +     information on each variables position in the stack).  */
> +  if (flag_stack_protect
> +      || asan_sanitize_stack_p ()
> +      || hwasan_sanitize_stack_p ())
>       return true;
>   
>     unsigned int align = TREE_CODE (var) == SSA_NAME
> @@ -1938,6 +2032,8 @@ init_vars_expansion (void)
>     /* Initialize local stack smashing state.  */
>     has_protected_decls = false;
>     has_short_buffer = false;
> +  if (hwasan_sanitize_stack_p ())
> +    hwasan_tag_init ();
>   }
>   
>   /* Free up stack variable graph data.  */
> @@ -2297,12 +2393,27 @@ expand_used_vars (void)
>   	}
>   
>         expand_stack_vars (NULL, &data);
> +
> +      if (hwasan_sanitize_stack_p ())
> +	hwasan_emit_prologue (data.hwasan_base_vec.address (),
> +			      data.hwasan_untagged_base_vec.address (),
> +			      data.hwasan_vec.address (),
> +			      data.hwasan_tag_vec.address (),
> +			      data.hwasan_vec.length ());
>       }
>   
>     if (asan_sanitize_allocas_p () && cfun->calls_alloca)
>       var_end_seq = asan_emit_allocas_unpoison (virtual_stack_dynamic_rtx,
>   					      virtual_stack_vars_rtx,
>   					      var_end_seq);
> +  /* Here we clear tags fro the entire frame of this function.
> +     We need to clear tags of *something* if we have tagged either local
> +     variables or alloca objects.  */
> +  else if (hwasan_sanitize_stack_p ()
> +	   && (cfun->calls_alloca || stack_vars_num > 0))
> +    var_end_seq = hwasan_emit_untag_frame (virtual_stack_dynamic_rtx,
> +					   virtual_stack_vars_rtx,
> +					   var_end_seq);
>   
>     fini_vars_expansion ();
>   
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index e842b734c9c20253986880ba3622a8f692d3ca88..718d2e8aac56553bdc30c592fe70fa10aa911736 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -2982,6 +2982,31 @@ A target hook which lets a backend compute the set of pressure classes to  be us
>   True if backend architecture naturally supports ignoring the top byte of pointers.  This feature means that -fsanitize=hwaddress can work.
>   @end deftypefn
>   
> +@deftypefn {Target Hook} rtx TARGET_MEMTAG_ADDTAG (rtx @var{base}, poly_int64 @var{addr_offset}, uint8_t @var{tag_offset})
> +Emit an RTX representing BASE offset in value by ADDR_OFFSET and in tag by TAG_OFFSET.
> +The resulting RTX must either be a valid memory address or be able to get
> +put into an operand with force_operand.  If overridden the more common case
> +is that we force this into an operand using the backend hook
> +"addtag_force_operand" that is called in force_operand.
> +
> +It is expected that that "addtag_force_operand" recognises the RTX
> +generated by "addtag" and emits code to force that RTX into an operand.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} rtx TARGET_MEMTAG_ADDTAG_FORCE_OPERAND (rtx @var{oper}, rtx @var{target})
> +If the RTL expression OPER is of the form generated by targetm.memtag.addtag,
> +then emit instructions to move the value into an operand (i.e. for
> +force_operand).
> +TARGET is an RTX suggestion of where to generate the value.
> +This hook is most often implemented by emitting instructions to put the
> +expression into a pseudo register, then returning that pseudo register.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} void TARGET_MEMTAG_GENTAG (rtx @var{base}, rtx @var{untagged})
> +Set the BASE argument to UNTAGGED with some random tag.
> +This function is used to generate a tagged base for the current stack frame.
> +@end deftypefn
> +
>   @node Stack and Calling
>   @section Stack Layout and Calling Conventions
>   @cindex calling conventions
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 6136ac1a5fe0c0980d5b5123b67b102a1b1e0bcc..3d3761dbc097e6f73f4f4f937aa713b93872a4a2 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -2379,6 +2379,12 @@ in the reload pass.
>   
>   @hook TARGET_MEMTAG_CAN_TAG_ADDRESSES
>   
> +@hook TARGET_MEMTAG_ADDTAG
> +
> +@hook TARGET_MEMTAG_ADDTAG_FORCE_OPERAND
> +
> +@hook TARGET_MEMTAG_GENTAG
> +
>   @node Stack and Calling
>   @section Stack Layout and Calling Conventions
>   @cindex calling conventions
> diff --git a/gcc/explow.h b/gcc/explow.h
> index 5110ad82d6a024fda1d3a3eaf80de40c5e6ad3b6..333948e0c69a1b1132e9a1d06707dc63f1226262 100644
> --- a/gcc/explow.h
> +++ b/gcc/explow.h
> @@ -102,7 +102,7 @@ extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned,
>   extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *);
>   
>   /* Returns the address of the dynamic stack space without allocating it.  */
> -extern rtx get_dynamic_stack_base (poly_int64, unsigned);
> +extern rtx get_dynamic_stack_base (poly_int64, unsigned, rtx);
>   
>   /* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
>   extern rtx align_dynamic_address (rtx, unsigned);
> diff --git a/gcc/explow.c b/gcc/explow.c
> index 7eb854bca4a6dcc5b15e5c42df1a5e88a19f2464..2728f7a4b1ee8ed0d20287b716a3d0ad5a97d84b 100644
> --- a/gcc/explow.c
> +++ b/gcc/explow.c
> @@ -1577,10 +1577,14 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align,
>      OFFSET is the offset of the area into the virtual stack vars area.
>   
>      REQUIRED_ALIGN is the alignment (in bits) required for the region
> -   of memory.  */
> +   of memory.
> +
> +   BASE is the rtx of the base of this virtual stack vars area.
> +   The only time this is not `virtual_stack_vars_rtx` is when tagging pointers
> +   on the stack.  */
>   
>   rtx
> -get_dynamic_stack_base (poly_int64 offset, unsigned required_align)
> +get_dynamic_stack_base (poly_int64 offset, unsigned required_align, rtx base)
>   {
>     rtx target;
>   
> @@ -1588,7 +1592,7 @@ get_dynamic_stack_base (poly_int64 offset, unsigned required_align)
>       crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
>   
>     target = gen_reg_rtx (Pmode);
> -  emit_move_insn (target, virtual_stack_vars_rtx);
> +  emit_move_insn (target, base);
>     target = expand_binop (Pmode, add_optab, target,
>   			 gen_int_mode (offset, Pmode),
>   			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 476c6865f20828fc68f455e70d4874eaabd9d08d..24d011e698af0dbf3635ba5f9d8275376a124bf4 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -7500,6 +7500,13 @@ force_operand (rtx value, rtx target)
>         return subtarget;
>       }
>   
> +  if (targetm.memtag.addtag_force_operand)
> +    {
> +      rtx ret = targetm.memtag.addtag_force_operand (value, target);
> +      if (ret)
> +	return ret;
> +    }
> +
>     if (ARITHMETIC_P (value))
>       {
>         op2 = XEXP (value, 1);
> diff --git a/gcc/sanitizer.def b/gcc/sanitizer.def
> index 374d15007d868363d9b4fbf467e1e462abbca61a..7bd50715f24a2cb154b578e2abdea4e8fcdb2107 100644
> --- a/gcc/sanitizer.def
> +++ b/gcc/sanitizer.def
> @@ -180,6 +180,12 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_POINTER_COMPARE, "__sanitizer_ptr_cmp",
>   DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_POINTER_SUBTRACT, "__sanitizer_ptr_sub",
>   		      BT_FN_VOID_PTR_PTRMODE, ATTR_NOTHROW_LEAF_LIST)
>   
> +/* Hardware Address Sanitizer.  */
> +DEF_SANITIZER_BUILTIN(BUILT_IN_HWASAN_INIT, "__hwasan_init",
> +		      BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_HWASAN_TAG_MEM, "__hwasan_tag_memory",
> +		      BT_FN_VOID_PTR_UINT8_SIZE, ATTR_NOTHROW_LIST)
> +
>   /* Thread Sanitizer */
>   DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_INIT, "__tsan_init",
>   		      BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
> diff --git a/gcc/target.def b/gcc/target.def
> index ad16a151b6af9b1ee13918c8f2980280d75b1d90..3c533acbe3965cdb0870621e364a009353f72c2e 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -6751,6 +6751,37 @@ DEFHOOK
>    pointers.  This feature means that -fsanitize=hwaddress can work.",
>    bool, (), default_memtag_can_tag_addresses)
>   
> +DEFHOOK
> +(addtag,
> + "Emit an RTX representing BASE offset in value by ADDR_OFFSET and in tag by\
> + TAG_OFFSET.\n\
> +The resulting RTX must either be a valid memory address or be able to get\n\
> +put into an operand with force_operand.  If overridden the more common case\n\
> +is that we force this into an operand using the backend hook\n\
> +\"addtag_force_operand\" that is called in force_operand.\n\
> +\n\
> +It is expected that that \"addtag_force_operand\" recognises the RTX\n\
> +generated by \"addtag\" and emits code to force that RTX into an operand.",
> +rtx, (rtx base, poly_int64 addr_offset, uint8_t tag_offset),
> +default_memtag_addtag)
> +
> +DEFHOOK
> +(addtag_force_operand,
> + "If the RTL expression OPER is of the form generated by targetm.memtag.addtag,\n\
> +then emit instructions to move the value into an operand (i.e. for\n\
> +force_operand).\n\
> +TARGET is an RTX suggestion of where to generate the value.\n\
> +This hook is most often implemented by emitting instructions to put the\n\
> +expression into a pseudo register, then returning that pseudo register.",
> +rtx, (rtx oper, rtx target), NULL)
> +
> +DEFHOOK
> +(gentag,
> + "Set the BASE argument to UNTAGGED with some random tag.\n\
> +This function is used to generate a tagged base for the current stack frame.",
> +  void, (rtx base, rtx untagged),
> +  default_memtag_gentag)
> +
>   HOOK_VECTOR_END (memtag)
>   #undef HOOK_PREFIX
>   #define HOOK_PREFIX "TARGET_"
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index 94e865259f35e46e26f6b4763c5e2f9dc9ed1b83..b0e32102acacdf7a64f1e3d314a966d1d3f062c7 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -283,4 +283,6 @@ extern bool speculation_safe_value_not_needed (bool);
>   extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx);
>   
>   extern bool default_memtag_can_tag_addresses ();
> +extern void default_memtag_gentag (rtx, rtx);
> +extern rtx default_memtag_addtag (rtx, poly_int64, uint8_t);
>   #endif /* GCC_TARGHOOKS_H */
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 6e9e877c32e2c8705056bdf0ce2b1b6f125d93c3..cd9f98fc800d7232ead50b03f951364f76c01adc 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
>   #include "varasm.h"
>   #include "flags.h"
>   #include "explow.h"
> +#include "expmed.h"
>   #include "calls.h"
>   #include "expr.h"
>   #include "output.h"
> @@ -84,6 +85,8 @@ along with GCC; see the file COPYING3.  If not see
>   #include "langhooks.h"
>   #include "sbitmap.h"
>   #include "function-abi.h"
> +#include "attribs.h"
> +#include "asan.h"
>   
>   bool
>   default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
> @@ -2371,4 +2374,78 @@ default_memtag_can_tag_addresses ()
>     return false;
>   }
>   
> +void
> +default_memtag_gentag (rtx base, rtx untagged)
> +{
> +  gcc_assert (HWASAN_STACK);

This can be STATIC_ASSERT.

> +  if (HWASAN_RANDOM_FRAME_TAG)
> +    {
> +    rtx temp = gen_reg_rtx (QImode);
> +    rtx ret = init_one_libfunc ("__hwasan_generate_tag");
> +    rtx new_tag = emit_library_call_value (ret, temp, LCT_NORMAL, QImode);
> +    emit_move_insn (base, untagged);
> +    /* We know that `base` is not the stack pointer, since we never want to put
> +      a randomly generated tag into the stack pointer.  Hence we can use
> +      `store_bit_field` which on aarch64 generates a `bfi` which can not act on
> +      the stack pointer.  */
> +    store_bit_field (base, 8, 56, 0, 0, QImode, new_tag, false);
> +    }
> +  else
> +    {
> +      /* NOTE: The kernel API does not have __hwasan_generate_tag exposed.
> +	 In the future we may add the option emit random tags with inline
> +	 instrumentation instead of function calls.  This would be the same
> +	 between the kernel and userland.  */
> +      emit_move_insn (base, untagged);
> +    }
> +}
> +
> +rtx
> +default_memtag_addtag (rtx base, poly_int64 offset, uint8_t tag_offset)
> +{
> +  /* Need to look into what the most efficient code sequence is.
> +     This is a code sequence that would be emitted *many* times, so we
> +     want it as small as possible.
> +
> +     If the tag offset is greater that (1 << 7) then the most efficient
> +     sequence here would give UB from signed integer overflow in the
> +     poly_int64.  Hence in that case we emit the slightly less efficient
> +     sequence.
> +
> +     There are two places where tag overflow is a question:
> +       - Tagging the shadow stack.
> +	  (both tagging and untagging).
> +       - Tagging addressable pointers.
> +
> +     We need to ensure both behaviours are the same (i.e. that the tag that
> +     ends up in a pointer after "overflowing" the tag bits with a tag addition
> +     is the same that ends up in the shadow space).
> +
> +     The aim is that the behaviour of tag addition should follow modulo
> +     wrapping in both instances.
> +
> +     The libhwasan code doesn't have any path that increments a pointers tag,
> +     which means it has no opinion on what happens when a tag increment
> +     overflows (and hence we can choose our own behaviour).  */
> +
> +  if (tag_offset < (1 << 7))
> +    {
> +      offset += ((uint64_t)tag_offset << HWASAN_SHIFT);
> +      return plus_constant (Pmode, base, offset);
> +    }
> +  else
> +    {
> +      /* This is the fallback, it would be nice if it had less instructions,
> +	 but we can look for cleverer ways later.  */
> +      uint64_t tag_mask = ~(0xFFUL << HWASAN_SHIFT);
> +      rtx untagged_base = gen_rtx_AND (Pmode, GEN_INT (tag_mask), base);
> +      rtx new_addr = plus_constant (Pmode, untagged_base, offset);
> +
> +      rtx original_tag_value = gen_rtx_LSHIFTRT (Pmode, base, GEN_INT (HWASAN_SHIFT));
> +      rtx new_tag_value = plus_constant (Pmode, original_tag_value, tag_offset);
> +      rtx new_tag = gen_rtx_ASHIFT (Pmode, new_tag_value, GEN_INT (HWASAN_SHIFT));
> +      return gen_rtx_IOR (Pmode, new_addr, new_tag);
> +    }
> +}
> +
>   #include "gt-targhooks.h"
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index ab67384249a3437ac37f42f741ed516884677f9f..7bd75548d2aebb3415ac85ec40ad25e5ca794094 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -508,6 +508,9 @@ compile_file (void)
>         if (flag_sanitize & SANITIZE_THREAD)
>   	tsan_finish_file ();
>   
> +      if (flag_sanitize & SANITIZE_HWADDRESS)
> +	hwasan_finish_file ();
> +
>         omp_finish_file ();
>   
>         hsa_output_brig ();
>
Matthew Malcomson Nov. 20, 2019, 2:37 p.m. UTC | #2
Hi Martin,

Thanks for the review,
I'll get working on your comments now, but since I really enjoyed 
finding this bug in ./configure when I hit it I thought I'd answer this 
right away.


On 20/11/2019 14:02, Martin Liška wrote:
> On 11/7/19 7:37 PM, Matthew Malcomson wrote:
>>
>> diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
>> index 
>> 4f60bed3fd6e98b47a3a38aea6eba2a7c320da25..91989f4bb1db6ccff564383777757b896645e541 
>> 100644
>> --- a/config/bootstrap-hwasan.mk
>> +++ b/config/bootstrap-hwasan.mk
>> @@ -1,7 +1,11 @@
>>   # This option enables -fsanitize=hwaddress for stage2 and stage3.
>> +# We need to disable random frame tags for bootstrap since the 
>> autoconf check
>> +# for which direction the stack is growing has UB that a random frame 
>> tag
>> +# breaks.  Running with a random frame tag gives approx. 50% chance of
>> +# bootstrap comparison diff in libiberty/alloca.c.
> 
> Here I would like to see what's exactly the problem. I would expect ASAN 
> will
> have exactly the same problem? Can you please isolate it and file a bug. 
> I bet
> a configure script should not expose an undefined behavior.
> 

The configure problem is this snippet below:


find_stack_direction ()
{
   static char *addr = 0;
   auto char dummy;
   if (addr == 0)
     {
       addr = &dummy;
       return find_stack_direction ();
     }
   else
     return (&dummy > addr) ? 1 : -1;
}
main ()
{
   exit (find_stack_direction() < 0);
}


configure uses this to determine the direction that the stack grows.

`find_stack_direction` compares the address of two different objects and 
uses that to make a decision.

With HWASAN random frame tags the answer to the comparison is mostly 
determined by what random tag was assigned to the object in each frame, 
rather than the memory layout of the stack -- which means this configure 
test program can end up getting different answers on different runs.

This is not a problem for ASAN since ASAN does not store tags in the 
pointers of variables.


You're right -- I should file a bug on that for configure.

For reference the UB clause in the standard is 6.5.8 #5 (relational 
operators) where there's a sentence at the end saying "In all other 
cases, the behaviour is undefined".  Essentially, this program is 
comparing the address of two different objects on the stack, and that's 
not allowed.
Martin Liška Nov. 20, 2019, 2:46 p.m. UTC | #3
On 11/20/19 3:37 PM, Matthew Malcomson wrote:
> Hi Martin,
> 
> Thanks for the review,

You're welcome.

> I'll get working on your comments now, but since I really enjoyed
> finding this bug in ./configure when I hit it I thought I'd answer this
> right away.

Heh :)

> 
> 
> On 20/11/2019 14:02, Martin Liška wrote:
>> On 11/7/19 7:37 PM, Matthew Malcomson wrote:
>>>
>>> diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
>>> index
>>> 4f60bed3fd6e98b47a3a38aea6eba2a7c320da25..91989f4bb1db6ccff564383777757b896645e541
>>> 100644
>>> --- a/config/bootstrap-hwasan.mk
>>> +++ b/config/bootstrap-hwasan.mk
>>> @@ -1,7 +1,11 @@
>>>    # This option enables -fsanitize=hwaddress for stage2 and stage3.
>>> +# We need to disable random frame tags for bootstrap since the
>>> autoconf check
>>> +# for which direction the stack is growing has UB that a random frame
>>> tag
>>> +# breaks.  Running with a random frame tag gives approx. 50% chance of
>>> +# bootstrap comparison diff in libiberty/alloca.c.
>>
>> Here I would like to see what's exactly the problem. I would expect ASAN
>> will
>> have exactly the same problem? Can you please isolate it and file a bug.
>> I bet
>> a configure script should not expose an undefined behavior.
>>
> 
> The configure problem is this snippet below:
> 
> 
> find_stack_direction ()
> {
>     static char *addr = 0;
>     auto char dummy;
>     if (addr == 0)
>       {
>         addr = &dummy;
>         return find_stack_direction ();
>       }
>     else
>       return (&dummy > addr) ? 1 : -1;
> }
> main ()
> {
>     exit (find_stack_direction() < 0);
> }
> 
> 
> configure uses this to determine the direction that the stack grows.
> 
> `find_stack_direction` compares the address of two different objects and
> uses that to make a decision.
> 
> With HWASAN random frame tags the answer to the comparison is mostly
> determined by what random tag was assigned to the object in each frame,
> rather than the memory layout of the stack -- which means this configure
> test program can end up getting different answers on different runs.
> 
> This is not a problem for ASAN since ASAN does not store tags in the
> pointers of variables.
> 
> 
> You're right -- I should file a bug on that for configure.
> 
> For reference the UB clause in the standard is 6.5.8 #5 (relational
> operators) where there's a sentence at the end saying "In all other
> cases, the behaviour is undefined".  Essentially, this program is
> comparing the address of two different objects on the stack, and that's
> not allowed.

Well, to be honest, this is quite cute violation of the standard. I would
have written exactly the same code for the stack direction direction. I understand
that a top byte will (a.k.a. tag) make the randomness.

Do you have an idea how can we rewrite the check?
Thanks,
Martin
Matthew Malcomson Nov. 20, 2019, 2:53 p.m. UTC | #4
On 20/11/2019 14:46, Martin Liška wrote:
> On 11/20/19 3:37 PM, Matthew Malcomson wrote:
>> Hi Martin,
>>
>> Thanks for the review,
> 
> You're welcome.
> 
>> I'll get working on your comments now, but since I really enjoyed
>> finding this bug in ./configure when I hit it I thought I'd answer this
>> right away.
> 
> Heh :)
> 
>>
>>
>> On 20/11/2019 14:02, Martin Liška wrote:
>>> On 11/7/19 7:37 PM, Matthew Malcomson wrote:
>>>>
>>>> diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
>>>> index
>>>> 4f60bed3fd6e98b47a3a38aea6eba2a7c320da25..91989f4bb1db6ccff564383777757b896645e541 
>>>>
>>>> 100644
>>>> --- a/config/bootstrap-hwasan.mk
>>>> +++ b/config/bootstrap-hwasan.mk
>>>> @@ -1,7 +1,11 @@
>>>>    # This option enables -fsanitize=hwaddress for stage2 and stage3.
>>>> +# We need to disable random frame tags for bootstrap since the
>>>> autoconf check
>>>> +# for which direction the stack is growing has UB that a random frame
>>>> tag
>>>> +# breaks.  Running with a random frame tag gives approx. 50% chance of
>>>> +# bootstrap comparison diff in libiberty/alloca.c.
>>>
>>> Here I would like to see what's exactly the problem. I would expect ASAN
>>> will
>>> have exactly the same problem? Can you please isolate it and file a bug.
>>> I bet
>>> a configure script should not expose an undefined behavior.
>>>
>>
>> The configure problem is this snippet below:
>>
>>
>> find_stack_direction ()
>> {
>>     static char *addr = 0;
>>     auto char dummy;
>>     if (addr == 0)
>>       {
>>         addr = &dummy;
>>         return find_stack_direction ();
>>       }
>>     else
>>       return (&dummy > addr) ? 1 : -1;
>> }
>> main ()
>> {
>>     exit (find_stack_direction() < 0);
>> }
>>
>>
>> configure uses this to determine the direction that the stack grows.
>>
>> `find_stack_direction` compares the address of two different objects and
>> uses that to make a decision.
>>
>> With HWASAN random frame tags the answer to the comparison is mostly
>> determined by what random tag was assigned to the object in each frame,
>> rather than the memory layout of the stack -- which means this configure
>> test program can end up getting different answers on different runs.
>>
>> This is not a problem for ASAN since ASAN does not store tags in the
>> pointers of variables.
>>
>>
>> You're right -- I should file a bug on that for configure.
>>
>> For reference the UB clause in the standard is 6.5.8 #5 (relational
>> operators) where there's a sentence at the end saying "In all other
>> cases, the behaviour is undefined".  Essentially, this program is
>> comparing the address of two different objects on the stack, and that's
>> not allowed.
> 
> Well, to be honest, this is quite cute violation of the standard. I would
> have written exactly the same code for the stack direction direction. I 
> understand
> that a top byte will (a.k.a. tag) make the randomness.
> 
> Do you have an idea how can we rewrite the check?
> Thanks,
> Martin
> 

I don't have much of a plan.

The most promising lead I have is that libiberty/alloca.c has a similar 
functionality but with macros to account for a special case.

Instead of just using '&' it uses a macro `ADDRESS_FUNCTION`.
I can use that macro to ensure the libiberty/alloca.c function could 
handle tags, but I'm not sure that architecture specific conditions will 
neatly fit into autoconf.
Joseph Myers Nov. 20, 2019, 6:06 p.m. UTC | #5
On Wed, 20 Nov 2019, Matthew Malcomson wrote:

> I don't have much of a plan.
> 
> The most promising lead I have is that libiberty/alloca.c has a similar 
> functionality but with macros to account for a special case.

The comment in libiberty/aclocal.m4 is:

# We always want a C version of alloca() compiled into libiberty,
# because native-compiler support for the real alloca is so !@#$%
# unreliable that GCC has decided to use it only when being compiled
# by GCC.  This is the part of AC_FUNC_ALLOCA that calculates the
# information alloca.c needs.

This is the sort of thing that was relevant when GCC was built on lots of 
proprietary Unix systems with their system C compilers.  Most of those 
proprietary Unix systems are long obsolete and are no longer supported by 
GCC.  On the limited remaining set of host systems supported by GCC, there 
are a limited number of C++ compilers used to build most of the host code 
in GCC that is C++, and presumably a limited number of accompanying C 
compilers used to build libiberty.  Now, libiberty is used by binutils 
more of which is written in C, but I doubt that expands the range of 
relevant host C compilers; the set of host OSes used nowadays is simply 
much smaller than it was when this code was written.

So I'd suggest either completely eliminating C alloca from libiberty, or 
at least not building it at all when building with a compiler that defines 
__GNUC__.

Patch
diff mbox series

diff --git a/config/bootstrap-hwasan.mk b/config/bootstrap-hwasan.mk
index 4f60bed3fd6e98b47a3a38aea6eba2a7c320da25..91989f4bb1db6ccff564383777757b896645e541 100644
--- a/config/bootstrap-hwasan.mk
+++ b/config/bootstrap-hwasan.mk
@@ -1,7 +1,11 @@ 
 # This option enables -fsanitize=hwaddress for stage2 and stage3.
+# We need to disable random frame tags for bootstrap since the autoconf check
+# for which direction the stack is growing has UB that a random frame tag
+# breaks.  Running with a random frame tag gives approx. 50% chance of
+# bootstrap comparison diff in libiberty/alloca.c.
 
-STAGE2_CFLAGS += -fsanitize=hwaddress
-STAGE3_CFLAGS += -fsanitize=hwaddress
+STAGE2_CFLAGS += -fsanitize=hwaddress --param hwasan-random-frame-tag=0
+STAGE3_CFLAGS += -fsanitize=hwaddress --param hwasan-random-frame-tag=0
 POSTSTAGE1_LDFLAGS += -fsanitize=hwaddress -static-libhwasan \
 		      -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/ \
 		      -B$$r/prev-$(TARGET_SUBDIR)/libsanitizer/hwasan/ \
diff --git a/gcc/asan.h b/gcc/asan.h
index 7675f18a84ee3f187ba4cb40db0ce232f3958762..467231f8dad031a6176aeaddb9414f768b2af3fc 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -23,6 +23,18 @@  along with GCC; see the file COPYING3.  If not see
 
 extern void asan_function_start (void);
 extern void asan_finish_file (void);
+extern void hwasan_finish_file (void);
+extern void hwasan_record_base (rtx);
+extern uint8_t hwasan_current_tag ();
+extern void hwasan_increment_tag ();
+extern rtx hwasan_with_tag (rtx, poly_int64);
+extern void hwasan_tag_init ();
+extern rtx hwasan_create_untagged_base (rtx);
+extern rtx hwasan_extract_tag (rtx tagged_pointer);
+extern void hwasan_emit_prologue (rtx *, rtx *, poly_int64 *, uint8_t *, size_t);
+extern rtx_insn *hwasan_emit_untag_frame (rtx, rtx, rtx_insn *);
+extern bool memory_tagging_p (void);
+extern bool hwasan_sanitize_stack_p (void);
 extern rtx_insn *asan_emit_stack_protection (rtx, rtx, unsigned int,
 					     HOST_WIDE_INT *, tree *, int);
 extern rtx_insn *asan_emit_allocas_unpoison (rtx, rtx, rtx_insn *);
@@ -75,6 +87,31 @@  extern hash_set <tree> *asan_used_labels;
 
 #define ASAN_USE_AFTER_SCOPE_ATTRIBUTE	"use after scope memory"
 
+/* NOTE: The values below define an ABI and are hard-coded to these values in
+   libhwasan, hence they can't be changed independently here.  */
+/* How many bits are used to store a tag in a pointer.
+   HWASAN uses the entire top byte of a pointer (i.e. 8 bits).  */
+#define HWASAN_TAG_SIZE 8
+/* Tag Granule of HWASAN shadow stack.
+   This is the size in real memory that each byte in the shadow memory refers
+   to.  I.e. if a variable is X bytes long in memory then it's tag in shadow
+   memory will span X / HWASAN_TAG_GRANULE_SIZE bytes.
+   Most variables will need to be aligned to this amount since two variables
+   that are neighbours in memory and share a tag granule would need to share
+   the same tag (the shared tag granule can only store one tag).  */
+#define HWASAN_TAG_SHIFT_SIZE 4
+#define HWASAN_TAG_GRANULE_SIZE (1ULL << HWASAN_TAG_SHIFT_SIZE)
+/* Define the tag for the stack background.
+   This defines what tag the stack pointer will be and hence what tag all
+   variables that are not given special tags are (e.g. spilled registers,
+   and parameters passed on the stack).  */
+#define HWASAN_STACK_BACKGROUND 0
+/* How many bits to shift in order to access the tag bits.
+   The tag is stored in the top 8 bits of a pointer hence shifting 56 bits will
+   leave just the tag.  */
+#define HWASAN_SHIFT 56
+#define HWASAN_SHIFT_RTX const_int_rtx[MAX_SAVED_CONST_INT + HWASAN_SHIFT]
+
 /* Various flags for Asan builtins.  */
 enum asan_check_flags
 {
diff --git a/gcc/asan.c b/gcc/asan.c
index a731bd490b4e78e916ae20fc9a0249c1fc04daa5..2e79d39785467651c352169dae4551a47d7b3613 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -257,6 +257,9 @@  hash_set<tree> *asan_handled_variables = NULL;
 
 hash_set <tree> *asan_used_labels = NULL;
 
+static uint8_t tag_offset = 0;
+static rtx hwasan_base_ptr = NULL_RTX;
+
 /* Sets shadow offset to value in string VAL.  */
 
 bool
@@ -1352,6 +1355,21 @@  asan_redzone_buffer::flush_if_full (void)
     flush_redzone_payload ();
 }
 
+/* Returns whether we are tagging pointers and checking those tags on memory
+   access.  This is true when checking with either in software or hardware.  */
+bool
+memory_tagging_p ()
+{
+    return sanitize_flags_p (SANITIZE_HWADDRESS);
+}
+
+/* Are we tagging the stack?  */
+bool
+hwasan_sanitize_stack_p ()
+{
+  return (memory_tagging_p () && HWASAN_STACK);
+}
+
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
    directly, epilogue sequence returned.  BASE is the register holding the
    stack base, against which OFFSETS array offsets are relative to, OFFSETS
@@ -2884,6 +2902,11 @@  initialize_sanitizer_builtins (void)
     = build_function_type_list (void_type_node, uint64_type_node,
 				ptr_type_node, NULL_TREE);
 
+  tree BT_FN_VOID_PTR_UINT8_SIZE
+    = build_function_type_list (void_type_node, ptr_type_node,
+				unsigned_char_type_node, size_type_node,
+				NULL_TREE);
+
   tree BT_FN_BOOL_VPTR_PTR_IX_INT_INT[5];
   tree BT_FN_IX_CONST_VPTR_INT[5];
   tree BT_FN_IX_VPTR_IX_INT[5];
@@ -2934,6 +2957,8 @@  initialize_sanitizer_builtins (void)
 #define BT_FN_I16_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[4]
 #define BT_FN_I16_VPTR_I16_INT BT_FN_IX_VPTR_IX_INT[4]
 #define BT_FN_VOID_VPTR_I16_INT BT_FN_VOID_VPTR_IX_INT[4]
+#undef ATTR_NOTHROW_LIST
+#define ATTR_NOTHROW_LIST ECF_NOTHROW
 #undef ATTR_NOTHROW_LEAF_LIST
 #define ATTR_NOTHROW_LEAF_LIST ECF_NOTHROW | ECF_LEAF
 #undef ATTR_TMPURE_NOTHROW_LEAF_LIST
@@ -3684,4 +3709,189 @@  make_pass_asan_O0 (gcc::context *ctxt)
   return new pass_asan_O0 (ctxt);
 }
 
+void
+hwasan_record_base (rtx base)
+{
+  /* Initialise tag of the base register.
+     This has to be done as soon as the stack is getting expanded to ensure
+     anything emitted with `get_dynamic_stack_base` will use the value set here
+     instead of using a register without a value.
+     Especially note that RTL expansion of large aligned values does that.  */
+  targetm.memtag.gentag (base, virtual_stack_vars_rtx);
+  hwasan_base_ptr = base;
+}
+
+uint8_t
+hwasan_current_tag ()
+{
+  return tag_offset;
+}
+
+void
+hwasan_increment_tag ()
+{
+  uint8_t tag_bits = HWASAN_TAG_SIZE;
+  tag_offset = (tag_offset + 1) % (1 << tag_bits);
+}
+
+rtx
+hwasan_with_tag (rtx base, poly_int64 offset)
+{
+  gcc_assert (tag_offset < (1 << HWASAN_TAG_SIZE));
+  return targetm.memtag.addtag (base, offset, tag_offset);
+}
+
+/* Clear internal state for the next function.
+   This function is called before variables on the stack get expanded, in
+   `init_vars_expansion`.  */
+void
+hwasan_tag_init ()
+{
+  delete asan_used_labels;
+  asan_used_labels = NULL;
+
+  hwasan_base_ptr = NULL_RTX;
+  tag_offset = HWASAN_STACK_BACKGROUND + 1;
+}
+
+rtx
+hwasan_extract_tag (rtx tagged_pointer)
+{
+  rtx tag = expand_simple_binop (Pmode,
+				 LSHIFTRT,
+				 tagged_pointer,
+				 HWASAN_SHIFT_RTX,
+				 NULL_RTX,
+				 /* unsignedp = */0,
+				 OPTAB_DIRECT);
+  return gen_lowpart (QImode, tag);
+}
+
+void
+hwasan_emit_prologue (rtx *bases,
+		      rtx *untagged_bases,
+		      poly_int64 *offsets,
+		      uint8_t *tags,
+		      size_t length)
+{
+  /* We need untagged base pointers since libhwasan only accepts untagged
+    pointers in __hwasan_tag_memory.  We need the tagged base pointer to obtain
+    the base tag for an offset.  */
+  for (size_t i = 0; (i * 2) + 1 < length; i++)
+    {
+      poly_int64 start = offsets[i * 2];
+      poly_int64 end = offsets[(i * 2) + 1];
+
+      poly_int64 bot, top;
+      if (known_ge (start, end))
+	{
+	  top = start;
+	  bot = end;
+	}
+      else
+	{
+	  top = end;
+	  bot = start;
+	}
+      poly_int64 size = (top - bot);
+
+      /* Can't check that all poly_int64's are aligned, but still nice
+	 to check those that are compile-time constants.  */
+      HOST_WIDE_INT tmp;
+      if (top.is_constant (&tmp))
+	gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
+      if (bot.is_constant (&tmp))
+	gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
+      if (size.is_constant (&tmp))
+	gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
+
+      rtx ret = init_one_libfunc ("__hwasan_tag_memory");
+      rtx base_tag = hwasan_extract_tag (bases[i]);
+      /* In the case of tag overflow we would want modulo wrapping -- which
+	 should be given from the `plus_constant` in QImode.  */
+      rtx tag = plus_constant (QImode, base_tag, tags[i]);
+      emit_library_call (ret,
+			 LCT_NORMAL,
+			 VOIDmode,
+			 plus_constant (ptr_mode, untagged_bases[i], bot),
+			 ptr_mode,
+			 tag,
+			 QImode,
+			 gen_int_mode (size, ptr_mode),
+			 ptr_mode);
+    }
+}
+
+rtx_insn *
+hwasan_emit_untag_frame (rtx dynamic, rtx vars, rtx_insn *before)
+{
+  if (before)
+    push_to_sequence (before);
+  else
+    start_sequence ();
+
+  dynamic = convert_memory_address (ptr_mode, dynamic);
+  vars = convert_memory_address (ptr_mode, vars);
+
+  rtx top_rtx;
+  rtx bot_rtx;
+  if (STACK_GROWS_DOWNWARD)
+    {
+      top_rtx = vars;
+      bot_rtx = dynamic;
+    }
+  else
+    {
+      top_rtx = dynamic;
+      bot_rtx = vars;
+    }
+
+  rtx size_rtx = expand_simple_binop (Pmode, MINUS, top_rtx, bot_rtx,
+				  NULL_RTX, /* unsignedp = */0, OPTAB_DIRECT);
+
+  rtx ret = init_one_libfunc ("__hwasan_tag_memory");
+  emit_library_call (ret, LCT_NORMAL, VOIDmode,
+      bot_rtx, ptr_mode,
+      const0_rtx, QImode,
+      size_rtx, ptr_mode);
+
+  do_pending_stack_adjust ();
+  rtx_insn *insns = get_insns ();
+  end_sequence ();
+  return insns;
+}
+
+rtx
+hwasan_create_untagged_base (rtx orig_base)
+{
+  rtx untagged_base = gen_reg_rtx (Pmode);
+  rtx tag_mask = gen_int_mode ((1ULL << HWASAN_SHIFT) - 1, Pmode);
+  untagged_base = expand_binop (Pmode, and_optab,
+				orig_base, tag_mask,
+				untagged_base, true, OPTAB_DIRECT);
+  gcc_assert (untagged_base);
+  return untagged_base;
+}
+
+/* Needs to be GTY(()), because cgraph_build_static_cdtor may
+   invoke ggc_collect.  */
+static GTY(()) tree hwasan_ctor_statements;
+
+void
+hwasan_finish_file (void)
+{
+  /* Do not emit constructor initialisation for the kernel.
+     (the kernel has its own initialisation already).  */
+  if (flag_sanitize & SANITIZE_KERNEL_HWADDRESS)
+    return;
+
+  /* Avoid instrumenting code in the hwasan constructors/destructors.  */
+  flag_sanitize &= ~SANITIZE_HWADDRESS;
+  int priority = MAX_RESERVED_INIT_PRIORITY - 1;
+  tree fn = builtin_decl_implicit (BUILT_IN_HWASAN_INIT);
+  append_to_statement_list (build_call_expr (fn, 0), &hwasan_ctor_statements);
+  cgraph_build_static_cdtor ('I', hwasan_ctor_statements, priority);
+  flag_sanitize |= SANITIZE_HWADDRESS;
+}
+
 #include "gt-asan.h"
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index e5c9e063c480d1392b6c2b395ec9d029b6d94209..d05f597b6434f39fe95d4f28dd2ef3ed463dd925 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -625,6 +625,8 @@  DEF_FUNCTION_TYPE_3 (BT_FN_VOID_UINT32_UINT32_PTR,
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_SIZE_SIZE_PTR, BT_VOID, BT_SIZE, BT_SIZE,
 		     BT_PTR)
 DEF_FUNCTION_TYPE_3 (BT_FN_UINT_UINT_PTR_PTR, BT_UINT, BT_UINT, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_UINT8_SIZE, BT_VOID, BT_PTR, BT_UINT8,
+		     BT_SIZE)
 
 DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR,
 		     BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR)
diff --git a/gcc/builtins.def b/gcc/builtins.def
index d8233f5f760f8426e8ff85473ecda02aa6c2655b..3f621ffdbda0acf5949348882a7c1f3504634666 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -244,6 +244,7 @@  along with GCC; see the file COPYING3.  If not see
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
 	       true, true, true, ATTRS, true, \
 	      (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \
+				| SANITIZE_HWADDRESS \
 				| SANITIZE_UNDEFINED \
 				| SANITIZE_UNDEFINED_NONDEFAULT) \
 	       || flag_sanitize_coverage))
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index c34a53b526b50d49cd73ab5a5c383efc6da5a23e..8e1ea21621e350b3d8779b79b0d7f69d571caa08 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -378,7 +378,14 @@  align_local_variable (tree decl, bool really_expand)
       if (really_expand)
 	SET_DECL_ALIGN (decl, align);
     }
-  return align / BITS_PER_UNIT;
+
+  unsigned int ret_align = align / BITS_PER_UNIT;
+
+  if (hwasan_sanitize_stack_p ())
+    ret_align = ret_align > HWASAN_TAG_GRANULE_SIZE
+      ? ret_align
+      : HWASAN_TAG_GRANULE_SIZE;
+  return ret_align;
 }
 
 /* Align given offset BASE with ALIGN.  Truncate up if ALIGN_UP is true,
@@ -986,7 +993,7 @@  dump_stack_var_partition (void)
 
 static void
 expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
-			 poly_int64 offset)
+			 poly_int64 offset, rtx stack_base)
 {
   unsigned align;
   rtx x;
@@ -994,7 +1001,11 @@  expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
   /* If this fails, we've overflowed the stack frame.  Error nicely?  */
   gcc_assert (known_eq (offset, trunc_int_for_mode (offset, Pmode)));
 
-  x = plus_constant (Pmode, base, offset);
+  if (hwasan_sanitize_stack_p ())
+    x = hwasan_with_tag (base, offset);
+  else
+    x = plus_constant (Pmode, base, offset);
+
   x = gen_rtx_MEM (TREE_CODE (decl) == SSA_NAME
 		   ? TYPE_MODE (TREE_TYPE (decl))
 		   : DECL_MODE (SSAVAR (decl)), x);
@@ -1004,7 +1015,7 @@  expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
       /* Set alignment we actually gave this decl if it isn't an SSA name.
          If it is we generate stack slots only accidentally so it isn't as
 	 important, we'll simply use the alignment that is already set.  */
-      if (base == virtual_stack_vars_rtx)
+      if (base == stack_base)
 	offset -= frame_phase;
       align = known_alignment (offset);
       align *= BITS_PER_UNIT;
@@ -1030,9 +1041,19 @@  public:
      The vector is in reversed, highest offset pairs come first.  */
   auto_vec<HOST_WIDE_INT> asan_vec;
 
+  /* HWASAN records the poly_int64 so it can handle any stack variable.  */
+  auto_vec<poly_int64> hwasan_vec;
+  auto_vec<rtx> hwasan_untagged_base_vec;
+  auto_vec<rtx> hwasan_base_vec;
+
   /* Vector of partition representative decls in between the paddings.  */
   auto_vec<tree> asan_decl_vec;
 
+  /* Vector of tag offsets representing the tag for each stack variable.
+     Each offset determines the difference between the randomly generated
+     tag for the current frame and the tag for this stack variable.  */
+  auto_vec<uint8_t> hwasan_tag_vec;
+
   /* Base pseudo register for Address Sanitizer protected automatic vars.  */
   rtx asan_base;
 
@@ -1050,6 +1071,7 @@  expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
   size_t si, i, j, n = stack_vars_num;
   poly_uint64 large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
+  rtx large_untagged_base = NULL;
   unsigned large_align = 0;
   bool large_allocation_done = false;
   tree decl;
@@ -1096,11 +1118,17 @@  expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
 	}
     }
 
+  if (hwasan_sanitize_stack_p () && data->asan_base == NULL)
+    {
+      data->asan_base = gen_reg_rtx (Pmode);
+      hwasan_record_base (data->asan_base);
+    }
+
   for (si = 0; si < n; ++si)
     {
       rtx base;
       unsigned base_align, alignb;
-      poly_int64 offset;
+      poly_int64 offset = 0;
 
       i = stack_vars_sorted[si];
 
@@ -1121,10 +1149,36 @@  expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
       if (pred && !pred (i))
 	continue;
 
+      base = hwasan_sanitize_stack_p ()
+	? data->asan_base
+	: virtual_stack_vars_rtx;
       alignb = stack_vars[i].alignb;
       if (alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
 	{
-	  base = virtual_stack_vars_rtx;
+	  if (hwasan_sanitize_stack_p ())
+	    {
+	      /* Allocate zero bytes to take advantage of the
+		 alloc_stack_frame_space logic of ensuring the stack is aligned
+		 despite having poly_int64's to deal with.
+
+		 There must be no tag granule "shared" between different
+		 objects.  This means that no HWASAN_TAG_GRANULE_SIZE byte
+		 chunk can have more than one object in it.
+
+		 We ensure this by forcing the end of the last bit of data to
+		 be aligned to HWASAN_TAG_GRANULE_SIZE bytes here, and setting
+		 the start of each variable to be aligned to
+		 HWASAN_TAG_GRANULE_SIZE bytes in `align_local_variable`.
+
+		 We can't align just one of the start or end, since there are
+		 untagged things stored on the stack that we have no control on
+		 the alignment and these can't share a tag granule with a
+		 tagged variable.  */
+	      gcc_assert (stack_vars[i].alignb >= HWASAN_TAG_GRANULE_SIZE);
+	      offset = alloc_stack_frame_space (0, HWASAN_TAG_GRANULE_SIZE);
+	      data->hwasan_vec.safe_push (offset);
+	      data->hwasan_untagged_base_vec.safe_push (virtual_stack_vars_rtx);
+	    }
 	  /* ASAN description strings don't yet have a syntax for expressing
 	     polynomial offsets.  */
 	  HOST_WIDE_INT prev_offset;
@@ -1204,6 +1258,9 @@  expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
 	      offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
 	      base_align = crtl->max_used_stack_slot_alignment;
 	    }
+
+	  if (hwasan_sanitize_stack_p ())
+	    data->hwasan_vec.safe_push (offset);
 	}
       else
 	{
@@ -1223,14 +1280,31 @@  expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
 	      loffset = alloc_stack_frame_space
 		(rtx_to_poly_int64 (large_allocsize),
 		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
-	      large_base = get_dynamic_stack_base (loffset, large_align);
+	      large_base = get_dynamic_stack_base (loffset, large_align, base);
 	      large_allocation_done = true;
 	    }
-	  gcc_assert (large_base != NULL);
 
+	  gcc_assert (large_base != NULL);
 	  large_alloc = aligned_upper_bound (large_alloc, alignb);
+	  if (hwasan_sanitize_stack_p ())
+	    {
+	      /* An object with a large alignment requirement means that the
+		 alignment requirement is greater than the required alignment
+		 for tags.  */
+	      if (!large_untagged_base)
+		large_untagged_base = hwasan_create_untagged_base (large_base);
+	      data->hwasan_vec.safe_push (large_alloc);
+	      data->hwasan_untagged_base_vec.safe_push (large_untagged_base);
+	    }
 	  offset = large_alloc;
 	  large_alloc += stack_vars[i].size;
+	  if (hwasan_sanitize_stack_p ())
+	    {
+	      /* Ensure the end of the variable is also aligned correctly.  */
+	      poly_int64 align_again =
+		aligned_upper_bound (large_alloc, HWASAN_TAG_GRANULE_SIZE);
+	      data->hwasan_vec.safe_push (align_again);
+	    }
 
 	  base = large_base;
 	  base_align = large_align;
@@ -1242,7 +1316,21 @@  expand_stack_vars (bool (*pred) (size_t), class stack_vars_data *data)
 	{
 	  expand_one_stack_var_at (stack_vars[j].decl,
 				   base, base_align,
-				   offset);
+				   offset,
+				   hwasan_sanitize_stack_p ()
+				   ? data->asan_base
+				   : virtual_stack_vars_rtx);
+	}
+
+      if (hwasan_sanitize_stack_p ())
+	{
+	  /* Record the tag for this object in `data` so the prologue knows
+	     what tag to put in the shadow memory during cfgexpand.c.
+	     Then increment the tag so that the next object has a different
+	     tag to this object.  */
+	  data->hwasan_base_vec.safe_push (base);
+	  data->hwasan_tag_vec.safe_push (hwasan_current_tag ());
+	  hwasan_increment_tag ();
 	}
     }
 
@@ -1339,7 +1427,8 @@  expand_one_stack_var_1 (tree var)
   offset = alloc_stack_frame_space (size, byte_align);
 
   expand_one_stack_var_at (var, virtual_stack_vars_rtx,
-			   crtl->max_used_stack_slot_alignment, offset);
+			   crtl->max_used_stack_slot_alignment, offset,
+			   virtual_stack_vars_rtx);
 }
 
 /* Wrapper for expand_one_stack_var_1 that checks SSA_NAMEs are
@@ -1552,8 +1641,13 @@  defer_stack_allocation (tree var, bool toplevel)
 
   /* If stack protection is enabled, *all* stack variables must be deferred,
      so that we can re-order the strings to the top of the frame.
-     Similarly for Address Sanitizer.  */
-  if (flag_stack_protect || asan_sanitize_stack_p ())
+     Similarly for Address Sanitizer.
+     When tagging memory we defer all stack variables so we can handle them in
+     one place (handle here meaning ensure they are aligned and record
+     information on each variables position in the stack).  */
+  if (flag_stack_protect
+      || asan_sanitize_stack_p ()
+      || hwasan_sanitize_stack_p ())
     return true;
 
   unsigned int align = TREE_CODE (var) == SSA_NAME
@@ -1938,6 +2032,8 @@  init_vars_expansion (void)
   /* Initialize local stack smashing state.  */
   has_protected_decls = false;
   has_short_buffer = false;
+  if (hwasan_sanitize_stack_p ())
+    hwasan_tag_init ();
 }
 
 /* Free up stack variable graph data.  */
@@ -2297,12 +2393,27 @@  expand_used_vars (void)
 	}
 
       expand_stack_vars (NULL, &data);
+
+      if (hwasan_sanitize_stack_p ())
+	hwasan_emit_prologue (data.hwasan_base_vec.address (),
+			      data.hwasan_untagged_base_vec.address (),
+			      data.hwasan_vec.address (),
+			      data.hwasan_tag_vec.address (),
+			      data.hwasan_vec.length ());
     }
 
   if (asan_sanitize_allocas_p () && cfun->calls_alloca)
     var_end_seq = asan_emit_allocas_unpoison (virtual_stack_dynamic_rtx,
 					      virtual_stack_vars_rtx,
 					      var_end_seq);
+  /* Here we clear tags fro the entire frame of this function.
+     We need to clear tags of *something* if we have tagged either local
+     variables or alloca objects.  */
+  else if (hwasan_sanitize_stack_p ()
+	   && (cfun->calls_alloca || stack_vars_num > 0))
+    var_end_seq = hwasan_emit_untag_frame (virtual_stack_dynamic_rtx,
+					   virtual_stack_vars_rtx,
+					   var_end_seq);
 
   fini_vars_expansion ();
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index e842b734c9c20253986880ba3622a8f692d3ca88..718d2e8aac56553bdc30c592fe70fa10aa911736 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2982,6 +2982,31 @@  A target hook which lets a backend compute the set of pressure classes to  be us
 True if backend architecture naturally supports ignoring the top byte of pointers.  This feature means that -fsanitize=hwaddress can work.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_MEMTAG_ADDTAG (rtx @var{base}, poly_int64 @var{addr_offset}, uint8_t @var{tag_offset})
+Emit an RTX representing BASE offset in value by ADDR_OFFSET and in tag by TAG_OFFSET.
+The resulting RTX must either be a valid memory address or be able to get
+put into an operand with force_operand.  If overridden the more common case
+is that we force this into an operand using the backend hook
+"addtag_force_operand" that is called in force_operand.
+
+It is expected that that "addtag_force_operand" recognises the RTX
+generated by "addtag" and emits code to force that RTX into an operand.
+@end deftypefn
+
+@deftypefn {Target Hook} rtx TARGET_MEMTAG_ADDTAG_FORCE_OPERAND (rtx @var{oper}, rtx @var{target})
+If the RTL expression OPER is of the form generated by targetm.memtag.addtag,
+then emit instructions to move the value into an operand (i.e. for
+force_operand).
+TARGET is an RTX suggestion of where to generate the value.
+This hook is most often implemented by emitting instructions to put the
+expression into a pseudo register, then returning that pseudo register.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_MEMTAG_GENTAG (rtx @var{base}, rtx @var{untagged})
+Set the BASE argument to UNTAGGED with some random tag.
+This function is used to generate a tagged base for the current stack frame.
+@end deftypefn
+
 @node Stack and Calling
 @section Stack Layout and Calling Conventions
 @cindex calling conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6136ac1a5fe0c0980d5b5123b67b102a1b1e0bcc..3d3761dbc097e6f73f4f4f937aa713b93872a4a2 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2379,6 +2379,12 @@  in the reload pass.
 
 @hook TARGET_MEMTAG_CAN_TAG_ADDRESSES
 
+@hook TARGET_MEMTAG_ADDTAG
+
+@hook TARGET_MEMTAG_ADDTAG_FORCE_OPERAND
+
+@hook TARGET_MEMTAG_GENTAG
+
 @node Stack and Calling
 @section Stack Layout and Calling Conventions
 @cindex calling conventions
diff --git a/gcc/explow.h b/gcc/explow.h
index 5110ad82d6a024fda1d3a3eaf80de40c5e6ad3b6..333948e0c69a1b1132e9a1d06707dc63f1226262 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -102,7 +102,7 @@  extern rtx allocate_dynamic_stack_space (rtx, unsigned, unsigned,
 extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *);
 
 /* Returns the address of the dynamic stack space without allocating it.  */
-extern rtx get_dynamic_stack_base (poly_int64, unsigned);
+extern rtx get_dynamic_stack_base (poly_int64, unsigned, rtx);
 
 /* Return an rtx doing runtime alignment to REQUIRED_ALIGN on TARGET.  */
 extern rtx align_dynamic_address (rtx, unsigned);
diff --git a/gcc/explow.c b/gcc/explow.c
index 7eb854bca4a6dcc5b15e5c42df1a5e88a19f2464..2728f7a4b1ee8ed0d20287b716a3d0ad5a97d84b 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1577,10 +1577,14 @@  allocate_dynamic_stack_space (rtx size, unsigned size_align,
    OFFSET is the offset of the area into the virtual stack vars area.
 
    REQUIRED_ALIGN is the alignment (in bits) required for the region
-   of memory.  */
+   of memory.
+
+   BASE is the rtx of the base of this virtual stack vars area.
+   The only time this is not `virtual_stack_vars_rtx` is when tagging pointers
+   on the stack.  */
 
 rtx
-get_dynamic_stack_base (poly_int64 offset, unsigned required_align)
+get_dynamic_stack_base (poly_int64 offset, unsigned required_align, rtx base)
 {
   rtx target;
 
@@ -1588,7 +1592,7 @@  get_dynamic_stack_base (poly_int64 offset, unsigned required_align)
     crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
 
   target = gen_reg_rtx (Pmode);
-  emit_move_insn (target, virtual_stack_vars_rtx);
+  emit_move_insn (target, base);
   target = expand_binop (Pmode, add_optab, target,
 			 gen_int_mode (offset, Pmode),
 			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
diff --git a/gcc/expr.c b/gcc/expr.c
index 476c6865f20828fc68f455e70d4874eaabd9d08d..24d011e698af0dbf3635ba5f9d8275376a124bf4 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7500,6 +7500,13 @@  force_operand (rtx value, rtx target)
       return subtarget;
     }
 
+  if (targetm.memtag.addtag_force_operand)
+    {
+      rtx ret = targetm.memtag.addtag_force_operand (value, target);
+      if (ret)
+	return ret;
+    }
+
   if (ARITHMETIC_P (value))
     {
       op2 = XEXP (value, 1);
diff --git a/gcc/sanitizer.def b/gcc/sanitizer.def
index 374d15007d868363d9b4fbf467e1e462abbca61a..7bd50715f24a2cb154b578e2abdea4e8fcdb2107 100644
--- a/gcc/sanitizer.def
+++ b/gcc/sanitizer.def
@@ -180,6 +180,12 @@  DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_POINTER_COMPARE, "__sanitizer_ptr_cmp",
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_POINTER_SUBTRACT, "__sanitizer_ptr_sub",
 		      BT_FN_VOID_PTR_PTRMODE, ATTR_NOTHROW_LEAF_LIST)
 
+/* Hardware Address Sanitizer.  */
+DEF_SANITIZER_BUILTIN(BUILT_IN_HWASAN_INIT, "__hwasan_init",
+		      BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_HWASAN_TAG_MEM, "__hwasan_tag_memory",
+		      BT_FN_VOID_PTR_UINT8_SIZE, ATTR_NOTHROW_LIST)
+
 /* Thread Sanitizer */
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_INIT, "__tsan_init", 
 		      BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
diff --git a/gcc/target.def b/gcc/target.def
index ad16a151b6af9b1ee13918c8f2980280d75b1d90..3c533acbe3965cdb0870621e364a009353f72c2e 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -6751,6 +6751,37 @@  DEFHOOK
  pointers.  This feature means that -fsanitize=hwaddress can work.",
  bool, (), default_memtag_can_tag_addresses)
 
+DEFHOOK
+(addtag,
+ "Emit an RTX representing BASE offset in value by ADDR_OFFSET and in tag by\
+ TAG_OFFSET.\n\
+The resulting RTX must either be a valid memory address or be able to get\n\
+put into an operand with force_operand.  If overridden the more common case\n\
+is that we force this into an operand using the backend hook\n\
+\"addtag_force_operand\" that is called in force_operand.\n\
+\n\
+It is expected that that \"addtag_force_operand\" recognises the RTX\n\
+generated by \"addtag\" and emits code to force that RTX into an operand.",
+rtx, (rtx base, poly_int64 addr_offset, uint8_t tag_offset),
+default_memtag_addtag)
+
+DEFHOOK
+(addtag_force_operand,
+ "If the RTL expression OPER is of the form generated by targetm.memtag.addtag,\n\
+then emit instructions to move the value into an operand (i.e. for\n\
+force_operand).\n\
+TARGET is an RTX suggestion of where to generate the value.\n\
+This hook is most often implemented by emitting instructions to put the\n\
+expression into a pseudo register, then returning that pseudo register.",
+rtx, (rtx oper, rtx target), NULL)
+
+DEFHOOK
+(gentag,
+ "Set the BASE argument to UNTAGGED with some random tag.\n\
+This function is used to generate a tagged base for the current stack frame.",
+  void, (rtx base, rtx untagged),
+  default_memtag_gentag)
+
 HOOK_VECTOR_END (memtag)
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 94e865259f35e46e26f6b4763c5e2f9dc9ed1b83..b0e32102acacdf7a64f1e3d314a966d1d3f062c7 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -283,4 +283,6 @@  extern bool speculation_safe_value_not_needed (bool);
 extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx);
 
 extern bool default_memtag_can_tag_addresses ();
+extern void default_memtag_gentag (rtx, rtx);
+extern rtx default_memtag_addtag (rtx, poly_int64, uint8_t);
 #endif /* GCC_TARGHOOKS_H */
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 6e9e877c32e2c8705056bdf0ce2b1b6f125d93c3..cd9f98fc800d7232ead50b03f951364f76c01adc 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -70,6 +70,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "varasm.h"
 #include "flags.h"
 #include "explow.h"
+#include "expmed.h"
 #include "calls.h"
 #include "expr.h"
 #include "output.h"
@@ -84,6 +85,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "sbitmap.h"
 #include "function-abi.h"
+#include "attribs.h"
+#include "asan.h"
 
 bool
 default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
@@ -2371,4 +2374,78 @@  default_memtag_can_tag_addresses ()
   return false;
 }
 
+void
+default_memtag_gentag (rtx base, rtx untagged)
+{
+  gcc_assert (HWASAN_STACK);
+  if (HWASAN_RANDOM_FRAME_TAG)
+    {
+    rtx temp = gen_reg_rtx (QImode);
+    rtx ret = init_one_libfunc ("__hwasan_generate_tag");
+    rtx new_tag = emit_library_call_value (ret, temp, LCT_NORMAL, QImode);
+    emit_move_insn (base, untagged);
+    /* We know that `base` is not the stack pointer, since we never want to put
+      a randomly generated tag into the stack pointer.  Hence we can use
+      `store_bit_field` which on aarch64 generates a `bfi` which can not act on
+      the stack pointer.  */
+    store_bit_field (base, 8, 56, 0, 0, QImode, new_tag, false);
+    }
+  else
+    {
+      /* NOTE: The kernel API does not have __hwasan_generate_tag exposed.
+	 In the future we may add the option emit random tags with inline
+	 instrumentation instead of function calls.  This would be the same
+	 between the kernel and userland.  */
+      emit_move_insn (base, untagged);
+    }
+}
+
+rtx
+default_memtag_addtag (rtx base, poly_int64 offset, uint8_t tag_offset)
+{
+  /* Need to look into what the most efficient code sequence is.
+     This is a code sequence that would be emitted *many* times, so we
+     want it as small as possible.
+
+     If the tag offset is greater that (1 << 7) then the most efficient
+     sequence here would give UB from signed integer overflow in the
+     poly_int64.  Hence in that case we emit the slightly less efficient
+     sequence.
+
+     There are two places where tag overflow is a question:
+       - Tagging the shadow stack.
+	  (both tagging and untagging).
+       - Tagging addressable pointers.
+
+     We need to ensure both behaviours are the same (i.e. that the tag that
+     ends up in a pointer after "overflowing" the tag bits with a tag addition
+     is the same that ends up in the shadow space).
+
+     The aim is that the behaviour of tag addition should follow modulo
+     wrapping in both instances.
+
+     The libhwasan code doesn't have any path that increments a pointers tag,
+     which means it has no opinion on what happens when a tag increment
+     overflows (and hence we can choose our own behaviour).  */
+
+  if (tag_offset < (1 << 7))
+    {
+      offset += ((uint64_t)tag_offset << HWASAN_SHIFT);
+      return plus_constant (Pmode, base, offset);
+    }
+  else
+    {
+      /* This is the fallback, it would be nice if it had less instructions,
+	 but we can look for cleverer ways later.  */
+      uint64_t tag_mask = ~(0xFFUL << HWASAN_SHIFT);
+      rtx untagged_base = gen_rtx_AND (Pmode, GEN_INT (tag_mask), base);
+      rtx new_addr = plus_constant (Pmode, untagged_base, offset);
+
+      rtx original_tag_value = gen_rtx_LSHIFTRT (Pmode, base, GEN_INT (HWASAN_SHIFT));
+      rtx new_tag_value = plus_constant (Pmode, original_tag_value, tag_offset);
+      rtx new_tag = gen_rtx_ASHIFT (Pmode, new_tag_value, GEN_INT (HWASAN_SHIFT));
+      return gen_rtx_IOR (Pmode, new_addr, new_tag);
+    }
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/toplev.c b/gcc/toplev.c
index ab67384249a3437ac37f42f741ed516884677f9f..7bd75548d2aebb3415ac85ec40ad25e5ca794094 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -508,6 +508,9 @@  compile_file (void)
       if (flag_sanitize & SANITIZE_THREAD)
 	tsan_finish_file ();
 
+      if (flag_sanitize & SANITIZE_HWADDRESS)
+	hwasan_finish_file ();
+
       omp_finish_file ();
 
       hsa_output_brig ();