From patchwork Tue Feb 12 14:19:37 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dodji Seketeli X-Patchwork-Id: 219864 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 66BAB2C0330 for ; Wed, 13 Feb 2013 01:20:14 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1361283615; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:From:To:Cc:Subject:References:Date:In-Reply-To: Message-ID:User-Agent:MIME-Version:Content-Type:Mailing-List: Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:Sender:Delivered-To; bh=3+2+iPkjrFMiGkBsG31BG6G9RiA=; b=Y6xZ/FRAEVJonOqOC3/M9QyT3boLWTa4TkN2nE5OEXwTnmVq8lKHSGEXL0OPOV 361IVRGus5tholSGIKIgSmFG3abqx2ceHAQmX5hpIHoPwEadtDOZDP4882/yf10D ArfNL25Jf5s43s3bzFDn2ThW+nkHEMDLZKRK+x6P16iCM= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:From:To:Cc:Subject:References:X-URL:Date:In-Reply-To:Message-ID:User-Agent:MIME-Version:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=Jt1YGPDQuCUsor48apIZc//Cu5mHdKEtI0Bvq2Xixyf+8pBgR24aAD3xN8czO0 1knk/oeB+wBX6MpbkAxCPnFpty5cSuFtETc0bQ/SIKIccDOFFdgVXOqBNAv/zkkF wXI54mzxkI2643IDP7FVsQ+6pnhaB1RiYfC7H7wxMLsos=; Received: (qmail 30451 invoked by alias); 12 Feb 2013 14:20:05 -0000 Received: (qmail 30354 invoked by uid 22791); 12 Feb 2013 14:19:59 -0000 X-SWARE-Spam-Status: No, hits=-5.0 required=5.0 tests=AWL, BAYES_40, KHOP_RCVD_UNTRUST, KHOP_SPAMHAUS_DROP, RCVD_IN_DNSWL_HI, RCVD_IN_HOSTKARMA_W, RP_MATCHES_RCVD, SPF_HELO_PASS, TW_TM X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 12 Feb 2013 14:19:40 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r1CEJd7X012454 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 12 Feb 2013 09:19:39 -0500 Received: from localhost (ovpn-116-33.ams2.redhat.com [10.36.116.33]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r1CEJbQW007949 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 12 Feb 2013 09:19:39 -0500 Received: by localhost (Postfix, from userid 1000) id 648B84063D; Tue, 12 Feb 2013 15:19:37 +0100 (CET) From: Dodji Seketeli To: Jakub Jelinek Cc: GCC Patches , Kostya Serebryany , Dmitry Vyukov Subject: Re: [PATCH 2/2] [asan] Avoid instrumenting duplicated memory access in the same basic block References: <87mwvtxalf.fsf@redhat.com> <87ehh5xabo.fsf@redhat.com> <20130129150022.GS4385@tucnak.redhat.com> X-URL: http://www.redhat.com Date: Tue, 12 Feb 2013 15:19:37 +0100 In-Reply-To: <20130129150022.GS4385@tucnak.redhat.com> (Jakub Jelinek's message of "Tue, 29 Jan 2013 16:00:22 +0100") Message-ID: <87obfpfwcm.fsf@redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) MIME-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hello, Following this email review and offline discussions we had, I rewrote the patch to do away with the c++ idioms toward which I tend to lean naturally :-) > > @@ -212,6 +213,159 @@ alias_set_type asan_shadow_set = -1; > > alias set is used for all shadow memory accesses. */ > > static GTY(()) tree shadow_ptr_types[2]; > > > > +/* Hashtable support for memory references used by gimple > > + statements. */ > > + > > +/* This type represents a reference to a memory region. */ > > +struct __attribute__ ((visibility ("hidden"))) mem_ref > > This is wrong, you aren't checking whether visibility or even hidden > visibility is supposed by the compiler. > > The C++ way I think would be to use anonymous namespace, > but I don't see anything like that used in gcc/*.c yet, so > perhaps just name it asan_mem_ref and be done with that. Done. > > +{ > > + /* The expression of the begining of the memory region. */ > > + tree start; > > + /* The expression representing the length of the region. */ > > + tree len; > > Do you really need to store len as tree? Wouldn't it be better > just to store the access size in bytes as integer (1, 2, 4, 8 or 16 > bytes) into say unsigned char; variable? Actually, this len member wasn't meant to represent the size of an access for one datum, but rather, for builtin functions like memset, the size of the entire memory region that was accessed. In this new version of the patch, asan_mem_ref has the access size, and not the length. The latter is passed as a free form variable to the functions that need it. > > > +struct __attribute__((visibility ("hidden"))) mem_ref_hasher > > + : typed_delete_remove > > Likewise. Done. > > > +static hash_table & > > Not sure about agreed formatting for references, but grep '>&' doesn't > show anything, while '> &' shows some hits. This is now gone. > > +get_mem_ref_hash_table () > > +{ > > + static hash_table ht; > > + > > + if (!ht.is_created ()) > > + ht.create (10); > > + > > + return ht; > > The above is all indented too much. Fixed. > > > + hash_table ht = get_mem_ref_hash_table (); > > + mem_ref **slot = ht.find_slot (&ref, INSERT); > > + gcc_assert (*slot == NULL); > > + *slot = new mem_ref (ref); > > Wouldn't it be better to use pool_alloc/pool_free here instead of > new/delete? Done. > > > case BUILT_IN_STRLEN: > > - return instrument_strlen_call (iter); > > + source0 = gimple_call_arg (call, 0); > > Reminds me that we should replace all uses of is_gimple_builtin_call > in asan.c/tsan.c with gimple_call_builtin_p (stmt, BUILT_IN_NORMAL), > and probably nuke is_gimple_builtin_call, Done. For tsan.c, I propose to post a follow-up patch later. > otherwise no verification > whether K&R unprototyped size_t strlen (); etc. aren't used > say with x = strlen (); . Right now gimple_call_arg here is unsafe, > we haven't checked, whether it has at least one argument. But if we > ignore calls which don't match the builtin prototype, we'd be safe. I see. Thanks. > > +static int > > +test1 () > > +{ > > + /*__builtin___asan_report_store1 called 1 time here to instrument > > + the initialization. */ > > + char foo[4] = {1}; > > + > > + /*__builtin___asan_report_store1 called 2 times here to instrument > > + the store to the memory region of tab. */ > > + __builtin_memset (tab, 3, sizeof (tab)); > > + > > + /* There is no instrumentation for the two memset calls below. */ > > + __builtin_memset (tab, 4, sizeof (tab)); > > + __builtin_memset (tab, 5, sizeof (tab)); > > If you don't instrument the above two, it shows something bad. Why? the size of the memory region that is accessed doesn't change. It's still sizeof (tab), so it's redundant with the first access via __builtin_memset (tab, 3, sizeof (tab)). What am I missing? > I'd say for calls which test two bytes (first and last) you actually > should enter into the hash table two records, one that &tab[0] with size 1 > byte has been instrumented, another one that &tab[3] (resp. 4, resp. 5) > with size 1 byte has been instrumented, and e.g. if the first access has > been instrumented already, but second access hasn't (or vice versa), just > emit instrumentation for the one which is missing. Yeah, I thought about this at first hand but then I settled for something simple to begin with. This updated patch does what you are suggesting now. > For future improvements (but 4.9 material likely, after we lower > each instrumentation just to a single builtin first), different access > sizes and/or different is_store should just mean we (at least in the same bb > and with no intervening calls that could never return) could upgrade the > the previous instrumentation to cover a wider access size, or load into > store (or just ignore loads vs. stores?). Yeah, following the discussion we had in the audit trail of Bug sanitizer/55309, this patch now ignores the difference between load/store as what asan@llvm does. I haven't done widening of access, though. But I keep the idea for 4.9, thanks. > > > +/* > > I wouldn't start the comment on different line. > > > + { dg-final { scan-tree-dump-times "__builtin___asan_report_store1" 7 "instrumented stores"} } > Fixed. > Space between " and }, only one space between } }. Done. > > + > > + { dg-final { scan-tree-dump-times "__builtin___asan_report_load" 4 "instrumented loads"} } > > and just close */ at the end of the above line (or use /* */ on each line > separately. Done. > On Tue, Jan 29, 2013 at 04:00:22PM +0100, Jakub Jelinek wrote: > > @@ -1464,23 +1726,39 @@ transform_statements (void) > > > > FOR_EACH_BB (bb) > > { > > + get_mem_ref_hash_table ().dispose (); > > + > > if (bb->index >= saved_last_basic_block) continue; > > for (i = gsi_start_bb (bb); !gsi_end_p (i);) > > > + if (is_gimple_call (s)) > > + get_mem_ref_hash_table ().dispose (); > > + > > Two more things. IMHO the hash table should be a global var, > not a static inside get_mem_ref_hash_table (), because otherwise > all these 3 calls just create an empty hash table and immediately > destroy it. And, at BB boundaries and for calls, you shouldn't > use .dispose (), but .empty () method. > So my preference would be > if (asan_mem_ref_ht.is_created ()) > asan_mem_ref_ht.empty (); > (for the two above occurrences, resp. .dispose () for the one below). Done. Bootstrapped and tested against trunk on x86-64-unknown-linux-gnu. gcc/ * Makefile.in (asan.o): Add new dependency on hash-table.h * asan.c (struct asan_mem_ref, struct mem_ref_hasher): New types. (asan_mem_ref_init, asan_mem_ref_get_end, get_mem_ref_hash_table) (has_stmt_been_instrumented_p, empty_mem_ref_hash_table) (free_mem_ref_resources, has_mem_ref_been_instrumented) (has_stmt_been_instrumented_p, update_mem_ref_hash_table) (get_mem_ref_of_assignment): New functions. (get_mem_refs_of_builtin_call): Extract from instrument_builtin_call and tweak a little bit to make it fit with the new signature. (instrument_builtin_call): Use the new get_mem_refs_of_builtin_call. Use gimple_call_builtin_p instead of is_gimple_builtin_call. (instrument_derefs, instrument_mem_region_access): Insert the instrumented memory reference into the hash table. (maybe_instrument_assignment): Renamed instrument_assignment into this, and change it to advance the iterator when instrumentation actually happened and return true in that case. This makes it homogeneous with maybe_instrument_assignment, and thus give a chance to callers to be more 'regular'. (transform_statements): Clear the memory reference hash table whenever we enter a new BB, when we cross a function call, or when we are done transforming statements. Use maybe_instrument_assignment instead of instrumentation. No more need to special case maybe_instrument_assignment and advance the iterator after calling it; it's now handled just like maybe_instrument_call. Update comment. gcc/testsuite/ * c-c++-common/asan/no-redundant-instrumentation-1.c: New test. * testsuite/c-c++-common/asan/no-redundant-instrumentation-2.c: Likewise. * testsuite/c-c++-common/asan/no-redundant-instrumentation-3.c: Likewise. * testsuite/c-c++-common/asan/inc.c: Likewise. --- gcc/Makefile.in | 3 +- gcc/asan.c | 1258 +++++++++++++------- gcc/testsuite/c-c++-common/asan/inc.c | 20 + .../asan/no-redundant-instrumentation-1.c | 67 ++ .../asan/no-redundant-instrumentation-2.c | 26 + .../asan/no-redundant-instrumentation-3.c | 18 + 6 files changed, 982 insertions(+), 410 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/asan/inc.c create mode 100644 gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-1.c create mode 100644 gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-2.c create mode 100644 gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-3.c diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 375d5f5..f3bb168 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2226,7 +2226,8 @@ stor-layout.o : stor-layout.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ asan.o : asan.c asan.h $(CONFIG_H) $(SYSTEM_H) $(GIMPLE_H) \ output.h coretypes.h $(GIMPLE_PRETTY_PRINT_H) \ tree-iterator.h $(TREE_FLOW_H) $(TREE_PASS_H) \ - $(TARGET_H) $(EXPR_H) $(OPTABS_H) $(TM_P_H) langhooks.h + $(TARGET_H) $(EXPR_H) $(OPTABS_H) $(TM_P_H) langhooks.h \ + $(HASH_TABLE_H) alloc-pool.h tsan.o : $(CONFIG_H) $(SYSTEM_H) $(TREE_H) $(TREE_INLINE_H) \ $(GIMPLE_H) $(DIAGNOSTIC_H) langhooks.h \ $(TM_H) coretypes.h $(TREE_DUMP_H) $(TREE_PASS_H) $(CGRAPH_H) $(GGC_H) \ diff --git a/gcc/asan.c b/gcc/asan.c index f05e36c..3cb9511 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -34,6 +34,8 @@ along with GCC; see the file COPYING3. If not see #include "output.h" #include "tm_p.h" #include "langhooks.h" +#include "hash-table.h" +#include "alloc-pool.h" /* AddressSanitizer finds out-of-bounds and use-after-free bugs with <2x slowdown on average. @@ -212,6 +214,620 @@ alias_set_type asan_shadow_set = -1; alias set is used for all shadow memory accesses. */ static GTY(()) tree shadow_ptr_types[2]; +/* Hashtable support for memory references used by gimple + statements. */ + +/* This type represents a reference to a memory region. */ +struct asan_mem_ref +{ + /* The expression of the begining of the memory region. */ + tree start; + + /* The size of the access (can be 1, 2, 4, 8, 16 for now). */ + char access_size; +}; + +static alloc_pool asan_mem_ref_alloc_pool; + +/* This creates the alloc pool used to store the instances of + asan_mem_ref that are stored in the hash table asan_mem_ref_ht. */ + +static alloc_pool +asan_mem_ref_get_alloc_pool () +{ + if (asan_mem_ref_alloc_pool == NULL) + asan_mem_ref_alloc_pool = create_alloc_pool ("asan_mem_ref", + sizeof (asan_mem_ref), + 10); + return asan_mem_ref_alloc_pool; + +} + +/* Initializes an instance of asan_mem_ref. */ + +static void +asan_mem_ref_init (asan_mem_ref *ref, tree start, char access_size) +{ + ref->start = start; + ref->access_size = access_size; +} + +/* Allocates memory for an instance of asan_mem_ref into the memory + pool returned by asan_mem_ref_get_alloc_pool and initialize it. + START is the address of (or the expression pointing to) the + beginning of memory reference. ACCESS_SIZE is the size of the + access to the referenced memory. */ + +static asan_mem_ref* +asan_mem_ref_new (tree start, char access_size) +{ + asan_mem_ref *ref = + (asan_mem_ref *) pool_alloc (asan_mem_ref_get_alloc_pool ()); + + asan_mem_ref_init (ref, start, access_size); + return ref; +} + +/* This builds and returns a pointer to the end of the memory region + that starts at START and of length LEN. */ + +tree +asan_mem_ref_get_end (tree start, tree len) +{ + if (len == NULL_TREE || integer_zerop (len)) + return start; + + return fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (start), start, len); +} + +/* Return a tree expression that represents the end of the referenced + memory region. Beware that this function can actually build a new + tree expression. */ + +tree +asan_mem_ref_get_end (const asan_mem_ref *ref, tree len) +{ + return asan_mem_ref_get_end (ref->start, len); +} + +struct asan_mem_ref_hasher + : typed_noop_remove +{ + typedef asan_mem_ref value_type; + typedef asan_mem_ref compare_type; + + static inline hashval_t hash (const value_type *); + static inline bool equal (const value_type *, const compare_type *); +}; + +/* Hash a memory reference. */ + +inline hashval_t +asan_mem_ref_hasher::hash (const asan_mem_ref *mem_ref) +{ + hashval_t h = iterative_hash_expr (mem_ref->start, 0); + h = iterative_hash_hashval_t (h, mem_ref->access_size); + return h; +} + +/* Compare two memory references. We accept the length of either + memory references to be NULL_TREE. */ + +inline bool +asan_mem_ref_hasher::equal (const asan_mem_ref *m1, + const asan_mem_ref *m2) +{ + return (m1->access_size == m2->access_size + && operand_equal_p (m1->start, m2->start, 0)); +} + +static hash_table asan_mem_ref_ht; + +/* Returns a reference to the hash table containing memory references. + This function ensures that the hash table is created. Note that + this hash table is updated by the function + update_mem_ref_hash_table. */ + +static hash_table & +get_mem_ref_hash_table () +{ + if (!asan_mem_ref_ht.is_created ()) + asan_mem_ref_ht.create (10); + + return asan_mem_ref_ht; +} + +/* Clear all entries from the memory references hash table. */ + +static void +empty_mem_ref_hash_table () +{ + if (asan_mem_ref_ht.is_created ()) + asan_mem_ref_ht.empty (); +} + +/* Free the memory references hash table. */ + +static void +free_mem_ref_resources () +{ + if (asan_mem_ref_ht.is_created ()) + asan_mem_ref_ht.dispose (); + + if (asan_mem_ref_alloc_pool) + { + free_alloc_pool (asan_mem_ref_alloc_pool); + asan_mem_ref_alloc_pool = NULL; + } +} + +/* Return true iff the memory reference REF has been instrumented. */ + +static bool +has_mem_ref_been_instrumented (tree ref, char access_size) +{ + asan_mem_ref r; + asan_mem_ref_init (&r, ref, access_size); + + return (get_mem_ref_hash_table ().find (&r) != NULL); +} + +/* Return true iff the memory reference REF has been instrumented. */ + +static bool +has_mem_ref_been_instrumented (const asan_mem_ref *ref) +{ + return has_mem_ref_been_instrumented (ref->start, ref->access_size); +} + +/* Return true iff access to memory region starting at REF and of + length LEN has been instrumented. */ + +static bool +has_mem_ref_been_instrumented (const asan_mem_ref *ref, tree len) +{ + /* First let's see if the address of the beginning of REF has been + instrumented. */ + if (!has_mem_ref_been_instrumented (ref)) + return false; + + if (len != 0) + { + /* Let's see if the end of the region has been instrumented. */ + if (!has_mem_ref_been_instrumented (asan_mem_ref_get_end (ref, len), + ref->access_size)) + return false; + } + return true; +} + +/* Set REF to the memory reference present in a gimple assignment + ASSIGNMENT. Return true upon successful completion, false + otherwise. */ + +static bool +get_mem_ref_of_assignment (const gimple assignment, + asan_mem_ref *ref, + bool *ref_is_store) +{ + gcc_assert (gimple_assign_single_p (assignment)); + + if (gimple_store_p (assignment)) + { + ref->start = gimple_assign_lhs (assignment); + *ref_is_store = true; + } + else if (gimple_assign_load_p (assignment)) + { + ref->start = gimple_assign_rhs1 (assignment); + *ref_is_store = false; + } + else + return false; + + ref->access_size = int_size_in_bytes (TREE_TYPE (ref->start)); + return true; +} + +/* Return the memory references contained in a gimple statement + representing a builtin call that has to do with memory access. */ + +static bool +get_mem_refs_of_builtin_call (const gimple call, + asan_mem_ref *src0, + tree *src0_len, + bool *src0_is_store, + asan_mem_ref *src1, + tree *src1_len, + bool *src1_is_store, + asan_mem_ref *dst, + tree *dst_len, + bool *dst_is_store, + bool *dest_is_deref) +{ + gcc_checking_assert (gimple_call_builtin_p (call, BUILT_IN_NORMAL)); + + tree callee = gimple_call_fndecl (call); + tree source0 = NULL_TREE, source1 = NULL_TREE, + dest = NULL_TREE, len = NULL_TREE; + bool is_store = true, got_reference_p = false; + char access_size = 1; + + switch (DECL_FUNCTION_CODE (callee)) + { + /* (s, s, n) style memops. */ + case BUILT_IN_BCMP: + case BUILT_IN_MEMCMP: + source0 = gimple_call_arg (call, 0); + source1 = gimple_call_arg (call, 1); + len = gimple_call_arg (call, 2); + break; + + /* (src, dest, n) style memops. */ + case BUILT_IN_BCOPY: + source0 = gimple_call_arg (call, 0); + dest = gimple_call_arg (call, 1); + len = gimple_call_arg (call, 2); + break; + + /* (dest, src, n) style memops. */ + case BUILT_IN_MEMCPY: + case BUILT_IN_MEMCPY_CHK: + case BUILT_IN_MEMMOVE: + case BUILT_IN_MEMMOVE_CHK: + case BUILT_IN_MEMPCPY: + case BUILT_IN_MEMPCPY_CHK: + dest = gimple_call_arg (call, 0); + source0 = gimple_call_arg (call, 1); + len = gimple_call_arg (call, 2); + break; + + /* (dest, n) style memops. */ + case BUILT_IN_BZERO: + dest = gimple_call_arg (call, 0); + len = gimple_call_arg (call, 1); + break; + + /* (dest, x, n) style memops*/ + case BUILT_IN_MEMSET: + case BUILT_IN_MEMSET_CHK: + dest = gimple_call_arg (call, 0); + len = gimple_call_arg (call, 2); + break; + + case BUILT_IN_STRLEN: + source0 = gimple_call_arg (call, 0); + len = gimple_call_lhs (call); + break ; + + /* And now the __atomic* and __sync builtins. + These are handled differently from the classical memory memory + access builtins above. */ + + case BUILT_IN_ATOMIC_LOAD_1: + case BUILT_IN_ATOMIC_LOAD_2: + case BUILT_IN_ATOMIC_LOAD_4: + case BUILT_IN_ATOMIC_LOAD_8: + case BUILT_IN_ATOMIC_LOAD_16: + is_store = false; + /* fall through. */ + + case BUILT_IN_SYNC_FETCH_AND_ADD_1: + case BUILT_IN_SYNC_FETCH_AND_ADD_2: + case BUILT_IN_SYNC_FETCH_AND_ADD_4: + case BUILT_IN_SYNC_FETCH_AND_ADD_8: + case BUILT_IN_SYNC_FETCH_AND_ADD_16: + + case BUILT_IN_SYNC_FETCH_AND_SUB_1: + case BUILT_IN_SYNC_FETCH_AND_SUB_2: + case BUILT_IN_SYNC_FETCH_AND_SUB_4: + case BUILT_IN_SYNC_FETCH_AND_SUB_8: + case BUILT_IN_SYNC_FETCH_AND_SUB_16: + + case BUILT_IN_SYNC_FETCH_AND_OR_1: + case BUILT_IN_SYNC_FETCH_AND_OR_2: + case BUILT_IN_SYNC_FETCH_AND_OR_4: + case BUILT_IN_SYNC_FETCH_AND_OR_8: + case BUILT_IN_SYNC_FETCH_AND_OR_16: + + case BUILT_IN_SYNC_FETCH_AND_AND_1: + case BUILT_IN_SYNC_FETCH_AND_AND_2: + case BUILT_IN_SYNC_FETCH_AND_AND_4: + case BUILT_IN_SYNC_FETCH_AND_AND_8: + case BUILT_IN_SYNC_FETCH_AND_AND_16: + + case BUILT_IN_SYNC_FETCH_AND_XOR_1: + case BUILT_IN_SYNC_FETCH_AND_XOR_2: + case BUILT_IN_SYNC_FETCH_AND_XOR_4: + case BUILT_IN_SYNC_FETCH_AND_XOR_8: + case BUILT_IN_SYNC_FETCH_AND_XOR_16: + + case BUILT_IN_SYNC_FETCH_AND_NAND_1: + case BUILT_IN_SYNC_FETCH_AND_NAND_2: + case BUILT_IN_SYNC_FETCH_AND_NAND_4: + case BUILT_IN_SYNC_FETCH_AND_NAND_8: + + case BUILT_IN_SYNC_ADD_AND_FETCH_1: + case BUILT_IN_SYNC_ADD_AND_FETCH_2: + case BUILT_IN_SYNC_ADD_AND_FETCH_4: + case BUILT_IN_SYNC_ADD_AND_FETCH_8: + case BUILT_IN_SYNC_ADD_AND_FETCH_16: + + case BUILT_IN_SYNC_SUB_AND_FETCH_1: + case BUILT_IN_SYNC_SUB_AND_FETCH_2: + case BUILT_IN_SYNC_SUB_AND_FETCH_4: + case BUILT_IN_SYNC_SUB_AND_FETCH_8: + case BUILT_IN_SYNC_SUB_AND_FETCH_16: + + case BUILT_IN_SYNC_OR_AND_FETCH_1: + case BUILT_IN_SYNC_OR_AND_FETCH_2: + case BUILT_IN_SYNC_OR_AND_FETCH_4: + case BUILT_IN_SYNC_OR_AND_FETCH_8: + case BUILT_IN_SYNC_OR_AND_FETCH_16: + + case BUILT_IN_SYNC_AND_AND_FETCH_1: + case BUILT_IN_SYNC_AND_AND_FETCH_2: + case BUILT_IN_SYNC_AND_AND_FETCH_4: + case BUILT_IN_SYNC_AND_AND_FETCH_8: + case BUILT_IN_SYNC_AND_AND_FETCH_16: + + case BUILT_IN_SYNC_XOR_AND_FETCH_1: + case BUILT_IN_SYNC_XOR_AND_FETCH_2: + case BUILT_IN_SYNC_XOR_AND_FETCH_4: + case BUILT_IN_SYNC_XOR_AND_FETCH_8: + case BUILT_IN_SYNC_XOR_AND_FETCH_16: + + case BUILT_IN_SYNC_NAND_AND_FETCH_1: + case BUILT_IN_SYNC_NAND_AND_FETCH_2: + case BUILT_IN_SYNC_NAND_AND_FETCH_4: + case BUILT_IN_SYNC_NAND_AND_FETCH_8: + + case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_1: + case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_2: + case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_4: + case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_8: + case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_16: + + case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_1: + case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_2: + case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_4: + case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_8: + case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_16: + + case BUILT_IN_SYNC_LOCK_TEST_AND_SET_1: + case BUILT_IN_SYNC_LOCK_TEST_AND_SET_2: + case BUILT_IN_SYNC_LOCK_TEST_AND_SET_4: + case BUILT_IN_SYNC_LOCK_TEST_AND_SET_8: + case BUILT_IN_SYNC_LOCK_TEST_AND_SET_16: + + case BUILT_IN_SYNC_LOCK_RELEASE_1: + case BUILT_IN_SYNC_LOCK_RELEASE_2: + case BUILT_IN_SYNC_LOCK_RELEASE_4: + case BUILT_IN_SYNC_LOCK_RELEASE_8: + case BUILT_IN_SYNC_LOCK_RELEASE_16: + + case BUILT_IN_ATOMIC_EXCHANGE_1: + case BUILT_IN_ATOMIC_EXCHANGE_2: + case BUILT_IN_ATOMIC_EXCHANGE_4: + case BUILT_IN_ATOMIC_EXCHANGE_8: + case BUILT_IN_ATOMIC_EXCHANGE_16: + + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_1: + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_2: + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_4: + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_8: + case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_16: + + case BUILT_IN_ATOMIC_STORE_1: + case BUILT_IN_ATOMIC_STORE_2: + case BUILT_IN_ATOMIC_STORE_4: + case BUILT_IN_ATOMIC_STORE_8: + case BUILT_IN_ATOMIC_STORE_16: + + case BUILT_IN_ATOMIC_ADD_FETCH_1: + case BUILT_IN_ATOMIC_ADD_FETCH_2: + case BUILT_IN_ATOMIC_ADD_FETCH_4: + case BUILT_IN_ATOMIC_ADD_FETCH_8: + case BUILT_IN_ATOMIC_ADD_FETCH_16: + + case BUILT_IN_ATOMIC_SUB_FETCH_1: + case BUILT_IN_ATOMIC_SUB_FETCH_2: + case BUILT_IN_ATOMIC_SUB_FETCH_4: + case BUILT_IN_ATOMIC_SUB_FETCH_8: + case BUILT_IN_ATOMIC_SUB_FETCH_16: + + case BUILT_IN_ATOMIC_AND_FETCH_1: + case BUILT_IN_ATOMIC_AND_FETCH_2: + case BUILT_IN_ATOMIC_AND_FETCH_4: + case BUILT_IN_ATOMIC_AND_FETCH_8: + case BUILT_IN_ATOMIC_AND_FETCH_16: + + case BUILT_IN_ATOMIC_NAND_FETCH_1: + case BUILT_IN_ATOMIC_NAND_FETCH_2: + case BUILT_IN_ATOMIC_NAND_FETCH_4: + case BUILT_IN_ATOMIC_NAND_FETCH_8: + case BUILT_IN_ATOMIC_NAND_FETCH_16: + + case BUILT_IN_ATOMIC_XOR_FETCH_1: + case BUILT_IN_ATOMIC_XOR_FETCH_2: + case BUILT_IN_ATOMIC_XOR_FETCH_4: + case BUILT_IN_ATOMIC_XOR_FETCH_8: + case BUILT_IN_ATOMIC_XOR_FETCH_16: + + case BUILT_IN_ATOMIC_OR_FETCH_1: + case BUILT_IN_ATOMIC_OR_FETCH_2: + case BUILT_IN_ATOMIC_OR_FETCH_4: + case BUILT_IN_ATOMIC_OR_FETCH_8: + case BUILT_IN_ATOMIC_OR_FETCH_16: + + case BUILT_IN_ATOMIC_FETCH_ADD_1: + case BUILT_IN_ATOMIC_FETCH_ADD_2: + case BUILT_IN_ATOMIC_FETCH_ADD_4: + case BUILT_IN_ATOMIC_FETCH_ADD_8: + case BUILT_IN_ATOMIC_FETCH_ADD_16: + + case BUILT_IN_ATOMIC_FETCH_SUB_1: + case BUILT_IN_ATOMIC_FETCH_SUB_2: + case BUILT_IN_ATOMIC_FETCH_SUB_4: + case BUILT_IN_ATOMIC_FETCH_SUB_8: + case BUILT_IN_ATOMIC_FETCH_SUB_16: + + case BUILT_IN_ATOMIC_FETCH_AND_1: + case BUILT_IN_ATOMIC_FETCH_AND_2: + case BUILT_IN_ATOMIC_FETCH_AND_4: + case BUILT_IN_ATOMIC_FETCH_AND_8: + case BUILT_IN_ATOMIC_FETCH_AND_16: + + case BUILT_IN_ATOMIC_FETCH_NAND_1: + case BUILT_IN_ATOMIC_FETCH_NAND_2: + case BUILT_IN_ATOMIC_FETCH_NAND_4: + case BUILT_IN_ATOMIC_FETCH_NAND_8: + case BUILT_IN_ATOMIC_FETCH_NAND_16: + + case BUILT_IN_ATOMIC_FETCH_XOR_1: + case BUILT_IN_ATOMIC_FETCH_XOR_2: + case BUILT_IN_ATOMIC_FETCH_XOR_4: + case BUILT_IN_ATOMIC_FETCH_XOR_8: + case BUILT_IN_ATOMIC_FETCH_XOR_16: + + case BUILT_IN_ATOMIC_FETCH_OR_1: + case BUILT_IN_ATOMIC_FETCH_OR_2: + case BUILT_IN_ATOMIC_FETCH_OR_4: + case BUILT_IN_ATOMIC_FETCH_OR_8: + case BUILT_IN_ATOMIC_FETCH_OR_16: + { + dest = gimple_call_arg (call, 0); + /* DEST represents the address of a memory location. + instrument_derefs wants the memory location, so lets + dereference the address DEST before handing it to + instrument_derefs. */ + if (TREE_CODE (dest) == ADDR_EXPR) + dest = TREE_OPERAND (dest, 0); + else if (TREE_CODE (dest) == SSA_NAME) + dest = build2 (MEM_REF, TREE_TYPE (TREE_TYPE (dest)), + dest, build_int_cst (TREE_TYPE (dest), 0)); + else + gcc_unreachable (); + + access_size = int_size_in_bytes (TREE_TYPE (dest)); + } + + default: + /* The other builtins memory access are not instrumented in this + function because they either don't have any length parameter, + or their length parameter is just a limit. */ + break; + } + + if (len != NULL_TREE) + { + if (source0 != NULL_TREE) + { + src0->start = source0; + src0->access_size = access_size; + *src0_len = len; + *src0_is_store = false; + } + + if (source1 != NULL_TREE) + { + src1->start = source1; + src1->access_size = access_size; + *src1_len = len; + *src1_is_store = false; + } + + if (dest != NULL_TREE) + { + dst->start = dest; + dst->access_size = access_size; + *dst_len = len; + *dst_is_store = true; + } + + got_reference_p = true; + } + else + { + if (dest) + { + dst->start = dest; + dst->access_size = access_size; + *dst_len = NULL_TREE; + *dst_is_store = is_store; + *dest_is_deref = true; + got_reference_p = true; + } + } + + return got_reference_p; +} + +/* Return true iff a given gimple statement has been instrumented. + Note that the statement is "defined" by the memory references it + contains. */ + +static bool +has_stmt_been_instrumented_p (gimple stmt) +{ + if (gimple_assign_single_p (stmt)) + { + bool r_is_store; + asan_mem_ref r; + asan_mem_ref_init (&r, NULL, 1); + + if (get_mem_ref_of_assignment (stmt, &r, &r_is_store)) + return has_mem_ref_been_instrumented (&r); + } + else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) + { + asan_mem_ref src0, src1, dest; + asan_mem_ref_init (&src0, NULL, 1); + asan_mem_ref_init (&src1, NULL, 1); + asan_mem_ref_init (&dest, NULL, 1); + + tree src0_len = NULL_TREE, src1_len = NULL_TREE, dest_len = NULL_TREE; + bool src0_is_store = false, src1_is_store = false, + dest_is_store = false, dest_is_deref = false; + if (get_mem_refs_of_builtin_call (stmt, + &src0, &src0_len, &src0_is_store, + &src1, &src1_len, &src1_is_store, + &dest, &dest_len, &dest_is_store, + &dest_is_deref)) + { + if (src0.start != NULL_TREE + && !has_mem_ref_been_instrumented (&src0, src0_len)) + return false; + + if (src1.start != NULL_TREE + && !has_mem_ref_been_instrumented (&src1, src1_len)) + return false; + + if (dest.start != NULL_TREE + && !has_mem_ref_been_instrumented (&dest, dest_len)) + return false; + + return true; + } + } + return false; +} + +/* Insert a memory reference into the hash table. */ + +static void +update_mem_ref_hash_table (tree ref, char access_size) +{ + hash_table ht = get_mem_ref_hash_table (); + + asan_mem_ref r; + asan_mem_ref_init (&r, ref, access_size); + + asan_mem_ref **slot = ht.find_slot (&r, INSERT); + if (*slot == NULL) + *slot = asan_mem_ref_new (ref, access_size); +} + /* Initialize shadow_ptr_types array. */ static void @@ -835,7 +1451,7 @@ build_check_stmt (location_t location, tree base, gimple_stmt_iterator *iter, static void instrument_derefs (gimple_stmt_iterator *iter, tree t, - location_t location, bool is_store) + location_t location, bool is_store) { tree type, base; HOST_WIDE_INT size_in_bytes; @@ -878,8 +1494,14 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t, } base = build_fold_addr_expr (t); - build_check_stmt (location, base, iter, /*before_p=*/true, - is_store, size_in_bytes); + if (!has_mem_ref_been_instrumented (base, size_in_bytes)) + { + build_check_stmt (location, base, iter, /*before_p=*/true, + is_store, size_in_bytes); + update_mem_ref_hash_table (base, size_in_bytes); + update_mem_ref_hash_table (t, size_in_bytes); + } + } /* Instrument an access to a contiguous memory region that starts at @@ -903,6 +1525,12 @@ instrument_mem_region_access (tree base, tree len, gimple_stmt_iterator gsi = *iter; basic_block fallthrough_bb = NULL, then_bb = NULL; + + /* If the beginning of the memory region has already been + instrumented, do not instrument it. */ + if (has_mem_ref_been_instrumented (base, 1)) + goto after_first_instrumentation; + if (!is_gimple_constant (len)) { /* So, the length of the memory area to asan-protect is @@ -945,9 +1573,19 @@ instrument_mem_region_access (tree base, tree len, else *iter = gsi; + update_mem_ref_hash_table (base, 1); + + after_first_instrumentation: + /* We want to instrument the access at the end of the memory region, which is at (base + len - 1). */ + /* If the end of the memory region has already been instrumented, do + not instrument it. */ + tree end = asan_mem_ref_get_end (base, len); + if (has_mem_ref_been_instrumented (end, 1)) + return; + /* offset = len - 1; */ len = unshare_expr (len); tree offset; @@ -982,434 +1620,221 @@ instrument_mem_region_access (tree base, tree len, g = gimple_build_assign_with_ops (MINUS_EXPR, t, len, build_int_cst (size_type_node, 1)); gimple_set_location (g, location); - gimple_seq_add_stmt_without_update (&seq, g); - offset = gimple_assign_lhs (g); - } - - /* _1 = base; */ - base = unshare_expr (base); - gimple region_end = - gimple_build_assign_with_ops (TREE_CODE (base), - make_ssa_name (TREE_TYPE (base), NULL), - base, NULL); - gimple_set_location (region_end, location); - gimple_seq_add_stmt_without_update (&seq, region_end); - gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); - gsi_prev (&gsi); - - /* _2 = _1 + offset; */ - region_end = - gimple_build_assign_with_ops (POINTER_PLUS_EXPR, - make_ssa_name (TREE_TYPE (base), NULL), - gimple_assign_lhs (region_end), - offset); - gimple_set_location (region_end, location); - gsi_insert_after (&gsi, region_end, GSI_NEW_STMT); - - /* instrument access at _2; */ - build_check_stmt (location, gimple_assign_lhs (region_end), - &gsi, /*before_p=*/false, is_store, 1); -} - -/* Instrument the call (to the builtin strlen function) pointed to by - ITER. - - This function instruments the access to the first byte of the - argument, right before the call. After the call it instruments the - access to the last byte of the argument; it uses the result of the - call to deduce the offset of that last byte. - - Upon completion, iff the call has actullay been instrumented, this - function returns TRUE and *ITER points to the statement logically - following the built-in strlen function call *ITER was initially - pointing to. Otherwise, the function returns FALSE and *ITER - remains unchanged. */ - -static bool -instrument_strlen_call (gimple_stmt_iterator *iter) -{ - gimple call = gsi_stmt (*iter); - gcc_assert (is_gimple_call (call)); - - tree callee = gimple_call_fndecl (call); - gcc_assert (is_builtin_fn (callee) - && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL - && DECL_FUNCTION_CODE (callee) == BUILT_IN_STRLEN); - - tree len = gimple_call_lhs (call); - if (len == NULL) - /* Some passes might clear the return value of the strlen call; - bail out in that case. Return FALSE as we are not advancing - *ITER. */ - return false; - gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (len))); - - location_t loc = gimple_location (call); - tree str_arg = gimple_call_arg (call, 0); - - /* Instrument the access to the first byte of str_arg. i.e: - - _1 = str_arg; instrument (_1); */ - gimple str_arg_ssa = - gimple_build_assign_with_ops (NOP_EXPR, - make_ssa_name (build_pointer_type - (char_type_node), NULL), - str_arg, NULL); - gimple_set_location (str_arg_ssa, loc); - gimple_stmt_iterator gsi = *iter; - gsi_insert_before (&gsi, str_arg_ssa, GSI_NEW_STMT); - build_check_stmt (loc, gimple_assign_lhs (str_arg_ssa), &gsi, - /*before_p=*/false, /*is_store=*/false, 1); - - /* If we initially had an instruction like: - - int n = strlen (str) - - we now want to instrument the access to str[n], after the - instruction above.*/ - - /* So let's build the access to str[n] that is, access through the - pointer_plus expr: (_1 + len). */ - gimple stmt = - gimple_build_assign_with_ops (POINTER_PLUS_EXPR, - make_ssa_name (TREE_TYPE (str_arg), - NULL), - gimple_assign_lhs (str_arg_ssa), - len); - gimple_set_location (stmt, loc); - gsi_insert_after (&gsi, stmt, GSI_NEW_STMT); - - build_check_stmt (loc, gimple_assign_lhs (stmt), &gsi, - /*before_p=*/false, /*is_store=*/false, 1); - - /* Ensure that iter points to the statement logically following the - one it was initially pointing to. */ - *iter = gsi; - /* As *ITER has been advanced to point to the next statement, let's - return true to inform transform_statements that it shouldn't - advance *ITER anymore; otherwises it will skip that next - statement, which wouldn't be instrumented. */ - return true; -} - -/* Instrument the call to a built-in memory access function that is - pointed to by the iterator ITER. - - Upon completion, return TRUE iff *ITER has been advanced to the - statement following the one it was originally pointing to. */ - -static bool -instrument_builtin_call (gimple_stmt_iterator *iter) -{ - gimple call = gsi_stmt (*iter); - - gcc_checking_assert (is_gimple_builtin_call (call)); - - tree callee = gimple_call_fndecl (call); - location_t loc = gimple_location (call); - tree source0 = NULL_TREE, source1 = NULL_TREE, - dest = NULL_TREE, len = NULL_TREE; - bool is_store = true; - - switch (DECL_FUNCTION_CODE (callee)) - { - /* (s, s, n) style memops. */ - case BUILT_IN_BCMP: - case BUILT_IN_MEMCMP: - source0 = gimple_call_arg (call, 0); - source1 = gimple_call_arg (call, 1); - len = gimple_call_arg (call, 2); - break; - - /* (src, dest, n) style memops. */ - case BUILT_IN_BCOPY: - source0 = gimple_call_arg (call, 0); - dest = gimple_call_arg (call, 1); - len = gimple_call_arg (call, 2); - break; - - /* (dest, src, n) style memops. */ - case BUILT_IN_MEMCPY: - case BUILT_IN_MEMCPY_CHK: - case BUILT_IN_MEMMOVE: - case BUILT_IN_MEMMOVE_CHK: - case BUILT_IN_MEMPCPY: - case BUILT_IN_MEMPCPY_CHK: - dest = gimple_call_arg (call, 0); - source0 = gimple_call_arg (call, 1); - len = gimple_call_arg (call, 2); - break; - - /* (dest, n) style memops. */ - case BUILT_IN_BZERO: - dest = gimple_call_arg (call, 0); - len = gimple_call_arg (call, 1); - break; - - /* (dest, x, n) style memops*/ - case BUILT_IN_MEMSET: - case BUILT_IN_MEMSET_CHK: - dest = gimple_call_arg (call, 0); - len = gimple_call_arg (call, 2); - break; - - case BUILT_IN_STRLEN: - return instrument_strlen_call (iter); - - /* And now the __atomic* and __sync builtins. - These are handled differently from the classical memory memory - access builtins above. */ - - case BUILT_IN_ATOMIC_LOAD_1: - case BUILT_IN_ATOMIC_LOAD_2: - case BUILT_IN_ATOMIC_LOAD_4: - case BUILT_IN_ATOMIC_LOAD_8: - case BUILT_IN_ATOMIC_LOAD_16: - is_store = false; - /* fall through. */ - - case BUILT_IN_SYNC_FETCH_AND_ADD_1: - case BUILT_IN_SYNC_FETCH_AND_ADD_2: - case BUILT_IN_SYNC_FETCH_AND_ADD_4: - case BUILT_IN_SYNC_FETCH_AND_ADD_8: - case BUILT_IN_SYNC_FETCH_AND_ADD_16: - - case BUILT_IN_SYNC_FETCH_AND_SUB_1: - case BUILT_IN_SYNC_FETCH_AND_SUB_2: - case BUILT_IN_SYNC_FETCH_AND_SUB_4: - case BUILT_IN_SYNC_FETCH_AND_SUB_8: - case BUILT_IN_SYNC_FETCH_AND_SUB_16: - - case BUILT_IN_SYNC_FETCH_AND_OR_1: - case BUILT_IN_SYNC_FETCH_AND_OR_2: - case BUILT_IN_SYNC_FETCH_AND_OR_4: - case BUILT_IN_SYNC_FETCH_AND_OR_8: - case BUILT_IN_SYNC_FETCH_AND_OR_16: - - case BUILT_IN_SYNC_FETCH_AND_AND_1: - case BUILT_IN_SYNC_FETCH_AND_AND_2: - case BUILT_IN_SYNC_FETCH_AND_AND_4: - case BUILT_IN_SYNC_FETCH_AND_AND_8: - case BUILT_IN_SYNC_FETCH_AND_AND_16: - - case BUILT_IN_SYNC_FETCH_AND_XOR_1: - case BUILT_IN_SYNC_FETCH_AND_XOR_2: - case BUILT_IN_SYNC_FETCH_AND_XOR_4: - case BUILT_IN_SYNC_FETCH_AND_XOR_8: - case BUILT_IN_SYNC_FETCH_AND_XOR_16: - - case BUILT_IN_SYNC_FETCH_AND_NAND_1: - case BUILT_IN_SYNC_FETCH_AND_NAND_2: - case BUILT_IN_SYNC_FETCH_AND_NAND_4: - case BUILT_IN_SYNC_FETCH_AND_NAND_8: - - case BUILT_IN_SYNC_ADD_AND_FETCH_1: - case BUILT_IN_SYNC_ADD_AND_FETCH_2: - case BUILT_IN_SYNC_ADD_AND_FETCH_4: - case BUILT_IN_SYNC_ADD_AND_FETCH_8: - case BUILT_IN_SYNC_ADD_AND_FETCH_16: - - case BUILT_IN_SYNC_SUB_AND_FETCH_1: - case BUILT_IN_SYNC_SUB_AND_FETCH_2: - case BUILT_IN_SYNC_SUB_AND_FETCH_4: - case BUILT_IN_SYNC_SUB_AND_FETCH_8: - case BUILT_IN_SYNC_SUB_AND_FETCH_16: - - case BUILT_IN_SYNC_OR_AND_FETCH_1: - case BUILT_IN_SYNC_OR_AND_FETCH_2: - case BUILT_IN_SYNC_OR_AND_FETCH_4: - case BUILT_IN_SYNC_OR_AND_FETCH_8: - case BUILT_IN_SYNC_OR_AND_FETCH_16: + gimple_seq_add_stmt_without_update (&seq, g); + offset = gimple_assign_lhs (g); + } - case BUILT_IN_SYNC_AND_AND_FETCH_1: - case BUILT_IN_SYNC_AND_AND_FETCH_2: - case BUILT_IN_SYNC_AND_AND_FETCH_4: - case BUILT_IN_SYNC_AND_AND_FETCH_8: - case BUILT_IN_SYNC_AND_AND_FETCH_16: + /* _1 = base; */ + base = unshare_expr (base); + gimple region_end = + gimple_build_assign_with_ops (TREE_CODE (base), + make_ssa_name (TREE_TYPE (base), NULL), + base, NULL); + gimple_set_location (region_end, location); + gimple_seq_add_stmt_without_update (&seq, region_end); + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); + gsi_prev (&gsi); - case BUILT_IN_SYNC_XOR_AND_FETCH_1: - case BUILT_IN_SYNC_XOR_AND_FETCH_2: - case BUILT_IN_SYNC_XOR_AND_FETCH_4: - case BUILT_IN_SYNC_XOR_AND_FETCH_8: - case BUILT_IN_SYNC_XOR_AND_FETCH_16: + /* _2 = _1 + offset; */ + region_end = + gimple_build_assign_with_ops (POINTER_PLUS_EXPR, + make_ssa_name (TREE_TYPE (base), NULL), + gimple_assign_lhs (region_end), + offset); + gimple_set_location (region_end, location); + gsi_insert_after (&gsi, region_end, GSI_NEW_STMT); - case BUILT_IN_SYNC_NAND_AND_FETCH_1: - case BUILT_IN_SYNC_NAND_AND_FETCH_2: - case BUILT_IN_SYNC_NAND_AND_FETCH_4: - case BUILT_IN_SYNC_NAND_AND_FETCH_8: + /* instrument access at _2; */ + build_check_stmt (location, gimple_assign_lhs (region_end), + &gsi, /*before_p=*/false, is_store, 1); - case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_1: - case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_2: - case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_4: - case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_8: - case BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP_16: + update_mem_ref_hash_table (end, 1); +} - case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_1: - case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_2: - case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_4: - case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_8: - case BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP_16: +/* Instrument the call (to the builtin strlen function) pointed to by + ITER. - case BUILT_IN_SYNC_LOCK_TEST_AND_SET_1: - case BUILT_IN_SYNC_LOCK_TEST_AND_SET_2: - case BUILT_IN_SYNC_LOCK_TEST_AND_SET_4: - case BUILT_IN_SYNC_LOCK_TEST_AND_SET_8: - case BUILT_IN_SYNC_LOCK_TEST_AND_SET_16: + This function instruments the access to the first byte of the + argument, right before the call. After the call it instruments the + access to the last byte of the argument; it uses the result of the + call to deduce the offset of that last byte. - case BUILT_IN_SYNC_LOCK_RELEASE_1: - case BUILT_IN_SYNC_LOCK_RELEASE_2: - case BUILT_IN_SYNC_LOCK_RELEASE_4: - case BUILT_IN_SYNC_LOCK_RELEASE_8: - case BUILT_IN_SYNC_LOCK_RELEASE_16: + Upon completion, iff the call has actullay been instrumented, this + function returns TRUE and *ITER points to the statement logically + following the built-in strlen function call *ITER was initially + pointing to. Otherwise, the function returns FALSE and *ITER + remains unchanged. */ - case BUILT_IN_ATOMIC_EXCHANGE_1: - case BUILT_IN_ATOMIC_EXCHANGE_2: - case BUILT_IN_ATOMIC_EXCHANGE_4: - case BUILT_IN_ATOMIC_EXCHANGE_8: - case BUILT_IN_ATOMIC_EXCHANGE_16: +static bool +instrument_strlen_call (gimple_stmt_iterator *iter) +{ + gimple call = gsi_stmt (*iter); + gcc_assert (is_gimple_call (call)); - case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_1: - case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_2: - case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_4: - case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_8: - case BUILT_IN_ATOMIC_COMPARE_EXCHANGE_16: + tree callee = gimple_call_fndecl (call); + gcc_assert (is_builtin_fn (callee) + && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL + && DECL_FUNCTION_CODE (callee) == BUILT_IN_STRLEN); - case BUILT_IN_ATOMIC_STORE_1: - case BUILT_IN_ATOMIC_STORE_2: - case BUILT_IN_ATOMIC_STORE_4: - case BUILT_IN_ATOMIC_STORE_8: - case BUILT_IN_ATOMIC_STORE_16: + tree len = gimple_call_lhs (call); + if (len == NULL) + /* Some passes might clear the return value of the strlen call; + bail out in that case. Return FALSE as we are not advancing + *ITER. */ + return false; + gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (len))); - case BUILT_IN_ATOMIC_ADD_FETCH_1: - case BUILT_IN_ATOMIC_ADD_FETCH_2: - case BUILT_IN_ATOMIC_ADD_FETCH_4: - case BUILT_IN_ATOMIC_ADD_FETCH_8: - case BUILT_IN_ATOMIC_ADD_FETCH_16: + location_t loc = gimple_location (call); + tree str_arg = gimple_call_arg (call, 0); - case BUILT_IN_ATOMIC_SUB_FETCH_1: - case BUILT_IN_ATOMIC_SUB_FETCH_2: - case BUILT_IN_ATOMIC_SUB_FETCH_4: - case BUILT_IN_ATOMIC_SUB_FETCH_8: - case BUILT_IN_ATOMIC_SUB_FETCH_16: + /* Instrument the access to the first byte of str_arg. i.e: - case BUILT_IN_ATOMIC_AND_FETCH_1: - case BUILT_IN_ATOMIC_AND_FETCH_2: - case BUILT_IN_ATOMIC_AND_FETCH_4: - case BUILT_IN_ATOMIC_AND_FETCH_8: - case BUILT_IN_ATOMIC_AND_FETCH_16: + _1 = str_arg; instrument (_1); */ + gimple str_arg_ssa = + gimple_build_assign_with_ops (NOP_EXPR, + make_ssa_name (build_pointer_type + (char_type_node), NULL), + str_arg, NULL); + gimple_set_location (str_arg_ssa, loc); + gimple_stmt_iterator gsi = *iter; + gsi_insert_before (&gsi, str_arg_ssa, GSI_NEW_STMT); + build_check_stmt (loc, gimple_assign_lhs (str_arg_ssa), &gsi, + /*before_p=*/false, /*is_store=*/false, 1); - case BUILT_IN_ATOMIC_NAND_FETCH_1: - case BUILT_IN_ATOMIC_NAND_FETCH_2: - case BUILT_IN_ATOMIC_NAND_FETCH_4: - case BUILT_IN_ATOMIC_NAND_FETCH_8: - case BUILT_IN_ATOMIC_NAND_FETCH_16: + /* If we initially had an instruction like: - case BUILT_IN_ATOMIC_XOR_FETCH_1: - case BUILT_IN_ATOMIC_XOR_FETCH_2: - case BUILT_IN_ATOMIC_XOR_FETCH_4: - case BUILT_IN_ATOMIC_XOR_FETCH_8: - case BUILT_IN_ATOMIC_XOR_FETCH_16: + int n = strlen (str) - case BUILT_IN_ATOMIC_OR_FETCH_1: - case BUILT_IN_ATOMIC_OR_FETCH_2: - case BUILT_IN_ATOMIC_OR_FETCH_4: - case BUILT_IN_ATOMIC_OR_FETCH_8: - case BUILT_IN_ATOMIC_OR_FETCH_16: + we now want to instrument the access to str[n], after the + instruction above.*/ - case BUILT_IN_ATOMIC_FETCH_ADD_1: - case BUILT_IN_ATOMIC_FETCH_ADD_2: - case BUILT_IN_ATOMIC_FETCH_ADD_4: - case BUILT_IN_ATOMIC_FETCH_ADD_8: - case BUILT_IN_ATOMIC_FETCH_ADD_16: + /* So let's build the access to str[n] that is, access through the + pointer_plus expr: (_1 + len). */ + gimple stmt = + gimple_build_assign_with_ops (POINTER_PLUS_EXPR, + make_ssa_name (TREE_TYPE (str_arg), + NULL), + gimple_assign_lhs (str_arg_ssa), + len); + gimple_set_location (stmt, loc); + gsi_insert_after (&gsi, stmt, GSI_NEW_STMT); - case BUILT_IN_ATOMIC_FETCH_SUB_1: - case BUILT_IN_ATOMIC_FETCH_SUB_2: - case BUILT_IN_ATOMIC_FETCH_SUB_4: - case BUILT_IN_ATOMIC_FETCH_SUB_8: - case BUILT_IN_ATOMIC_FETCH_SUB_16: + build_check_stmt (loc, gimple_assign_lhs (stmt), &gsi, + /*before_p=*/false, /*is_store=*/false, 1); - case BUILT_IN_ATOMIC_FETCH_AND_1: - case BUILT_IN_ATOMIC_FETCH_AND_2: - case BUILT_IN_ATOMIC_FETCH_AND_4: - case BUILT_IN_ATOMIC_FETCH_AND_8: - case BUILT_IN_ATOMIC_FETCH_AND_16: + /* Ensure that iter points to the statement logically following the + one it was initially pointing to. */ + *iter = gsi; + /* As *ITER has been advanced to point to the next statement, let's + return true to inform transform_statements that it shouldn't + advance *ITER anymore; otherwises it will skip that next + statement, which wouldn't be instrumented. */ + return true; +} - case BUILT_IN_ATOMIC_FETCH_NAND_1: - case BUILT_IN_ATOMIC_FETCH_NAND_2: - case BUILT_IN_ATOMIC_FETCH_NAND_4: - case BUILT_IN_ATOMIC_FETCH_NAND_8: - case BUILT_IN_ATOMIC_FETCH_NAND_16: +/* Instrument the call to a built-in memory access function that is + pointed to by the iterator ITER. - case BUILT_IN_ATOMIC_FETCH_XOR_1: - case BUILT_IN_ATOMIC_FETCH_XOR_2: - case BUILT_IN_ATOMIC_FETCH_XOR_4: - case BUILT_IN_ATOMIC_FETCH_XOR_8: - case BUILT_IN_ATOMIC_FETCH_XOR_16: + Upon completion, return TRUE iff *ITER has been advanced to the + statement following the one it was originally pointing to. */ - case BUILT_IN_ATOMIC_FETCH_OR_1: - case BUILT_IN_ATOMIC_FETCH_OR_2: - case BUILT_IN_ATOMIC_FETCH_OR_4: - case BUILT_IN_ATOMIC_FETCH_OR_8: - case BUILT_IN_ATOMIC_FETCH_OR_16: - { - dest = gimple_call_arg (call, 0); - /* So DEST represents the address of a memory location. - instrument_derefs wants the memory location, so lets - dereference the address DEST before handing it to - instrument_derefs. */ - if (TREE_CODE (dest) == ADDR_EXPR) - dest = TREE_OPERAND (dest, 0); - else if (TREE_CODE (dest) == SSA_NAME) - dest = build2 (MEM_REF, TREE_TYPE (TREE_TYPE (dest)), - dest, build_int_cst (TREE_TYPE (dest), 0)); - else - gcc_unreachable (); +static bool +instrument_builtin_call (gimple_stmt_iterator *iter) +{ + bool iter_advanced_p = false; + gimple call = gsi_stmt (*iter); - instrument_derefs (iter, dest, loc, is_store); - return false; - } + gcc_checking_assert (gimple_call_builtin_p (call, BUILT_IN_NORMAL)); - default: - /* The other builtins memory access are not instrumented in this - function because they either don't have any length parameter, - or their length parameter is just a limit. */ - break; - } + tree callee = gimple_call_fndecl (call); + location_t loc = gimple_location (call); - if (len != NULL_TREE) + if (DECL_FUNCTION_CODE (callee) == BUILT_IN_STRLEN) + iter_advanced_p = instrument_strlen_call (iter); + else { - if (source0 != NULL_TREE) - instrument_mem_region_access (source0, len, iter, - loc, /*is_store=*/false); - if (source1 != NULL_TREE) - instrument_mem_region_access (source1, len, iter, - loc, /*is_store=*/false); - else if (dest != NULL_TREE) - instrument_mem_region_access (dest, len, iter, - loc, /*is_store=*/true); - - *iter = gsi_for_stmt (call); - return false; + asan_mem_ref src0, src1, dest; + asan_mem_ref_init (&src0, NULL, 1); + asan_mem_ref_init (&src1, NULL, 1); + asan_mem_ref_init (&dest, NULL, 1); + + tree src0_len = NULL_TREE, src1_len = NULL_TREE, dest_len = NULL_TREE; + bool src0_is_store = false, src1_is_store = false, + dest_is_store = false, dest_is_deref = false; + + if (get_mem_refs_of_builtin_call (call, + &src0, &src0_len, &src0_is_store, + &src1, &src0_len, &src1_is_store, + &dest, &dest_len, &dest_is_store, + &dest_is_deref)) + { + if (dest_is_deref) + { + instrument_derefs (iter, dest.start, loc, dest_is_store); + gsi_next (iter); + iter_advanced_p = true; + } + else if (src0_len || src1_len || dest_len) + { + if (src0.start) + instrument_mem_region_access (src0.start, src0_len, + iter, loc, /*is_store=*/false); + if (src1.start != NULL_TREE) + instrument_mem_region_access (src1.start, src1_len, + iter, loc, /*is_store=*/false); + if (dest.start != NULL_TREE) + instrument_mem_region_access (dest.start, dest_len, + iter, loc, /*is_store=*/true); + *iter = gsi_for_stmt (call); + gsi_next (iter); + iter_advanced_p = true; + } + } } - return false; + return iter_advanced_p; } /* Instrument the assignment statement ITER if it is subject to - instrumentation. */ + instrumentation. Return TRUE iff instrumentation actually + happened. In that case, the iterator ITER is advanced to the next + logical expression following the one initially pointed to by ITER, + and the relevant memory reference that which access has been + instrumented is added to the memory references hash table. */ -static void -instrument_assignment (gimple_stmt_iterator *iter) +static bool +maybe_instrument_assignment (gimple_stmt_iterator *iter) { gimple s = gsi_stmt (*iter); gcc_assert (gimple_assign_single_p (s)); + tree ref_expr = NULL_TREE; + bool is_store, is_instrumented = false; + if (gimple_store_p (s)) - instrument_derefs (iter, gimple_assign_lhs (s), - gimple_location (s), true); + { + ref_expr = gimple_assign_lhs (s); + is_store = true; + instrument_derefs (iter, ref_expr, + gimple_location (s), + is_store); + is_instrumented = true; + } + if (gimple_assign_load_p (s)) - instrument_derefs (iter, gimple_assign_rhs1 (s), - gimple_location (s), false); + { + ref_expr = gimple_assign_rhs1 (s); + is_store = false; + instrument_derefs (iter, ref_expr, + gimple_location (s), + is_store); + is_instrumented = true; + } + + if (is_instrumented) + gsi_next (iter); + + return is_instrumented; } /* Instrument the function call pointed to by the iterator ITER, if it @@ -1424,10 +1849,11 @@ static bool maybe_instrument_call (gimple_stmt_iterator *iter) { gimple stmt = gsi_stmt (*iter); - bool is_builtin = is_gimple_builtin_call (stmt); - if (is_builtin - && instrument_builtin_call (iter)) + bool is_builtin = gimple_call_builtin_p (stmt, BUILT_IN_NORMAL); + + if (is_builtin && instrument_builtin_call (iter)) return true; + if (gimple_call_noreturn_p (stmt)) { if (is_builtin) @@ -1449,11 +1875,10 @@ maybe_instrument_call (gimple_stmt_iterator *iter) return false; } -/* asan: this looks too complex. Can this be done simpler? */ -/* Transform - 1) Memory references. - 2) BUILTIN_ALLOCA calls. -*/ +/* Walk each instruction of all basic block and instrument those that + represent memory references: loads, stores, or function calls. + In a given basic block, this function avoids instrumenting memory + references that have already been instrumented. */ static void transform_statements (void) @@ -1464,23 +1889,38 @@ transform_statements (void) FOR_EACH_BB (bb) { + empty_mem_ref_hash_table (); + if (bb->index >= saved_last_basic_block) continue; for (i = gsi_start_bb (bb); !gsi_end_p (i);) { gimple s = gsi_stmt (i); - if (gimple_assign_single_p (s)) - instrument_assignment (&i); - else if (is_gimple_call (s)) + if (has_stmt_been_instrumented_p (s)) + gsi_next (&i); + else if (gimple_assign_single_p (s) + && maybe_instrument_assignment (&i)) + /* Nothing to do as maybe_instrument_assignment advanced + the iterator I. */; + else if (is_gimple_call (s) && maybe_instrument_call (&i)) + /* Nothing to do as maybe_instrument_call + advanced the iterator I. */; + else { - if (maybe_instrument_call (&i)) - /* Avoid gsi_next (&i), because maybe_instrument_call - advanced the I iterator already. */ - continue; + /* No instrumentation happened. + + If the current instruction is a function call, let's + forget about the memory references that got + instrumented. Otherwise we might miss some + instrumentation opportunities. */ + if (is_gimple_call (s)) + empty_mem_ref_hash_table (); + + gsi_next (&i); } - gsi_next (&i); } } + free_mem_ref_resources (); } /* Build diff --git a/gcc/testsuite/c-c++-common/asan/inc.c b/gcc/testsuite/c-c++-common/asan/inc.c new file mode 100644 index 0000000..09ad176 --- /dev/null +++ b/gcc/testsuite/c-c++-common/asan/inc.c @@ -0,0 +1,20 @@ +/* { dg-options "-fdump-tree-asan0" } + { dg-do compile } */ + +void +foo(int *a) +{ + (*a)++; +} + +int +main () +{ + int a = 0; + foo (&a); + return 0; +} + +/* { dg-final { scan-tree-dump-times "__builtin___asan_report" 1 "asan0" } } */ + +/* { dg-final { scan-tree-dump "__builtin___asan_report_load4" "asan0" } } */ diff --git a/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-1.c b/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-1.c new file mode 100644 index 0000000..8c6cca4 --- /dev/null +++ b/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-1.c @@ -0,0 +1,67 @@ +/* This tests that when faced with two references to the same memory + location in the same basic block, the second reference should not + be instrumented by the Address Sanitizer + + { dg-options "-fdump-tree-asan0" } + { dg-do compile } + */ + +static char tab[4] = {0}; + +static int +test0 () +{ + /* __builtin___asan_report_store1 called 2 times for the two stores + below. */ + tab[0] = 1; + tab[1] = 2; + + /* __builtin___asan_report_load1 called 1 time for the store + below. */ + char t0 = tab[1]; + + /* This load should not be instrumented because it is to the same + memory location as above. */ + char t1 = tab[1]; + + return t0 + t1; +} + +static int +test1 () +{ + /*__builtin___asan_report_store1 called 1 time here to instrument + the initialization. */ + char foo[4] = {1}; + + /*__builtin___asan_report_store1 called 2 times here to instrument + the store to the memory region of tab. */ + __builtin_memset (tab, 3, sizeof (tab)); + + /* There is no instrumentation for the two memset calls below. */ + __builtin_memset (tab, 4, sizeof (tab)); + __builtin_memset (tab, 5, sizeof (tab)); + + /* There are 2 calls to __builtin___asan_report_store1 and 2 calls + to __builtin___asan_report_load1 to instrument the store to + (subset of) the memory region of tab. */ + __builtin_memcpy (&tab[1], foo, sizeof (tab) - 1); + + /* This should not generate a __builtin___asan_report_load1 because + the reference to tab[1] has been already instrumented above. */ + return tab[1]; + + /* So for these function, there should be 7 calls to + __builtin___asan_report_store1. */ +} + +int +main () +{ + return test0 () && test1 (); +} + +/* { dg-final { scan-tree-dump-times "__builtin___asan_report_store1" 7 "asan0" } } + + { dg-final { scan-tree-dump-times "__builtin___asan_report_load" 2 "asan0" } } + */ diff --git a/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-2.c b/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-2.c new file mode 100644 index 0000000..3d7e5df --- /dev/null +++ b/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-2.c @@ -0,0 +1,26 @@ +/* This tests that when faced with two references to the same memory + location in the same basic block, the second reference should not + be instrumented by the Address Sanitizer. But in case of access to + overlapping regions we must be precise. + + { dg-options "-fdump-tree-asan0" } + { dg-do compile } + */ + +int +main () +{ + char tab[5]; + + /* Here, we instrument the access at offset 0 and access at offset + 4. */ + __builtin_memset (tab, 1, sizeof (tab)); + /* We instrumented access at offset 0 above already, so only access + at offset 3 is instrumented. */ + __builtin_memset (tab, 1, 3); +} + +/* { dg-final { scan-tree-dump-times "__builtin___asan_report_store1" 3 "asan0" } } + + { dg-final { scan-tree-dump-times "__builtin___asan_report" 3 "asan0" } } + */ diff --git a/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-3.c b/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-3.c new file mode 100644 index 0000000..d5d0fa1 --- /dev/null +++ b/gcc/testsuite/c-c++-common/asan/no-redundant-instrumentation-3.c @@ -0,0 +1,18 @@ +/* { dg-options "-fdump-tree-asan0" } + { dg-do compile } */ + +char +foo (__INT32_TYPE__ *p) +{ + /* This generates a __builtin___asan_report_load1. */ + __INT32_TYPE__ ret = *(char *) p; + /* This generates a __builtin___asan_report_store4 depending on the. */ + *p = 26; + return ret; +} + +/* { dg-final { scan-tree-dump-times "__builtin___asan_report" 2 "asan0" } } */ + +/* { dg-final { scan-tree-dump-times "__builtin___asan_report_load1" 1 "asan0" } } */ + +/* { dg-final { scan-tree-dump-times "__builtin___asan_report_store" 1 "asan0" } } */