From patchwork Thu Jun 2 19:12:38 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aldy Hernandez X-Patchwork-Id: 98462 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id B139FB6F80 for ; Fri, 3 Jun 2011 05:13:07 +1000 (EST) Received: (qmail 22188 invoked by alias); 2 Jun 2011 19:13:05 -0000 Received: (qmail 22173 invoked by uid 22791); 2 Jun 2011 19:13:03 -0000 X-SWARE-Spam-Status: No, hits=-6.5 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_HI, SPF_HELO_PASS, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 02 Jun 2011 19:12:45 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p52JCdj0008191 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 2 Jun 2011 15:12:39 -0400 Received: from houston.quesejoda.com (vpn-237-23.phx2.redhat.com [10.3.237.23]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p52JCcwF002877; Thu, 2 Jun 2011 15:12:39 -0400 Message-ID: <4DE7E0A6.9070400@redhat.com> Date: Thu, 02 Jun 2011 14:12:38 -0500 From: Aldy Hernandez User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc15 Lightning/1.0b3pre Thunderbird/3.1.10 MIME-Version: 1.0 To: Andrew MacLeod CC: "Joseph S. Myers" , Richard Henderson , gcc-patches , Andrew MacLeod Subject: Re: __sync_swap* with acq/rel/full memory barrier semantics References: <4DDAE516.4010307@redhat.com> <4DE3F8ED.6020109@redhat.com> In-Reply-To: <4DE3F8ED.6020109@redhat.com> Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On 05/30/11 15:07, Andrew MacLeod wrote: > Aldy was just too excited about working on memory model I think :-) > > I've been looking at this, and I propose we go this way : > > http://gcc.gnu.org/wiki/Atomic/GCCMM/CodeGen Still overly excited, but now with a more thorough plan :). I'm going to concentrate on the non controversial parts (the __sync builtins), while the details are ironed out. The attached patch implements the exchange operation, with a parameter/enum for the type of memory model to use. I have chosen to call the builtins __sync_mem_BLAH to keep them all consistent. I am including documentation and a test, so folks can get an idea of where I'm headed with this. Once I take everyone's input, we can implement the rest of the builtins, and take it from there. I see no prior art in providing some sort of enum for a builtin parameter. I can proceed down this path if advisable, but an easier path is to just declare the __SYNC_MEM_* enum as preprocessor macros as I do in this patch. Suggestions welcome. How does this (lightly tested patch) look? * doc/extend.texi (__sync_mem_exchange): Document. * cppbuiltin.c (define__GNUC__): Define __SYNC_MEM*. * c-family/c-common.c (BUILT_IN_MEM_EXCHANGE_N): Add case. * optabs.c (expand_sync_mem_exchange): New. * optabs.h (enum direct_optab_index): Add DOI_sync_mem* entries. (sync_mem_exchange_*_optab): Define. * genopinit.c: Add entries for sync_mem_exchange_*. * tree.h (enum memmodel): New. * builtins.c (get_memmodel): New. (expand_builtin_mem_exchange): New. (expand_builtin_synchronize): Remove static. (expand_builtin): Add cases for BUILT_IN_MEM_EXCHANGE_*. * sync-builtins.def: Add entries for BUILT_IN_MEM_EXCHANGE_*. * builtin-types.def (BT_FN_I{1,2,4,8,16}_VPTR_I{1,2,4,8,16}_INT): New. * expr.h (expand_sync_mem_exchange): Declare. (expand_builtin_synchronize): Same. * config/i386/i386.md (UNSPECV_MEM_XCHG): New. (sync_mem_exchange_seq_cst): New pattern. Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 173831) +++ doc/extend.texi (working copy) @@ -6728,6 +6728,22 @@ This builtin is not a full barrier, but This means that all previous memory stores are globally visible, and all previous memory loads have been satisfied, but following memory reads are not prevented from being speculated to before the barrier. + +@item @var{type} __sync_mem_exchange (@var{type} *ptr, @var{type} value, int memmodel, ...) +@findex __sync_mem_exchange +This builtin implements an atomic exchange operation within the +constraints of a memory model. It writes @var{value} into +@code{*@var{ptr}}, and returns the previous contents of +@code{*@var{ptr}}. + +The valid memory model variants for this builtin are +__SYNC_MEM_RELAXED, __SYNC_MEM_SEQ_CST, __SYNC_MEM_ACQUIRE, +__SYNC_MEM_RELEASE, and __SYNC_MEM_ACQ_REL. If the variant is not +available for the given target, the compiler will fall back to the +more restrictive memory model, the sequentially consistent model (if +available). If the sequentially consistent model is not implemented +for the target, the compiler will implement the builtin with a compare +and swap loop. @end table @node Object Size Checking Index: cppbuiltin.c =================================================================== --- cppbuiltin.c (revision 173831) +++ cppbuiltin.c (working copy) @@ -66,6 +66,12 @@ define__GNUC__ (cpp_reader *pfile) cpp_define_formatted (pfile, "__GNUC_MINOR__=%d", minor); cpp_define_formatted (pfile, "__GNUC_PATCHLEVEL__=%d", patchlevel); cpp_define_formatted (pfile, "__VERSION__=\"%s\"", version_string); + cpp_define_formatted (pfile, "__SYNC_MEM_RELAXED=%d", MEMMODEL_RELAXED); + cpp_define_formatted (pfile, "__SYNC_MEM_SEQ_CST=%d", MEMMODEL_SEQ_CST); + cpp_define_formatted (pfile, "__SYNC_MEM_ACQUIRE=%d", MEMMODEL_ACQUIRE); + cpp_define_formatted (pfile, "__SYNC_MEM_RELEASE=%d", MEMMODEL_RELEASE); + cpp_define_formatted (pfile, "__SYNC_MEM_ACQ_REL=%d", MEMMODEL_ACQ_REL); + cpp_define_formatted (pfile, "__SYNC_MEM_CONSUME=%d", MEMMODEL_CONSUME); } Index: c-family/c-common.c =================================================================== --- c-family/c-common.c (revision 173831) +++ c-family/c-common.c (working copy) @@ -9035,6 +9035,7 @@ resolve_overloaded_builtin (location_t l case BUILT_IN_VAL_COMPARE_AND_SWAP_N: case BUILT_IN_LOCK_TEST_AND_SET_N: case BUILT_IN_LOCK_RELEASE_N: + case BUILT_IN_MEM_EXCHANGE_N: { int n = sync_resolve_size (function, params); tree new_function, first_param, result; Index: optabs.c =================================================================== --- optabs.c (revision 173831) +++ optabs.c (working copy) @@ -6988,6 +6988,85 @@ expand_sync_lock_test_and_set (rtx mem, return NULL_RTX; } + +/* This function expands a fine grained atomic exchange operation: + atomically store VAL in MEM and return the previous value in MEM. + + MEMMODEL is the memory model variant to use. + TARGET is an option place to stick the return value. */ + +rtx +expand_sync_mem_exchange (enum memmodel model, rtx mem, rtx val, rtx target) +{ + enum machine_mode mode = GET_MODE (mem); + enum insn_code icode; + direct_optab op; + + switch (model) + { + case MEMMODEL_RELAXED: + /* ?? Eventually we should either just emit the atomic + instruction without any barriers (and thus allow movements + and transformations), or emit a relaxed builtin. + + It is still not clear whether any transformations are + permissible on the atomics (for example, CSE might break + coherence), so we might need to emit a relaxed builtin. + + Until we figure this out, be conservative and fall + through. */ + case MEMMODEL_SEQ_CST: + op = sync_mem_exchange_seq_cst_optab; + break; + case MEMMODEL_ACQUIRE: + op = sync_mem_exchange_acq_optab; + break; + case MEMMODEL_RELEASE: + op = sync_mem_exchange_rel_optab; + break; + case MEMMODEL_ACQ_REL: + op = sync_mem_exchange_acq_rel_optab; + break; + default: + gcc_unreachable (); + } + /* If no variant is found, try the full barrier. */ + if (direct_optab_handler (op, mode) == CODE_FOR_nothing) + op = sync_mem_exchange_seq_cst_optab; + + /* If the target supports the swap directly, great. */ + icode = direct_optab_handler (op, mode); + if (icode != CODE_FOR_nothing) + { + struct expand_operand ops[3]; + + create_output_operand (&ops[0], target, mode); + create_fixed_operand (&ops[1], mem); + /* VAL may have been promoted to a wider mode. Shrink it if so. */ + create_convert_operand_to (&ops[2], val, mode, true); + if (maybe_expand_insn (icode, 3, ops)) + return ops[0].value; + } + + /* Otherwise, use a compare-and-swap loop with full barriers around + it. */ + if (direct_optab_handler (sync_compare_and_swap_optab, mode) + != CODE_FOR_nothing) + { + expand_builtin_synchronize (); + if (!target || !register_operand (target, mode)) + target = gen_reg_rtx (mode); + if (GET_MODE (val) != VOIDmode && GET_MODE (val) != mode) + val = convert_modes (mode, GET_MODE (val), val, 1); + if (expand_compare_and_swap_loop (mem, target, val, NULL_RTX)) + { + expand_builtin_synchronize (); + return target; + } + } + + return NULL_RTX; +} /* Return true if OPERAND is suitable for operand number OPNO of instruction ICODE. */ Index: optabs.h =================================================================== --- optabs.h (revision 173831) +++ optabs.h (working copy) @@ -675,6 +675,12 @@ enum direct_optab_index /* Atomic clear with release semantics. */ DOI_sync_lock_release, + /* Fine grained atomic exchange. */ + DOI_sync_mem_exchange_seq_cst, + DOI_sync_mem_exchange_acq, + DOI_sync_mem_exchange_rel, + DOI_sync_mem_exchange_acq_rel, + DOI_MAX }; @@ -722,6 +728,15 @@ typedef struct direct_optab_d *direct_op (&direct_optab_table[(int) DOI_sync_lock_test_and_set]) #define sync_lock_release_optab \ (&direct_optab_table[(int) DOI_sync_lock_release]) + +#define sync_mem_exchange_seq_cst_optab \ + (&direct_optab_table[(int) DOI_sync_mem_exchange_seq_cst]) +#define sync_mem_exchange_acq_optab \ + (&direct_optab_table[(int) DOI_sync_mem_exchange_acq]) +#define sync_mem_exchange_rel_optab \ + (&direct_optab_table[(int) DOI_sync_mem_exchange_rel]) +#define sync_mem_exchange_acq_rel_optab \ + (&direct_optab_table[(int) DOI_sync_mem_exchange_acq_rel]) /* Target-dependent globals. */ struct target_optabs { Index: genopinit.c =================================================================== --- genopinit.c (revision 173831) +++ genopinit.c (working copy) @@ -240,6 +240,10 @@ static const char * const optabs[] = "set_direct_optab_handler (sync_compare_and_swap_optab, $A, CODE_FOR_$(sync_compare_and_swap$I$a$))", "set_direct_optab_handler (sync_lock_test_and_set_optab, $A, CODE_FOR_$(sync_lock_test_and_set$I$a$))", "set_direct_optab_handler (sync_lock_release_optab, $A, CODE_FOR_$(sync_lock_release$I$a$))", + "set_direct_optab_handler (sync_mem_exchange_seq_cst_optab, $A, CODE_FOR_$(sync_mem_exchange_seq_cst$I$a$))", + "set_direct_optab_handler (sync_mem_exchange_acq_rel_optab, $A, CODE_FOR_$(sync_mem_exchange_acq_rel$I$a$))", + "set_direct_optab_handler (sync_mem_exchange_acq_optab, $A, CODE_FOR_$(sync_mem_exchange_acq$I$a$))", + "set_direct_optab_handler (sync_mem_exchange_rel_optab, $A, CODE_FOR_$(sync_mem_exchange_rel$I$a$))", "set_optab_handler (vec_set_optab, $A, CODE_FOR_$(vec_set$a$))", "set_optab_handler (vec_extract_optab, $A, CODE_FOR_$(vec_extract$a$))", "set_optab_handler (vec_extract_even_optab, $A, CODE_FOR_$(vec_extract_even$a$))", Index: tree.h =================================================================== --- tree.h (revision 173831) +++ tree.h (working copy) @@ -5840,4 +5840,16 @@ is_lang_specific (tree t) /* In gimple-low.c. */ extern bool block_may_fallthru (const_tree); +/* Memory model types for the __sync_mem* builtins. */ +enum memmodel +{ + MEMMODEL_RELAXED = 0, + MEMMODEL_SEQ_CST = 1, + MEMMODEL_ACQUIRE = 2, + MEMMODEL_RELEASE = 3, + MEMMODEL_ACQ_REL = 4, + MEMMODEL_CONSUME = 5, + MEMMODEL_LAST = 6 +}; + #endif /* GCC_TREE_H */ Index: builtins.c =================================================================== --- builtins.c (revision 173831) +++ builtins.c (working copy) @@ -5682,9 +5682,69 @@ expand_builtin_lock_test_and_set (enum m return expand_sync_lock_test_and_set (mem, val, target); } +/* Given an integer representing an ``enum memmodel'', verify its + correctness and return the memory model enum. */ + +static enum memmodel +get_memmodel (tree exp) +{ + rtx op; + + if (TREE_CODE (exp) != INTEGER_CST) + { + error ("third argument to builtin is an invalid memory model"); + return MEMMODEL_SEQ_CST; + } + op = expand_normal (exp); + if (INTVAL (op) < 0 || INTVAL (op) >= MEMMODEL_LAST) + { + error ("third argument to builtin is an invalid memory model"); + return MEMMODEL_SEQ_CST; + } + return (enum memmodel) INTVAL (op); +} + +/* Expand the __sync_mem_exchange intrinsic: + + TYPE __sync_mem_exchange (TYPE *to, TYPE from, enum memmodel) + + EXP is the CALL_EXPR. + TARGET is an optional place for us to store the results. */ + +static rtx +expand_builtin_mem_exchange (enum machine_mode mode, tree exp, rtx target) +{ + rtx val, mem; + enum machine_mode old_mode; + enum memmodel model; + + model = get_memmodel (CALL_EXPR_ARG (exp, 2)); + if (model != MEMMODEL_RELAXED + && model != MEMMODEL_SEQ_CST + && model != MEMMODEL_ACQ_REL + && model != MEMMODEL_RELEASE + && model != MEMMODEL_ACQUIRE) + { + error ("invalid memory model for %<__sync_mem_exchange%>"); + return NULL_RTX; + } + + /* Expand the operands. */ + mem = get_builtin_sync_mem (CALL_EXPR_ARG (exp, 0), mode); + val = expand_expr (CALL_EXPR_ARG (exp, 1), NULL_RTX, mode, EXPAND_NORMAL); + /* If VAL is promoted to a wider mode, convert it back to MODE. Take care + of CONST_INTs, where we know the old_mode only from the call argument. */ + old_mode = GET_MODE (val); + if (old_mode == VOIDmode) + old_mode = TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (exp, 1))); + val = convert_modes (mode, old_mode, val, 1); + + return expand_sync_mem_exchange (model, mem, val, target); +} + /* Expand the __sync_synchronize intrinsic. */ -static void +void expand_builtin_synchronize (void) { gimple x; @@ -6495,6 +6555,17 @@ expand_builtin (tree exp, rtx target, rt return target; break; + case BUILT_IN_MEM_EXCHANGE_1: + case BUILT_IN_MEM_EXCHANGE_2: + case BUILT_IN_MEM_EXCHANGE_4: + case BUILT_IN_MEM_EXCHANGE_8: + case BUILT_IN_MEM_EXCHANGE_16: + mode = get_builtin_sync_mode (fcode - BUILT_IN_MEM_EXCHANGE_1); + target = expand_builtin_mem_exchange (mode, exp, target); + if (target) + return target; + break; + case BUILT_IN_LOCK_TEST_AND_SET_1: case BUILT_IN_LOCK_TEST_AND_SET_2: case BUILT_IN_LOCK_TEST_AND_SET_4: Index: sync-builtins.def =================================================================== --- sync-builtins.def (revision 173831) +++ sync-builtins.def (working copy) @@ -250,3 +250,24 @@ DEF_SYNC_BUILTIN (BUILT_IN_LOCK_RELEASE_ DEF_SYNC_BUILTIN (BUILT_IN_SYNCHRONIZE, "__sync_synchronize", BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) + +/* Fine grained __sync* builtins for the C++ memory model. */ + +DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_N, + "__sync_mem_exchange", + BT_FN_VOID_VAR, ATTR_NOTHROW_LEAF_LIST) +DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_1, + "__sync_mem_exchange_1", + BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST) +DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_2, + "__sync_mem_exchange_2", + BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST) +DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_4, + "__sync_mem_exchange_4", + BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST) +DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_8, + "__sync_mem_exchange_8", + BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST) +DEF_SYNC_BUILTIN (BUILT_IN_MEM_EXCHANGE_16, + "__sync_mem_exchange_16", + BT_FN_I16_VPTR_I16_INT, ATTR_NOTHROW_LEAF_LIST) Index: testsuite/gcc.dg/x86-sync-1.c =================================================================== --- testsuite/gcc.dg/x86-sync-1.c (revision 0) +++ testsuite/gcc.dg/x86-sync-1.c (revision 0) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-dap" } */ + +int i; + +void foo() +{ + __sync_mem_exchange (&i, 555, __SYNC_MEM_SEQ_CST); +} + +/* { dg-final { scan-assembler "sync_mem_exchange_seq_cstsi" } } */ Index: builtin-types.def =================================================================== --- builtin-types.def (revision 173831) +++ builtin-types.def (working copy) @@ -383,6 +383,11 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_OMPFN_PT BT_PTR, BT_UINT) DEF_FUNCTION_TYPE_3 (BT_FN_PTR_CONST_PTR_INT_SIZE, BT_PTR, BT_CONST_PTR, BT_INT, BT_SIZE) +DEF_FUNCTION_TYPE_3 (BT_FN_I1_VPTR_I1_INT, BT_I1, BT_VOLATILE_PTR, BT_I1, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_I2_VPTR_I2_INT, BT_I2, BT_VOLATILE_PTR, BT_I2, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_I4_VPTR_I4_INT, BT_I4, BT_VOLATILE_PTR, BT_I4, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_I8_VPTR_I8_INT, BT_I8, BT_VOLATILE_PTR, BT_I8, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_INT, BT_I16, BT_VOLATILE_PTR, BT_I16, BT_INT) DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR, BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR) Index: expr.h =================================================================== --- expr.h (revision 173831) +++ expr.h (working copy) @@ -217,6 +217,7 @@ rtx expand_bool_compare_and_swap (rtx, r rtx expand_sync_operation (rtx, rtx, enum rtx_code); rtx expand_sync_fetch_operation (rtx, rtx, enum rtx_code, bool, rtx); rtx expand_sync_lock_test_and_set (rtx, rtx, rtx); +rtx expand_sync_mem_exchange (enum memmodel, rtx, rtx, rtx); /* Functions from expmed.c: */ @@ -248,6 +249,7 @@ extern void expand_builtin_setjmp_receiv extern rtx expand_builtin_saveregs (void); extern void expand_builtin_trap (void); extern rtx builtin_strncpy_read_str (void *, HOST_WIDE_INT, enum machine_mode); +extern void expand_builtin_synchronize (void); /* Functions from expr.c: */ Index: config/i386/i386.md =================================================================== --- config/i386/i386.md (revision 173831) +++ config/i386/i386.md (working copy) @@ -250,6 +250,7 @@ UNSPECV_MWAIT UNSPECV_CMPXCHG UNSPECV_XCHG + UNSPECV_MEM_XCHG UNSPECV_LOCK UNSPECV_PROLOGUE_USE UNSPECV_CLD Index: config/i386/sync.md =================================================================== --- config/i386/sync.md (revision 173831) +++ config/i386/sync.md (working copy) @@ -232,6 +232,15 @@ return "lock{%;} add{}\t{%1, %0|%0, %1}"; }) +(define_insn "sync_mem_exchange_seq_cst" + [(set (match_operand:SWI 0 "register_operand" "=") + (unspec_volatile:SWI + [(match_operand:SWI 1 "memory_operand" "+m")] UNSPECV_MEM_XCHG)) + (set (match_dup 1) + (match_operand:SWI 2 "register_operand" "0"))] + "" + "xchg{}\t{%1, %0|%0, %1}") + ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space. (define_insn "sync_lock_test_and_set" [(set (match_operand:SWI 0 "register_operand" "=")