From patchwork Fri Jun 24 14:22:08 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 101805 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 40F6BB6F87 for ; Sat, 25 Jun 2011 00:22:33 +1000 (EST) Received: (qmail 28786 invoked by alias); 24 Jun 2011 14:22:30 -0000 Received: (qmail 28776 invoked by uid 22791); 24 Jun 2011 14:22:27 -0000 X-SWARE-Spam-Status: No, hits=-6.4 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_HI, SPF_HELO_PASS, TW_GN, TW_TM, TW_TR, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 24 Jun 2011 14:22:10 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p5OEMATJ008234 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Fri, 24 Jun 2011 10:22:10 -0400 Received: from tyan-ft48-01.lab.bos.redhat.com (tyan-ft48-01.lab.bos.redhat.com [10.16.42.4]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p5OEM9qf018112 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 24 Jun 2011 10:22:10 -0400 Received: from tyan-ft48-01.lab.bos.redhat.com (localhost.localdomain [127.0.0.1]) by tyan-ft48-01.lab.bos.redhat.com (8.14.4/8.14.4) with ESMTP id p5OEM8Rv015014 for ; Fri, 24 Jun 2011 16:22:09 +0200 Received: (from jakub@localhost) by tyan-ft48-01.lab.bos.redhat.com (8.14.4/8.14.4/Submit) id p5OEM8iA015012 for gcc-patches@gcc.gnu.org; Fri, 24 Jun 2011 16:22:08 +0200 Date: Fri, 24 Jun 2011 16:22:08 +0200 From: Jakub Jelinek To: gcc-patches@gcc.gnu.org Subject: [PATCH] __builtin_assume_aligned Message-ID: <20110624142207.GW16443@tyan-ft48-01.lab.bos.redhat.com> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi! This patch introduces a new extension, to hint the compiler that a pointer is guaranteed to be somehow aligned (or misaligned). It is designed as a pass-thru builtin which just returns its first argument, so that it is more obvious where we can assume how it is aligned. Otherwise it is similar to ICC's __assume_aligned, so for lvalue first argument ICC's __assume_aligned can be emulated using #define __assume_aligned(lvalueptr, align) lvalueptr = __builtin_assume_aligned (lvalueptr, align) ICC doesn't allow side-effects in the arguments of this, GCC does, so one can e.g. write: void foo (std::vector &vec) { double *__restrict data = (double *) __builtin_assume_aligned (vec.data (), 16); ... } to hint gcc that it can assume the vector has its data () 16 byte aligned (which is true e.g. on x86_64-linux if using standard malloc based allocator, which guarantees 2 * sizeof (void*) alignment). E.g. vectorizer can use that hint to generate aligned stores/loads instead of unaligned ones. Maybe we should have also __builtin_likely_aligned, which would be similar, just wouldn't guarantee such an alignment, just say it is very likely. If vectorizer decided to version a loop, for the fast alternative it could check the alignment in the versioning condition and assume the likely aligned alignment in the fast vectorized version and let the unlikely non-aligned case use slower scalar loop. But that can be done separately. The builtin can have either two or three arguments, the second is alignment and third is misalignment (i.e. that (uintptr_t) ((char *) firstarg - misalign) & (align - 1) == 0 ). I've been contemplating to make the builtin overloaded, have return type be always the type of the first argument if it is pointer/reference type, like template T __builtin_assume_aligned (T, size_t, ...); both in C and C++, but I think it would be too difficult to make it work that way, so the builtin is instead void *__builtin_assume_aligned (const void *, size_t, ...); Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-06-24 Jakub Jelinek * builtin-types.def (BT_FN_PTR_CONST_PTR_SIZE_VAR): New. * builtins.def (BUILT_IN_ASSUME_ALIGNED): New builtin. * tree-ssa-structalias.c (find_func_aliases_for_builtin_call, find_func_clobbers): Handle BUILT_IN_ASSUME_ALIGNED. * tree-ssa-ccp.c (bit_value_assume_aligned): New function. (evaluate_stmt, execute_fold_all_builtins): Handle BUILT_IN_ASSUME_ALIGNED. * tree-ssa-dce.c (propagate_necessity): Likewise. * tree-ssa-alias.c (ref_maybe_used_by_call_p_1, call_may_clobber_ref_p_1): Likewise. * builtins.c (is_simple_builtin, fold_builtin_varargs, expand_builtin): Likewise. (expand_builtin_assume_aligned, fold_builtin_assume_aligned): New functions. * doc/extend.texi (__builtin_assume_aligned): Document. * gcc.dg/builtin-assume-aligned-1.c: New test. * gcc.dg/builtin-assume-aligned-2.c: New test. * gcc.target/i386/builtin-assume-aligned-1.c: New test. Jakub --- gcc/builtin-types.def.jj 2011-06-21 16:45:42.000000000 +0200 +++ gcc/builtin-types.def 2011-06-23 11:25:03.000000000 +0200 @@ -454,6 +454,8 @@ DEF_FUNCTION_TYPE_VAR_2 (BT_FN_INT_CONST BT_INT, BT_CONST_STRING, BT_CONST_STRING) DEF_FUNCTION_TYPE_VAR_2 (BT_FN_INT_INT_CONST_STRING_VAR, BT_INT, BT_INT, BT_CONST_STRING) +DEF_FUNCTION_TYPE_VAR_2 (BT_FN_PTR_CONST_PTR_SIZE_VAR, BT_PTR, + BT_CONST_PTR, BT_SIZE) DEF_FUNCTION_TYPE_VAR_3 (BT_FN_INT_STRING_SIZE_CONST_STRING_VAR, BT_INT, BT_STRING, BT_SIZE, BT_CONST_STRING) --- gcc/builtins.def.jj 2011-06-21 16:46:01.000000000 +0200 +++ gcc/builtins.def 2011-06-23 11:25:03.000000000 +0200 @@ -1,7 +1,7 @@ /* This file contains the definitions and documentation for the builtins used in the GNU compiler. Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, - 2010 Free Software Foundation, Inc. + 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -638,6 +638,7 @@ DEF_EXT_LIB_BUILTIN (BUILT_IN_EXE DEF_EXT_LIB_BUILTIN (BUILT_IN_EXECVE, "execve", BT_FN_INT_CONST_STRING_PTR_CONST_STRING_PTR_CONST_STRING, ATTR_NOTHROW_LIST) DEF_LIB_BUILTIN (BUILT_IN_EXIT, "exit", BT_FN_VOID_INT, ATTR_NORETURN_NOTHROW_LIST) DEF_GCC_BUILTIN (BUILT_IN_EXPECT, "expect", BT_FN_LONG_LONG_LONG, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN (BUILT_IN_ASSUME_ALIGNED, "assume_aligned", BT_FN_PTR_CONST_PTR_SIZE_VAR, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN (BUILT_IN_EXTEND_POINTER, "extend_pointer", BT_FN_UNWINDWORD_PTR, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN (BUILT_IN_EXTRACT_RETURN_ADDR, "extract_return_addr", BT_FN_PTR_PTR, ATTR_LEAF_LIST) DEF_EXT_LIB_BUILTIN (BUILT_IN_FFS, "ffs", BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST) --- gcc/tree-ssa-structalias.c.jj 2011-06-23 10:13:58.000000000 +0200 +++ gcc/tree-ssa-structalias.c 2011-06-23 11:25:04.000000000 +0200 @@ -4002,6 +4002,7 @@ find_func_aliases_for_builtin_call (gimp case BUILT_IN_STPCPY_CHK: case BUILT_IN_STRCAT_CHK: case BUILT_IN_STRNCAT_CHK: + case BUILT_IN_ASSUME_ALIGNED: { tree res = gimple_call_lhs (t); tree dest = gimple_call_arg (t, (DECL_FUNCTION_CODE (fndecl) @@ -4726,6 +4727,7 @@ find_func_clobbers (gimple origt) return; } /* The following functions neither read nor clobber memory. */ + case BUILT_IN_ASSUME_ALIGNED: case BUILT_IN_FREE: return; /* Trampolines are of no interest to us. */ --- gcc/tree-ssa-ccp.c.jj 2011-06-23 10:13:58.000000000 +0200 +++ gcc/tree-ssa-ccp.c 2011-06-23 15:17:16.000000000 +0200 @@ -1476,6 +1476,64 @@ bit_value_binop (enum tree_code code, tr return val; } +/* Return the propagation value when applying __builtin_assume_aligned to + its arguments. */ + +static prop_value_t +bit_value_assume_aligned (gimple stmt) +{ + tree ptr = gimple_call_arg (stmt, 0), align, misalign = NULL_TREE; + tree type = TREE_TYPE (ptr); + unsigned HOST_WIDE_INT aligni, misaligni = 0; + prop_value_t ptrval = get_value_for_expr (ptr, true); + prop_value_t alignval; + double_int value, mask; + prop_value_t val; + if (ptrval.lattice_val == UNDEFINED) + return ptrval; + gcc_assert ((ptrval.lattice_val == CONSTANT + && TREE_CODE (ptrval.value) == INTEGER_CST) + || double_int_minus_one_p (ptrval.mask)); + align = gimple_call_arg (stmt, 1); + if (!host_integerp (align, 1)) + return ptrval; + aligni = tree_low_cst (align, 1); + if (aligni <= 1 + || (aligni & (aligni - 1)) != 0) + return ptrval; + if (gimple_call_num_args (stmt) > 2) + { + misalign = gimple_call_arg (stmt, 2); + if (!host_integerp (misalign, 1)) + return ptrval; + misaligni = tree_low_cst (misalign, 1); + if (misaligni >= aligni) + return ptrval; + } + align = build_int_cst_type (type, -aligni); + alignval = get_value_for_expr (align, true); + bit_value_binop_1 (BIT_AND_EXPR, type, &value, &mask, + type, value_to_double_int (ptrval), ptrval.mask, + type, value_to_double_int (alignval), alignval.mask); + if (!double_int_minus_one_p (mask)) + { + val.lattice_val = CONSTANT; + val.mask = mask; + gcc_assert ((mask.low & (aligni - 1)) == 0); + gcc_assert ((value.low & (aligni - 1)) == 0); + value.low |= misaligni; + /* ??? Delay building trees here. */ + val.value = double_int_to_tree (type, value); + } + else + { + val.lattice_val = VARYING; + val.value = NULL_TREE; + val.mask = double_int_minus_one; + } + return val; +} + /* Evaluate statement STMT. Valid only for assignments, calls, conditionals, and switches. */ @@ -1647,6 +1705,10 @@ evaluate_stmt (gimple stmt) val = get_value_for_expr (gimple_call_arg (stmt, 0), true); break; + case BUILT_IN_ASSUME_ALIGNED: + val = bit_value_assume_aligned (stmt); + break; + default:; } } @@ -2186,6 +2248,11 @@ execute_fold_all_builtins (void) result = integer_zero_node; break; + case BUILT_IN_ASSUME_ALIGNED: + /* Remove __builtin_assume_aligned. */ + result = gimple_call_arg (stmt, 0); + break; + case BUILT_IN_STACK_RESTORE: result = optimize_stack_restore (i); if (result) --- gcc/tree-ssa-dce.c.jj 2011-06-23 10:13:58.000000000 +0200 +++ gcc/tree-ssa-dce.c 2011-06-23 11:25:05.000000000 +0200 @@ -837,7 +837,8 @@ propagate_necessity (struct edge_list *e || DECL_FUNCTION_CODE (callee) == BUILT_IN_FREE || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_SAVE - || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE)) + || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE + || DECL_FUNCTION_CODE (callee) == BUILT_IN_ASSUME_ALIGNED)) continue; /* Calls implicitly load from memory, their arguments --- gcc/tree-ssa-alias.c.jj 2011-06-23 10:13:58.000000000 +0200 +++ gcc/tree-ssa-alias.c 2011-06-23 11:25:05.000000000 +0200 @@ -1253,6 +1253,7 @@ ref_maybe_used_by_call_p_1 (gimple call, case BUILT_IN_SINCOS: case BUILT_IN_SINCOSF: case BUILT_IN_SINCOSL: + case BUILT_IN_ASSUME_ALIGNED: return false; /* __sync_* builtins and some OpenMP builtins act as threading barriers. */ @@ -1511,6 +1512,7 @@ call_may_clobber_ref_p_1 (gimple call, a return false; case BUILT_IN_STACK_SAVE: case BUILT_IN_ALLOCA: + case BUILT_IN_ASSUME_ALIGNED: return false; /* Freeing memory kills the pointed-to memory. More importantly the call has to serve as a barrier for moving loads and stores --- gcc/builtins.c.jj 2011-06-22 10:16:56.000000000 +0200 +++ gcc/builtins.c 2011-06-23 11:25:05.000000000 +0200 @@ -4604,6 +4604,23 @@ expand_builtin_expect (tree exp, rtx tar return target; } +/* Expand a call to __builtin_assume_aligned. We just return our first + argument as the builtin_assume_aligned semantic should've been already + executed by CCP. */ + +static rtx +expand_builtin_assume_aligned (tree exp, rtx target) +{ + if (call_expr_nargs (exp) < 2) + return const0_rtx; + target = expand_expr (CALL_EXPR_ARG (exp, 0), target, VOIDmode, + EXPAND_NORMAL); + gcc_assert (!TREE_SIDE_EFFECTS (CALL_EXPR_ARG (exp, 1)) + && (call_expr_nargs (exp) < 3 + || !TREE_SIDE_EFFECTS (CALL_EXPR_ARG (exp, 2)))); + return target; +} + void expand_builtin_trap (void) { @@ -5823,6 +5840,8 @@ expand_builtin (tree exp, rtx target, rt return expand_builtin_va_copy (exp); case BUILT_IN_EXPECT: return expand_builtin_expect (exp, target); + case BUILT_IN_ASSUME_ALIGNED: + return expand_builtin_assume_aligned (exp, target); case BUILT_IN_PREFETCH: expand_builtin_prefetch (exp); return const0_rtx; @@ -9352,6 +9371,31 @@ fold_builtin_fpclassify (location_t loc, return res; } +/* Diagnose invalid uses of __builtin_assume_aligned. */ + +static tree +fold_builtin_assume_aligned (location_t loc, tree fndecl, tree exp) +{ + int nargs = call_expr_nargs (exp); + + if (nargs < 2) + return NULL_TREE; + if (nargs > 3) + { + error_at (loc, "%<__builtin_assume_aligned%> must have 2 or 3 arguments"); + return fold_convert_loc (loc, TREE_TYPE (TREE_TYPE (fndecl)), + CALL_EXPR_ARG (exp, 0)); + } + if (nargs == 3 && !validate_arg (CALL_EXPR_ARG (exp, 2), INTEGER_TYPE)) + { + error_at (loc, + "%<__builtin_assume_aligned%> last operand must have integer type"); + return fold_convert_loc (loc, TREE_TYPE (TREE_TYPE (fndecl)), + CALL_EXPR_ARG (exp, 0)); + } + return NULL_TREE; +} + /* Fold a call to an unordered comparison function such as __builtin_isgreater(). FNDECL is the FUNCTION_DECL for the function being called and ARG0 and ARG1 are the arguments for the call. @@ -10266,6 +10310,9 @@ fold_builtin_varargs (location_t loc, tr ret = fold_builtin_fpclassify (loc, exp); break; + case BUILT_IN_ASSUME_ALIGNED: + return fold_builtin_assume_aligned (loc, fndecl, exp); + default: break; } @@ -13461,6 +13508,7 @@ is_simple_builtin (tree decl) case BUILT_IN_OBJECT_SIZE: case BUILT_IN_UNREACHABLE: /* Simple register moves or loads from stack. */ + case BUILT_IN_ASSUME_ALIGNED: case BUILT_IN_RETURN_ADDRESS: case BUILT_IN_EXTRACT_RETURN_ADDR: case BUILT_IN_FROB_RETURN_ADDR: --- gcc/doc/extend.texi.jj 2011-06-21 16:45:44.000000000 +0200 +++ gcc/doc/extend.texi 2011-06-24 12:36:34.000000000 +0200 @@ -7646,6 +7646,28 @@ int g (int c) @end deftypefn +@deftypefn {Built-in Function} void *__builtin_assume_aligned (const void *@var{exp}, size_t @var{align}, ...) +This function returns its first argument, and allows the compiler +to assume that the returned pointer is at least @var{align} bytes +aligned. This built-in can have either two or three arguments, +if it has three, the third argument should have integer type, and +if it is non-zero means misalignment offset. For example: + +@smallexample +void *x = __builtin_assume_aligned (arg, 16); +@end smallexample + +means that the compiler can assume x, set to arg, is at least +16 byte aligned, while: + +@smallexample +void *x = __builtin_assume_aligned (arg, 32, 8); +@end smallexample + +means that the compiler can assume for x, set to arg, that +(char *) x - 8 is 32 byte aligned. +@end deftypefn + @deftypefn {Built-in Function} void __builtin___clear_cache (char *@var{begin}, char *@var{end}) This function is used to flush the processor's instruction cache for the region of memory between @var{begin} inclusive and @var{end} --- gcc/testsuite/gcc.dg/builtin-assume-aligned-1.c.jj 2011-06-24 12:56:21.000000000 +0200 +++ gcc/testsuite/gcc.dg/builtin-assume-aligned-1.c 2011-06-24 13:05:45.000000000 +0200 @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +void +test1 (double *out1, double *out2, double *out3, double *in1, + double *in2, int len) +{ + int i; + double *__restrict o1 = __builtin_assume_aligned (out1, 16); + double *__restrict o2 = __builtin_assume_aligned (out2, 16); + double *__restrict o3 = __builtin_assume_aligned (out3, 16); + double *__restrict i1 = __builtin_assume_aligned (in1, 16); + double *__restrict i2 = __builtin_assume_aligned (in2, 16); + for (i = 0; i < len; ++i) + { + o1[i] = i1[i] * i2[i]; + o2[i] = i1[i] + i2[i]; + o3[i] = i1[i] - i2[i]; + } +} + +void +test2 (double *out1, double *out2, double *out3, double *in1, + double *in2, int len) +{ + int i, align = 32, misalign = 16; + out1 = __builtin_assume_aligned (out1, align, misalign); + out2 = __builtin_assume_aligned (out2, align, 16); + out3 = __builtin_assume_aligned (out3, 32, misalign); + in1 = __builtin_assume_aligned (in1, 32, 16); + in2 = __builtin_assume_aligned (in2, 32, 0); + for (i = 0; i < len; ++i) + { + out1[i] = in1[i] * in2[i]; + out2[i] = in1[i] + in2[i]; + out3[i] = in1[i] - in2[i]; + } +} + +/* { dg-final { scan-tree-dump-not "__builtin_assume_aligned" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ --- gcc/testsuite/gcc.dg/builtin-assume-aligned-2.c.jj 2011-06-24 13:00:45.000000000 +0200 +++ gcc/testsuite/gcc.dg/builtin-assume-aligned-2.c 2011-06-24 13:01:36.000000000 +0200 @@ -0,0 +1,16 @@ +/* { dg-do compile } */ + +double *bar (void); + +void +foo (double *ptr, int i) +{ + double *a = __builtin_assume_aligned (ptr, 16, 8, 7); /* { dg-error "must have 2 or 3 arguments" } */ + double *b = __builtin_assume_aligned (bar (), 16); + double *c = __builtin_assume_aligned (bar (), 16, 8); + double *d = __builtin_assume_aligned (ptr, i, ptr); /* { dg-error "last operand must have integer type" } */ + *a = 0.0; + *b = 0.0; + *c = 0.0; + *d = 0.0; +} --- gcc/testsuite/gcc.target/i386/builtin-assume-aligned-1.c.jj 2011-06-24 13:02:57.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/builtin-assume-aligned-1.c 2011-06-24 13:05:28.000000000 +0200 @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -msse2 -mno-avx" } */ + +void +test1 (double *out1, double *out2, double *out3, double *in1, + double *in2, int len) +{ + int i; + double *__restrict o1 = __builtin_assume_aligned (out1, 16); + double *__restrict o2 = __builtin_assume_aligned (out2, 16); + double *__restrict o3 = __builtin_assume_aligned (out3, 16); + double *__restrict i1 = __builtin_assume_aligned (in1, 16); + double *__restrict i2 = __builtin_assume_aligned (in2, 16); + for (i = 0; i < len; ++i) + { + o1[i] = i1[i] * i2[i]; + o2[i] = i1[i] + i2[i]; + o3[i] = i1[i] - i2[i]; + } +} + +void +test2 (double *out1, double *out2, double *out3, double *in1, + double *in2, int len) +{ + int i, align = 32, misalign = 16; + out1 = __builtin_assume_aligned (out1, align, misalign); + out2 = __builtin_assume_aligned (out2, align, 16); + out3 = __builtin_assume_aligned (out3, 32, misalign); + in1 = __builtin_assume_aligned (in1, 32, 16); + in2 = __builtin_assume_aligned (in2, 32, 0); + for (i = 0; i < len; ++i) + { + out1[i] = in1[i] * in2[i]; + out2[i] = in1[i] + in2[i]; + out3[i] = in1[i] - in2[i]; + } +} + +/* { dg-final { scan-assembler-not "movhpd" } } */ +/* { dg-final { scan-assembler-not "movlpd" } } */