[{"id":1767162,"web_url":"http://patchwork.ozlabs.org/comment/1767162/","msgid":"<239ABD63-BB1B-418F-98E8-C92A195F613A@linux.vnet.ibm.com>","list_archive_url":null,"date":"2017-09-12T15:22:37","subject":"Re: [PATCH, rs6000] Folding of vector loads in GIMPLE","submitter":{"id":6459,"url":"http://patchwork.ozlabs.org/api/people/6459/","name":"Bill Schmidt","email":"wschmidt@linux.vnet.ibm.com"},"content":"> On Sep 12, 2017, at 9:41 AM, Will Schmidt <will_schmidt@vnet.ibm.com> wrote:\n> \n> Hi\n> \n> [PATCH, rs6000] Folding of vector loads in GIMPLE\n> \n> Folding of vector loads in GIMPLE.\n> \n> - Add code to handle gimple folding for the vec_ld builtins.\n> - Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\n> comments have been adjusted slightly so they continue to read OK for the\n> vec_st code that remains.\n> \n> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\n> tests which have been posted separately. (a few minutes ago).\n> \n> Regtest successfully completed on power6 and newer. (p6,p7,p8le,p8be,p9).\n> \n> OK for trunk?\n> \n> Thanks,\n> -Will\n> \n> [gcc]\n> \n>        2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n> \n> \t* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n> \t  for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n> \t* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n> \t  Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n> \n> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\n> index 897306c..73e14d9 100644\n> --- a/gcc/config/rs6000/rs6000-c.c\n> +++ b/gcc/config/rs6000/rs6000-c.c\n> @@ -6459,92 +6459,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n> \t\t     convert (TREE_TYPE (stmt), arg0));\n>       stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n>       return stmt;\n>     }\n> \n> -  /* Expand vec_ld into an expression that masks the address and\n> -     performs the load.  We need to expand this early to allow\n> +  /* Expand vec_st into an expression that masks the address and\n> +     performs the store.  We need to expand this early to allow\n>      the best aliasing, as by the time we get into RTL we no longer\n>      are able to honor __restrict__, for example.  We may want to\n>      consider this for all memory access built-ins.\n> \n>      When -maltivec=be is specified, or the wrong number of arguments\n>      is provided, simply punt to existing built-in processing.  */\n> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n> -      && nargs == 2)\n> -    {\n> -      tree arg0 = (*arglist)[0];\n> -      tree arg1 = (*arglist)[1];\n> -\n> -      /* Strip qualifiers like \"const\" from the pointer arg.  */\n> -      tree arg1_type = TREE_TYPE (arg1);\n> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n> -\tgoto bad;\n> -\n> -      tree inner_type = TREE_TYPE (arg1_type);\n> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n> -\t{\n> -\t  arg1_type = build_pointer_type (build_qualified_type (inner_type,\n> -\t\t\t\t\t\t\t\t0));\n> -\t  arg1 = fold_convert (arg1_type, arg1);\n> -\t}\n> -\n> -      /* Construct the masked address.  Let existing error handling take\n> -\t over if we don't have a constant offset.  */\n> -      arg0 = fold (arg0);\n> -\n> -      if (TREE_CODE (arg0) == INTEGER_CST)\n> -\t{\n> -\t  if (!ptrofftype_p (TREE_TYPE (arg0)))\n> -\t    arg0 = build1 (NOP_EXPR, sizetype, arg0);\n> -\n> -\t  tree arg1_type = TREE_TYPE (arg1);\n> -\t  if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n> -\t    {\n> -\t      arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n> -\t      tree const0 = build_int_cstu (sizetype, 0);\n> -\t      tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n> -\t      arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n> -\t    }\n> -\n> -\t  tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n> -\t\t\t\t       arg1, arg0);\n> -\t  tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n> -\t\t\t\t\t  build_int_cst (arg1_type, -16));\n> -\n> -\t  /* Find the built-in to get the return type so we can convert\n> -\t     the result properly (or fall back to default handling if the\n> -\t     arguments aren't compatible).  */\n> -\t  for (desc = altivec_overloaded_builtins;\n> -\t       desc->code && desc->code != fcode; desc++)\n> -\t    continue;\n> -\n> -\t  for (; desc->code == fcode; desc++)\n> -\t    if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n> -\t\t&& (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n> -\t\t\t\t\t\t    desc->op2)))\n> -\t      {\n> -\t\ttree ret_type = rs6000_builtin_type (desc->ret_type);\n> -\t\tif (TYPE_MODE (ret_type) == V2DImode)\n> -\t\t  /* Type-based aliasing analysis thinks vector long\n> -\t\t     and vector long long are different and will put them\n> -\t\t     in distinct alias classes.  Force our return type\n> -\t\t     to be a may-alias type to avoid this.  */\n> -\t\t  ret_type\n> -\t\t    = build_pointer_type_for_mode (ret_type, Pmode,\n> -\t\t\t\t\t\t   true/*can_alias_all*/);\n> -\t\telse\n> -\t\t  ret_type = build_pointer_type (ret_type);\n> -\t\taligned = build1 (NOP_EXPR, ret_type, aligned);\n> -\t\ttree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n> -\t\treturn ret_val;\n> -\t      }\n> -\t}\n> -    }\n> \n> -  /* Similarly for stvx.  */\n>   if (fcode == ALTIVEC_BUILTIN_VEC_ST\n>       && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>       && nargs == 3)\n>     {\n>       tree arg0 = (*arglist)[0];\n> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\n> index cf744d8..5b14789 100644\n> --- a/gcc/config/rs6000/rs6000.c\n> +++ b/gcc/config/rs6000/rs6000.c\n> @@ -16473,10 +16473,65 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n> \tres = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n> \tgsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n> \tupdate_call_from_tree (gsi, res);\n> \treturn true;\n>       }\n> +    /* Vector loads.  */\n> +    case ALTIVEC_BUILTIN_LVX_V16QI:\n> +    case ALTIVEC_BUILTIN_LVX_V8HI:\n> +    case ALTIVEC_BUILTIN_LVX_V4SI:\n> +    case ALTIVEC_BUILTIN_LVX_V4SF:\n> +    case ALTIVEC_BUILTIN_LVX_V2DI:\n> +    case ALTIVEC_BUILTIN_LVX_V2DF:\n> +      {\n> +\t gimple *g;\n> +\t arg0 = gimple_call_arg (stmt, 0);  // offset\n> +\t arg1 = gimple_call_arg (stmt, 1);  // address\n> +\n> +\t /* Limit folding of loads to LE targets.  */\n> +\t if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n> +\t   return false;\n\nWhy?  This transformation shouldn't be endian-dependent.\n\nThanks,\nBill\n\n> +\n> +\t lhs = gimple_call_lhs (stmt);\n> +\t location_t loc = gimple_location (stmt);\n> +\n> +\t tree arg1_type = TREE_TYPE (arg1);\n> +\t tree lhs_type = TREE_TYPE (lhs);\n> +\n> +\t /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n> +\t    the tree using the value from arg0.  The resulting type will match\n> +\t    the type of arg1.  */\n> +\t tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n> +\t g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n> +\t gimple_set_location (g, loc);\n> +\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> +\t tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n> +\t g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n> +\t\t\t\t  temp_offset);\n> +\t gimple_set_location (g, loc);\n> +\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> +\n> +\t /* Mask off any lower bits from the address.  */\n> +\t tree alignment_mask = build_int_cst (arg1_type, -16);\n> +\t tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n> +\t g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n> +\t\t\t\t temp_addr, alignment_mask);\n> +\t gimple_set_location (g, loc);\n> +\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> +\n> +\t /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n> +\t    take an offset, but since we've already incorporated the offset\n> +\t    above, here we just pass in a zero.  */\n> +\t g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n> +\t\t\t\t\t\tbuild_int_cst (arg1_type, 0)));\n> +\t gimple_set_location (g, loc);\n> +\t gsi_replace (gsi, g, true);\n> +\n> +\t return true;\n> +\n> +      }\n> +\n>     default:\n> \tif (TARGET_DEBUG_BUILTIN)\n> \t   fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n> \t\t    fn_code, fn_name1, fn_name2);\n>       break;\n> \n>","headers":{"Return-Path":"<gcc-patches-return-461947-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-461947-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"nNAiTjJ9\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xs7pS1qJFz9s7f\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 13 Sep 2017 01:22:56 +1000 (AEST)","(qmail 85980 invoked by alias); 12 Sep 2017 15:22:46 -0000","(qmail 85341 invoked by uid 89); 12 Sep 2017 15:22:45 -0000","from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)\n\t(148.163.158.5) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tTue, 12 Sep 2017 15:22:43 +0000","from pps.filterd (m0098416.ppops.net [127.0.0.1])\tby\n\tmx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id\n\tv8CFJS29167185\tfor <gcc-patches@gcc.gnu.org>;\n\tTue, 12 Sep 2017 11:22:42 -0400","from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.153])\tby\n\tmx0b-001b2d01.pphosted.com with ESMTP id\n\t2cxgyd4k0v-1\t(version=TLSv1.2 cipher=AES256-SHA bits=256\n\tverify=NOT)\tfor <gcc-patches@gcc.gnu.org>;\n\tTue, 12 Sep 2017 11:22:41 -0400","from localhost\tby e35.co.us.ibm.com with IBM ESMTP SMTP Gateway:\n\tAuthorized Use Only! Violators will be prosecuted\tfor\n\t<gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;\n\tTue, 12 Sep 2017 09:22:40 -0600","from b03cxnp08025.gho.boulder.ibm.com (9.17.130.17)\tby\n\te35.co.us.ibm.com (192.168.1.135) with IBM ESMTP SMTP\n\tGateway: Authorized Use Only! Violators will be prosecuted;\n\tTue, 12 Sep 2017 09:22:38 -0600","from b03ledav005.gho.boulder.ibm.com\n\t(b03ledav005.gho.boulder.ibm.com [9.17.130.236])\tby\n\tb03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0)\n\twith ESMTP id v8CFMci866912292; Tue, 12 Sep 2017 08:22:38 -0700","from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1])\tby\n\tIMSVA (Postfix) with ESMTP id 2773FBE04C;\n\tTue, 12 Sep 2017 09:22:38 -0600 (MDT)","from bigmac.rchland.ibm.com (unknown [9.10.86.143])\tby\n\tb03ledav005.gho.boulder.ibm.com (Postfix) with ESMTPS id\n\tC5C19BE03B; Tue, 12 Sep 2017 09:22:37 -0600 (MDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; q=dns; s=\n\tdefault; b=U6CUeZQnJ3Nhn0LAx7/AplCVFDUIzSsOHPJIqKxaFNQ64KhX0NXSg\n\tbcE4gGYoVOaZfM1zrrk/8YrZMsYFxr2nSSYylJzLaDfN+v2xCOiv1YLBEDbAkINV\n\tfMB7A7RsrI6tj4gqOvQSYCLQcjD1pnwFQTB1+9U47XT0aG9iWzz00w=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; s=default;\n\tbh=rm4CzsN8jguFia+kU65yuwaIlnk=; b=nNAiTjJ9ooy8SqY5fM9nlolmjAZ7\n\tWiPJmFWBybLlj1MbU3AnN4xIqGSLPw+PWLxPmdQwCa+MmjOcSLU+cAMWs0tR+GbF\n\tg2TQxnrL9yuBC3wr69oEiXbLVKXAgtckiGE0Z36gkpeKfBaYaUJzM4mIIRRuELsF\n\taN04LsUxKXNfYhA=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-24.7 required=5.0 tests=AWL, BAYES_00,\n\tGIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,\n\tKAM_LAZY_DOMAIN_SECURITY,\n\tRCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=","X-HELO":"mx0a-001b2d01.pphosted.com","Content-Type":"text/plain; charset=us-ascii","Mime-Version":"1.0 (Mac OS X Mail 10.3 \\(3273\\))","Subject":"Re: [PATCH, rs6000] Folding of vector loads in GIMPLE","From":"Bill Schmidt <wschmidt@linux.vnet.ibm.com>","In-Reply-To":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>","Date":"Tue, 12 Sep 2017 10:22:37 -0500","Cc":"GCC Patches <gcc-patches@gcc.gnu.org>,\n\tSegher Boessenkool <segher@kernel.crashing.org>,\n\tRichard Biener <richard.guenther@gmail.com>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","Content-Transfer-Encoding":"quoted-printable","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>","To":"will_schmidt@vnet.ibm.com","X-TM-AS-GCONF":"00","x-cbid":"17091215-0012-0000-0000-000014FED31A","X-IBM-SpamModules-Scores":"","X-IBM-SpamModules-Versions":"BY=3.00007711; HX=3.00000241; KW=3.00000007;\n\tPH=3.00000004; SC=3.00000227; SDB=6.00915991; UDB=6.00459932;\n\tIPR=6.00696222; BA=6.00005587; NDR=6.00000001; ZLA=6.00000005;\n\tZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;\n\tZU=6.00000002; MB=3.00017127; XFM=3.00000015;\n\tUTC=2017-09-12 15:22:39","X-IBM-AV-DETECTION":"SAVI=unused REMOTE=unused XFE=unused","x-cbparentid":"17091215-0013-0000-0000-00004F75229C","Message-Id":"<239ABD63-BB1B-418F-98E8-C92A195F613A@linux.vnet.ibm.com>","X-Proofpoint-Virus-Version":"vendor=fsecure engine=2.50.10432:, ,\n\tdefinitions=2017-09-12_07:, , signatures=0","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n\tspamscore=0 suspectscore=1 malwarescore=0 phishscore=0\n\tadultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx\n\tscancount=1 engine=8.0.1-1707230000\n\tdefinitions=main-1709120214","X-IsSubscribed":"yes"}},{"id":1767270,"web_url":"http://patchwork.ozlabs.org/comment/1767270/","msgid":"<1505238306.14827.162.camel@brimstone.rchland.ibm.com>","list_archive_url":null,"date":"2017-09-12T17:45:06","subject":"Re: [PATCH, rs6000] Folding of vector loads in GIMPLE","submitter":{"id":3241,"url":"http://patchwork.ozlabs.org/api/people/3241/","name":"will schmidt","email":"will_schmidt@vnet.ibm.com"},"content":"On Tue, 2017-09-12 at 10:22 -0500, Bill Schmidt wrote:\n> > On Sep 12, 2017, at 9:41 AM, Will Schmidt <will_schmidt@vnet.ibm.com> wrote:\n> > \n> > Hi\n> > \n> > [PATCH, rs6000] Folding of vector loads in GIMPLE\n> > \n> > Folding of vector loads in GIMPLE.\n> > \n> > - Add code to handle gimple folding for the vec_ld builtins.\n> > - Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\n> > comments have been adjusted slightly so they continue to read OK for the\n> > vec_st code that remains.\n> > \n> > The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\n> > tests which have been posted separately. (a few minutes ago).\n> > \n> > Regtest successfully completed on power6 and newer. (p6,p7,p8le,p8be,p9).\n> > \n> > OK for trunk?\n> > \n> > Thanks,\n> > -Will\n> > \n> > [gcc]\n> > \n> >        2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n> > \n> > \t* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n> > \t  for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n> > \t* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n> > \t  Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n> > \n> > diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\n> > index 897306c..73e14d9 100644\n> > --- a/gcc/config/rs6000/rs6000-c.c\n> > +++ b/gcc/config/rs6000/rs6000-c.c\n> > @@ -6459,92 +6459,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n> > \t\t     convert (TREE_TYPE (stmt), arg0));\n> >       stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n> >       return stmt;\n> >     }\n> > \n> > -  /* Expand vec_ld into an expression that masks the address and\n> > -     performs the load.  We need to expand this early to allow\n> > +  /* Expand vec_st into an expression that masks the address and\n> > +     performs the store.  We need to expand this early to allow\n> >      the best aliasing, as by the time we get into RTL we no longer\n> >      are able to honor __restrict__, for example.  We may want to\n> >      consider this for all memory access built-ins.\n> > \n> >      When -maltivec=be is specified, or the wrong number of arguments\n> >      is provided, simply punt to existing built-in processing.  */\n> > -  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n> > -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n> > -      && nargs == 2)\n> > -    {\n> > -      tree arg0 = (*arglist)[0];\n> > -      tree arg1 = (*arglist)[1];\n> > -\n> > -      /* Strip qualifiers like \"const\" from the pointer arg.  */\n> > -      tree arg1_type = TREE_TYPE (arg1);\n> > -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n> > -\tgoto bad;\n> > -\n> > -      tree inner_type = TREE_TYPE (arg1_type);\n> > -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n> > -\t{\n> > -\t  arg1_type = build_pointer_type (build_qualified_type (inner_type,\n> > -\t\t\t\t\t\t\t\t0));\n> > -\t  arg1 = fold_convert (arg1_type, arg1);\n> > -\t}\n> > -\n> > -      /* Construct the masked address.  Let existing error handling take\n> > -\t over if we don't have a constant offset.  */\n> > -      arg0 = fold (arg0);\n> > -\n> > -      if (TREE_CODE (arg0) == INTEGER_CST)\n> > -\t{\n> > -\t  if (!ptrofftype_p (TREE_TYPE (arg0)))\n> > -\t    arg0 = build1 (NOP_EXPR, sizetype, arg0);\n> > -\n> > -\t  tree arg1_type = TREE_TYPE (arg1);\n> > -\t  if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n> > -\t    {\n> > -\t      arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n> > -\t      tree const0 = build_int_cstu (sizetype, 0);\n> > -\t      tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n> > -\t      arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n> > -\t    }\n> > -\n> > -\t  tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n> > -\t\t\t\t       arg1, arg0);\n> > -\t  tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n> > -\t\t\t\t\t  build_int_cst (arg1_type, -16));\n> > -\n> > -\t  /* Find the built-in to get the return type so we can convert\n> > -\t     the result properly (or fall back to default handling if the\n> > -\t     arguments aren't compatible).  */\n> > -\t  for (desc = altivec_overloaded_builtins;\n> > -\t       desc->code && desc->code != fcode; desc++)\n> > -\t    continue;\n> > -\n> > -\t  for (; desc->code == fcode; desc++)\n> > -\t    if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n> > -\t\t&& (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n> > -\t\t\t\t\t\t    desc->op2)))\n> > -\t      {\n> > -\t\ttree ret_type = rs6000_builtin_type (desc->ret_type);\n> > -\t\tif (TYPE_MODE (ret_type) == V2DImode)\n> > -\t\t  /* Type-based aliasing analysis thinks vector long\n> > -\t\t     and vector long long are different and will put them\n> > -\t\t     in distinct alias classes.  Force our return type\n> > -\t\t     to be a may-alias type to avoid this.  */\n> > -\t\t  ret_type\n> > -\t\t    = build_pointer_type_for_mode (ret_type, Pmode,\n> > -\t\t\t\t\t\t   true/*can_alias_all*/);\n> > -\t\telse\n> > -\t\t  ret_type = build_pointer_type (ret_type);\n> > -\t\taligned = build1 (NOP_EXPR, ret_type, aligned);\n> > -\t\ttree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n> > -\t\treturn ret_val;\n> > -\t      }\n> > -\t}\n> > -    }\n> > \n> > -  /* Similarly for stvx.  */\n> >   if (fcode == ALTIVEC_BUILTIN_VEC_ST\n> >       && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n> >       && nargs == 3)\n> >     {\n> >       tree arg0 = (*arglist)[0];\n> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\n> > index cf744d8..5b14789 100644\n> > --- a/gcc/config/rs6000/rs6000.c\n> > +++ b/gcc/config/rs6000/rs6000.c\n> > @@ -16473,10 +16473,65 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n> > \tres = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n> > \tgsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n> > \tupdate_call_from_tree (gsi, res);\n> > \treturn true;\n> >       }\n> > +    /* Vector loads.  */\n> > +    case ALTIVEC_BUILTIN_LVX_V16QI:\n> > +    case ALTIVEC_BUILTIN_LVX_V8HI:\n> > +    case ALTIVEC_BUILTIN_LVX_V4SI:\n> > +    case ALTIVEC_BUILTIN_LVX_V4SF:\n> > +    case ALTIVEC_BUILTIN_LVX_V2DI:\n> > +    case ALTIVEC_BUILTIN_LVX_V2DF:\n> > +      {\n> > +\t gimple *g;\n> > +\t arg0 = gimple_call_arg (stmt, 0);  // offset\n> > +\t arg1 = gimple_call_arg (stmt, 1);  // address\n> > +\n> > +\t /* Limit folding of loads to LE targets.  */\n> > +\t if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n> > +\t   return false;\n> \n> Why?  This transformation shouldn't be endian-dependent.\n\nI was seeing errors in some of the existing tests specific to BE.\nFAIL: gcc.dg/vmx/ld-be-order.c   -Os  execution test\nFAIL: gcc.dg/vmx/ld-vsx-be-order.c   -O0  execution test\n\nI'll give this another attempt without that exclusion and verify.  I'll\nadmit it is possible the ld*be-order tests were failing for other\nreasons.\n\nThanks\n-Will\n\n> \n> Thanks,\n> Bill\n> \n> > +\n> > +\t lhs = gimple_call_lhs (stmt);\n> > +\t location_t loc = gimple_location (stmt);\n> > +\n> > +\t tree arg1_type = TREE_TYPE (arg1);\n> > +\t tree lhs_type = TREE_TYPE (lhs);\n> > +\n> > +\t /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n> > +\t    the tree using the value from arg0.  The resulting type will match\n> > +\t    the type of arg1.  */\n> > +\t tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n> > +\t g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n> > +\t gimple_set_location (g, loc);\n> > +\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> > +\t tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n> > +\t g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n> > +\t\t\t\t  temp_offset);\n> > +\t gimple_set_location (g, loc);\n> > +\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> > +\n> > +\t /* Mask off any lower bits from the address.  */\n> > +\t tree alignment_mask = build_int_cst (arg1_type, -16);\n> > +\t tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n> > +\t g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n> > +\t\t\t\t temp_addr, alignment_mask);\n> > +\t gimple_set_location (g, loc);\n> > +\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> > +\n> > +\t /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n> > +\t    take an offset, but since we've already incorporated the offset\n> > +\t    above, here we just pass in a zero.  */\n> > +\t g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n> > +\t\t\t\t\t\tbuild_int_cst (arg1_type, 0)));\n> > +\t gimple_set_location (g, loc);\n> > +\t gsi_replace (gsi, g, true);\n> > +\n> > +\t return true;\n> > +\n> > +      }\n> > +\n> >     default:\n> > \tif (TARGET_DEBUG_BUILTIN)\n> > \t   fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n> > \t\t    fn_code, fn_name1, fn_name2);\n> >       break;\n> > \n> > \n>","headers":{"Return-Path":"<gcc-patches-return-461977-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-461977-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"LiXwc+ju\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xsBzG5Zzyz9s7M\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 13 Sep 2017 03:45:45 +1000 (AEST)","(qmail 43494 invoked by alias); 12 Sep 2017 17:45:35 -0000","(qmail 43285 invoked by uid 89); 12 Sep 2017 17:45:24 -0000","from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)\n\t(148.163.156.1) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tTue, 12 Sep 2017 17:45:14 +0000","from pps.filterd (m0098399.ppops.net [127.0.0.1])\tby\n\tmx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id\n\tv8CHimlt100242\tfor <gcc-patches@gcc.gnu.org>;\n\tTue, 12 Sep 2017 13:45:11 -0400","from e16.ny.us.ibm.com (e16.ny.us.ibm.com [129.33.205.206])\tby\n\tmx0a-001b2d01.pphosted.com with ESMTP id\n\t2cxkmu9spu-1\t(version=TLSv1.2 cipher=AES256-SHA bits=256\n\tverify=NOT)\tfor <gcc-patches@gcc.gnu.org>;\n\tTue, 12 Sep 2017 13:45:11 -0400","from localhost\tby e16.ny.us.ibm.com with IBM ESMTP SMTP Gateway:\n\tAuthorized Use Only! Violators will be prosecuted\tfor\n\t<gcc-patches@gcc.gnu.org> from <will_schmidt@vnet.ibm.com>;\n\tTue, 12 Sep 2017 13:45:10 -0400","from b01cxnp23033.gho.pok.ibm.com (9.57.198.28)\tby\n\te16.ny.us.ibm.com (146.89.104.203) with IBM ESMTP SMTP\n\tGateway: Authorized Use Only! Violators will be prosecuted;\n\tTue, 12 Sep 2017 13:45:07 -0400","from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com\n\t[9.57.199.106])\tby b01cxnp23033.gho.pok.ibm.com\n\t(8.14.9/8.14.9/NCO v10.0) with ESMTP id v8CHj7Wv56885320;\n\tTue, 12 Sep 2017 17:45:07 GMT","from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1])\tby IMSVA\n\t(Postfix) with ESMTP id 5A6232803D;\n\tTue, 12 Sep 2017 13:45:00 -0400 (EDT)","from [9.10.86.107] (unknown [9.10.86.107])\tby\n\tb01ledav001.gho.pok.ibm.com (Postfix) with ESMTP id\n\t13D402803F; Tue, 12 Sep 2017 13:45:00 -0400 (EDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:subject:from:reply-to:to:cc:in-reply-to:references:content-type\n\t:date:mime-version:content-transfer-encoding:message-id; q=dns;\n\ts=default; b=HBd7yb8QkJSezvwdq2LNXzynmDxlVsQgAfh04mD/1LvIWqcvyX\n\tfVuAh+qFYb9P/i+pG0a9SJGHs5HmquMV8yoaxNj1rQ5MvSP8N/6K8TR/n1QDHCPr\n\tNyHN7zjL9yAKvHmx0krvaFhpCYXBJS1HmtVncdiQLw+so0NAqzm1pdcJQ=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:subject:from:reply-to:to:cc:in-reply-to:references:content-type\n\t:date:mime-version:content-transfer-encoding:message-id; s=\n\tdefault; bh=nCy8Ziic0b2rs8Ormt5TjXs/2hU=; b=LiXwc+juBuzptNhvDJ6x\n\twYqkJUkg7NNkKqCictc7g8Uc1gXBNwa7i7R+QtrKU4EjyFh8fsyJ8bJVuqCC6Yr+\n\taazqX0v1uUVAmSs9V8gUlhdDdkrmemUev/V+W88UuM7d39RTaIteQLet5wSslYQE\n\tC4CsBHcLjgkJzhEV66Ko8Gs=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-26.6 required=5.0 tests=AWL, BAYES_00,\n\tGIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,\n\tRCVD_IN_DNSWL_LOW,\n\tSPF_PASS autolearn=ham version=3.3.2 spammy=","X-HELO":"mx0a-001b2d01.pphosted.com","Subject":"Re: [PATCH, rs6000] Folding of vector loads in GIMPLE","From":"Will Schmidt <will_schmidt@vnet.ibm.com>","Reply-To":"will_schmidt@vnet.ibm.com","To":"Bill Schmidt <wschmidt@linux.vnet.ibm.com>","Cc":"GCC Patches <gcc-patches@gcc.gnu.org>,\n\tSegher Boessenkool <segher@kernel.crashing.org>,\n\tRichard Biener <richard.guenther@gmail.com>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","In-Reply-To":"<239ABD63-BB1B-418F-98E8-C92A195F613A@linux.vnet.ibm.com>","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>\t\n\t<239ABD63-BB1B-418F-98E8-C92A195F613A@linux.vnet.ibm.com>","Content-Type":"text/plain; charset=\"UTF-8\"","Date":"Tue, 12 Sep 2017 12:45:06 -0500","Mime-Version":"1.0","Content-Transfer-Encoding":"7bit","X-TM-AS-GCONF":"00","x-cbid":"17091217-0024-0000-0000-000002D0BB43","X-IBM-SpamModules-Scores":"","X-IBM-SpamModules-Versions":"BY=3.00007712; HX=3.00000241; KW=3.00000007;\n\tPH=3.00000004; SC=3.00000227; SDB=6.00916038; UDB=6.00459961;\n\tIPR=6.00696269; BA=6.00005587; NDR=6.00000001; ZLA=6.00000005;\n\tZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;\n\tZU=6.00000002; MB=3.00017128; XFM=3.00000015;\n\tUTC=2017-09-12 17:45:09","X-IBM-AV-DETECTION":"SAVI=unused REMOTE=unused XFE=unused","x-cbparentid":"17091217-0025-0000-0000-0000456710DB","Message-Id":"<1505238306.14827.162.camel@brimstone.rchland.ibm.com>","X-Proofpoint-Virus-Version":"vendor=fsecure engine=2.50.10432:, ,\n\tdefinitions=2017-09-12_08:, , signatures=0","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n\tspamscore=0 suspectscore=0 malwarescore=0 phishscore=0\n\tadultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx\n\tscancount=1 engine=8.0.1-1707230000\n\tdefinitions=main-1709120248","X-IsSubscribed":"yes"}},{"id":1767336,"web_url":"http://patchwork.ozlabs.org/comment/1767336/","msgid":"<1505250505.14827.191.camel@brimstone.rchland.ibm.com>","list_archive_url":null,"date":"2017-09-12T21:08:25","subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","submitter":{"id":3241,"url":"http://patchwork.ozlabs.org/api/people/3241/","name":"will schmidt","email":"will_schmidt@vnet.ibm.com"},"content":"Hi,\n\n[PATCH, rs6000] [v2] Folding of vector loads in GIMPLE\n    \nFolding of vector loads in GIMPLE.\n    \nAdd code to handle gimple folding for the vec_ld builtins.\nRemove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\ncomments have been adjusted slightly so they continue to read OK for the\nexisting vec_st code.\n    \nThe resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\ntests which have been posted separately.\n\nFor V2 of this patch, I've removed the chunk of code that prohibited the\ngimple fold from occurring in BE environments.   This had fixed an issue\nfor me earlier during my development of the code, and turns out this was\nnot necessary.  I've sniff-tested after removing that check and it looks\nOK.\n\n>+ /* Limit folding of loads to LE targets.  */ \n> +\t if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n> +\t   return false;\n\nI've restarted a regression test on this updated version.\n    \nOK for trunk (assuming successful regression test completion)  ?\n    \nThanks,\n-Will\n    \n[gcc]\n\n        2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n\n        * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n        for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n        * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n        Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n\ndiff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\nindex fbab0a2..bb8a77d 100644\n--- a/gcc/config/rs6000/rs6000-c.c\n+++ b/gcc/config/rs6000/rs6000-c.c\n@@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n \t\t     convert (TREE_TYPE (stmt), arg0));\n       stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n       return stmt;\n     }\n \n-  /* Expand vec_ld into an expression that masks the address and\n-     performs the load.  We need to expand this early to allow\n+  /* Expand vec_st into an expression that masks the address and\n+     performs the store.  We need to expand this early to allow\n      the best aliasing, as by the time we get into RTL we no longer\n      are able to honor __restrict__, for example.  We may want to\n      consider this for all memory access built-ins.\n \n      When -maltivec=be is specified, or the wrong number of arguments\n      is provided, simply punt to existing built-in processing.  */\n-  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n-      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n-      && nargs == 2)\n-    {\n-      tree arg0 = (*arglist)[0];\n-      tree arg1 = (*arglist)[1];\n-\n-      /* Strip qualifiers like \"const\" from the pointer arg.  */\n-      tree arg1_type = TREE_TYPE (arg1);\n-      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n-\tgoto bad;\n-\n-      tree inner_type = TREE_TYPE (arg1_type);\n-      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n-\t{\n-\t  arg1_type = build_pointer_type (build_qualified_type (inner_type,\n-\t\t\t\t\t\t\t\t0));\n-\t  arg1 = fold_convert (arg1_type, arg1);\n-\t}\n-\n-      /* Construct the masked address.  Let existing error handling take\n-\t over if we don't have a constant offset.  */\n-      arg0 = fold (arg0);\n-\n-      if (TREE_CODE (arg0) == INTEGER_CST)\n-\t{\n-\t  if (!ptrofftype_p (TREE_TYPE (arg0)))\n-\t    arg0 = build1 (NOP_EXPR, sizetype, arg0);\n-\n-\t  tree arg1_type = TREE_TYPE (arg1);\n-\t  if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n-\t    {\n-\t      arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n-\t      tree const0 = build_int_cstu (sizetype, 0);\n-\t      tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n-\t      arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n-\t    }\n-\n-\t  tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n-\t\t\t\t       arg1, arg0);\n-\t  tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n-\t\t\t\t\t  build_int_cst (arg1_type, -16));\n-\n-\t  /* Find the built-in to get the return type so we can convert\n-\t     the result properly (or fall back to default handling if the\n-\t     arguments aren't compatible).  */\n-\t  for (desc = altivec_overloaded_builtins;\n-\t       desc->code && desc->code != fcode; desc++)\n-\t    continue;\n-\n-\t  for (; desc->code == fcode; desc++)\n-\t    if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n-\t\t&& (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n-\t\t\t\t\t\t    desc->op2)))\n-\t      {\n-\t\ttree ret_type = rs6000_builtin_type (desc->ret_type);\n-\t\tif (TYPE_MODE (ret_type) == V2DImode)\n-\t\t  /* Type-based aliasing analysis thinks vector long\n-\t\t     and vector long long are different and will put them\n-\t\t     in distinct alias classes.  Force our return type\n-\t\t     to be a may-alias type to avoid this.  */\n-\t\t  ret_type\n-\t\t    = build_pointer_type_for_mode (ret_type, Pmode,\n-\t\t\t\t\t\t   true/*can_alias_all*/);\n-\t\telse\n-\t\t  ret_type = build_pointer_type (ret_type);\n-\t\taligned = build1 (NOP_EXPR, ret_type, aligned);\n-\t\ttree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n-\t\treturn ret_val;\n-\t      }\n-\t}\n-    }\n \n-  /* Similarly for stvx.  */\n   if (fcode == ALTIVEC_BUILTIN_VEC_ST\n       && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n       && nargs == 3)\n     {\n       tree arg0 = (*arglist)[0];\ndiff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\nindex 1338371..1fb5f44 100644\n--- a/gcc/config/rs6000/rs6000.c\n+++ b/gcc/config/rs6000/rs6000.c\n@@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n \tres = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n \tgsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n \tupdate_call_from_tree (gsi, res);\n \treturn true;\n       }\n+    /* Vector loads.  */\n+    case ALTIVEC_BUILTIN_LVX_V16QI:\n+    case ALTIVEC_BUILTIN_LVX_V8HI:\n+    case ALTIVEC_BUILTIN_LVX_V4SI:\n+    case ALTIVEC_BUILTIN_LVX_V4SF:\n+    case ALTIVEC_BUILTIN_LVX_V2DI:\n+    case ALTIVEC_BUILTIN_LVX_V2DF:\n+      {\n+\t gimple *g;\n+\t arg0 = gimple_call_arg (stmt, 0);  // offset\n+\t arg1 = gimple_call_arg (stmt, 1);  // address\n+\n+\t lhs = gimple_call_lhs (stmt);\n+\t location_t loc = gimple_location (stmt);\n+\n+\t tree arg1_type = TREE_TYPE (arg1);\n+\t tree lhs_type = TREE_TYPE (lhs);\n+\n+\t /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n+\t    the tree using the value from arg0.  The resulting type will match\n+\t    the type of arg1.  */\n+\t tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n+\t g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n+\t gimple_set_location (g, loc);\n+\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n+\t tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n+\t g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n+\t\t\t\t  temp_offset);\n+\t gimple_set_location (g, loc);\n+\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n+\n+\t /* Mask off any lower bits from the address.  */\n+\t tree alignment_mask = build_int_cst (arg1_type, -16);\n+\t tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n+\t g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n+\t\t\t\t temp_addr, alignment_mask);\n+\t gimple_set_location (g, loc);\n+\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n+\n+\t /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n+\t    take an offset, but since we've already incorporated the offset\n+\t    above, here we just pass in a zero.  */\n+\t g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n+\t\t\t\t\t\tbuild_int_cst (arg1_type, 0)));\n+\t gimple_set_location (g, loc);\n+\t gsi_replace (gsi, g, true);\n+\n+\t return true;\n+\n+      }\n+\n     default:\n \tif (TARGET_DEBUG_BUILTIN)\n \t   fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n \t\t    fn_code, fn_name1, fn_name2);\n       break;","headers":{"Return-Path":"<gcc-patches-return-461990-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-461990-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"lU8ssvvo\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xsHTP59NSz9t2x\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 13 Sep 2017 07:08:41 +1000 (AEST)","(qmail 97584 invoked by alias); 12 Sep 2017 21:08:34 -0000","(qmail 97565 invoked by uid 89); 12 Sep 2017 21:08:33 -0000","from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)\n\t(148.163.158.5) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tTue, 12 Sep 2017 21:08:31 +0000","from pps.filterd (m0098414.ppops.net [127.0.0.1])\tby\n\tmx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id\n\tv8CL4wKp078267\tfor <gcc-patches@gcc.gnu.org>;\n\tTue, 12 Sep 2017 17:08:29 -0400","from e15.ny.us.ibm.com (e15.ny.us.ibm.com [129.33.205.205])\tby\n\tmx0b-001b2d01.pphosted.com with ESMTP id\n\t2cxmqjq1yd-1\t(version=TLSv1.2 cipher=AES256-SHA bits=256\n\tverify=NOT)\tfor <gcc-patches@gcc.gnu.org>;\n\tTue, 12 Sep 2017 17:08:29 -0400","from localhost\tby e15.ny.us.ibm.com with IBM ESMTP SMTP Gateway:\n\tAuthorized Use Only! Violators will be prosecuted\tfor\n\t<gcc-patches@gcc.gnu.org> from <will_schmidt@vnet.ibm.com>;\n\tTue, 12 Sep 2017 17:08:28 -0400","from b01cxnp23032.gho.pok.ibm.com (9.57.198.27)\tby\n\te15.ny.us.ibm.com (146.89.104.202) with IBM ESMTP SMTP\n\tGateway: Authorized Use Only! Violators will be prosecuted;\n\tTue, 12 Sep 2017 17:08:27 -0400","from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com\n\t[9.57.199.107])\tby b01cxnp23032.gho.pok.ibm.com\n\t(8.14.9/8.14.9/NCO v10.0) with ESMTP id v8CL8Q5a27459780;\n\tTue, 12 Sep 2017 21:08:26 GMT","from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1])\tby IMSVA\n\t(Postfix) with ESMTP id 0CE54124049;\n\tTue, 12 Sep 2017 17:05:44 -0400 (EDT)","from [9.10.86.107] (unknown [9.10.86.107])\tby\n\tb01ledav002.gho.pok.ibm.com (Postfix) with ESMTP id\n\tBA60D12403F; Tue, 12 Sep 2017 17:05:43 -0400 (EDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:subject:from:reply-to:to:cc:in-reply-to:references:content-type\n\t:date:mime-version:content-transfer-encoding:message-id; q=dns;\n\ts=default; b=TXuvWDv//uSnGX6VCX+oTG690pGqIUO7ZqNDp7VlYkN4gvrVu2\n\t/SMeygx4iB4f1OJn735IuvhW6+oaSe2HjKjE0q1swaRmhqfCxgM9WzZ0Y/84ZURK\n\txV4QrG0j0ARjWF61YW54bZFiv4n6K2Axep4cxDivzCZp4YeS3le6/Ezec=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:subject:from:reply-to:to:cc:in-reply-to:references:content-type\n\t:date:mime-version:content-transfer-encoding:message-id; s=\n\tdefault; bh=n6NKrwvN2Zu2ZzgQoA3tdlvdYHg=; b=lU8ssvvo3zonno640TTe\n\t5FTxLX+5rc6uJQYoDueDYkkn1oisFhiGjkUoJFIKcEvvc5impMWm1bcLUj2reU0c\n\tKYmOIidJOJ5POW5N/vAXxSjT0M8jaXQ/1R7tBYEfH099JX6eXLBG71+uyMZzNaLP\n\ttJfTNxEKd9UayEA2HETmICI=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-26.6 required=5.0 tests=AWL, BAYES_00,\n\tGIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,\n\tRCVD_IN_DNSWL_LOW,\n\tSPF_PASS autolearn=ham version=3.3.2 spammy=","X-HELO":"mx0a-001b2d01.pphosted.com","Subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","From":"Will Schmidt <will_schmidt@vnet.ibm.com>","Reply-To":"will_schmidt@vnet.ibm.com","To":"GCC Patches <gcc-patches@gcc.gnu.org>","Cc":"Segher Boessenkool <segher@kernel.crashing.org>,\n\tRichard Biener <richard.guenther@gmail.com>,\n\tBill Schmidt <wschmidt@linux.vnet.ibm.com>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","In-Reply-To":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>","Content-Type":"text/plain; charset=\"UTF-8\"","Date":"Tue, 12 Sep 2017 16:08:25 -0500","Mime-Version":"1.0","Content-Transfer-Encoding":"7bit","X-TM-AS-GCONF":"00","x-cbid":"17091221-0036-0000-0000-000002690AAE","X-IBM-SpamModules-Scores":"","X-IBM-SpamModules-Versions":"BY=3.00007713; HX=3.00000241; KW=3.00000007;\n\tPH=3.00000004; SC=3.00000227; SDB=6.00916106; UDB=6.00460001;\n\tIPR=6.00696337; BA=6.00005587; NDR=6.00000001; ZLA=6.00000005;\n\tZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;\n\tZU=6.00000002; MB=3.00017130; XFM=3.00000015;\n\tUTC=2017-09-12 21:08:28","X-IBM-AV-DETECTION":"SAVI=unused REMOTE=unused XFE=unused","x-cbparentid":"17091221-0037-0000-0000-000041BF9C19","Message-Id":"<1505250505.14827.191.camel@brimstone.rchland.ibm.com>","X-Proofpoint-Virus-Version":"vendor=fsecure engine=2.50.10432:, ,\n\tdefinitions=2017-09-12_09:, , signatures=0","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n\tspamscore=0 suspectscore=0 malwarescore=0 phishscore=0\n\tadultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx\n\tscancount=1 engine=8.0.1-1707230000\n\tdefinitions=main-1709120298","X-IsSubscribed":"yes"}},{"id":1767399,"web_url":"http://patchwork.ozlabs.org/comment/1767399/","msgid":"<2B5668CD-EADE-4F29-8B96-B61B535B6605@linux.vnet.ibm.com>","list_archive_url":null,"date":"2017-09-12T21:53:50","subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","submitter":{"id":6459,"url":"http://patchwork.ozlabs.org/api/people/6459/","name":"Bill Schmidt","email":"wschmidt@linux.vnet.ibm.com"},"content":"> On Sep 12, 2017, at 4:08 PM, Will Schmidt <will_schmidt@vnet.ibm.com> wrote:\n> \n> Hi,\n> \n> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE\n> \n> Folding of vector loads in GIMPLE.\n> \n> Add code to handle gimple folding for the vec_ld builtins.\n> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\n> comments have been adjusted slightly so they continue to read OK for the\n> existing vec_st code.\n> \n> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\n> tests which have been posted separately.\n> \n> For V2 of this patch, I've removed the chunk of code that prohibited the\n> gimple fold from occurring in BE environments.   This had fixed an issue\n> for me earlier during my development of the code, and turns out this was\n> not necessary.  I've sniff-tested after removing that check and it looks\n> OK.\n\nThanks!\n> \n>> + /* Limit folding of loads to LE targets.  */ \n>> +\t if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n>> +\t   return false;\n> \n> I've restarted a regression test on this updated version.\n> \n> OK for trunk (assuming successful regression test completion)  ?\n\nLooks good to me otherwise, but Richard may have streamlining\nimprovements, so please wait for his review.  And of course Segher's.\n\nThanks,\nBill\n> \n> Thanks,\n> -Will\n> \n> [gcc]\n> \n>        2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n> \n>        * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n>        for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n>        * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n>        Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n> \n> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\n> index fbab0a2..bb8a77d 100644\n> --- a/gcc/config/rs6000/rs6000-c.c\n> +++ b/gcc/config/rs6000/rs6000-c.c\n> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n> \t\t     convert (TREE_TYPE (stmt), arg0));\n>       stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n>       return stmt;\n>     }\n> \n> -  /* Expand vec_ld into an expression that masks the address and\n> -     performs the load.  We need to expand this early to allow\n> +  /* Expand vec_st into an expression that masks the address and\n> +     performs the store.  We need to expand this early to allow\n>      the best aliasing, as by the time we get into RTL we no longer\n>      are able to honor __restrict__, for example.  We may want to\n>      consider this for all memory access built-ins.\n> \n>      When -maltivec=be is specified, or the wrong number of arguments\n>      is provided, simply punt to existing built-in processing.  */\n> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n> -      && nargs == 2)\n> -    {\n> -      tree arg0 = (*arglist)[0];\n> -      tree arg1 = (*arglist)[1];\n> -\n> -      /* Strip qualifiers like \"const\" from the pointer arg.  */\n> -      tree arg1_type = TREE_TYPE (arg1);\n> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n> -\tgoto bad;\n> -\n> -      tree inner_type = TREE_TYPE (arg1_type);\n> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n> -\t{\n> -\t  arg1_type = build_pointer_type (build_qualified_type (inner_type,\n> -\t\t\t\t\t\t\t\t0));\n> -\t  arg1 = fold_convert (arg1_type, arg1);\n> -\t}\n> -\n> -      /* Construct the masked address.  Let existing error handling take\n> -\t over if we don't have a constant offset.  */\n> -      arg0 = fold (arg0);\n> -\n> -      if (TREE_CODE (arg0) == INTEGER_CST)\n> -\t{\n> -\t  if (!ptrofftype_p (TREE_TYPE (arg0)))\n> -\t    arg0 = build1 (NOP_EXPR, sizetype, arg0);\n> -\n> -\t  tree arg1_type = TREE_TYPE (arg1);\n> -\t  if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n> -\t    {\n> -\t      arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n> -\t      tree const0 = build_int_cstu (sizetype, 0);\n> -\t      tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n> -\t      arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n> -\t    }\n> -\n> -\t  tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n> -\t\t\t\t       arg1, arg0);\n> -\t  tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n> -\t\t\t\t\t  build_int_cst (arg1_type, -16));\n> -\n> -\t  /* Find the built-in to get the return type so we can convert\n> -\t     the result properly (or fall back to default handling if the\n> -\t     arguments aren't compatible).  */\n> -\t  for (desc = altivec_overloaded_builtins;\n> -\t       desc->code && desc->code != fcode; desc++)\n> -\t    continue;\n> -\n> -\t  for (; desc->code == fcode; desc++)\n> -\t    if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n> -\t\t&& (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n> -\t\t\t\t\t\t    desc->op2)))\n> -\t      {\n> -\t\ttree ret_type = rs6000_builtin_type (desc->ret_type);\n> -\t\tif (TYPE_MODE (ret_type) == V2DImode)\n> -\t\t  /* Type-based aliasing analysis thinks vector long\n> -\t\t     and vector long long are different and will put them\n> -\t\t     in distinct alias classes.  Force our return type\n> -\t\t     to be a may-alias type to avoid this.  */\n> -\t\t  ret_type\n> -\t\t    = build_pointer_type_for_mode (ret_type, Pmode,\n> -\t\t\t\t\t\t   true/*can_alias_all*/);\n> -\t\telse\n> -\t\t  ret_type = build_pointer_type (ret_type);\n> -\t\taligned = build1 (NOP_EXPR, ret_type, aligned);\n> -\t\ttree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n> -\t\treturn ret_val;\n> -\t      }\n> -\t}\n> -    }\n> \n> -  /* Similarly for stvx.  */\n>   if (fcode == ALTIVEC_BUILTIN_VEC_ST\n>       && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>       && nargs == 3)\n>     {\n>       tree arg0 = (*arglist)[0];\n> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\n> index 1338371..1fb5f44 100644\n> --- a/gcc/config/rs6000/rs6000.c\n> +++ b/gcc/config/rs6000/rs6000.c\n> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n> \tres = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n> \tgsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n> \tupdate_call_from_tree (gsi, res);\n> \treturn true;\n>       }\n> +    /* Vector loads.  */\n> +    case ALTIVEC_BUILTIN_LVX_V16QI:\n> +    case ALTIVEC_BUILTIN_LVX_V8HI:\n> +    case ALTIVEC_BUILTIN_LVX_V4SI:\n> +    case ALTIVEC_BUILTIN_LVX_V4SF:\n> +    case ALTIVEC_BUILTIN_LVX_V2DI:\n> +    case ALTIVEC_BUILTIN_LVX_V2DF:\n> +      {\n> +\t gimple *g;\n> +\t arg0 = gimple_call_arg (stmt, 0);  // offset\n> +\t arg1 = gimple_call_arg (stmt, 1);  // address\n> +\n> +\t lhs = gimple_call_lhs (stmt);\n> +\t location_t loc = gimple_location (stmt);\n> +\n> +\t tree arg1_type = TREE_TYPE (arg1);\n> +\t tree lhs_type = TREE_TYPE (lhs);\n> +\n> +\t /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n> +\t    the tree using the value from arg0.  The resulting type will match\n> +\t    the type of arg1.  */\n> +\t tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n> +\t g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n> +\t gimple_set_location (g, loc);\n> +\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> +\t tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n> +\t g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n> +\t\t\t\t  temp_offset);\n> +\t gimple_set_location (g, loc);\n> +\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> +\n> +\t /* Mask off any lower bits from the address.  */\n> +\t tree alignment_mask = build_int_cst (arg1_type, -16);\n> +\t tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n> +\t g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n> +\t\t\t\t temp_addr, alignment_mask);\n> +\t gimple_set_location (g, loc);\n> +\t gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> +\n> +\t /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n> +\t    take an offset, but since we've already incorporated the offset\n> +\t    above, here we just pass in a zero.  */\n> +\t g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n> +\t\t\t\t\t\tbuild_int_cst (arg1_type, 0)));\n> +\t gimple_set_location (g, loc);\n> +\t gsi_replace (gsi, g, true);\n> +\n> +\t return true;\n> +\n> +      }\n> +\n>     default:\n> \tif (TARGET_DEBUG_BUILTIN)\n> \t   fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n> \t\t    fn_code, fn_name1, fn_name2);\n>       break;\n> \n>","headers":{"Return-Path":"<gcc-patches-return-461991-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-461991-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"O9twT1RX\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xsJTs1W0cz9t33\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 13 Sep 2017 07:54:08 +1000 (AEST)","(qmail 88770 invoked by alias); 12 Sep 2017 21:54:00 -0000","(qmail 88746 invoked by uid 89); 12 Sep 2017 21:54:00 -0000","from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)\n\t(148.163.156.1) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tTue, 12 Sep 2017 21:53:58 +0000","from pps.filterd (m0098394.ppops.net [127.0.0.1])\tby\n\tmx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id\n\tv8CLrnak057310\tfor <gcc-patches@gcc.gnu.org>;\n\tTue, 12 Sep 2017 17:53:57 -0400","from e37.co.us.ibm.com (e37.co.us.ibm.com [32.97.110.158])\tby\n\tmx0a-001b2d01.pphosted.com with ESMTP id\n\t2cxhjb38pc-1\t(version=TLSv1.2 cipher=AES256-SHA bits=256\n\tverify=NOT)\tfor <gcc-patches@gcc.gnu.org>;\n\tTue, 12 Sep 2017 17:53:56 -0400","from localhost\tby e37.co.us.ibm.com with IBM ESMTP SMTP Gateway:\n\tAuthorized Use Only! Violators will be prosecuted\tfor\n\t<gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;\n\tTue, 12 Sep 2017 15:53:55 -0600","from b03cxnp07028.gho.boulder.ibm.com (9.17.130.15)\tby\n\te37.co.us.ibm.com (192.168.1.137) with IBM ESMTP SMTP\n\tGateway: Authorized Use Only! Violators will be prosecuted;\n\tTue, 12 Sep 2017 15:53:52 -0600","from b03ledav005.gho.boulder.ibm.com\n\t(b03ledav005.gho.boulder.ibm.com [9.17.130.236])\tby\n\tb03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0)\n\twith ESMTP id v8CLrpv85767618; Tue, 12 Sep 2017 14:53:51 -0700","from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1])\tby\n\tIMSVA (Postfix) with ESMTP id 9D507BE040;\n\tTue, 12 Sep 2017 15:53:51 -0600 (MDT)","from bigmac.rchland.ibm.com (unknown [9.10.86.143])\tby\n\tb03ledav005.gho.boulder.ibm.com (Postfix) with ESMTPS id\n\t3D768BE039; Tue, 12 Sep 2017 15:53:51 -0600 (MDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; q=dns; s=\n\tdefault; b=kVQliutyjbJHeUprikLoqyO1fRiKrwu/iHlOI4FDKHzC5snLXxNUJ\n\tAt0RTt+o7F+L4xRR6csMOLUA2ef5YhYNWQc42S4PdCKADfmWMB8tNZYTkbt10Yyv\n\tjP9GI42P0UjM5Ae2HadEjqsLRE/DV8q57xLoRpQKMn+28kReVmW48Y=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; s=default;\n\tbh=4kieHJp/HFBQkLT0D3iRJAr1el0=; b=O9twT1RXBTIf6VeOnByygCHVCjJZ\n\tGRXqT5otZ7zgVzQ43JWTSCWT2JOzOdshUzv7E264RBTnFMjpTUZ0P/l61eO8bfGz\n\txGA2UKEoijiTImEUV9dK+VgGTIQU4rWkeOCfy7hGTDZd50O1eyGd5VrWQXKakUda\n\tqX+0tcCTtQjyoJk=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-24.7 required=5.0 tests=AWL, BAYES_00,\n\tGIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,\n\tKAM_LAZY_DOMAIN_SECURITY,\n\tRCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=","X-HELO":"mx0a-001b2d01.pphosted.com","Content-Type":"text/plain; charset=us-ascii","Mime-Version":"1.0 (Mac OS X Mail 10.3 \\(3273\\))","Subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","From":"Bill Schmidt <wschmidt@linux.vnet.ibm.com>","In-Reply-To":"<1505250505.14827.191.camel@brimstone.rchland.ibm.com>","Date":"Tue, 12 Sep 2017 16:53:50 -0500","Cc":"GCC Patches <gcc-patches@gcc.gnu.org>,\n\tSegher Boessenkool <segher@kernel.crashing.org>,\n\tRichard Biener <richard.guenther@gmail.com>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","Content-Transfer-Encoding":"quoted-printable","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>\n\t<1505250505.14827.191.camel@brimstone.rchland.ibm.com>","To":"will_schmidt@vnet.ibm.com","X-TM-AS-GCONF":"00","x-cbid":"17091221-0024-0000-0000-0000172F3B2F","X-IBM-SpamModules-Scores":"","X-IBM-SpamModules-Versions":"BY=3.00007714; HX=3.00000241; KW=3.00000007;\n\tPH=3.00000004; SC=3.00000227; SDB=6.00916121; UDB=6.00460010;\n\tIPR=6.00696353; BA=6.00005587; NDR=6.00000001; ZLA=6.00000005;\n\tZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;\n\tZU=6.00000002; MB=3.00017130; XFM=3.00000015;\n\tUTC=2017-09-12 21:53:53","X-IBM-AV-DETECTION":"SAVI=unused REMOTE=unused XFE=unused","x-cbparentid":"17091221-0025-0000-0000-00004CB0F563","Message-Id":"<2B5668CD-EADE-4F29-8B96-B61B535B6605@linux.vnet.ibm.com>","X-Proofpoint-Virus-Version":"vendor=fsecure engine=2.50.10432:, ,\n\tdefinitions=2017-09-12_09:, , signatures=0","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n\tspamscore=0 suspectscore=1 malwarescore=0 phishscore=0\n\tadultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx\n\tscancount=1 engine=8.0.1-1707230000\n\tdefinitions=main-1709120310","X-IsSubscribed":"yes"}},{"id":1767842,"web_url":"http://patchwork.ozlabs.org/comment/1767842/","msgid":"<CAFiYyc18DrEDhKH1eeTFDq7r=JOE0aHVjZqMGNkf8+DPA5O23w@mail.gmail.com>","list_archive_url":null,"date":"2017-09-13T12:23:01","subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","submitter":{"id":1765,"url":"http://patchwork.ozlabs.org/api/people/1765/","name":"Richard Biener","email":"richard.guenther@gmail.com"},"content":"On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt\n<will_schmidt@vnet.ibm.com> wrote:\n> Hi,\n>\n> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE\n>\n> Folding of vector loads in GIMPLE.\n>\n> Add code to handle gimple folding for the vec_ld builtins.\n> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\n> comments have been adjusted slightly so they continue to read OK for the\n> existing vec_st code.\n>\n> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\n> tests which have been posted separately.\n>\n> For V2 of this patch, I've removed the chunk of code that prohibited the\n> gimple fold from occurring in BE environments.   This had fixed an issue\n> for me earlier during my development of the code, and turns out this was\n> not necessary.  I've sniff-tested after removing that check and it looks\n> OK.\n>\n>>+ /* Limit folding of loads to LE targets.  */\n>> +      if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n>> +        return false;\n>\n> I've restarted a regression test on this updated version.\n>\n> OK for trunk (assuming successful regression test completion)  ?\n>\n> Thanks,\n> -Will\n>\n> [gcc]\n>\n>         2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n>\n>         * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n>         for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n>         * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n>         Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n>\n> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\n> index fbab0a2..bb8a77d 100644\n> --- a/gcc/config/rs6000/rs6000-c.c\n> +++ b/gcc/config/rs6000/rs6000-c.c\n> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n>                      convert (TREE_TYPE (stmt), arg0));\n>        stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n>        return stmt;\n>      }\n>\n> -  /* Expand vec_ld into an expression that masks the address and\n> -     performs the load.  We need to expand this early to allow\n> +  /* Expand vec_st into an expression that masks the address and\n> +     performs the store.  We need to expand this early to allow\n>       the best aliasing, as by the time we get into RTL we no longer\n>       are able to honor __restrict__, for example.  We may want to\n>       consider this for all memory access built-ins.\n>\n>       When -maltivec=be is specified, or the wrong number of arguments\n>       is provided, simply punt to existing built-in processing.  */\n> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n> -      && nargs == 2)\n> -    {\n> -      tree arg0 = (*arglist)[0];\n> -      tree arg1 = (*arglist)[1];\n> -\n> -      /* Strip qualifiers like \"const\" from the pointer arg.  */\n> -      tree arg1_type = TREE_TYPE (arg1);\n> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n> -       goto bad;\n> -\n> -      tree inner_type = TREE_TYPE (arg1_type);\n> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n> -       {\n> -         arg1_type = build_pointer_type (build_qualified_type (inner_type,\n> -                                                               0));\n> -         arg1 = fold_convert (arg1_type, arg1);\n> -       }\n> -\n> -      /* Construct the masked address.  Let existing error handling take\n> -        over if we don't have a constant offset.  */\n> -      arg0 = fold (arg0);\n> -\n> -      if (TREE_CODE (arg0) == INTEGER_CST)\n> -       {\n> -         if (!ptrofftype_p (TREE_TYPE (arg0)))\n> -           arg0 = build1 (NOP_EXPR, sizetype, arg0);\n> -\n> -         tree arg1_type = TREE_TYPE (arg1);\n> -         if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n> -           {\n> -             arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n> -             tree const0 = build_int_cstu (sizetype, 0);\n> -             tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n> -             arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n> -           }\n> -\n> -         tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n> -                                      arg1, arg0);\n> -         tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n> -                                         build_int_cst (arg1_type, -16));\n> -\n> -         /* Find the built-in to get the return type so we can convert\n> -            the result properly (or fall back to default handling if the\n> -            arguments aren't compatible).  */\n> -         for (desc = altivec_overloaded_builtins;\n> -              desc->code && desc->code != fcode; desc++)\n> -           continue;\n> -\n> -         for (; desc->code == fcode; desc++)\n> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n> -               && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n> -                                                   desc->op2)))\n> -             {\n> -               tree ret_type = rs6000_builtin_type (desc->ret_type);\n> -               if (TYPE_MODE (ret_type) == V2DImode)\n> -                 /* Type-based aliasing analysis thinks vector long\n> -                    and vector long long are different and will put them\n> -                    in distinct alias classes.  Force our return type\n> -                    to be a may-alias type to avoid this.  */\n> -                 ret_type\n> -                   = build_pointer_type_for_mode (ret_type, Pmode,\n> -                                                  true/*can_alias_all*/);\n> -               else\n> -                 ret_type = build_pointer_type (ret_type);\n> -               aligned = build1 (NOP_EXPR, ret_type, aligned);\n> -               tree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n> -               return ret_val;\n> -             }\n> -       }\n> -    }\n>\n> -  /* Similarly for stvx.  */\n>    if (fcode == ALTIVEC_BUILTIN_VEC_ST\n>        && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>        && nargs == 3)\n>      {\n>        tree arg0 = (*arglist)[0];\n> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\n> index 1338371..1fb5f44 100644\n> --- a/gcc/config/rs6000/rs6000.c\n> +++ b/gcc/config/rs6000/rs6000.c\n> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n>         res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n>         gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>         update_call_from_tree (gsi, res);\n>         return true;\n>        }\n> +    /* Vector loads.  */\n> +    case ALTIVEC_BUILTIN_LVX_V16QI:\n> +    case ALTIVEC_BUILTIN_LVX_V8HI:\n> +    case ALTIVEC_BUILTIN_LVX_V4SI:\n> +    case ALTIVEC_BUILTIN_LVX_V4SF:\n> +    case ALTIVEC_BUILTIN_LVX_V2DI:\n> +    case ALTIVEC_BUILTIN_LVX_V2DF:\n> +      {\n> +        gimple *g;\n> +        arg0 = gimple_call_arg (stmt, 0);  // offset\n> +        arg1 = gimple_call_arg (stmt, 1);  // address\n> +\n> +        lhs = gimple_call_lhs (stmt);\n> +        location_t loc = gimple_location (stmt);\n> +\n> +        tree arg1_type = TREE_TYPE (arg1);\n> +        tree lhs_type = TREE_TYPE (lhs);\n> +\n> +        /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n> +           the tree using the value from arg0.  The resulting type will match\n> +           the type of arg1.  */\n> +        tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n> +        g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n> +        gimple_set_location (g, loc);\n> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> +        tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n> +        g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n> +                                 temp_offset);\n> +        gimple_set_location (g, loc);\n> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> +\n> +        /* Mask off any lower bits from the address.  */\n> +        tree alignment_mask = build_int_cst (arg1_type, -16);\n> +        tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n> +        g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n> +                                temp_addr, alignment_mask);\n> +        gimple_set_location (g, loc);\n> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n\nYou could use\n\n  gimple_seq stmts = NULL;\n  tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);\n  tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,\narg1_type, arg1, temp_offset);\n  tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,\narg1_type, temp_addr, build_int_cst (arg1_type, -16));\n  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n\n> +        /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n> +           take an offset, but since we've already incorporated the offset\n> +           above, here we just pass in a zero.  */\n> +        g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n> +                                               build_int_cst (arg1_type, 0)));\n\nare you sure about arg1_type here?  I'm sure not.  For\n\n... foo (struct S *p)\n{\n  return __builtin_lvx_v2df (4, (double *)p);\n}\n\nyou'd end up with p as arg1 and thus struct S * as arg1_type and thus\nTBAA using 'struct S' to access the memory.\n\nI think if the builtins have any TBAA constraints you need to build those\nexplicitely, if not, you should use ptr_type_node aka no TBAA.\n\nRichard.\n\n> +        gimple_set_location (g, loc);\n> +        gsi_replace (gsi, g, true);\n> +\n> +        return true;\n> +\n> +      }\n> +\n>      default:\n>         if (TARGET_DEBUG_BUILTIN)\n>            fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n>                     fn_code, fn_name1, fn_name2);\n>        break;\n>\n>","headers":{"Return-Path":"<gcc-patches-return-462027-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-462027-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"Wk2k17F3\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xsgmq2GvZz9s72\n\tfor <incoming@patchwork.ozlabs.org>;\n\tWed, 13 Sep 2017 22:23:19 +1000 (AEST)","(qmail 23263 invoked by alias); 13 Sep 2017 12:23:12 -0000","(qmail 23210 invoked by uid 89); 13 Sep 2017 12:23:11 -0000","from mail-wm0-f51.google.com (HELO mail-wm0-f51.google.com)\n\t(74.125.82.51) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tWed, 13 Sep 2017 12:23:04 +0000","by mail-wm0-f51.google.com with SMTP id g206so5369927wme.0 for\n\t<gcc-patches@gcc.gnu.org>; Wed, 13 Sep 2017 05:23:03 -0700 (PDT)","by 10.80.180.250 with HTTP; Wed, 13 Sep 2017 05:23:01 -0700 (PDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:mime-version:in-reply-to:references:from:date:message-id\n\t:subject:to:cc:content-type; q=dns; s=default; b=IVm9Z48Mivj7rZr\n\th8mRUXSTfVPACgWKK4qjmJTSy6wayaFc7xfvC0fhMyys1ty8tjbcF/aL43ThQz1U\n\tbHO8lMnBrfMhThpvtfIW43LGZqKxcNqdNwZjRVS+/Ffejk3YJKAv5yUS5u5r+auU\n\taNtjm0PMyHz9ms8oQiqCMWwyq5Ms=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:mime-version:in-reply-to:references:from:date:message-id\n\t:subject:to:cc:content-type; s=default; bh=4XzBvq0ztaA9UfJGhs/u8\n\twVOh64=; b=Wk2k17F3cfQ2JpvOqOK6imFAZn8mpBB+xqgg6FQrIS/MoHhjWjKu3\n\tx48cRrdatMroJd0mSeXIdbcgVWsByGCUYt8CZ31484BKWrN3M2xlF4LwEvlPo4Y+\n\tif7GIdOeCxv5xI2ZTKyfObDjyN9EkBDrThRBoSWryPBrAXXKZeCkQw=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-24.5 required=5.0 tests=AWL, BAYES_00,\n\tFREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2,\n\tGIT_PATCH_3, RCVD_IN_DNSWL_NONE,\n\tSPF_PASS autolearn=ham version=3.3.2 spammy=","X-HELO":"mail-wm0-f51.google.com","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;\n\ts=20161025;\n\th=x-gm-message-state:mime-version:in-reply-to:references:from:date\n\t:message-id:subject:to:cc;\n\tbh=gEWF7NYlMlO+oRFr7GtOpz7HGQuRSn9TyZ6AnyLAUqA=;\n\tb=lyqpF1fKQZJHyeuK5ajEw5cV8biSbdMHbidUSnoNJfxy1e188lK/y/aBVlmaziX9a5\n\t2FAk1ti98weaXy/WVHLoWdocPekW7Dq5ZnezI2nVdBWQc/Pm/6YgUpNv9aXR6EldZd/q\n\ti9WFNakM5/N2cmsioezLCd3aLtR2phOcXEUg/GF0ZNed+2dq+tf+ApF298TLYGdMCfGJ\n\tff5zSfAe094wi77M3cjk0r1OZVXqPgp9VgqaTFnWXhDuvX0+JkmtizH7FJHiZSf1VZKv\n\tGr+IRcce1vlFrvwomTXS9ASK9D9IZ55lPge3lO5eV6T0lMkmt2omAEpPWZgMBBo6KQp4\n\t82nw==","X-Gm-Message-State":"AHPjjUgEDgyA8MFot24B5bChATo5yYnCaveSjfL4hNXRJcrr5KTo0u/6\tf6TSpmT0jYsaGFrd09X54v9HixMRrjxkCysBxlU=","X-Google-Smtp-Source":"ADKCNb5XAkpcYLEjFKk1QLuBsv/rPUzROGAbptvcURs14GQtFn5KUmqYqvXssEa6PJUdJacTM0K5WmBjhtiI8CFGeqE=","X-Received":"by 10.80.186.110 with SMTP id 43mr14622709eds.18.1505305381949;\n\tWed, 13 Sep 2017 05:23:01 -0700 (PDT)","MIME-Version":"1.0","In-Reply-To":"<1505250505.14827.191.camel@brimstone.rchland.ibm.com>","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>\n\t<1505250505.14827.191.camel@brimstone.rchland.ibm.com>","From":"Richard Biener <richard.guenther@gmail.com>","Date":"Wed, 13 Sep 2017 14:23:01 +0200","Message-ID":"<CAFiYyc18DrEDhKH1eeTFDq7r=JOE0aHVjZqMGNkf8+DPA5O23w@mail.gmail.com>","Subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","To":"will_schmidt@vnet.ibm.com","Cc":"GCC Patches <gcc-patches@gcc.gnu.org>,\n\tSegher Boessenkool <segher@kernel.crashing.org>,\n\tBill Schmidt <wschmidt@linux.vnet.ibm.com>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","Content-Type":"text/plain; charset=\"UTF-8\"","X-IsSubscribed":"yes"}},{"id":1767973,"web_url":"http://patchwork.ozlabs.org/comment/1767973/","msgid":"<24A6439E-6268-44F8-92CE-EF0F22DA5773@linux.vnet.ibm.com>","list_archive_url":null,"date":"2017-09-13T15:40:30","subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","submitter":{"id":6459,"url":"http://patchwork.ozlabs.org/api/people/6459/","name":"Bill Schmidt","email":"wschmidt@linux.vnet.ibm.com"},"content":"On Sep 13, 2017, at 7:23 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n> \n> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt\n> <will_schmidt@vnet.ibm.com> wrote:\n>> Hi,\n>> \n>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE\n>> \n>> Folding of vector loads in GIMPLE.\n>> \n>> Add code to handle gimple folding for the vec_ld builtins.\n>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\n>> comments have been adjusted slightly so they continue to read OK for the\n>> existing vec_st code.\n>> \n>> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\n>> tests which have been posted separately.\n>> \n>> For V2 of this patch, I've removed the chunk of code that prohibited the\n>> gimple fold from occurring in BE environments.   This had fixed an issue\n>> for me earlier during my development of the code, and turns out this was\n>> not necessary.  I've sniff-tested after removing that check and it looks\n>> OK.\n>> \n>>> + /* Limit folding of loads to LE targets.  */\n>>> +      if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n>>> +        return false;\n>> \n>> I've restarted a regression test on this updated version.\n>> \n>> OK for trunk (assuming successful regression test completion)  ?\n>> \n>> Thanks,\n>> -Will\n>> \n>> [gcc]\n>> \n>>        2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n>> \n>>        * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n>>        for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n>>        * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n>>        Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n>> \n>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\n>> index fbab0a2..bb8a77d 100644\n>> --- a/gcc/config/rs6000/rs6000-c.c\n>> +++ b/gcc/config/rs6000/rs6000-c.c\n>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n>>                     convert (TREE_TYPE (stmt), arg0));\n>>       stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n>>       return stmt;\n>>     }\n>> \n>> -  /* Expand vec_ld into an expression that masks the address and\n>> -     performs the load.  We need to expand this early to allow\n>> +  /* Expand vec_st into an expression that masks the address and\n>> +     performs the store.  We need to expand this early to allow\n>>      the best aliasing, as by the time we get into RTL we no longer\n>>      are able to honor __restrict__, for example.  We may want to\n>>      consider this for all memory access built-ins.\n>> \n>>      When -maltivec=be is specified, or the wrong number of arguments\n>>      is provided, simply punt to existing built-in processing.  */\n>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n>> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>> -      && nargs == 2)\n>> -    {\n>> -      tree arg0 = (*arglist)[0];\n>> -      tree arg1 = (*arglist)[1];\n>> -\n>> -      /* Strip qualifiers like \"const\" from the pointer arg.  */\n>> -      tree arg1_type = TREE_TYPE (arg1);\n>> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n>> -       goto bad;\n>> -\n>> -      tree inner_type = TREE_TYPE (arg1_type);\n>> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n>> -       {\n>> -         arg1_type = build_pointer_type (build_qualified_type (inner_type,\n>> -                                                               0));\n>> -         arg1 = fold_convert (arg1_type, arg1);\n>> -       }\n>> -\n>> -      /* Construct the masked address.  Let existing error handling take\n>> -        over if we don't have a constant offset.  */\n>> -      arg0 = fold (arg0);\n>> -\n>> -      if (TREE_CODE (arg0) == INTEGER_CST)\n>> -       {\n>> -         if (!ptrofftype_p (TREE_TYPE (arg0)))\n>> -           arg0 = build1 (NOP_EXPR, sizetype, arg0);\n>> -\n>> -         tree arg1_type = TREE_TYPE (arg1);\n>> -         if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n>> -           {\n>> -             arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n>> -             tree const0 = build_int_cstu (sizetype, 0);\n>> -             tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n>> -             arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n>> -           }\n>> -\n>> -         tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n>> -                                      arg1, arg0);\n>> -         tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n>> -                                         build_int_cst (arg1_type, -16));\n>> -\n>> -         /* Find the built-in to get the return type so we can convert\n>> -            the result properly (or fall back to default handling if the\n>> -            arguments aren't compatible).  */\n>> -         for (desc = altivec_overloaded_builtins;\n>> -              desc->code && desc->code != fcode; desc++)\n>> -           continue;\n>> -\n>> -         for (; desc->code == fcode; desc++)\n>> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n>> -               && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n>> -                                                   desc->op2)))\n>> -             {\n>> -               tree ret_type = rs6000_builtin_type (desc->ret_type);\n>> -               if (TYPE_MODE (ret_type) == V2DImode)\n>> -                 /* Type-based aliasing analysis thinks vector long\n>> -                    and vector long long are different and will put them\n>> -                    in distinct alias classes.  Force our return type\n>> -                    to be a may-alias type to avoid this.  */\n>> -                 ret_type\n>> -                   = build_pointer_type_for_mode (ret_type, Pmode,\n>> -                                                  true/*can_alias_all*/);\n>> -               else\n>> -                 ret_type = build_pointer_type (ret_type);\n>> -               aligned = build1 (NOP_EXPR, ret_type, aligned);\n>> -               tree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n>> -               return ret_val;\n>> -             }\n>> -       }\n>> -    }\n>> \n>> -  /* Similarly for stvx.  */\n>>   if (fcode == ALTIVEC_BUILTIN_VEC_ST\n>>       && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>>       && nargs == 3)\n>>     {\n>>       tree arg0 = (*arglist)[0];\n>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\n>> index 1338371..1fb5f44 100644\n>> --- a/gcc/config/rs6000/rs6000.c\n>> +++ b/gcc/config/rs6000/rs6000.c\n>> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n>>        res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n>>        gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>>        update_call_from_tree (gsi, res);\n>>        return true;\n>>       }\n>> +    /* Vector loads.  */\n>> +    case ALTIVEC_BUILTIN_LVX_V16QI:\n>> +    case ALTIVEC_BUILTIN_LVX_V8HI:\n>> +    case ALTIVEC_BUILTIN_LVX_V4SI:\n>> +    case ALTIVEC_BUILTIN_LVX_V4SF:\n>> +    case ALTIVEC_BUILTIN_LVX_V2DI:\n>> +    case ALTIVEC_BUILTIN_LVX_V2DF:\n>> +      {\n>> +        gimple *g;\n>> +        arg0 = gimple_call_arg (stmt, 0);  // offset\n>> +        arg1 = gimple_call_arg (stmt, 1);  // address\n>> +\n>> +        lhs = gimple_call_lhs (stmt);\n>> +        location_t loc = gimple_location (stmt);\n>> +\n>> +        tree arg1_type = TREE_TYPE (arg1);\n>> +        tree lhs_type = TREE_TYPE (lhs);\n>> +\n>> +        /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n>> +           the tree using the value from arg0.  The resulting type will match\n>> +           the type of arg1.  */\n>> +        tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n>> +        g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n>> +        gimple_set_location (g, loc);\n>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>> +        tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>> +        g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n>> +                                 temp_offset);\n>> +        gimple_set_location (g, loc);\n>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>> +\n>> +        /* Mask off any lower bits from the address.  */\n>> +        tree alignment_mask = build_int_cst (arg1_type, -16);\n>> +        tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>> +        g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n>> +                                temp_addr, alignment_mask);\n>> +        gimple_set_location (g, loc);\n>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> \n> You could use\n> \n>  gimple_seq stmts = NULL;\n>  tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);\n>  tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,\n> arg1_type, arg1, temp_offset);\n>  tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,\n> arg1_type, temp_addr, build_int_cst (arg1_type, -16));\n>  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n> \n>> +        /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n>> +           take an offset, but since we've already incorporated the offset\n>> +           above, here we just pass in a zero.  */\n>> +        g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n>> +                                               build_int_cst (arg1_type, 0)));\n> \n> are you sure about arg1_type here?  I'm sure not.  For\n> \n> ... foo (struct S *p)\n> {\n>  return __builtin_lvx_v2df (4, (double *)p);\n> }\n> \n> you'd end up with p as arg1 and thus struct S * as arg1_type and thus\n> TBAA using 'struct S' to access the memory.\n\nHm, is that so?  Wouldn't arg1_type be double* since arg1 is (double *)p?\nWill, you should probably test this example and see, but I'm pretty confident\nabout this (see below).\n\n> \n> I think if the builtins have any TBAA constraints you need to build those\n> explicitely, if not, you should use ptr_type_node aka no TBAA.\n\nThe type signatures are constrained during parsing, so we should only\nsee allowed pointer types on arg1 by the time we get to gimple folding.  I\nthink that using arg1_type should work, but I am probably missing\nsomething subtle, so please feel free to whack me on the temple until\nI get it. :-)\n\nBill\n> \n> Richard.\n> \n>> +        gimple_set_location (g, loc);\n>> +        gsi_replace (gsi, g, true);\n>> +\n>> +        return true;\n>> +\n>> +      }\n>> +\n>>     default:\n>>        if (TARGET_DEBUG_BUILTIN)\n>>           fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n>>                    fn_code, fn_name1, fn_name2);\n>>       break;","headers":{"Return-Path":"<gcc-patches-return-462048-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-462048-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"Sm94o2wl\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xsm9X6Mplz9s9Y\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 14 Sep 2017 01:41:34 +1000 (AEST)","(qmail 75782 invoked by alias); 13 Sep 2017 15:41:26 -0000","(qmail 74984 invoked by uid 89); 13 Sep 2017 15:41:25 -0000","from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)\n\t(148.163.158.5) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tWed, 13 Sep 2017 15:41:23 +0000","from pps.filterd (m0098414.ppops.net [127.0.0.1])\tby\n\tmx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id\n\tv8DFfDTJ144740\tfor <gcc-patches@gcc.gnu.org>;\n\tWed, 13 Sep 2017 11:41:21 -0400","from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150])\tby\n\tmx0b-001b2d01.pphosted.com with ESMTP id\n\t2cy2xpq9hh-1\t(version=TLSv1.2 cipher=AES256-SHA bits=256\n\tverify=NOT)\tfor <gcc-patches@gcc.gnu.org>;\n\tWed, 13 Sep 2017 11:41:15 -0400","from localhost\tby e32.co.us.ibm.com with IBM ESMTP SMTP Gateway:\n\tAuthorized Use Only! Violators will be prosecuted\tfor\n\t<gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;\n\tWed, 13 Sep 2017 09:40:34 -0600","from b03cxnp08028.gho.boulder.ibm.com (9.17.130.20)\tby\n\te32.co.us.ibm.com (192.168.1.132) with IBM ESMTP SMTP\n\tGateway: Authorized Use Only! Violators will be prosecuted;\n\tWed, 13 Sep 2017 09:40:32 -0600","from b03ledav004.gho.boulder.ibm.com\n\t(b03ledav004.gho.boulder.ibm.com [9.17.130.235])\tby\n\tb03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0)\n\twith ESMTP id v8DFeWHn32047208; Wed, 13 Sep 2017 08:40:32 -0700","from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1])\tby\n\tIMSVA (Postfix) with ESMTP id 15BDD78037;\n\tWed, 13 Sep 2017 09:40:32 -0600 (MDT)","from bigmac.rchland.ibm.com (unknown [9.10.86.143])\tby\n\tb03ledav004.gho.boulder.ibm.com (Postfix) with ESMTPS id\n\tA02CF78038; Wed, 13 Sep 2017 09:40:31 -0600 (MDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; q=dns; s=\n\tdefault; b=le9JatlxoONTOuu071qkYZDrERM1K2o5nw0i0Zu50EVEmQcfJVGck\n\tUDnLqguAc+gtZK4uDpJDpV24Y3VgZjs5kXuYbzj2L4zZAorqxE54+AQJavJsZFZE\n\tM19/MftSaUYKUyqRGSBZpqgAxW2N0q7QFWHpWc8nx0ld+iWhWhXbnk=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; s=default;\n\tbh=T9jB/z2xHL8gr7rtQq8QdAsJ+yA=; b=Sm94o2wl5jQDlq/7eMKp6cEi0YKa\n\tGpZ/ly6SBjf1LclQKCFJ4WUDNL5bOJrYW/dN2HFJdROR86AdUnuGGfGYomDetZ9u\n\t8zCxZCBcchRA8jEvOknOTYIjKHD0eQ2mP/AIBs1mLznqkFmoBkL3RaT890BZLYnZ\n\toWgYY78hMPRcdjI=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-24.7 required=5.0 tests=AWL, BAYES_00,\n\tGIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,\n\tKAM_LAZY_DOMAIN_SECURITY,\n\tRCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=","X-HELO":"mx0a-001b2d01.pphosted.com","Content-Type":"text/plain; charset=us-ascii","Mime-Version":"1.0 (Mac OS X Mail 10.3 \\(3273\\))","Subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","From":"Bill Schmidt <wschmidt@linux.vnet.ibm.com>","In-Reply-To":"<CAFiYyc18DrEDhKH1eeTFDq7r=JOE0aHVjZqMGNkf8+DPA5O23w@mail.gmail.com>","Date":"Wed, 13 Sep 2017 10:40:30 -0500","Cc":"will_schmidt@vnet.ibm.com, GCC Patches <gcc-patches@gcc.gnu.org>,\n\tSegher Boessenkool <segher@kernel.crashing.org>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","Content-Transfer-Encoding":"quoted-printable","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>\n\t<1505250505.14827.191.camel@brimstone.rchland.ibm.com>\n\t<CAFiYyc18DrEDhKH1eeTFDq7r=JOE0aHVjZqMGNkf8+DPA5O23w@mail.gmail.com>","To":"Richard Biener <richard.guenther@gmail.com>","X-TM-AS-GCONF":"00","x-cbid":"17091315-0004-0000-0000-000012EAC98E","X-IBM-SpamModules-Scores":"","X-IBM-SpamModules-Versions":"BY=3.00007721; HX=3.00000241; KW=3.00000007;\n\tPH=3.00000004; SC=3.00000227; SDB=6.00916476; UDB=6.00460224;\n\tIPR=6.00696708; BA=6.00005588; NDR=6.00000001; ZLA=6.00000005;\n\tZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;\n\tZU=6.00000002; MB=3.00017139; XFM=3.00000015;\n\tUTC=2017-09-13 15:40:33","X-IBM-AV-DETECTION":"SAVI=unused REMOTE=unused XFE=unused","x-cbparentid":"17091315-0005-0000-0000-0000841317A0","Message-Id":"<24A6439E-6268-44F8-92CE-EF0F22DA5773@linux.vnet.ibm.com>","X-Proofpoint-Virus-Version":"vendor=fsecure engine=2.50.10432:, ,\n\tdefinitions=2017-09-13_04:, , signatures=0","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n\tspamscore=0 suspectscore=0 malwarescore=0 phishscore=0\n\tadultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx\n\tscancount=1 engine=8.0.1-1707230000\n\tdefinitions=main-1709130245","X-IsSubscribed":"yes"}},{"id":1768128,"web_url":"http://patchwork.ozlabs.org/comment/1768128/","msgid":"<B1261C39-9D9D-4D34-A65F-FC48BC88CEF9@linux.vnet.ibm.com>","list_archive_url":null,"date":"2017-09-13T20:14:09","subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","submitter":{"id":6459,"url":"http://patchwork.ozlabs.org/api/people/6459/","name":"Bill Schmidt","email":"wschmidt@linux.vnet.ibm.com"},"content":"On Sep 13, 2017, at 10:40 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:\n> \n> On Sep 13, 2017, at 7:23 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n>> \n>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt\n>> <will_schmidt@vnet.ibm.com> wrote:\n>>> Hi,\n>>> \n>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE\n>>> \n>>> Folding of vector loads in GIMPLE.\n>>> \n>>> Add code to handle gimple folding for the vec_ld builtins.\n>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\n>>> comments have been adjusted slightly so they continue to read OK for the\n>>> existing vec_st code.\n>>> \n>>> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\n>>> tests which have been posted separately.\n>>> \n>>> For V2 of this patch, I've removed the chunk of code that prohibited the\n>>> gimple fold from occurring in BE environments.   This had fixed an issue\n>>> for me earlier during my development of the code, and turns out this was\n>>> not necessary.  I've sniff-tested after removing that check and it looks\n>>> OK.\n>>> \n>>>> + /* Limit folding of loads to LE targets.  */\n>>>> +      if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n>>>> +        return false;\n>>> \n>>> I've restarted a regression test on this updated version.\n>>> \n>>> OK for trunk (assuming successful regression test completion)  ?\n>>> \n>>> Thanks,\n>>> -Will\n>>> \n>>> [gcc]\n>>> \n>>>       2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n>>> \n>>>       * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n>>>       for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n>>>       * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n>>>       Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n>>> \n>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\n>>> index fbab0a2..bb8a77d 100644\n>>> --- a/gcc/config/rs6000/rs6000-c.c\n>>> +++ b/gcc/config/rs6000/rs6000-c.c\n>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n>>>                    convert (TREE_TYPE (stmt), arg0));\n>>>      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n>>>      return stmt;\n>>>    }\n>>> \n>>> -  /* Expand vec_ld into an expression that masks the address and\n>>> -     performs the load.  We need to expand this early to allow\n>>> +  /* Expand vec_st into an expression that masks the address and\n>>> +     performs the store.  We need to expand this early to allow\n>>>     the best aliasing, as by the time we get into RTL we no longer\n>>>     are able to honor __restrict__, for example.  We may want to\n>>>     consider this for all memory access built-ins.\n>>> \n>>>     When -maltivec=be is specified, or the wrong number of arguments\n>>>     is provided, simply punt to existing built-in processing.  */\n>>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n>>> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>>> -      && nargs == 2)\n>>> -    {\n>>> -      tree arg0 = (*arglist)[0];\n>>> -      tree arg1 = (*arglist)[1];\n>>> -\n>>> -      /* Strip qualifiers like \"const\" from the pointer arg.  */\n>>> -      tree arg1_type = TREE_TYPE (arg1);\n>>> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n>>> -       goto bad;\n>>> -\n>>> -      tree inner_type = TREE_TYPE (arg1_type);\n>>> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n>>> -       {\n>>> -         arg1_type = build_pointer_type (build_qualified_type (inner_type,\n>>> -                                                               0));\n>>> -         arg1 = fold_convert (arg1_type, arg1);\n>>> -       }\n>>> -\n>>> -      /* Construct the masked address.  Let existing error handling take\n>>> -        over if we don't have a constant offset.  */\n>>> -      arg0 = fold (arg0);\n>>> -\n>>> -      if (TREE_CODE (arg0) == INTEGER_CST)\n>>> -       {\n>>> -         if (!ptrofftype_p (TREE_TYPE (arg0)))\n>>> -           arg0 = build1 (NOP_EXPR, sizetype, arg0);\n>>> -\n>>> -         tree arg1_type = TREE_TYPE (arg1);\n>>> -         if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n>>> -           {\n>>> -             arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n>>> -             tree const0 = build_int_cstu (sizetype, 0);\n>>> -             tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n>>> -             arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n>>> -           }\n>>> -\n>>> -         tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n>>> -                                      arg1, arg0);\n>>> -         tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n>>> -                                         build_int_cst (arg1_type, -16));\n>>> -\n>>> -         /* Find the built-in to get the return type so we can convert\n>>> -            the result properly (or fall back to default handling if the\n>>> -            arguments aren't compatible).  */\n>>> -         for (desc = altivec_overloaded_builtins;\n>>> -              desc->code && desc->code != fcode; desc++)\n>>> -           continue;\n>>> -\n>>> -         for (; desc->code == fcode; desc++)\n>>> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n>>> -               && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n>>> -                                                   desc->op2)))\n>>> -             {\n>>> -               tree ret_type = rs6000_builtin_type (desc->ret_type);\n>>> -               if (TYPE_MODE (ret_type) == V2DImode)\n>>> -                 /* Type-based aliasing analysis thinks vector long\n>>> -                    and vector long long are different and will put them\n>>> -                    in distinct alias classes.  Force our return type\n>>> -                    to be a may-alias type to avoid this.  */\n>>> -                 ret_type\n>>> -                   = build_pointer_type_for_mode (ret_type, Pmode,\n>>> -                                                  true/*can_alias_all*/);\n>>> -               else\n>>> -                 ret_type = build_pointer_type (ret_type);\n>>> -               aligned = build1 (NOP_EXPR, ret_type, aligned);\n>>> -               tree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n>>> -               return ret_val;\n>>> -             }\n>>> -       }\n>>> -    }\n>>> \n>>> -  /* Similarly for stvx.  */\n>>>  if (fcode == ALTIVEC_BUILTIN_VEC_ST\n>>>      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>>>      && nargs == 3)\n>>>    {\n>>>      tree arg0 = (*arglist)[0];\n>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\n>>> index 1338371..1fb5f44 100644\n>>> --- a/gcc/config/rs6000/rs6000.c\n>>> +++ b/gcc/config/rs6000/rs6000.c\n>>> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n>>>       res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n>>>       gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>>>       update_call_from_tree (gsi, res);\n>>>       return true;\n>>>      }\n>>> +    /* Vector loads.  */\n>>> +    case ALTIVEC_BUILTIN_LVX_V16QI:\n>>> +    case ALTIVEC_BUILTIN_LVX_V8HI:\n>>> +    case ALTIVEC_BUILTIN_LVX_V4SI:\n>>> +    case ALTIVEC_BUILTIN_LVX_V4SF:\n>>> +    case ALTIVEC_BUILTIN_LVX_V2DI:\n>>> +    case ALTIVEC_BUILTIN_LVX_V2DF:\n>>> +      {\n>>> +        gimple *g;\n>>> +        arg0 = gimple_call_arg (stmt, 0);  // offset\n>>> +        arg1 = gimple_call_arg (stmt, 1);  // address\n>>> +\n>>> +        lhs = gimple_call_lhs (stmt);\n>>> +        location_t loc = gimple_location (stmt);\n>>> +\n>>> +        tree arg1_type = TREE_TYPE (arg1);\n>>> +        tree lhs_type = TREE_TYPE (lhs);\n>>> +\n>>> +        /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n>>> +           the tree using the value from arg0.  The resulting type will match\n>>> +           the type of arg1.  */\n>>> +        tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n>>> +        g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n>>> +        gimple_set_location (g, loc);\n>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>> +        tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>>> +        g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n>>> +                                 temp_offset);\n>>> +        gimple_set_location (g, loc);\n>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>> +\n>>> +        /* Mask off any lower bits from the address.  */\n>>> +        tree alignment_mask = build_int_cst (arg1_type, -16);\n>>> +        tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>>> +        g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n>>> +                                temp_addr, alignment_mask);\n>>> +        gimple_set_location (g, loc);\n>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>> \n>> You could use\n>> \n>> gimple_seq stmts = NULL;\n>> tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);\n>> tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,\n>> arg1_type, arg1, temp_offset);\n>> tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,\n>> arg1_type, temp_addr, build_int_cst (arg1_type, -16));\n>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>> \n>>> +        /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n>>> +           take an offset, but since we've already incorporated the offset\n>>> +           above, here we just pass in a zero.  */\n>>> +        g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n>>> +                                               build_int_cst (arg1_type, 0)));\n>> \n>> are you sure about arg1_type here?  I'm sure not.  For\n>> \n>> ... foo (struct S *p)\n>> {\n>> return __builtin_lvx_v2df (4, (double *)p);\n>> }\n>> \n>> you'd end up with p as arg1 and thus struct S * as arg1_type and thus\n>> TBAA using 'struct S' to access the memory.\n> \n> Hm, is that so?  Wouldn't arg1_type be double* since arg1 is (double *)p?\n> Will, you should probably test this example and see, but I'm pretty confident\n> about this (see below).\n\nBut, as I should have suspected, you're right.  For some reason \ngimple_call_arg is returning p, stripped of the cast information where the\nuser asserted that p points to a double*.\n\nCan you explain to me why this should be so?  I assume that somebody\nhas decided to strip_nops the argument and lose the cast.\n\nUsing ptr_type_node loses all type information, so that would be a\nregression from what we do today.  In some cases we could reconstruct\nthat this was necessarily, say, a double*, but I don't know how we would\nrecover the signedness for an integer type.\n\nBill\n> \n>> \n>> I think if the builtins have any TBAA constraints you need to build those\n>> explicitely, if not, you should use ptr_type_node aka no TBAA.\n> \n> The type signatures are constrained during parsing, so we should only\n> see allowed pointer types on arg1 by the time we get to gimple folding.  I\n> think that using arg1_type should work, but I am probably missing\n> something subtle, so please feel free to whack me on the temple until\n> I get it. :-)\n> \n> Bill\n>> \n>> Richard.\n>> \n>>> +        gimple_set_location (g, loc);\n>>> +        gsi_replace (gsi, g, true);\n>>> +\n>>> +        return true;\n>>> +\n>>> +      }\n>>> +\n>>>    default:\n>>>       if (TARGET_DEBUG_BUILTIN)\n>>>          fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n>>>                   fn_code, fn_name1, fn_name2);\n>>>      break;","headers":{"Return-Path":"<gcc-patches-return-462081-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-462081-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"jPnssOVf\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xstDR4D4dz9rxj\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 14 Sep 2017 06:14:30 +1000 (AEST)","(qmail 27039 invoked by alias); 13 Sep 2017 20:14:22 -0000","(qmail 26971 invoked by uid 89); 13 Sep 2017 20:14:17 -0000","from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)\n\t(148.163.158.5) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tWed, 13 Sep 2017 20:14:15 +0000","from pps.filterd (m0098421.ppops.net [127.0.0.1])\tby\n\tmx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id\n\tv8DKDlU9004090\tfor <gcc-patches@gcc.gnu.org>;\n\tWed, 13 Sep 2017 16:14:13 -0400","from e14.ny.us.ibm.com (e14.ny.us.ibm.com [129.33.205.204])\tby\n\tmx0a-001b2d01.pphosted.com with ESMTP id\n\t2cy7wscpv7-1\t(version=TLSv1.2 cipher=AES256-SHA bits=256\n\tverify=NOT)\tfor <gcc-patches@gcc.gnu.org>;\n\tWed, 13 Sep 2017 16:14:13 -0400","from localhost\tby e14.ny.us.ibm.com with IBM ESMTP SMTP Gateway:\n\tAuthorized Use Only! Violators will be prosecuted\tfor\n\t<gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;\n\tWed, 13 Sep 2017 16:14:13 -0400","from b01cxnp22033.gho.pok.ibm.com (9.57.198.23)\tby\n\te14.ny.us.ibm.com (146.89.104.201) with IBM ESMTP SMTP\n\tGateway: Authorized Use Only! Violators will be prosecuted;\n\tWed, 13 Sep 2017 16:14:11 -0400","from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com\n\t[9.57.199.106])\tby b01cxnp22033.gho.pok.ibm.com\n\t(8.14.9/8.14.9/NCO v10.0) with ESMTP id v8DKEAOe41222364;\n\tWed, 13 Sep 2017 20:14:10 GMT","from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1])\tby IMSVA\n\t(Postfix) with ESMTP id 264D72804C;\n\tWed, 13 Sep 2017 16:14:04 -0400 (EDT)","from bigmac.rchland.ibm.com (unknown [9.10.86.143])\tby\n\tb01ledav001.gho.pok.ibm.com (Postfix) with ESMTPS id\n\tCBA1E28046; Wed, 13 Sep 2017 16:14:03 -0400 (EDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; q=dns; s=\n\tdefault; b=r0zdw5qAIEAqUINLTk1BhzqYXVEvDeiLPCYE7k1vKszDux3GT+yDC\n\tnJDvCr/YVcmj0temWja7rkZoc+3IqE+PM1qGNKpJT0nDzBhOGugIVhJyNtRyEF+N\n\tZVHkXp0WxqtVOfBy+Ht0o9OFZs2LLH9hocFQkPkQfSLBzFlIQMWcvY=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; s=default;\n\tbh=T4xl6+Y1HEq88c9ccAi8pYY9pek=; b=jPnssOVfZqB7zpiYmlJdr9+Lz/XF\n\tDuxfwo+Qf0D1/roUguoJuvGTurhepY6G4dg083/ziPiuk0b7a0L06jB3zOHwNW5a\n\tYEu+fOA+eLHb4GIUUys7RNVXpg5VgFSJQrjirqjQTrd1bbpuJ9ElYloyr+QgsjP1\n\tuJ5aR6PQB/sBgGA=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-24.7 required=5.0 tests=AWL, BAYES_00,\n\tGIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,\n\tKAM_LAZY_DOMAIN_SECURITY,\n\tRCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=","X-HELO":"mx0a-001b2d01.pphosted.com","Content-Type":"text/plain; charset=us-ascii","Mime-Version":"1.0 (Mac OS X Mail 10.3 \\(3273\\))","Subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","From":"Bill Schmidt <wschmidt@linux.vnet.ibm.com>","In-Reply-To":"<24A6439E-6268-44F8-92CE-EF0F22DA5773@linux.vnet.ibm.com>","Date":"Wed, 13 Sep 2017 15:14:09 -0500","Cc":"will_schmidt@vnet.ibm.com, GCC Patches <gcc-patches@gcc.gnu.org>,\n\tSegher Boessenkool <segher@kernel.crashing.org>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","Content-Transfer-Encoding":"quoted-printable","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>\n\t<1505250505.14827.191.camel@brimstone.rchland.ibm.com>\n\t<CAFiYyc18DrEDhKH1eeTFDq7r=JOE0aHVjZqMGNkf8+DPA5O23w@mail.gmail.com>\n\t<24A6439E-6268-44F8-92CE-EF0F22DA5773@linux.vnet.ibm.com>","To":"Richard Biener <richard.guenther@gmail.com>","X-TM-AS-GCONF":"00","x-cbid":"17091320-0052-0000-0000-0000026090BF","X-IBM-SpamModules-Scores":"","X-IBM-SpamModules-Versions":"BY=3.00007723; HX=3.00000241; KW=3.00000007;\n\tPH=3.00000004; SC=3.00000227; SDB=6.00916568; UDB=6.00460278;\n\tIPR=6.00696799; BA=6.00005588; NDR=6.00000001; ZLA=6.00000005;\n\tZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;\n\tZU=6.00000002; MB=3.00017142; XFM=3.00000015;\n\tUTC=2017-09-13 20:14:12","X-IBM-AV-DETECTION":"SAVI=unused REMOTE=unused XFE=unused","x-cbparentid":"17091320-0053-0000-0000-000051FF50A3","Message-Id":"<B1261C39-9D9D-4D34-A65F-FC48BC88CEF9@linux.vnet.ibm.com>","X-Proofpoint-Virus-Version":"vendor=fsecure engine=2.50.10432:, ,\n\tdefinitions=2017-09-13_06:, , signatures=0","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n\tspamscore=0 suspectscore=0 malwarescore=0 phishscore=0\n\tadultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx\n\tscancount=1 engine=8.0.1-1707230000\n\tdefinitions=main-1709130310","X-IsSubscribed":"yes"}},{"id":1768493,"web_url":"http://patchwork.ozlabs.org/comment/1768493/","msgid":"<CAFiYyc05e7Pg-9CxGY=czF66AMEJviNXy2d9UY-ty5NKgP-N2w@mail.gmail.com>","list_archive_url":null,"date":"2017-09-14T10:15:34","subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","submitter":{"id":1765,"url":"http://patchwork.ozlabs.org/api/people/1765/","name":"Richard Biener","email":"richard.guenther@gmail.com"},"content":"On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt\n<wschmidt@linux.vnet.ibm.com> wrote:\n> On Sep 13, 2017, at 10:40 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:\n>>\n>> On Sep 13, 2017, at 7:23 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n>>>\n>>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt\n>>> <will_schmidt@vnet.ibm.com> wrote:\n>>>> Hi,\n>>>>\n>>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE\n>>>>\n>>>> Folding of vector loads in GIMPLE.\n>>>>\n>>>> Add code to handle gimple folding for the vec_ld builtins.\n>>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\n>>>> comments have been adjusted slightly so they continue to read OK for the\n>>>> existing vec_st code.\n>>>>\n>>>> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\n>>>> tests which have been posted separately.\n>>>>\n>>>> For V2 of this patch, I've removed the chunk of code that prohibited the\n>>>> gimple fold from occurring in BE environments.   This had fixed an issue\n>>>> for me earlier during my development of the code, and turns out this was\n>>>> not necessary.  I've sniff-tested after removing that check and it looks\n>>>> OK.\n>>>>\n>>>>> + /* Limit folding of loads to LE targets.  */\n>>>>> +      if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n>>>>> +        return false;\n>>>>\n>>>> I've restarted a regression test on this updated version.\n>>>>\n>>>> OK for trunk (assuming successful regression test completion)  ?\n>>>>\n>>>> Thanks,\n>>>> -Will\n>>>>\n>>>> [gcc]\n>>>>\n>>>>       2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n>>>>\n>>>>       * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n>>>>       for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n>>>>       * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n>>>>       Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n>>>>\n>>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\n>>>> index fbab0a2..bb8a77d 100644\n>>>> --- a/gcc/config/rs6000/rs6000-c.c\n>>>> +++ b/gcc/config/rs6000/rs6000-c.c\n>>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n>>>>                    convert (TREE_TYPE (stmt), arg0));\n>>>>      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n>>>>      return stmt;\n>>>>    }\n>>>>\n>>>> -  /* Expand vec_ld into an expression that masks the address and\n>>>> -     performs the load.  We need to expand this early to allow\n>>>> +  /* Expand vec_st into an expression that masks the address and\n>>>> +     performs the store.  We need to expand this early to allow\n>>>>     the best aliasing, as by the time we get into RTL we no longer\n>>>>     are able to honor __restrict__, for example.  We may want to\n>>>>     consider this for all memory access built-ins.\n>>>>\n>>>>     When -maltivec=be is specified, or the wrong number of arguments\n>>>>     is provided, simply punt to existing built-in processing.  */\n>>>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n>>>> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>>>> -      && nargs == 2)\n>>>> -    {\n>>>> -      tree arg0 = (*arglist)[0];\n>>>> -      tree arg1 = (*arglist)[1];\n>>>> -\n>>>> -      /* Strip qualifiers like \"const\" from the pointer arg.  */\n>>>> -      tree arg1_type = TREE_TYPE (arg1);\n>>>> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n>>>> -       goto bad;\n>>>> -\n>>>> -      tree inner_type = TREE_TYPE (arg1_type);\n>>>> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n>>>> -       {\n>>>> -         arg1_type = build_pointer_type (build_qualified_type (inner_type,\n>>>> -                                                               0));\n>>>> -         arg1 = fold_convert (arg1_type, arg1);\n>>>> -       }\n>>>> -\n>>>> -      /* Construct the masked address.  Let existing error handling take\n>>>> -        over if we don't have a constant offset.  */\n>>>> -      arg0 = fold (arg0);\n>>>> -\n>>>> -      if (TREE_CODE (arg0) == INTEGER_CST)\n>>>> -       {\n>>>> -         if (!ptrofftype_p (TREE_TYPE (arg0)))\n>>>> -           arg0 = build1 (NOP_EXPR, sizetype, arg0);\n>>>> -\n>>>> -         tree arg1_type = TREE_TYPE (arg1);\n>>>> -         if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n>>>> -           {\n>>>> -             arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n>>>> -             tree const0 = build_int_cstu (sizetype, 0);\n>>>> -             tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n>>>> -             arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n>>>> -           }\n>>>> -\n>>>> -         tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n>>>> -                                      arg1, arg0);\n>>>> -         tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n>>>> -                                         build_int_cst (arg1_type, -16));\n>>>> -\n>>>> -         /* Find the built-in to get the return type so we can convert\n>>>> -            the result properly (or fall back to default handling if the\n>>>> -            arguments aren't compatible).  */\n>>>> -         for (desc = altivec_overloaded_builtins;\n>>>> -              desc->code && desc->code != fcode; desc++)\n>>>> -           continue;\n>>>> -\n>>>> -         for (; desc->code == fcode; desc++)\n>>>> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n>>>> -               && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n>>>> -                                                   desc->op2)))\n>>>> -             {\n>>>> -               tree ret_type = rs6000_builtin_type (desc->ret_type);\n>>>> -               if (TYPE_MODE (ret_type) == V2DImode)\n>>>> -                 /* Type-based aliasing analysis thinks vector long\n>>>> -                    and vector long long are different and will put them\n>>>> -                    in distinct alias classes.  Force our return type\n>>>> -                    to be a may-alias type to avoid this.  */\n>>>> -                 ret_type\n>>>> -                   = build_pointer_type_for_mode (ret_type, Pmode,\n>>>> -                                                  true/*can_alias_all*/);\n>>>> -               else\n>>>> -                 ret_type = build_pointer_type (ret_type);\n>>>> -               aligned = build1 (NOP_EXPR, ret_type, aligned);\n>>>> -               tree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n>>>> -               return ret_val;\n>>>> -             }\n>>>> -       }\n>>>> -    }\n>>>>\n>>>> -  /* Similarly for stvx.  */\n>>>>  if (fcode == ALTIVEC_BUILTIN_VEC_ST\n>>>>      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>>>>      && nargs == 3)\n>>>>    {\n>>>>      tree arg0 = (*arglist)[0];\n>>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\n>>>> index 1338371..1fb5f44 100644\n>>>> --- a/gcc/config/rs6000/rs6000.c\n>>>> +++ b/gcc/config/rs6000/rs6000.c\n>>>> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n>>>>       res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n>>>>       gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>>>>       update_call_from_tree (gsi, res);\n>>>>       return true;\n>>>>      }\n>>>> +    /* Vector loads.  */\n>>>> +    case ALTIVEC_BUILTIN_LVX_V16QI:\n>>>> +    case ALTIVEC_BUILTIN_LVX_V8HI:\n>>>> +    case ALTIVEC_BUILTIN_LVX_V4SI:\n>>>> +    case ALTIVEC_BUILTIN_LVX_V4SF:\n>>>> +    case ALTIVEC_BUILTIN_LVX_V2DI:\n>>>> +    case ALTIVEC_BUILTIN_LVX_V2DF:\n>>>> +      {\n>>>> +        gimple *g;\n>>>> +        arg0 = gimple_call_arg (stmt, 0);  // offset\n>>>> +        arg1 = gimple_call_arg (stmt, 1);  // address\n>>>> +\n>>>> +        lhs = gimple_call_lhs (stmt);\n>>>> +        location_t loc = gimple_location (stmt);\n>>>> +\n>>>> +        tree arg1_type = TREE_TYPE (arg1);\n>>>> +        tree lhs_type = TREE_TYPE (lhs);\n>>>> +\n>>>> +        /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n>>>> +           the tree using the value from arg0.  The resulting type will match\n>>>> +           the type of arg1.  */\n>>>> +        tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n>>>> +        g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n>>>> +        gimple_set_location (g, loc);\n>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>> +        tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>>>> +        g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n>>>> +                                 temp_offset);\n>>>> +        gimple_set_location (g, loc);\n>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>> +\n>>>> +        /* Mask off any lower bits from the address.  */\n>>>> +        tree alignment_mask = build_int_cst (arg1_type, -16);\n>>>> +        tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>>>> +        g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n>>>> +                                temp_addr, alignment_mask);\n>>>> +        gimple_set_location (g, loc);\n>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>\n>>> You could use\n>>>\n>>> gimple_seq stmts = NULL;\n>>> tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);\n>>> tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,\n>>> arg1_type, arg1, temp_offset);\n>>> tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,\n>>> arg1_type, temp_addr, build_int_cst (arg1_type, -16));\n>>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>>>\n>>>> +        /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n>>>> +           take an offset, but since we've already incorporated the offset\n>>>> +           above, here we just pass in a zero.  */\n>>>> +        g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n>>>> +                                               build_int_cst (arg1_type, 0)));\n>>>\n>>> are you sure about arg1_type here?  I'm sure not.  For\n>>>\n>>> ... foo (struct S *p)\n>>> {\n>>> return __builtin_lvx_v2df (4, (double *)p);\n>>> }\n>>>\n>>> you'd end up with p as arg1 and thus struct S * as arg1_type and thus\n>>> TBAA using 'struct S' to access the memory.\n>>\n>> Hm, is that so?  Wouldn't arg1_type be double* since arg1 is (double *)p?\n>> Will, you should probably test this example and see, but I'm pretty confident\n>> about this (see below).\n>\n> But, as I should have suspected, you're right.  For some reason\n> gimple_call_arg is returning p, stripped of the cast information where the\n> user asserted that p points to a double*.\n>\n> Can you explain to me why this should be so?  I assume that somebody\n> has decided to strip_nops the argument and lose the cast.\n\npointer types have no meaning in GIMPLE so we aggressively prune them.\n\n> Using ptr_type_node loses all type information, so that would be a\n> regression from what we do today.  In some cases we could reconstruct\n> that this was necessarily, say, a double*, but I don't know how we would\n> recover the signedness for an integer type.\n\nHow did we handle the expansion previously - ah - it was done earlier\nin the C FE.  So why are you moving it to GIMPLE?  The function is called\nresolve_overloaded_builtin - what kind of overloading do you resolve here?\nAs said argument types might not be preserved.\n\nRichard.\n\n> Bill\n>>\n>>>\n>>> I think if the builtins have any TBAA constraints you need to build those\n>>> explicitely, if not, you should use ptr_type_node aka no TBAA.\n>>\n>> The type signatures are constrained during parsing, so we should only\n>> see allowed pointer types on arg1 by the time we get to gimple folding.  I\n>> think that using arg1_type should work, but I am probably missing\n>> something subtle, so please feel free to whack me on the temple until\n>> I get it. :-)\n>>\n>> Bill\n>>>\n>>> Richard.\n>>>\n>>>> +        gimple_set_location (g, loc);\n>>>> +        gsi_replace (gsi, g, true);\n>>>> +\n>>>> +        return true;\n>>>> +\n>>>> +      }\n>>>> +\n>>>>    default:\n>>>>       if (TARGET_DEBUG_BUILTIN)\n>>>>          fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n>>>>                   fn_code, fn_name1, fn_name2);\n>>>>      break;\n>","headers":{"Return-Path":"<gcc-patches-return-462115-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-462115-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"dDj24gpq\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xtDvD3LpDz9s0Z\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 14 Sep 2017 20:15:51 +1000 (AEST)","(qmail 64636 invoked by alias); 14 Sep 2017 10:15:40 -0000","(qmail 63676 invoked by uid 89); 14 Sep 2017 10:15:40 -0000","from mail-wm0-f51.google.com (HELO mail-wm0-f51.google.com)\n\t(74.125.82.51) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tThu, 14 Sep 2017 10:15:37 +0000","by mail-wm0-f51.google.com with SMTP id 189so4872946wmh.1 for\n\t<gcc-patches@gcc.gnu.org>; Thu, 14 Sep 2017 03:15:37 -0700 (PDT)","by 10.80.180.250 with HTTP; Thu, 14 Sep 2017 03:15:34 -0700 (PDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:mime-version:in-reply-to:references:from:date:message-id\n\t:subject:to:cc:content-type; q=dns; s=default; b=qyvGYJgsVtYsOlb\n\te/R3M8pkgQ1v2wovuFh6/xtGVnc6GCztNV6YvafckB4Sk1mBdHAJcS529hBfkZvi\n\t2qDIsYZr9sb5ANPb3of60DaNb3fjxnyYc/sG0Q5SHFs4XtR2nizFaORpJRluHMkj\n\t514P9lkrLbXD87P7/HH/VKN2bOIY=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:mime-version:in-reply-to:references:from:date:message-id\n\t:subject:to:cc:content-type; s=default; bh=vB0h9IHSPlq7vwul0b7KZ\n\tut02MM=; b=dDj24gpqIR0OLb9Y8sge2s4ccP3Mab7L85yQa0KHEiZsrJeecbxsD\n\tb+RrbP4hNcML1veQEI+hSRs/1O+KwQUPnMzlPOgR1Ap10WMBAAgBISMtt5WiPj5c\n\twQgmcPMBPqQKq2TlXylbs43CulaZ9EhCNQaO+lCPY+L9f65hMQrNVI=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-24.5 required=5.0 tests=AWL, BAYES_00,\n\tFREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2,\n\tGIT_PATCH_3, RCVD_IN_DNSWL_NONE,\n\tSPF_PASS autolearn=ham version=3.3.2 spammy=","X-HELO":"mail-wm0-f51.google.com","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;\n\ts=20161025;\n\th=x-gm-message-state:mime-version:in-reply-to:references:from:date\n\t:message-id:subject:to:cc;\n\tbh=0j6WQZS1yF6a2PpJtbNwiqJbb0aaZxwcNIL+C0DoXws=;\n\tb=LHz1CTmONYMuUkd8Fd2aa3RJS7V77sHyGd5MR6bd+5V4e68mpsVD7GjyAR5Mocv7B6\n\tntyx1kImGiuaz6bllDnUY93VVMN5tJd3SwOZKUP9Mb5dzt5DSzEbDkn/g+3XoUF75ulw\n\tpuBqH1EoQ1ALLP19wkz0ATF0hOA+cxwCdPkHryqf5u6b4rCYqh6L8w09GgTd08k35ZK4\n\tMf3ByyMdpkUSuIIgXqv/DITJuWtCE3c5BWJ2G9hCed0YaXUctZy5P+yujlzB8j++DMNk\n\tVkgHS/BcLAPJQ7Ja0KZlFnmvO/FZmDUPbYKBuYaFN/T0kp639D9hDEKAwzBeuur8mrmk\n\tL7OQ==","X-Gm-Message-State":"AHPjjUiNKTJQZXiM6lBQREtf3mFDq/PxyENHa1C4n6ABoPdWpaNivozk\tEx1dNrEsjEMgc2xDjCj7dkrriOdlnA65Zs6Curw=","X-Google-Smtp-Source":"ADKCNb6VFdSQdSLcCOdDEsup+bv0Rf+orpG03BJ01h6B59kBiA3iHLPugHah5wZInmHoYbVl/dDwFFSo31FZLCHGLUI=","X-Received":"by 10.80.217.15 with SMTP id t15mr8887218edj.217.1505384135245;\n\tThu, 14 Sep 2017 03:15:35 -0700 (PDT)","MIME-Version":"1.0","In-Reply-To":"<B1261C39-9D9D-4D34-A65F-FC48BC88CEF9@linux.vnet.ibm.com>","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>\n\t<1505250505.14827.191.camel@brimstone.rchland.ibm.com>\n\t<CAFiYyc18DrEDhKH1eeTFDq7r=JOE0aHVjZqMGNkf8+DPA5O23w@mail.gmail.com>\n\t<24A6439E-6268-44F8-92CE-EF0F22DA5773@linux.vnet.ibm.com>\n\t<B1261C39-9D9D-4D34-A65F-FC48BC88CEF9@linux.vnet.ibm.com>","From":"Richard Biener <richard.guenther@gmail.com>","Date":"Thu, 14 Sep 2017 12:15:34 +0200","Message-ID":"<CAFiYyc05e7Pg-9CxGY=czF66AMEJviNXy2d9UY-ty5NKgP-N2w@mail.gmail.com>","Subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","To":"Bill Schmidt <wschmidt@linux.vnet.ibm.com>","Cc":"will_schmidt@vnet.ibm.com, GCC Patches <gcc-patches@gcc.gnu.org>,\n\tSegher Boessenkool <segher@kernel.crashing.org>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","Content-Type":"text/plain; charset=\"UTF-8\"","X-IsSubscribed":"yes"}},{"id":1768637,"web_url":"http://patchwork.ozlabs.org/comment/1768637/","msgid":"<73D3C195-E029-4050-9764-57C07845DBEB@linux.vnet.ibm.com>","list_archive_url":null,"date":"2017-09-14T14:38:13","subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","submitter":{"id":6459,"url":"http://patchwork.ozlabs.org/api/people/6459/","name":"Bill Schmidt","email":"wschmidt@linux.vnet.ibm.com"},"content":"On Sep 14, 2017, at 5:15 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n> \n> On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt\n> <wschmidt@linux.vnet.ibm.com> wrote:\n>> On Sep 13, 2017, at 10:40 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:\n>>> \n>>> On Sep 13, 2017, at 7:23 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n>>>> \n>>>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt\n>>>> <will_schmidt@vnet.ibm.com> wrote:\n>>>>> Hi,\n>>>>> \n>>>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE\n>>>>> \n>>>>> Folding of vector loads in GIMPLE.\n>>>>> \n>>>>> Add code to handle gimple folding for the vec_ld builtins.\n>>>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\n>>>>> comments have been adjusted slightly so they continue to read OK for the\n>>>>> existing vec_st code.\n>>>>> \n>>>>> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\n>>>>> tests which have been posted separately.\n>>>>> \n>>>>> For V2 of this patch, I've removed the chunk of code that prohibited the\n>>>>> gimple fold from occurring in BE environments.   This had fixed an issue\n>>>>> for me earlier during my development of the code, and turns out this was\n>>>>> not necessary.  I've sniff-tested after removing that check and it looks\n>>>>> OK.\n>>>>> \n>>>>>> + /* Limit folding of loads to LE targets.  */\n>>>>>> +      if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n>>>>>> +        return false;\n>>>>> \n>>>>> I've restarted a regression test on this updated version.\n>>>>> \n>>>>> OK for trunk (assuming successful regression test completion)  ?\n>>>>> \n>>>>> Thanks,\n>>>>> -Will\n>>>>> \n>>>>> [gcc]\n>>>>> \n>>>>>      2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n>>>>> \n>>>>>      * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n>>>>>      for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n>>>>>      * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n>>>>>      Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n>>>>> \n>>>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\n>>>>> index fbab0a2..bb8a77d 100644\n>>>>> --- a/gcc/config/rs6000/rs6000-c.c\n>>>>> +++ b/gcc/config/rs6000/rs6000-c.c\n>>>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n>>>>>                   convert (TREE_TYPE (stmt), arg0));\n>>>>>     stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n>>>>>     return stmt;\n>>>>>   }\n>>>>> \n>>>>> -  /* Expand vec_ld into an expression that masks the address and\n>>>>> -     performs the load.  We need to expand this early to allow\n>>>>> +  /* Expand vec_st into an expression that masks the address and\n>>>>> +     performs the store.  We need to expand this early to allow\n>>>>>    the best aliasing, as by the time we get into RTL we no longer\n>>>>>    are able to honor __restrict__, for example.  We may want to\n>>>>>    consider this for all memory access built-ins.\n>>>>> \n>>>>>    When -maltivec=be is specified, or the wrong number of arguments\n>>>>>    is provided, simply punt to existing built-in processing.  */\n>>>>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n>>>>> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>>>>> -      && nargs == 2)\n>>>>> -    {\n>>>>> -      tree arg0 = (*arglist)[0];\n>>>>> -      tree arg1 = (*arglist)[1];\n>>>>> -\n>>>>> -      /* Strip qualifiers like \"const\" from the pointer arg.  */\n>>>>> -      tree arg1_type = TREE_TYPE (arg1);\n>>>>> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n>>>>> -       goto bad;\n>>>>> -\n>>>>> -      tree inner_type = TREE_TYPE (arg1_type);\n>>>>> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n>>>>> -       {\n>>>>> -         arg1_type = build_pointer_type (build_qualified_type (inner_type,\n>>>>> -                                                               0));\n>>>>> -         arg1 = fold_convert (arg1_type, arg1);\n>>>>> -       }\n>>>>> -\n>>>>> -      /* Construct the masked address.  Let existing error handling take\n>>>>> -        over if we don't have a constant offset.  */\n>>>>> -      arg0 = fold (arg0);\n>>>>> -\n>>>>> -      if (TREE_CODE (arg0) == INTEGER_CST)\n>>>>> -       {\n>>>>> -         if (!ptrofftype_p (TREE_TYPE (arg0)))\n>>>>> -           arg0 = build1 (NOP_EXPR, sizetype, arg0);\n>>>>> -\n>>>>> -         tree arg1_type = TREE_TYPE (arg1);\n>>>>> -         if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n>>>>> -           {\n>>>>> -             arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n>>>>> -             tree const0 = build_int_cstu (sizetype, 0);\n>>>>> -             tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n>>>>> -             arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n>>>>> -           }\n>>>>> -\n>>>>> -         tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n>>>>> -                                      arg1, arg0);\n>>>>> -         tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n>>>>> -                                         build_int_cst (arg1_type, -16));\n>>>>> -\n>>>>> -         /* Find the built-in to get the return type so we can convert\n>>>>> -            the result properly (or fall back to default handling if the\n>>>>> -            arguments aren't compatible).  */\n>>>>> -         for (desc = altivec_overloaded_builtins;\n>>>>> -              desc->code && desc->code != fcode; desc++)\n>>>>> -           continue;\n>>>>> -\n>>>>> -         for (; desc->code == fcode; desc++)\n>>>>> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n>>>>> -               && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n>>>>> -                                                   desc->op2)))\n>>>>> -             {\n>>>>> -               tree ret_type = rs6000_builtin_type (desc->ret_type);\n>>>>> -               if (TYPE_MODE (ret_type) == V2DImode)\n>>>>> -                 /* Type-based aliasing analysis thinks vector long\n>>>>> -                    and vector long long are different and will put them\n>>>>> -                    in distinct alias classes.  Force our return type\n>>>>> -                    to be a may-alias type to avoid this.  */\n>>>>> -                 ret_type\n>>>>> -                   = build_pointer_type_for_mode (ret_type, Pmode,\n>>>>> -                                                  true/*can_alias_all*/);\n>>>>> -               else\n>>>>> -                 ret_type = build_pointer_type (ret_type);\n>>>>> -               aligned = build1 (NOP_EXPR, ret_type, aligned);\n>>>>> -               tree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n>>>>> -               return ret_val;\n>>>>> -             }\n>>>>> -       }\n>>>>> -    }\n>>>>> \n>>>>> -  /* Similarly for stvx.  */\n>>>>> if (fcode == ALTIVEC_BUILTIN_VEC_ST\n>>>>>     && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>>>>>     && nargs == 3)\n>>>>>   {\n>>>>>     tree arg0 = (*arglist)[0];\n>>>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\n>>>>> index 1338371..1fb5f44 100644\n>>>>> --- a/gcc/config/rs6000/rs6000.c\n>>>>> +++ b/gcc/config/rs6000/rs6000.c\n>>>>> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n>>>>>      res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n>>>>>      gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>>>>>      update_call_from_tree (gsi, res);\n>>>>>      return true;\n>>>>>     }\n>>>>> +    /* Vector loads.  */\n>>>>> +    case ALTIVEC_BUILTIN_LVX_V16QI:\n>>>>> +    case ALTIVEC_BUILTIN_LVX_V8HI:\n>>>>> +    case ALTIVEC_BUILTIN_LVX_V4SI:\n>>>>> +    case ALTIVEC_BUILTIN_LVX_V4SF:\n>>>>> +    case ALTIVEC_BUILTIN_LVX_V2DI:\n>>>>> +    case ALTIVEC_BUILTIN_LVX_V2DF:\n>>>>> +      {\n>>>>> +        gimple *g;\n>>>>> +        arg0 = gimple_call_arg (stmt, 0);  // offset\n>>>>> +        arg1 = gimple_call_arg (stmt, 1);  // address\n>>>>> +\n>>>>> +        lhs = gimple_call_lhs (stmt);\n>>>>> +        location_t loc = gimple_location (stmt);\n>>>>> +\n>>>>> +        tree arg1_type = TREE_TYPE (arg1);\n>>>>> +        tree lhs_type = TREE_TYPE (lhs);\n>>>>> +\n>>>>> +        /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n>>>>> +           the tree using the value from arg0.  The resulting type will match\n>>>>> +           the type of arg1.  */\n>>>>> +        tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n>>>>> +        g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n>>>>> +        gimple_set_location (g, loc);\n>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>>> +        tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>>>>> +        g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n>>>>> +                                 temp_offset);\n>>>>> +        gimple_set_location (g, loc);\n>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>>> +\n>>>>> +        /* Mask off any lower bits from the address.  */\n>>>>> +        tree alignment_mask = build_int_cst (arg1_type, -16);\n>>>>> +        tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>>>>> +        g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n>>>>> +                                temp_addr, alignment_mask);\n>>>>> +        gimple_set_location (g, loc);\n>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>> \n>>>> You could use\n>>>> \n>>>> gimple_seq stmts = NULL;\n>>>> tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);\n>>>> tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,\n>>>> arg1_type, arg1, temp_offset);\n>>>> tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,\n>>>> arg1_type, temp_addr, build_int_cst (arg1_type, -16));\n>>>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>>>> \n>>>>> +        /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n>>>>> +           take an offset, but since we've already incorporated the offset\n>>>>> +           above, here we just pass in a zero.  */\n>>>>> +        g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n>>>>> +                                               build_int_cst (arg1_type, 0)));\n>>>> \n>>>> are you sure about arg1_type here?  I'm sure not.  For\n>>>> \n>>>> ... foo (struct S *p)\n>>>> {\n>>>> return __builtin_lvx_v2df (4, (double *)p);\n>>>> }\n>>>> \n>>>> you'd end up with p as arg1 and thus struct S * as arg1_type and thus\n>>>> TBAA using 'struct S' to access the memory.\n>>> \n>>> Hm, is that so?  Wouldn't arg1_type be double* since arg1 is (double *)p?\n>>> Will, you should probably test this example and see, but I'm pretty confident\n>>> about this (see below).\n>> \n>> But, as I should have suspected, you're right.  For some reason\n>> gimple_call_arg is returning p, stripped of the cast information where the\n>> user asserted that p points to a double*.\n>> \n>> Can you explain to me why this should be so?  I assume that somebody\n>> has decided to strip_nops the argument and lose the cast.\n> \n> pointer types have no meaning in GIMPLE so we aggressively prune them.\n> \n>> Using ptr_type_node loses all type information, so that would be a\n>> regression from what we do today.  In some cases we could reconstruct\n>> that this was necessarily, say, a double*, but I don't know how we would\n>> recover the signedness for an integer type.\n> \n> How did we handle the expansion previously - ah - it was done earlier\n> in the C FE.  So why are you moving it to GIMPLE?  The function is called\n> resolve_overloaded_builtin - what kind of overloading do you resolve here?\n> As said argument types might not be preserved.\n\nThe AltiVec builtins allow overloaded names based on the argument types,\nusing a special callout during parsing to convert the overloaded names to\ntype-specific names.  Historically these have then remained builtin calls\nuntil RTL expansion, which loses a lot of useful optimization.  Will has been\ngradually implementing gimple folding for these builtins so that we can\noptimize simple vector arithmetic and so on.  The overloading is still dealt\nwith during parsing.\n\nAs an example:\n\n  double a[64];\n  vector double x = vec_ld (0, a);\n\nwill get translated into\n\n  vector double x = __builtin_altivec_lvx_v2df (0, a);\n\nand \n\n  unsigned char b[64];\n  vector unsigned char y = vec_ld (0, b);\n\nwill get translated into\n\n  vector unsigned char y = __builtin_altivec_lvx_v16qi (0, b);\n\nSo in resolving the overloading we still maintain the type info for arg1.\n\nEarlier I had dealt with the performance issue in a different way for the \nvec_ld and vec_st overloaded builtins, which created the rather grotty \ncode in rs6000-c.c to modify the parse trees instead.  My hope was that\nwe could simplify the code by having Will deal with them as gimple folds\ninstead.  But if in so doing we lose type information, that may not be the\nright call.\n\nHowever, since you say that gimple aggressively removes the casts \nfrom pointer types, perhaps the code that we see in early gimple from\nthe existing method might also be missing the type information?  Will,\nit would be worth looking at that code to see.  If it's no different then\nperhaps we still go ahead with the folding.\n\nAnother note for Will:  The existing code gives up when -maltivec=be has\nbeen specified, and you probably want to do that as well.  That may be\nwhy you initially turned off big endian -- it is easy to misread that code.\n-maltivec=be is VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN.\n\nThanks,\nBill\n> \n> Richard.\n> \n>> Bill\n>>> \n>>>> \n>>>> I think if the builtins have any TBAA constraints you need to build those\n>>>> explicitely, if not, you should use ptr_type_node aka no TBAA.\n>>> \n>>> The type signatures are constrained during parsing, so we should only\n>>> see allowed pointer types on arg1 by the time we get to gimple folding.  I\n>>> think that using arg1_type should work, but I am probably missing\n>>> something subtle, so please feel free to whack me on the temple until\n>>> I get it. :-)\n>>> \n>>> Bill\n>>>> \n>>>> Richard.\n>>>> \n>>>>> +        gimple_set_location (g, loc);\n>>>>> +        gsi_replace (gsi, g, true);\n>>>>> +\n>>>>> +        return true;\n>>>>> +\n>>>>> +      }\n>>>>> +\n>>>>>   default:\n>>>>>      if (TARGET_DEBUG_BUILTIN)\n>>>>>         fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n>>>>>                  fn_code, fn_name1, fn_name2);\n>>>>>     break;","headers":{"Return-Path":"<gcc-patches-return-462146-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-462146-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"tYIQMvqD\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xtLkJ61wGz9sPk\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 15 Sep 2017 00:38:32 +1000 (AEST)","(qmail 121478 invoked by alias); 14 Sep 2017 14:38:24 -0000","(qmail 121410 invoked by uid 89); 14 Sep 2017 14:38:23 -0000","from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)\n\t(148.163.156.1) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tThu, 14 Sep 2017 14:38:20 +0000","from pps.filterd (m0098393.ppops.net [127.0.0.1])\tby\n\tmx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id\n\tv8EEam30091862\tfor <gcc-patches@gcc.gnu.org>;\n\tThu, 14 Sep 2017 10:38:19 -0400","from e37.co.us.ibm.com (e37.co.us.ibm.com [32.97.110.158])\tby\n\tmx0a-001b2d01.pphosted.com with ESMTP id\n\t2cyrp06cy8-1\t(version=TLSv1.2 cipher=AES256-SHA bits=256\n\tverify=NOT)\tfor <gcc-patches@gcc.gnu.org>;\n\tThu, 14 Sep 2017 10:38:18 -0400","from localhost\tby e37.co.us.ibm.com with IBM ESMTP SMTP Gateway:\n\tAuthorized Use Only! Violators will be prosecuted\tfor\n\t<gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;\n\tThu, 14 Sep 2017 08:38:18 -0600","from b03cxnp08025.gho.boulder.ibm.com (9.17.130.17)\tby\n\te37.co.us.ibm.com (192.168.1.137) with IBM ESMTP SMTP\n\tGateway: Authorized Use Only! Violators will be prosecuted;\n\tThu, 14 Sep 2017 08:38:14 -0600","from b03ledav004.gho.boulder.ibm.com\n\t(b03ledav004.gho.boulder.ibm.com [9.17.130.235])\tby\n\tb03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0)\n\twith ESMTP id v8EEcEx91966446; Thu, 14 Sep 2017 07:38:14 -0700","from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1])\tby\n\tIMSVA (Postfix) with ESMTP id 2396678041;\n\tThu, 14 Sep 2017 08:38:14 -0600 (MDT)","from bigmac.rchland.ibm.com (unknown [9.10.86.143])\tby\n\tb03ledav004.gho.boulder.ibm.com (Postfix) with ESMTPS id\n\tC4DED78037; Thu, 14 Sep 2017 08:38:13 -0600 (MDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; q=dns; s=\n\tdefault; b=rDSLCUVBz+v5m1OlEUUF2Kid6ZFBHbBchyFIlRJDZ2CmECAHjuMrX\n\tFiFEK+OqboJKd/XVk2VryAF9ntPofSQxxKitjG3Vqft3QHycsEENTo93ripWxd+L\n\tCgD4GKNkMnYl7yQFFDAnlYtElInbLd/c7M0BiiOjbsyk+ChH0WNWoU=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; s=default;\n\tbh=8WkFSON9vWzf1Jeo7x5FmELUVZg=; b=tYIQMvqDa7kFzMnkQ/pVuiNZD+rd\n\tYOjPPzeZbEpdy1dE3WMa6wzbMfHjprxOv5b7KPl/EGauqYmHnuQDUx31MsmyrfWa\n\to/MEJlsZsnvgvrB4VaBdTjEiVHEcVgCC1MtQxQUje9qzXwVDInPhcijQhegtIzIc\n\tERoVAL9161z7ITg=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-24.7 required=5.0 tests=AWL, BAYES_00,\n\tGIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,\n\tKAM_LAZY_DOMAIN_SECURITY,\n\tRCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=","X-HELO":"mx0a-001b2d01.pphosted.com","Content-Type":"text/plain; charset=us-ascii","Mime-Version":"1.0 (Mac OS X Mail 10.3 \\(3273\\))","Subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","From":"Bill Schmidt <wschmidt@linux.vnet.ibm.com>","In-Reply-To":"<CAFiYyc05e7Pg-9CxGY=czF66AMEJviNXy2d9UY-ty5NKgP-N2w@mail.gmail.com>","Date":"Thu, 14 Sep 2017 09:38:13 -0500","Cc":"will_schmidt@vnet.ibm.com, GCC Patches <gcc-patches@gcc.gnu.org>,\n\tSegher Boessenkool <segher@kernel.crashing.org>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","Content-Transfer-Encoding":"quoted-printable","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>\n\t<1505250505.14827.191.camel@brimstone.rchland.ibm.com>\n\t<CAFiYyc18DrEDhKH1eeTFDq7r=JOE0aHVjZqMGNkf8+DPA5O23w@mail.gmail.com>\n\t<24A6439E-6268-44F8-92CE-EF0F22DA5773@linux.vnet.ibm.com>\n\t<B1261C39-9D9D-4D34-A65F-FC48BC88CEF9@linux.vnet.ibm.com>\n\t<CAFiYyc05e7Pg-9CxGY=czF66AMEJviNXy2d9UY-ty5NKgP-N2w@mail.gmail.com>","To":"Richard Biener <richard.guenther@gmail.com>","X-TM-AS-GCONF":"00","x-cbid":"17091414-0024-0000-0000-000017319742","X-IBM-SpamModules-Scores":"","X-IBM-SpamModules-Versions":"BY=3.00007732; HX=3.00000241; KW=3.00000007;\n\tPH=3.00000004; SC=3.00000227; SDB=6.00916936; UDB=6.00460499;\n\tIPR=6.00697163; BA=6.00005588; NDR=6.00000001; ZLA=6.00000005;\n\tZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;\n\tZU=6.00000002; MB=3.00017151; XFM=3.00000015;\n\tUTC=2017-09-14 14:38:16","X-IBM-AV-DETECTION":"SAVI=unused REMOTE=unused XFE=unused","x-cbparentid":"17091414-0025-0000-0000-00004CB710B4","Message-Id":"<73D3C195-E029-4050-9764-57C07845DBEB@linux.vnet.ibm.com>","X-Proofpoint-Virus-Version":"vendor=fsecure engine=2.50.10432:, ,\n\tdefinitions=2017-09-14_04:, , signatures=0","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n\tspamscore=0 suspectscore=0 malwarescore=0 phishscore=0\n\tadultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx\n\tscancount=1 engine=8.0.1-1707230000\n\tdefinitions=main-1709140217","X-IsSubscribed":"yes"}},{"id":1768834,"web_url":"http://patchwork.ozlabs.org/comment/1768834/","msgid":"<1505422271.26707.17.camel@brimstone.rchland.ibm.com>","list_archive_url":null,"date":"2017-09-14T20:51:11","subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","submitter":{"id":3241,"url":"http://patchwork.ozlabs.org/api/people/3241/","name":"will schmidt","email":"will_schmidt@vnet.ibm.com"},"content":"On Thu, 2017-09-14 at 09:38 -0500, Bill Schmidt wrote:\n> On Sep 14, 2017, at 5:15 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n> > \n> > On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt\n> > <wschmidt@linux.vnet.ibm.com> wrote:\n> >> On Sep 13, 2017, at 10:40 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:\n> >>> \n> >>> On Sep 13, 2017, at 7:23 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n> >>>> \n> >>>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt\n> >>>> <will_schmidt@vnet.ibm.com> wrote:\n> >>>>> Hi,\n> >>>>> \n> >>>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE\n> >>>>> \n> >>>>> Folding of vector loads in GIMPLE.\n> >>>>> \n> >>>>> Add code to handle gimple folding for the vec_ld builtins.\n> >>>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\n> >>>>> comments have been adjusted slightly so they continue to read OK for the\n> >>>>> existing vec_st code.\n> >>>>> \n> >>>>> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\n> >>>>> tests which have been posted separately.\n> >>>>> \n> >>>>> For V2 of this patch, I've removed the chunk of code that prohibited the\n> >>>>> gimple fold from occurring in BE environments.   This had fixed an issue\n> >>>>> for me earlier during my development of the code, and turns out this was\n> >>>>> not necessary.  I've sniff-tested after removing that check and it looks\n> >>>>> OK.\n> >>>>> \n> >>>>>> + /* Limit folding of loads to LE targets.  */\n> >>>>>> +      if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n> >>>>>> +        return false;\n> >>>>> \n> >>>>> I've restarted a regression test on this updated version.\n> >>>>> \n> >>>>> OK for trunk (assuming successful regression test completion)  ?\n> >>>>> \n> >>>>> Thanks,\n> >>>>> -Will\n> >>>>> \n> >>>>> [gcc]\n> >>>>> \n> >>>>>      2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n> >>>>> \n> >>>>>      * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n> >>>>>      for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n> >>>>>      * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n> >>>>>      Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n> >>>>> \n> >>>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\n> >>>>> index fbab0a2..bb8a77d 100644\n> >>>>> --- a/gcc/config/rs6000/rs6000-c.c\n> >>>>> +++ b/gcc/config/rs6000/rs6000-c.c\n> >>>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n> >>>>>                   convert (TREE_TYPE (stmt), arg0));\n> >>>>>     stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n> >>>>>     return stmt;\n> >>>>>   }\n> >>>>> \n> >>>>> -  /* Expand vec_ld into an expression that masks the address and\n> >>>>> -     performs the load.  We need to expand this early to allow\n> >>>>> +  /* Expand vec_st into an expression that masks the address and\n> >>>>> +     performs the store.  We need to expand this early to allow\n> >>>>>    the best aliasing, as by the time we get into RTL we no longer\n> >>>>>    are able to honor __restrict__, for example.  We may want to\n> >>>>>    consider this for all memory access built-ins.\n> >>>>> \n> >>>>>    When -maltivec=be is specified, or the wrong number of arguments\n> >>>>>    is provided, simply punt to existing built-in processing.  */\n> >>>>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n> >>>>> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n> >>>>> -      && nargs == 2)\n> >>>>> -    {\n> >>>>> -      tree arg0 = (*arglist)[0];\n> >>>>> -      tree arg1 = (*arglist)[1];\n> >>>>> -\n> >>>>> -      /* Strip qualifiers like \"const\" from the pointer arg.  */\n> >>>>> -      tree arg1_type = TREE_TYPE (arg1);\n> >>>>> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n> >>>>> -       goto bad;\n> >>>>> -\n> >>>>> -      tree inner_type = TREE_TYPE (arg1_type);\n> >>>>> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n> >>>>> -       {\n> >>>>> -         arg1_type = build_pointer_type (build_qualified_type (inner_type,\n> >>>>> -                                                               0));\n> >>>>> -         arg1 = fold_convert (arg1_type, arg1);\n> >>>>> -       }\n> >>>>> -\n> >>>>> -      /* Construct the masked address.  Let existing error handling take\n> >>>>> -        over if we don't have a constant offset.  */\n> >>>>> -      arg0 = fold (arg0);\n> >>>>> -\n> >>>>> -      if (TREE_CODE (arg0) == INTEGER_CST)\n> >>>>> -       {\n> >>>>> -         if (!ptrofftype_p (TREE_TYPE (arg0)))\n> >>>>> -           arg0 = build1 (NOP_EXPR, sizetype, arg0);\n> >>>>> -\n> >>>>> -         tree arg1_type = TREE_TYPE (arg1);\n> >>>>> -         if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n> >>>>> -           {\n> >>>>> -             arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n> >>>>> -             tree const0 = build_int_cstu (sizetype, 0);\n> >>>>> -             tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n> >>>>> -             arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n> >>>>> -           }\n> >>>>> -\n> >>>>> -         tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n> >>>>> -                                      arg1, arg0);\n> >>>>> -         tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n> >>>>> -                                         build_int_cst (arg1_type, -16));\n> >>>>> -\n> >>>>> -         /* Find the built-in to get the return type so we can convert\n> >>>>> -            the result properly (or fall back to default handling if the\n> >>>>> -            arguments aren't compatible).  */\n> >>>>> -         for (desc = altivec_overloaded_builtins;\n> >>>>> -              desc->code && desc->code != fcode; desc++)\n> >>>>> -           continue;\n> >>>>> -\n> >>>>> -         for (; desc->code == fcode; desc++)\n> >>>>> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n> >>>>> -               && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n> >>>>> -                                                   desc->op2)))\n> >>>>> -             {\n> >>>>> -               tree ret_type = rs6000_builtin_type (desc->ret_type);\n> >>>>> -               if (TYPE_MODE (ret_type) == V2DImode)\n> >>>>> -                 /* Type-based aliasing analysis thinks vector long\n> >>>>> -                    and vector long long are different and will put them\n> >>>>> -                    in distinct alias classes.  Force our return type\n> >>>>> -                    to be a may-alias type to avoid this.  */\n> >>>>> -                 ret_type\n> >>>>> -                   = build_pointer_type_for_mode (ret_type, Pmode,\n> >>>>> -                                                  true/*can_alias_all*/);\n> >>>>> -               else\n> >>>>> -                 ret_type = build_pointer_type (ret_type);\n> >>>>> -               aligned = build1 (NOP_EXPR, ret_type, aligned);\n> >>>>> -               tree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n> >>>>> -               return ret_val;\n> >>>>> -             }\n> >>>>> -       }\n> >>>>> -    }\n> >>>>> \n> >>>>> -  /* Similarly for stvx.  */\n> >>>>> if (fcode == ALTIVEC_BUILTIN_VEC_ST\n> >>>>>     && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n> >>>>>     && nargs == 3)\n> >>>>>   {\n> >>>>>     tree arg0 = (*arglist)[0];\n> >>>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\n> >>>>> index 1338371..1fb5f44 100644\n> >>>>> --- a/gcc/config/rs6000/rs6000.c\n> >>>>> +++ b/gcc/config/rs6000/rs6000.c\n> >>>>> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n> >>>>>      res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n> >>>>>      gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n> >>>>>      update_call_from_tree (gsi, res);\n> >>>>>      return true;\n> >>>>>     }\n> >>>>> +    /* Vector loads.  */\n> >>>>> +    case ALTIVEC_BUILTIN_LVX_V16QI:\n> >>>>> +    case ALTIVEC_BUILTIN_LVX_V8HI:\n> >>>>> +    case ALTIVEC_BUILTIN_LVX_V4SI:\n> >>>>> +    case ALTIVEC_BUILTIN_LVX_V4SF:\n> >>>>> +    case ALTIVEC_BUILTIN_LVX_V2DI:\n> >>>>> +    case ALTIVEC_BUILTIN_LVX_V2DF:\n> >>>>> +      {\n> >>>>> +        gimple *g;\n> >>>>> +        arg0 = gimple_call_arg (stmt, 0);  // offset\n> >>>>> +        arg1 = gimple_call_arg (stmt, 1);  // address\n> >>>>> +\n> >>>>> +        lhs = gimple_call_lhs (stmt);\n> >>>>> +        location_t loc = gimple_location (stmt);\n> >>>>> +\n> >>>>> +        tree arg1_type = TREE_TYPE (arg1);\n> >>>>> +        tree lhs_type = TREE_TYPE (lhs);\n> >>>>> +\n> >>>>> +        /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n> >>>>> +           the tree using the value from arg0.  The resulting type will match\n> >>>>> +           the type of arg1.  */\n> >>>>> +        tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n> >>>>> +        g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n> >>>>> +        gimple_set_location (g, loc);\n> >>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> >>>>> +        tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n> >>>>> +        g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n> >>>>> +                                 temp_offset);\n> >>>>> +        gimple_set_location (g, loc);\n> >>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> >>>>> +\n> >>>>> +        /* Mask off any lower bits from the address.  */\n> >>>>> +        tree alignment_mask = build_int_cst (arg1_type, -16);\n> >>>>> +        tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n> >>>>> +        g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n> >>>>> +                                temp_addr, alignment_mask);\n> >>>>> +        gimple_set_location (g, loc);\n> >>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n> >>>> \n> >>>> You could use\n> >>>> \n> >>>> gimple_seq stmts = NULL;\n> >>>> tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);\n> >>>> tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,\n> >>>> arg1_type, arg1, temp_offset);\n> >>>> tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,\n> >>>> arg1_type, temp_addr, build_int_cst (arg1_type, -16));\n> >>>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n> >>>> \n> >>>>> +        /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n> >>>>> +           take an offset, but since we've already incorporated the offset\n> >>>>> +           above, here we just pass in a zero.  */\n> >>>>> +        g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n> >>>>> +                                               build_int_cst (arg1_type, 0)));\n> >>>> \n> >>>> are you sure about arg1_type here?  I'm sure not.  For\n> >>>> \n> >>>> ... foo (struct S *p)\n> >>>> {\n> >>>> return __builtin_lvx_v2df (4, (double *)p);\n> >>>> }\n> >>>> \n> >>>> you'd end up with p as arg1 and thus struct S * as arg1_type and thus\n> >>>> TBAA using 'struct S' to access the memory.\n> >>> \n> >>> Hm, is that so?  Wouldn't arg1_type be double* since arg1 is (double *)p?\n> >>> Will, you should probably test this example and see, but I'm pretty confident\n> >>> about this (see below).\n> >> \n> >> But, as I should have suspected, you're right.  For some reason\n> >> gimple_call_arg is returning p, stripped of the cast information where the\n> >> user asserted that p points to a double*.\n> >> \n> >> Can you explain to me why this should be so?  I assume that somebody\n> >> has decided to strip_nops the argument and lose the cast.\n> > \n> > pointer types have no meaning in GIMPLE so we aggressively prune them.\n> > \n> >> Using ptr_type_node loses all type information, so that would be a\n> >> regression from what we do today.  In some cases we could reconstruct\n> >> that this was necessarily, say, a double*, but I don't know how we would\n> >> recover the signedness for an integer type.\n> > \n> > How did we handle the expansion previously - ah - it was done earlier\n> > in the C FE.  So why are you moving it to GIMPLE?  The function is called\n> > resolve_overloaded_builtin - what kind of overloading do you resolve here?\n> > As said argument types might not be preserved.\n> \n> The AltiVec builtins allow overloaded names based on the argument types,\n> using a special callout during parsing to convert the overloaded names to\n> type-specific names.  Historically these have then remained builtin calls\n> until RTL expansion, which loses a lot of useful optimization.  Will has been\n> gradually implementing gimple folding for these builtins so that we can\n> optimize simple vector arithmetic and so on.  The overloading is still dealt\n> with during parsing.\n> \n> As an example:\n> \n>   double a[64];\n>   vector double x = vec_ld (0, a);\n> \n> will get translated into\n> \n>   vector double x = __builtin_altivec_lvx_v2df (0, a);\n> \n> and \n> \n>   unsigned char b[64];\n>   vector unsigned char y = vec_ld (0, b);\n> \n> will get translated into\n> \n>   vector unsigned char y = __builtin_altivec_lvx_v16qi (0, b);\n> \n> So in resolving the overloading we still maintain the type info for arg1.\n> \n> Earlier I had dealt with the performance issue in a different way for the \n> vec_ld and vec_st overloaded builtins, which created the rather grotty \n> code in rs6000-c.c to modify the parse trees instead.  My hope was that\n> we could simplify the code by having Will deal with them as gimple folds\n> instead.  But if in so doing we lose type information, that may not be the\n> right call.\n> \n> However, since you say that gimple aggressively removes the casts \n> from pointer types, perhaps the code that we see in early gimple from\n> the existing method might also be missing the type information?  Will,\n> it would be worth looking at that code to see.  If it's no different then\n> perhaps we still go ahead with the folding.\n\nThe rs6000-c.c version of the code did not fold unless arg0 was\nconstant; and if it was a constant, it appears the operation got turned\ndirectly into a * reference.  So there isn't a good before/after compare\nthere.\n\nWhat I see:\n\t  return vec_ld (ll1, (vector double *)p);\n\nat gimple-time after the rs6000-c.c folding was a mostly un-folded \n\n  D.3207 = __builtin_altivec_lvx_v2dfD.1443 (ll1D.3192, pD.3193);\n\nwhile this, with a constant value for arg0:\n\treturn vec_ld (16, (vector double *)p);\nat gimple time after rs6000-c.c folding became a reference:\n  _1 = p + 16;\n  _2 = _1 & -16B;\n  D.3196 = *_2;\n\nwith the rs6000.c gimple folding code (the changes I've got locally),\nthe before/after with arg0 is constant reads the same.   When arg0 is a\nvariable:\n\t  return vec_ld (ll1, (vector double *)p);\nat dump-gimple time it then becomes:\n  D.3208 = (sizetype) ll1D.3192;\n  D.3209 = pD.3193 + D.3208;\n  D.3210 = D.3209 & -16B;\n  D.3207 = MEM[(struct S *)D.3210];\n\nAnd if I change the code such that arg1_type is instead ptr_type_node:\n  D.3207 = MEM[(voidD.44 *)D.3210];\n\nSo...  \n\n> Another note for Will:  The existing code gives up when -maltivec=be has\n> been specified, and you probably want to do that as well.  That may be\n> why you initially turned off big endian -- it is easy to misread that code.\n> -maltivec=be is VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN.\n\nYeah, I apparently inverted and confused the logic when I made that\nchange.  My current snippet reads as:\n\n         /* Do not fold for -maltivec=be on LE targets.  */\n         if (VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN)\n           return false;\n\n\n> Thanks,\n> Bill\n> > \n> > Richard.\n> > \n> >> Bill\n> >>> \n> >>>> \n> >>>> I think if the builtins have any TBAA constraints you need to build those\n> >>>> explicitely, if not, you should use ptr_type_node aka no TBAA.\n> >>> \n> >>> The type signatures are constrained during parsing, so we should only\n> >>> see allowed pointer types on arg1 by the time we get to gimple folding.  I\n> >>> think that using arg1_type should work, but I am probably missing\n> >>> something subtle, so please feel free to whack me on the temple until\n> >>> I get it. :-)\n> >>> \n> >>> Bill\n> >>>> \n> >>>> Richard.\n> >>>> \n> >>>>> +        gimple_set_location (g, loc);\n> >>>>> +        gsi_replace (gsi, g, true);\n> >>>>> +\n> >>>>> +        return true;\n> >>>>> +\n> >>>>> +      }\n> >>>>> +\n> >>>>>   default:\n> >>>>>      if (TARGET_DEBUG_BUILTIN)\n> >>>>>         fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n> >>>>>                  fn_code, fn_name1, fn_name2);\n> >>>>>     break;\n>","headers":{"Return-Path":"<gcc-patches-return-462178-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-462178-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"xUdqtImq\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xtW0n0VQ0z9s82\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 15 Sep 2017 06:51:35 +1000 (AEST)","(qmail 39562 invoked by alias); 14 Sep 2017 20:51:21 -0000","(qmail 39503 invoked by uid 89); 14 Sep 2017 20:51:21 -0000","from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)\n\t(148.163.158.5) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tThu, 14 Sep 2017 20:51:18 +0000","from pps.filterd (m0098413.ppops.net [127.0.0.1])\tby\n\tmx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id\n\tv8EKnRM5116274\tfor <gcc-patches@gcc.gnu.org>;\n\tThu, 14 Sep 2017 16:51:16 -0400","from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150])\tby\n\tmx0b-001b2d01.pphosted.com with ESMTP id\n\t2cyuy185qv-1\t(version=TLSv1.2 cipher=AES256-SHA bits=256\n\tverify=NOT)\tfor <gcc-patches@gcc.gnu.org>;\n\tThu, 14 Sep 2017 16:51:16 -0400","from localhost\tby e32.co.us.ibm.com with IBM ESMTP SMTP Gateway:\n\tAuthorized Use Only! Violators will be prosecuted\tfor\n\t<gcc-patches@gcc.gnu.org> from <will_schmidt@vnet.ibm.com>;\n\tThu, 14 Sep 2017 14:51:15 -0600","from b03cxnp07029.gho.boulder.ibm.com (9.17.130.16)\tby\n\te32.co.us.ibm.com (192.168.1.132) with IBM ESMTP SMTP\n\tGateway: Authorized Use Only! Violators will be prosecuted;\n\tThu, 14 Sep 2017 14:51:12 -0600","from b03ledav003.gho.boulder.ibm.com\n\t(b03ledav003.gho.boulder.ibm.com [9.17.130.234])\tby\n\tb03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0)\n\twith ESMTP id v8EKpCaK7799136; Thu, 14 Sep 2017 13:51:12 -0700","from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1])\tby\n\tIMSVA (Postfix) with ESMTP id 34BA16A043;\n\tThu, 14 Sep 2017 14:51:12 -0600 (MDT)","from [9.10.86.107] (unknown [9.10.86.107])\tby\n\tb03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP id\n\tC72B86A03C; Thu, 14 Sep 2017 14:51:11 -0600 (MDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:subject:from:reply-to:to:cc:in-reply-to:references:content-type\n\t:date:mime-version:content-transfer-encoding:message-id; q=dns;\n\ts=default; b=KdZNaQrCQ6qL9SLBCmVjw1OyWn78HTI0T8acLYfaBQd41RuZDa\n\tVkaTU6BKWJKVWr8yKyW34i8777Gk+vhdxc7Fz5jOmB57LBkgJUyAVEakhGMdbc2p\n\t8CAIyf0V9dUE++Lr2B/ZGw8CXsTWpeKV9+zOxt1Zem9piBGhjsGbjbueQ=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:subject:from:reply-to:to:cc:in-reply-to:references:content-type\n\t:date:mime-version:content-transfer-encoding:message-id; s=\n\tdefault; bh=YqtKw0QIkOaMgRuPfM+V+XtGk94=; b=xUdqtImqiuNmn+jz3Z5z\n\tOmf/T6etmk2qKTR73rKaaPtd3tMurKQlEWQ5zn2SJGaBAtjDW4SkO4LkZ8bma3cr\n\tcLEjCkb9Ei6cG5ItwIEebB0Sgfp2ke56w+HYotmyzj+Ukl3eg9W1MqPlTEMSIW2X\n\tojR6oPeEvatbFMSeCCuoPbA=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-26.6 required=5.0 tests=AWL, BAYES_00,\n\tGIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,\n\tRCVD_IN_DNSWL_LOW,\n\tSPF_PASS autolearn=ham version=3.3.2 spammy=","X-HELO":"mx0a-001b2d01.pphosted.com","Subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","From":"Will Schmidt <will_schmidt@vnet.ibm.com>","Reply-To":"will_schmidt@vnet.ibm.com","To":"Bill Schmidt <wschmidt@linux.vnet.ibm.com>","Cc":"Richard Biener <richard.guenther@gmail.com>,\n\tGCC Patches <gcc-patches@gcc.gnu.org>,\n\tSegher Boessenkool <segher@kernel.crashing.org>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","In-Reply-To":"<73D3C195-E029-4050-9764-57C07845DBEB@linux.vnet.ibm.com>","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>\t\n\t<1505250505.14827.191.camel@brimstone.rchland.ibm.com>\t\n\t<CAFiYyc18DrEDhKH1eeTFDq7r=JOE0aHVjZqMGNkf8+DPA5O23w@mail.gmail.com>\t\n\t<24A6439E-6268-44F8-92CE-EF0F22DA5773@linux.vnet.ibm.com>\t\n\t<B1261C39-9D9D-4D34-A65F-FC48BC88CEF9@linux.vnet.ibm.com>\t\n\t<CAFiYyc05e7Pg-9CxGY=czF66AMEJviNXy2d9UY-ty5NKgP-N2w@mail.gmail.com>\t\n\t<73D3C195-E029-4050-9764-57C07845DBEB@linux.vnet.ibm.com>","Content-Type":"text/plain; charset=\"UTF-8\"","Date":"Thu, 14 Sep 2017 15:51:11 -0500","Mime-Version":"1.0","Content-Transfer-Encoding":"7bit","X-TM-AS-GCONF":"00","x-cbid":"17091420-0004-0000-0000-000012EC97E9","X-IBM-SpamModules-Scores":"","X-IBM-SpamModules-Versions":"BY=3.00007735; HX=3.00000241; KW=3.00000007;\n\tPH=3.00000004; SC=3.00000227; SDB=6.00917060; UDB=6.00460573;\n\tIPR=6.00697287; BA=6.00005589; NDR=6.00000001; ZLA=6.00000005;\n\tZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;\n\tZU=6.00000002; MB=3.00017157; XFM=3.00000015;\n\tUTC=2017-09-14 20:51:14","X-IBM-AV-DETECTION":"SAVI=unused REMOTE=unused XFE=unused","x-cbparentid":"17091420-0005-0000-0000-00008417A96D","Message-Id":"<1505422271.26707.17.camel@brimstone.rchland.ibm.com>","X-Proofpoint-Virus-Version":"vendor=fsecure engine=2.50.10432:, ,\n\tdefinitions=2017-09-14_06:, , signatures=0","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n\tspamscore=0 suspectscore=0 malwarescore=0 phishscore=0\n\tadultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx\n\tscancount=1 engine=8.0.1-1707230000\n\tdefinitions=main-1709140311","X-IsSubscribed":"yes"}},{"id":1769065,"web_url":"http://patchwork.ozlabs.org/comment/1769065/","msgid":"<CAFiYyc3qisHPi62SN=PX8qa4gTzMuXTnUOSTd5dd4RhnZqA9Tg@mail.gmail.com>","list_archive_url":null,"date":"2017-09-15T09:13:43","subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","submitter":{"id":1765,"url":"http://patchwork.ozlabs.org/api/people/1765/","name":"Richard Biener","email":"richard.guenther@gmail.com"},"content":"On Thu, Sep 14, 2017 at 4:38 PM, Bill Schmidt\n<wschmidt@linux.vnet.ibm.com> wrote:\n> On Sep 14, 2017, at 5:15 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n>>\n>> On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt\n>> <wschmidt@linux.vnet.ibm.com> wrote:\n>>> On Sep 13, 2017, at 10:40 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:\n>>>>\n>>>> On Sep 13, 2017, at 7:23 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n>>>>>\n>>>>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt\n>>>>> <will_schmidt@vnet.ibm.com> wrote:\n>>>>>> Hi,\n>>>>>>\n>>>>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE\n>>>>>>\n>>>>>> Folding of vector loads in GIMPLE.\n>>>>>>\n>>>>>> Add code to handle gimple folding for the vec_ld builtins.\n>>>>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\n>>>>>> comments have been adjusted slightly so they continue to read OK for the\n>>>>>> existing vec_st code.\n>>>>>>\n>>>>>> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\n>>>>>> tests which have been posted separately.\n>>>>>>\n>>>>>> For V2 of this patch, I've removed the chunk of code that prohibited the\n>>>>>> gimple fold from occurring in BE environments.   This had fixed an issue\n>>>>>> for me earlier during my development of the code, and turns out this was\n>>>>>> not necessary.  I've sniff-tested after removing that check and it looks\n>>>>>> OK.\n>>>>>>\n>>>>>>> + /* Limit folding of loads to LE targets.  */\n>>>>>>> +      if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n>>>>>>> +        return false;\n>>>>>>\n>>>>>> I've restarted a regression test on this updated version.\n>>>>>>\n>>>>>> OK for trunk (assuming successful regression test completion)  ?\n>>>>>>\n>>>>>> Thanks,\n>>>>>> -Will\n>>>>>>\n>>>>>> [gcc]\n>>>>>>\n>>>>>>      2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n>>>>>>\n>>>>>>      * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n>>>>>>      for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n>>>>>>      * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n>>>>>>      Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n>>>>>>\n>>>>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\n>>>>>> index fbab0a2..bb8a77d 100644\n>>>>>> --- a/gcc/config/rs6000/rs6000-c.c\n>>>>>> +++ b/gcc/config/rs6000/rs6000-c.c\n>>>>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n>>>>>>                   convert (TREE_TYPE (stmt), arg0));\n>>>>>>     stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n>>>>>>     return stmt;\n>>>>>>   }\n>>>>>>\n>>>>>> -  /* Expand vec_ld into an expression that masks the address and\n>>>>>> -     performs the load.  We need to expand this early to allow\n>>>>>> +  /* Expand vec_st into an expression that masks the address and\n>>>>>> +     performs the store.  We need to expand this early to allow\n>>>>>>    the best aliasing, as by the time we get into RTL we no longer\n>>>>>>    are able to honor __restrict__, for example.  We may want to\n>>>>>>    consider this for all memory access built-ins.\n>>>>>>\n>>>>>>    When -maltivec=be is specified, or the wrong number of arguments\n>>>>>>    is provided, simply punt to existing built-in processing.  */\n>>>>>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n>>>>>> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>>>>>> -      && nargs == 2)\n>>>>>> -    {\n>>>>>> -      tree arg0 = (*arglist)[0];\n>>>>>> -      tree arg1 = (*arglist)[1];\n>>>>>> -\n>>>>>> -      /* Strip qualifiers like \"const\" from the pointer arg.  */\n>>>>>> -      tree arg1_type = TREE_TYPE (arg1);\n>>>>>> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n>>>>>> -       goto bad;\n>>>>>> -\n>>>>>> -      tree inner_type = TREE_TYPE (arg1_type);\n>>>>>> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n>>>>>> -       {\n>>>>>> -         arg1_type = build_pointer_type (build_qualified_type (inner_type,\n>>>>>> -                                                               0));\n>>>>>> -         arg1 = fold_convert (arg1_type, arg1);\n>>>>>> -       }\n>>>>>> -\n>>>>>> -      /* Construct the masked address.  Let existing error handling take\n>>>>>> -        over if we don't have a constant offset.  */\n>>>>>> -      arg0 = fold (arg0);\n>>>>>> -\n>>>>>> -      if (TREE_CODE (arg0) == INTEGER_CST)\n>>>>>> -       {\n>>>>>> -         if (!ptrofftype_p (TREE_TYPE (arg0)))\n>>>>>> -           arg0 = build1 (NOP_EXPR, sizetype, arg0);\n>>>>>> -\n>>>>>> -         tree arg1_type = TREE_TYPE (arg1);\n>>>>>> -         if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n>>>>>> -           {\n>>>>>> -             arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n>>>>>> -             tree const0 = build_int_cstu (sizetype, 0);\n>>>>>> -             tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n>>>>>> -             arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n>>>>>> -           }\n>>>>>> -\n>>>>>> -         tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n>>>>>> -                                      arg1, arg0);\n>>>>>> -         tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n>>>>>> -                                         build_int_cst (arg1_type, -16));\n>>>>>> -\n>>>>>> -         /* Find the built-in to get the return type so we can convert\n>>>>>> -            the result properly (or fall back to default handling if the\n>>>>>> -            arguments aren't compatible).  */\n>>>>>> -         for (desc = altivec_overloaded_builtins;\n>>>>>> -              desc->code && desc->code != fcode; desc++)\n>>>>>> -           continue;\n>>>>>> -\n>>>>>> -         for (; desc->code == fcode; desc++)\n>>>>>> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n>>>>>> -               && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n>>>>>> -                                                   desc->op2)))\n>>>>>> -             {\n>>>>>> -               tree ret_type = rs6000_builtin_type (desc->ret_type);\n>>>>>> -               if (TYPE_MODE (ret_type) == V2DImode)\n>>>>>> -                 /* Type-based aliasing analysis thinks vector long\n>>>>>> -                    and vector long long are different and will put them\n>>>>>> -                    in distinct alias classes.  Force our return type\n>>>>>> -                    to be a may-alias type to avoid this.  */\n>>>>>> -                 ret_type\n>>>>>> -                   = build_pointer_type_for_mode (ret_type, Pmode,\n>>>>>> -                                                  true/*can_alias_all*/);\n>>>>>> -               else\n>>>>>> -                 ret_type = build_pointer_type (ret_type);\n>>>>>> -               aligned = build1 (NOP_EXPR, ret_type, aligned);\n>>>>>> -               tree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n>>>>>> -               return ret_val;\n>>>>>> -             }\n>>>>>> -       }\n>>>>>> -    }\n>>>>>>\n>>>>>> -  /* Similarly for stvx.  */\n>>>>>> if (fcode == ALTIVEC_BUILTIN_VEC_ST\n>>>>>>     && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>>>>>>     && nargs == 3)\n>>>>>>   {\n>>>>>>     tree arg0 = (*arglist)[0];\n>>>>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\n>>>>>> index 1338371..1fb5f44 100644\n>>>>>> --- a/gcc/config/rs6000/rs6000.c\n>>>>>> +++ b/gcc/config/rs6000/rs6000.c\n>>>>>> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n>>>>>>      res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n>>>>>>      gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>>>>>>      update_call_from_tree (gsi, res);\n>>>>>>      return true;\n>>>>>>     }\n>>>>>> +    /* Vector loads.  */\n>>>>>> +    case ALTIVEC_BUILTIN_LVX_V16QI:\n>>>>>> +    case ALTIVEC_BUILTIN_LVX_V8HI:\n>>>>>> +    case ALTIVEC_BUILTIN_LVX_V4SI:\n>>>>>> +    case ALTIVEC_BUILTIN_LVX_V4SF:\n>>>>>> +    case ALTIVEC_BUILTIN_LVX_V2DI:\n>>>>>> +    case ALTIVEC_BUILTIN_LVX_V2DF:\n>>>>>> +      {\n>>>>>> +        gimple *g;\n>>>>>> +        arg0 = gimple_call_arg (stmt, 0);  // offset\n>>>>>> +        arg1 = gimple_call_arg (stmt, 1);  // address\n>>>>>> +\n>>>>>> +        lhs = gimple_call_lhs (stmt);\n>>>>>> +        location_t loc = gimple_location (stmt);\n>>>>>> +\n>>>>>> +        tree arg1_type = TREE_TYPE (arg1);\n>>>>>> +        tree lhs_type = TREE_TYPE (lhs);\n>>>>>> +\n>>>>>> +        /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n>>>>>> +           the tree using the value from arg0.  The resulting type will match\n>>>>>> +           the type of arg1.  */\n>>>>>> +        tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n>>>>>> +        g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n>>>>>> +        gimple_set_location (g, loc);\n>>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>>>> +        tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>>>>>> +        g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n>>>>>> +                                 temp_offset);\n>>>>>> +        gimple_set_location (g, loc);\n>>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>>>> +\n>>>>>> +        /* Mask off any lower bits from the address.  */\n>>>>>> +        tree alignment_mask = build_int_cst (arg1_type, -16);\n>>>>>> +        tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>>>>>> +        g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n>>>>>> +                                temp_addr, alignment_mask);\n>>>>>> +        gimple_set_location (g, loc);\n>>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>>>\n>>>>> You could use\n>>>>>\n>>>>> gimple_seq stmts = NULL;\n>>>>> tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);\n>>>>> tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,\n>>>>> arg1_type, arg1, temp_offset);\n>>>>> tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,\n>>>>> arg1_type, temp_addr, build_int_cst (arg1_type, -16));\n>>>>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>>>>>\n>>>>>> +        /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n>>>>>> +           take an offset, but since we've already incorporated the offset\n>>>>>> +           above, here we just pass in a zero.  */\n>>>>>> +        g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n>>>>>> +                                               build_int_cst (arg1_type, 0)));\n>>>>>\n>>>>> are you sure about arg1_type here?  I'm sure not.  For\n>>>>>\n>>>>> ... foo (struct S *p)\n>>>>> {\n>>>>> return __builtin_lvx_v2df (4, (double *)p);\n>>>>> }\n>>>>>\n>>>>> you'd end up with p as arg1 and thus struct S * as arg1_type and thus\n>>>>> TBAA using 'struct S' to access the memory.\n>>>>\n>>>> Hm, is that so?  Wouldn't arg1_type be double* since arg1 is (double *)p?\n>>>> Will, you should probably test this example and see, but I'm pretty confident\n>>>> about this (see below).\n>>>\n>>> But, as I should have suspected, you're right.  For some reason\n>>> gimple_call_arg is returning p, stripped of the cast information where the\n>>> user asserted that p points to a double*.\n>>>\n>>> Can you explain to me why this should be so?  I assume that somebody\n>>> has decided to strip_nops the argument and lose the cast.\n>>\n>> pointer types have no meaning in GIMPLE so we aggressively prune them.\n>>\n>>> Using ptr_type_node loses all type information, so that would be a\n>>> regression from what we do today.  In some cases we could reconstruct\n>>> that this was necessarily, say, a double*, but I don't know how we would\n>>> recover the signedness for an integer type.\n>>\n>> How did we handle the expansion previously - ah - it was done earlier\n>> in the C FE.  So why are you moving it to GIMPLE?  The function is called\n>> resolve_overloaded_builtin - what kind of overloading do you resolve here?\n>> As said argument types might not be preserved.\n>\n> The AltiVec builtins allow overloaded names based on the argument types,\n> using a special callout during parsing to convert the overloaded names to\n> type-specific names.  Historically these have then remained builtin calls\n> until RTL expansion, which loses a lot of useful optimization.  Will has been\n> gradually implementing gimple folding for these builtins so that we can\n> optimize simple vector arithmetic and so on.  The overloading is still dealt\n> with during parsing.\n>\n> As an example:\n>\n>   double a[64];\n>   vector double x = vec_ld (0, a);\n>\n> will get translated into\n>\n>   vector double x = __builtin_altivec_lvx_v2df (0, a);\n>\n> and\n>\n>   unsigned char b[64];\n>   vector unsigned char y = vec_ld (0, b);\n>\n> will get translated into\n>\n>   vector unsigned char y = __builtin_altivec_lvx_v16qi (0, b);\n>\n> So in resolving the overloading we still maintain the type info for arg1.\n\nSo TBAA-wise the vec_ld is specced to use alias-set zero for this case\nas it loads from a unsinged char array?  Or is it alias-set zero because\nthe type of arg1 is unsigned char *?  What if the type of arg1 was\nstruct X *?\n\n> Earlier I had dealt with the performance issue in a different way for the\n> vec_ld and vec_st overloaded builtins, which created the rather grotty\n> code in rs6000-c.c to modify the parse trees instead.  My hope was that\n> we could simplify the code by having Will deal with them as gimple folds\n> instead.  But if in so doing we lose type information, that may not be the\n> right call.\n>\n> However, since you say that gimple aggressively removes the casts\n> from pointer types, perhaps the code that we see in early gimple from\n> the existing method might also be missing the type information?  Will,\n> it would be worth looking at that code to see.  If it's no different then\n> perhaps we still go ahead with the folding.\n\nAs I said you can't simply use the type of arg1 for the TBAA type.\nYou can conservatively use ptr_type_node (alias-set zero) or you\ncan use sth that you derive from the builtin used (is a supposedly\nexisting _v4si variant always subject to int * TBAA?)\n\n> Another note for Will:  The existing code gives up when -maltivec=be has\n> been specified, and you probably want to do that as well.  That may be\n> why you initially turned off big endian -- it is easy to misread that code.\n> -maltivec=be is VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN.\n>\n> Thanks,\n> Bill\n>>\n>> Richard.\n>>\n>>> Bill\n>>>>\n>>>>>\n>>>>> I think if the builtins have any TBAA constraints you need to build those\n>>>>> explicitely, if not, you should use ptr_type_node aka no TBAA.\n>>>>\n>>>> The type signatures are constrained during parsing, so we should only\n>>>> see allowed pointer types on arg1 by the time we get to gimple folding.  I\n>>>> think that using arg1_type should work, but I am probably missing\n>>>> something subtle, so please feel free to whack me on the temple until\n>>>> I get it. :-)\n>>>>\n>>>> Bill\n>>>>>\n>>>>> Richard.\n>>>>>\n>>>>>> +        gimple_set_location (g, loc);\n>>>>>> +        gsi_replace (gsi, g, true);\n>>>>>> +\n>>>>>> +        return true;\n>>>>>> +\n>>>>>> +      }\n>>>>>> +\n>>>>>>   default:\n>>>>>>      if (TARGET_DEBUG_BUILTIN)\n>>>>>>         fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n>>>>>>                  fn_code, fn_name1, fn_name2);\n>>>>>>     break;\n>","headers":{"Return-Path":"<gcc-patches-return-462207-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-462207-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"vh7QDN3N\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xtqTY4Ygxz9sPs\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 15 Sep 2017 19:14:08 +1000 (AEST)","(qmail 8236 invoked by alias); 15 Sep 2017 09:13:57 -0000","(qmail 8187 invoked by uid 89); 15 Sep 2017 09:13:49 -0000","from mail-wm0-f43.google.com (HELO mail-wm0-f43.google.com)\n\t(74.125.82.43) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tFri, 15 Sep 2017 09:13:46 +0000","by mail-wm0-f43.google.com with SMTP id g206so6621433wme.0 for\n\t<gcc-patches@gcc.gnu.org>; Fri, 15 Sep 2017 02:13:46 -0700 (PDT)","by 10.80.180.250 with HTTP; Fri, 15 Sep 2017 02:13:43 -0700 (PDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:mime-version:in-reply-to:references:from:date:message-id\n\t:subject:to:cc:content-type; q=dns; s=default; b=uAREGp1ED7tGchT\n\td4O6yVuQ1l8SXH648QSq1xtyfqqd7ZgBErwseO+0d/5PlONn5Qk+D6VbZcJm9h94\n\tInwkWQi0vWkfa5gSwLhaK3VRV5b7nvDPr97ZenBZ/frjqL7VGPxIrX95sJnSQPkQ\n\tjUMfuAT7XyjKRXLB74tRLFYnsGkQ=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:mime-version:in-reply-to:references:from:date:message-id\n\t:subject:to:cc:content-type; s=default; bh=Cd2UFLuaVwZ4j4A2qW1v7\n\t+KXDqI=; b=vh7QDN3NihqNWIaEU0/hUzZgzw8yXiu4CiznAruLrjlgmD/pD1ISu\n\tdqEjXvLajYePjqV3OS15eJWBn4hR7vxsAU+nvJ095trRhWWAU1QXLBoEX/GrNF6k\n\tJLqRdZI0xr9wFKwXCPqcFj7Z5L+vvHihCJq9YoD7btiGRGApXqqqSk=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-24.0 required=5.0 tests=AWL, BAYES_00,\n\tFREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2,\n\tGIT_PATCH_3, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM,\n\tSPF_PASS autolearn=ham version=3.3.2 spammy=","X-HELO":"mail-wm0-f43.google.com","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;\n\ts=20161025;\n\th=x-gm-message-state:mime-version:in-reply-to:references:from:date\n\t:message-id:subject:to:cc;\n\tbh=3S8qQIP/ApoD7le1yINwufRuJ30PfVzNes7MwYu7Dtk=;\n\tb=oyrTjZND9LAQ5C1beCWo4glJSIP3DoDhN4g4jjiOEpeRwu+xCy07bb4yEcgCNUV1dM\n\tJ3MgPpLr7hoB+uCRQx95x/aU7cjWRSar/fW5f6fqqpltncyG+iARqV8bGapnkqfHyIDi\n\tWdLUCEI3RwYYevAPLOVagx/5jT99wiga2zjPAQ7OEupsI6nwyyWm4dFn2l0Z+RKN2/m9\n\tljDLze0b1Gv0axLJhy8mNkQchlycDOcz+e9TXgouFfMG7z35CYSoK0BlMzPofF+qmvVw\n\tAhlLm8LYTYgIqaHDVAslFwAxXZs/fmu1OHpnk9bx6vCeuarG4gjqTwp+j4q5Vgut5SNm\n\t/uWA==","X-Gm-Message-State":"AHPjjUhCKzv/z1JpN/b9rTdwUnr94Cd5q019vNElrBZmw91N1ltcsZha\tqJQg8Ul9uRctA3nwfNQy/qXu5k3aRNDwxSt1URw=","X-Google-Smtp-Source":"ADKCNb4AHdy0hrJn2R5P7ksgNhRJkTQdAO3l6w3/cBw1zJbDYV+aT+QayQYnyBgwNVzG3G+hl0ojmwVl84EyZFwrVuo=","X-Received":"by 10.80.186.110 with SMTP id 43mr19570778eds.18.1505466824365;\n\tFri, 15 Sep 2017 02:13:44 -0700 (PDT)","MIME-Version":"1.0","In-Reply-To":"<73D3C195-E029-4050-9764-57C07845DBEB@linux.vnet.ibm.com>","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>\n\t<1505250505.14827.191.camel@brimstone.rchland.ibm.com>\n\t<CAFiYyc18DrEDhKH1eeTFDq7r=JOE0aHVjZqMGNkf8+DPA5O23w@mail.gmail.com>\n\t<24A6439E-6268-44F8-92CE-EF0F22DA5773@linux.vnet.ibm.com>\n\t<B1261C39-9D9D-4D34-A65F-FC48BC88CEF9@linux.vnet.ibm.com>\n\t<CAFiYyc05e7Pg-9CxGY=czF66AMEJviNXy2d9UY-ty5NKgP-N2w@mail.gmail.com>\n\t<73D3C195-E029-4050-9764-57C07845DBEB@linux.vnet.ibm.com>","From":"Richard Biener <richard.guenther@gmail.com>","Date":"Fri, 15 Sep 2017 11:13:43 +0200","Message-ID":"<CAFiYyc3qisHPi62SN=PX8qa4gTzMuXTnUOSTd5dd4RhnZqA9Tg@mail.gmail.com>","Subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","To":"Bill Schmidt <wschmidt@linux.vnet.ibm.com>","Cc":"will_schmidt@vnet.ibm.com, GCC Patches <gcc-patches@gcc.gnu.org>,\n\tSegher Boessenkool <segher@kernel.crashing.org>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","Content-Type":"text/plain; charset=\"UTF-8\"","X-IsSubscribed":"yes"}},{"id":1769177,"web_url":"http://patchwork.ozlabs.org/comment/1769177/","msgid":"<0B9CF26C-8F9E-4C4E-8F42-50B99DA3D7B8@linux.vnet.ibm.com>","list_archive_url":null,"date":"2017-09-15T13:16:45","subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","submitter":{"id":6459,"url":"http://patchwork.ozlabs.org/api/people/6459/","name":"Bill Schmidt","email":"wschmidt@linux.vnet.ibm.com"},"content":"On Sep 15, 2017, at 4:13 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n> \n> On Thu, Sep 14, 2017 at 4:38 PM, Bill Schmidt\n> <wschmidt@linux.vnet.ibm.com> wrote:\n>> On Sep 14, 2017, at 5:15 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n>>> \n>>> On Wed, Sep 13, 2017 at 10:14 PM, Bill Schmidt\n>>> <wschmidt@linux.vnet.ibm.com> wrote:\n>>>> On Sep 13, 2017, at 10:40 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:\n>>>>> \n>>>>> On Sep 13, 2017, at 7:23 AM, Richard Biener <richard.guenther@gmail.com> wrote:\n>>>>>> \n>>>>>> On Tue, Sep 12, 2017 at 11:08 PM, Will Schmidt\n>>>>>> <will_schmidt@vnet.ibm.com> wrote:\n>>>>>>> Hi,\n>>>>>>> \n>>>>>>> [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE\n>>>>>>> \n>>>>>>> Folding of vector loads in GIMPLE.\n>>>>>>> \n>>>>>>> Add code to handle gimple folding for the vec_ld builtins.\n>>>>>>> Remove the now obsoleted folding code for vec_ld from rs6000-c.c. Surrounding\n>>>>>>> comments have been adjusted slightly so they continue to read OK for the\n>>>>>>> existing vec_st code.\n>>>>>>> \n>>>>>>> The resulting code is specifically verified by the powerpc/fold-vec-ld-*.c\n>>>>>>> tests which have been posted separately.\n>>>>>>> \n>>>>>>> For V2 of this patch, I've removed the chunk of code that prohibited the\n>>>>>>> gimple fold from occurring in BE environments.   This had fixed an issue\n>>>>>>> for me earlier during my development of the code, and turns out this was\n>>>>>>> not necessary.  I've sniff-tested after removing that check and it looks\n>>>>>>> OK.\n>>>>>>> \n>>>>>>>> + /* Limit folding of loads to LE targets.  */\n>>>>>>>> +      if (BYTES_BIG_ENDIAN || VECTOR_ELT_ORDER_BIG)\n>>>>>>>> +        return false;\n>>>>>>> \n>>>>>>> I've restarted a regression test on this updated version.\n>>>>>>> \n>>>>>>> OK for trunk (assuming successful regression test completion)  ?\n>>>>>>> \n>>>>>>> Thanks,\n>>>>>>> -Will\n>>>>>>> \n>>>>>>> [gcc]\n>>>>>>> \n>>>>>>>     2017-09-12  Will Schmidt  <will_schmidt@vnet.ibm.com>\n>>>>>>> \n>>>>>>>     * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling\n>>>>>>>     for early folding of vector loads (ALTIVEC_BUILTIN_LVX_*).\n>>>>>>>     * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):\n>>>>>>>     Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_LD.\n>>>>>>> \n>>>>>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c\n>>>>>>> index fbab0a2..bb8a77d 100644\n>>>>>>> --- a/gcc/config/rs6000/rs6000-c.c\n>>>>>>> +++ b/gcc/config/rs6000/rs6000-c.c\n>>>>>>> @@ -6470,92 +6470,19 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,\n>>>>>>>                  convert (TREE_TYPE (stmt), arg0));\n>>>>>>>    stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);\n>>>>>>>    return stmt;\n>>>>>>>  }\n>>>>>>> \n>>>>>>> -  /* Expand vec_ld into an expression that masks the address and\n>>>>>>> -     performs the load.  We need to expand this early to allow\n>>>>>>> +  /* Expand vec_st into an expression that masks the address and\n>>>>>>> +     performs the store.  We need to expand this early to allow\n>>>>>>>   the best aliasing, as by the time we get into RTL we no longer\n>>>>>>>   are able to honor __restrict__, for example.  We may want to\n>>>>>>>   consider this for all memory access built-ins.\n>>>>>>> \n>>>>>>>   When -maltivec=be is specified, or the wrong number of arguments\n>>>>>>>   is provided, simply punt to existing built-in processing.  */\n>>>>>>> -  if (fcode == ALTIVEC_BUILTIN_VEC_LD\n>>>>>>> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>>>>>>> -      && nargs == 2)\n>>>>>>> -    {\n>>>>>>> -      tree arg0 = (*arglist)[0];\n>>>>>>> -      tree arg1 = (*arglist)[1];\n>>>>>>> -\n>>>>>>> -      /* Strip qualifiers like \"const\" from the pointer arg.  */\n>>>>>>> -      tree arg1_type = TREE_TYPE (arg1);\n>>>>>>> -      if (!POINTER_TYPE_P (arg1_type) && TREE_CODE (arg1_type) != ARRAY_TYPE)\n>>>>>>> -       goto bad;\n>>>>>>> -\n>>>>>>> -      tree inner_type = TREE_TYPE (arg1_type);\n>>>>>>> -      if (TYPE_QUALS (TREE_TYPE (arg1_type)) != 0)\n>>>>>>> -       {\n>>>>>>> -         arg1_type = build_pointer_type (build_qualified_type (inner_type,\n>>>>>>> -                                                               0));\n>>>>>>> -         arg1 = fold_convert (arg1_type, arg1);\n>>>>>>> -       }\n>>>>>>> -\n>>>>>>> -      /* Construct the masked address.  Let existing error handling take\n>>>>>>> -        over if we don't have a constant offset.  */\n>>>>>>> -      arg0 = fold (arg0);\n>>>>>>> -\n>>>>>>> -      if (TREE_CODE (arg0) == INTEGER_CST)\n>>>>>>> -       {\n>>>>>>> -         if (!ptrofftype_p (TREE_TYPE (arg0)))\n>>>>>>> -           arg0 = build1 (NOP_EXPR, sizetype, arg0);\n>>>>>>> -\n>>>>>>> -         tree arg1_type = TREE_TYPE (arg1);\n>>>>>>> -         if (TREE_CODE (arg1_type) == ARRAY_TYPE)\n>>>>>>> -           {\n>>>>>>> -             arg1_type = TYPE_POINTER_TO (TREE_TYPE (arg1_type));\n>>>>>>> -             tree const0 = build_int_cstu (sizetype, 0);\n>>>>>>> -             tree arg1_elt0 = build_array_ref (loc, arg1, const0);\n>>>>>>> -             arg1 = build1 (ADDR_EXPR, arg1_type, arg1_elt0);\n>>>>>>> -           }\n>>>>>>> -\n>>>>>>> -         tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg1_type,\n>>>>>>> -                                      arg1, arg0);\n>>>>>>> -         tree aligned = fold_build2_loc (loc, BIT_AND_EXPR, arg1_type, addr,\n>>>>>>> -                                         build_int_cst (arg1_type, -16));\n>>>>>>> -\n>>>>>>> -         /* Find the built-in to get the return type so we can convert\n>>>>>>> -            the result properly (or fall back to default handling if the\n>>>>>>> -            arguments aren't compatible).  */\n>>>>>>> -         for (desc = altivec_overloaded_builtins;\n>>>>>>> -              desc->code && desc->code != fcode; desc++)\n>>>>>>> -           continue;\n>>>>>>> -\n>>>>>>> -         for (; desc->code == fcode; desc++)\n>>>>>>> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)\n>>>>>>> -               && (rs6000_builtin_type_compatible (TREE_TYPE (arg1),\n>>>>>>> -                                                   desc->op2)))\n>>>>>>> -             {\n>>>>>>> -               tree ret_type = rs6000_builtin_type (desc->ret_type);\n>>>>>>> -               if (TYPE_MODE (ret_type) == V2DImode)\n>>>>>>> -                 /* Type-based aliasing analysis thinks vector long\n>>>>>>> -                    and vector long long are different and will put them\n>>>>>>> -                    in distinct alias classes.  Force our return type\n>>>>>>> -                    to be a may-alias type to avoid this.  */\n>>>>>>> -                 ret_type\n>>>>>>> -                   = build_pointer_type_for_mode (ret_type, Pmode,\n>>>>>>> -                                                  true/*can_alias_all*/);\n>>>>>>> -               else\n>>>>>>> -                 ret_type = build_pointer_type (ret_type);\n>>>>>>> -               aligned = build1 (NOP_EXPR, ret_type, aligned);\n>>>>>>> -               tree ret_val = build_indirect_ref (loc, aligned, RO_NULL);\n>>>>>>> -               return ret_val;\n>>>>>>> -             }\n>>>>>>> -       }\n>>>>>>> -    }\n>>>>>>> \n>>>>>>> -  /* Similarly for stvx.  */\n>>>>>>> if (fcode == ALTIVEC_BUILTIN_VEC_ST\n>>>>>>>    && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)\n>>>>>>>    && nargs == 3)\n>>>>>>>  {\n>>>>>>>    tree arg0 = (*arglist)[0];\n>>>>>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c\n>>>>>>> index 1338371..1fb5f44 100644\n>>>>>>> --- a/gcc/config/rs6000/rs6000.c\n>>>>>>> +++ b/gcc/config/rs6000/rs6000.c\n>>>>>>> @@ -16547,10 +16547,61 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)\n>>>>>>>     res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);\n>>>>>>>     gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>>>>>>>     update_call_from_tree (gsi, res);\n>>>>>>>     return true;\n>>>>>>>    }\n>>>>>>> +    /* Vector loads.  */\n>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V16QI:\n>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V8HI:\n>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V4SI:\n>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V4SF:\n>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V2DI:\n>>>>>>> +    case ALTIVEC_BUILTIN_LVX_V2DF:\n>>>>>>> +      {\n>>>>>>> +        gimple *g;\n>>>>>>> +        arg0 = gimple_call_arg (stmt, 0);  // offset\n>>>>>>> +        arg1 = gimple_call_arg (stmt, 1);  // address\n>>>>>>> +\n>>>>>>> +        lhs = gimple_call_lhs (stmt);\n>>>>>>> +        location_t loc = gimple_location (stmt);\n>>>>>>> +\n>>>>>>> +        tree arg1_type = TREE_TYPE (arg1);\n>>>>>>> +        tree lhs_type = TREE_TYPE (lhs);\n>>>>>>> +\n>>>>>>> +        /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create\n>>>>>>> +           the tree using the value from arg0.  The resulting type will match\n>>>>>>> +           the type of arg1.  */\n>>>>>>> +        tree temp_offset = create_tmp_reg_or_ssa_name (sizetype);\n>>>>>>> +        g = gimple_build_assign (temp_offset, NOP_EXPR, arg0);\n>>>>>>> +        gimple_set_location (g, loc);\n>>>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>>>>> +        tree temp_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>>>>>>> +        g = gimple_build_assign (temp_addr, POINTER_PLUS_EXPR, arg1,\n>>>>>>> +                                 temp_offset);\n>>>>>>> +        gimple_set_location (g, loc);\n>>>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>>>>> +\n>>>>>>> +        /* Mask off any lower bits from the address.  */\n>>>>>>> +        tree alignment_mask = build_int_cst (arg1_type, -16);\n>>>>>>> +        tree aligned_addr = create_tmp_reg_or_ssa_name (arg1_type);\n>>>>>>> +        g = gimple_build_assign (aligned_addr, BIT_AND_EXPR,\n>>>>>>> +                                temp_addr, alignment_mask);\n>>>>>>> +        gimple_set_location (g, loc);\n>>>>>>> +        gsi_insert_before (gsi, g, GSI_SAME_STMT);\n>>>>>> \n>>>>>> You could use\n>>>>>> \n>>>>>> gimple_seq stmts = NULL;\n>>>>>> tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);\n>>>>>> tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,\n>>>>>> arg1_type, arg1, temp_offset);\n>>>>>> tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,\n>>>>>> arg1_type, temp_addr, build_int_cst (arg1_type, -16));\n>>>>>> gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);\n>>>>>> \n>>>>>>> +        /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also\n>>>>>>> +           take an offset, but since we've already incorporated the offset\n>>>>>>> +           above, here we just pass in a zero.  */\n>>>>>>> +        g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,\n>>>>>>> +                                               build_int_cst (arg1_type, 0)));\n>>>>>> \n>>>>>> are you sure about arg1_type here?  I'm sure not.  For\n>>>>>> \n>>>>>> ... foo (struct S *p)\n>>>>>> {\n>>>>>> return __builtin_lvx_v2df (4, (double *)p);\n>>>>>> }\n>>>>>> \n>>>>>> you'd end up with p as arg1 and thus struct S * as arg1_type and thus\n>>>>>> TBAA using 'struct S' to access the memory.\n>>>>> \n>>>>> Hm, is that so?  Wouldn't arg1_type be double* since arg1 is (double *)p?\n>>>>> Will, you should probably test this example and see, but I'm pretty confident\n>>>>> about this (see below).\n>>>> \n>>>> But, as I should have suspected, you're right.  For some reason\n>>>> gimple_call_arg is returning p, stripped of the cast information where the\n>>>> user asserted that p points to a double*.\n>>>> \n>>>> Can you explain to me why this should be so?  I assume that somebody\n>>>> has decided to strip_nops the argument and lose the cast.\n>>> \n>>> pointer types have no meaning in GIMPLE so we aggressively prune them.\n>>> \n>>>> Using ptr_type_node loses all type information, so that would be a\n>>>> regression from what we do today.  In some cases we could reconstruct\n>>>> that this was necessarily, say, a double*, but I don't know how we would\n>>>> recover the signedness for an integer type.\n>>> \n>>> How did we handle the expansion previously - ah - it was done earlier\n>>> in the C FE.  So why are you moving it to GIMPLE?  The function is called\n>>> resolve_overloaded_builtin - what kind of overloading do you resolve here?\n>>> As said argument types might not be preserved.\n>> \n>> The AltiVec builtins allow overloaded names based on the argument types,\n>> using a special callout during parsing to convert the overloaded names to\n>> type-specific names.  Historically these have then remained builtin calls\n>> until RTL expansion, which loses a lot of useful optimization.  Will has been\n>> gradually implementing gimple folding for these builtins so that we can\n>> optimize simple vector arithmetic and so on.  The overloading is still dealt\n>> with during parsing.\n>> \n>> As an example:\n>> \n>>  double a[64];\n>>  vector double x = vec_ld (0, a);\n>> \n>> will get translated into\n>> \n>>  vector double x = __builtin_altivec_lvx_v2df (0, a);\n>> \n>> and\n>> \n>>  unsigned char b[64];\n>>  vector unsigned char y = vec_ld (0, b);\n>> \n>> will get translated into\n>> \n>>  vector unsigned char y = __builtin_altivec_lvx_v16qi (0, b);\n>> \n>> So in resolving the overloading we still maintain the type info for arg1.\n> \n> So TBAA-wise the vec_ld is specced to use alias-set zero for this case\n> as it loads from a unsinged char array?  Or is it alias-set zero because\n> the type of arg1 is unsigned char *?  What if the type of arg1 was\n> struct X *?\n> \n>> Earlier I had dealt with the performance issue in a different way for the\n>> vec_ld and vec_st overloaded builtins, which created the rather grotty\n>> code in rs6000-c.c to modify the parse trees instead.  My hope was that\n>> we could simplify the code by having Will deal with them as gimple folds\n>> instead.  But if in so doing we lose type information, that may not be the\n>> right call.\n>> \n>> However, since you say that gimple aggressively removes the casts\n>> from pointer types, perhaps the code that we see in early gimple from\n>> the existing method might also be missing the type information?  Will,\n>> it would be worth looking at that code to see.  If it's no different then\n>> perhaps we still go ahead with the folding.\n> \n> As I said you can't simply use the type of arg1 for the TBAA type.\n> You can conservatively use ptr_type_node (alias-set zero) or you\n> can use sth that you derive from the builtin used (is a supposedly\n> existing _v4si variant always subject to int * TBAA?)\n\nAfter thinking about this a while, I believe Will should use ptr_type_node\nhere.  I think anything we do to try to enforce some TBAA on these\npointer types will be fragile.  Supposedly a _v2df variant should point\nonly to [an array of] double or to a vector double, and parsing enforces\nthat at least they've cast to that so they assert they know what they're\ndoing.  Beyond that we needn't be too fussed if it's actually a\nstruct X * or the like.  We already have issues with \"vector long\" and\n\"vector long long\" being different types in theory but aliased together\nfor 64-bit because they are the same in practice.\n\nAs long as we are still commoning identical loads (which didn't used\nto happen before the parsing-level expansion was done), I'll be happy.\nWe can always revisit this later if we feel like refined TBAA would solve a\nconcrete problem.\n\nBill\n> \n>> Another note for Will:  The existing code gives up when -maltivec=be has\n>> been specified, and you probably want to do that as well.  That may be\n>> why you initially turned off big endian -- it is easy to misread that code.\n>> -maltivec=be is VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN.\n>> \n>> Thanks,\n>> Bill\n>>> \n>>> Richard.\n>>> \n>>>> Bill\n>>>>> \n>>>>>> \n>>>>>> I think if the builtins have any TBAA constraints you need to build those\n>>>>>> explicitely, if not, you should use ptr_type_node aka no TBAA.\n>>>>> \n>>>>> The type signatures are constrained during parsing, so we should only\n>>>>> see allowed pointer types on arg1 by the time we get to gimple folding.  I\n>>>>> think that using arg1_type should work, but I am probably missing\n>>>>> something subtle, so please feel free to whack me on the temple until\n>>>>> I get it. :-)\n>>>>> \n>>>>> Bill\n>>>>>> \n>>>>>> Richard.\n>>>>>> \n>>>>>>> +        gimple_set_location (g, loc);\n>>>>>>> +        gsi_replace (gsi, g, true);\n>>>>>>> +\n>>>>>>> +        return true;\n>>>>>>> +\n>>>>>>> +      }\n>>>>>>> +\n>>>>>>>  default:\n>>>>>>>     if (TARGET_DEBUG_BUILTIN)\n>>>>>>>        fprintf (stderr, \"gimple builtin intrinsic not matched:%d %s %s\\n\",\n>>>>>>>                 fn_code, fn_name1, fn_name2);\n>>>>>>>    break;","headers":{"Return-Path":"<gcc-patches-return-462246-incoming=patchwork.ozlabs.org@gcc.gnu.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","mailing list gcc-patches@gcc.gnu.org"],"Authentication-Results":["ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org\n\t(client-ip=209.132.180.131; helo=sourceware.org;\n\tenvelope-from=gcc-patches-return-462246-incoming=patchwork.ozlabs.org@gcc.gnu.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org\n\theader.b=\"BVtTpNnX\"; dkim-atps=neutral","sourceware.org; auth=none"],"Received":["from sourceware.org (server1.sourceware.org [209.132.180.131])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3xtwsr1gpxz9s7M\n\tfor <incoming@patchwork.ozlabs.org>;\n\tFri, 15 Sep 2017 23:17:04 +1000 (AEST)","(qmail 89915 invoked by alias); 15 Sep 2017 13:16:56 -0000","(qmail 88794 invoked by uid 89); 15 Sep 2017 13:16:55 -0000","from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)\n\t(148.163.156.1) by sourceware.org\n\t(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;\n\tFri, 15 Sep 2017 13:16:52 +0000","from pps.filterd (m0098404.ppops.net [127.0.0.1])\tby\n\tmx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id\n\tv8FDGCFm085047\tfor <gcc-patches@gcc.gnu.org>;\n\tFri, 15 Sep 2017 09:16:50 -0400","from e18.ny.us.ibm.com (e18.ny.us.ibm.com [129.33.205.208])\tby\n\tmx0a-001b2d01.pphosted.com with ESMTP id\n\t2d0cn38u2m-1\t(version=TLSv1.2 cipher=AES256-SHA bits=256\n\tverify=NOT)\tfor <gcc-patches@gcc.gnu.org>;\n\tFri, 15 Sep 2017 09:16:50 -0400","from localhost\tby e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway:\n\tAuthorized Use Only! Violators will be prosecuted\tfor\n\t<gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;\n\tFri, 15 Sep 2017 09:16:49 -0400","from b01cxnp23034.gho.pok.ibm.com (9.57.198.29)\tby\n\te18.ny.us.ibm.com (146.89.104.205) with IBM ESMTP SMTP\n\tGateway: Authorized Use Only! Violators will be prosecuted;\n\tFri, 15 Sep 2017 09:16:46 -0400","from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com\n\t[9.57.199.106])\tby b01cxnp23034.gho.pok.ibm.com\n\t(8.14.9/8.14.9/NCO v10.0) with ESMTP id v8FDGka241615406;\n\tFri, 15 Sep 2017 13:16:46 GMT","from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1])\tby IMSVA\n\t(Postfix) with ESMTP id 85D8028048;\n\tFri, 15 Sep 2017 09:16:39 -0400 (EDT)","from bigmac.rchland.ibm.com (unknown [9.10.86.143])\tby\n\tb01ledav001.gho.pok.ibm.com (Postfix) with ESMTPS id\n\t2ECF42803D; Fri, 15 Sep 2017 09:16:39 -0400 (EDT)"],"DomainKey-Signature":"a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; q=dns; s=\n\tdefault; b=KAwpEJzcnI9TAkCM200T2htSRU7/Qo140jWoCmePpmi0bBCMNJx8C\n\ttNG+L5FP6Wy+weepSc4csnCnk7AOOjfJNXJaLy7N6wamTHdxBQaPk8SATYffLom7\n\t6H85+8nFdU1W4mBi3O0OCM2XXIm1/jabQ/SfncuzKhBfJUODzYE43I=","DKIM-Signature":"v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id\n\t:list-unsubscribe:list-archive:list-post:list-help:sender\n\t:content-type:mime-version:subject:from:in-reply-to:date:cc\n\t:content-transfer-encoding:references:to:message-id; s=default;\n\tbh=gdFU+f/MuUcvQsyS6aNxBh4Se2w=; b=BVtTpNnXic+oaLAideXHpEK4RFKG\n\tgutz/UcQAaeaA+bLCOcUvLfrwrx1nJxF4h+YEVlxEG/huIu9ItenYchfReSMqIuv\n\taCmzy7xj02KzcQqlTjH99VYpTEhyCCrBzJIjqmliYNLdnElO2ZagRaMtSudcgnm2\n\tb9IjawZAW4q7YZ8=","Mailing-List":"contact gcc-patches-help@gcc.gnu.org; run by ezmlm","Precedence":"bulk","List-Id":"<gcc-patches.gcc.gnu.org>","List-Unsubscribe":"<mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>","List-Archive":"<http://gcc.gnu.org/ml/gcc-patches/>","List-Post":"<mailto:gcc-patches@gcc.gnu.org>","List-Help":"<mailto:gcc-patches-help@gcc.gnu.org>","Sender":"gcc-patches-owner@gcc.gnu.org","X-Virus-Found":"No","X-Spam-SWARE-Status":"No, score=-24.7 required=5.0 tests=AWL, BAYES_00,\n\tGIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,\n\tKAM_LAZY_DOMAIN_SECURITY,\n\tRCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=","X-HELO":"mx0a-001b2d01.pphosted.com","Content-Type":"text/plain; charset=us-ascii","Mime-Version":"1.0 (Mac OS X Mail 10.3 \\(3273\\))","Subject":"Re: [PATCH, rs6000] [v2] Folding of vector loads in GIMPLE","From":"Bill Schmidt <wschmidt@linux.vnet.ibm.com>","In-Reply-To":"<CAFiYyc3qisHPi62SN=PX8qa4gTzMuXTnUOSTd5dd4RhnZqA9Tg@mail.gmail.com>","Date":"Fri, 15 Sep 2017 08:16:45 -0500","Cc":"will_schmidt@vnet.ibm.com, GCC Patches <gcc-patches@gcc.gnu.org>,\n\tSegher Boessenkool <segher@kernel.crashing.org>,\n\tDavid Edelsohn <dje.gcc@gmail.com>","Content-Transfer-Encoding":"quoted-printable","References":"<1505227262.14827.155.camel@brimstone.rchland.ibm.com>\n\t<1505250505.14827.191.camel@brimstone.rchland.ibm.com>\n\t<CAFiYyc18DrEDhKH1eeTFDq7r=JOE0aHVjZqMGNkf8+DPA5O23w@mail.gmail.com>\n\t<24A6439E-6268-44F8-92CE-EF0F22DA5773@linux.vnet.ibm.com>\n\t<B1261C39-9D9D-4D34-A65F-FC48BC88CEF9@linux.vnet.ibm.com>\n\t<CAFiYyc05e7Pg-9CxGY=czF66AMEJviNXy2d9UY-ty5NKgP-N2w@mail.gmail.com>\n\t<73D3C195-E029-4050-9764-57C07845DBEB@linux.vnet.ibm.com>\n\t<CAFiYyc3qisHPi62SN=PX8qa4gTzMuXTnUOSTd5dd4RhnZqA9Tg@mail.gmail.com>","To":"Richard Biener <richard.guenther@gmail.com>","X-TM-AS-GCONF":"00","x-cbid":"17091513-0044-0000-0000-0000038FED41","X-IBM-SpamModules-Scores":"","X-IBM-SpamModules-Versions":"BY=3.00007742; HX=3.00000241; KW=3.00000007;\n\tPH=3.00000004; SC=3.00000228; SDB=6.00917389; UDB=6.00460768;\n\tIPR=6.00697616; BA=6.00005589; NDR=6.00000001; ZLA=6.00000005;\n\tZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;\n\tZU=6.00000002; MB=3.00017168; XFM=3.00000015;\n\tUTC=2017-09-15 13:16:48","X-IBM-AV-DETECTION":"SAVI=unused REMOTE=unused XFE=unused","x-cbparentid":"17091513-0045-0000-0000-000007BEEE40","Message-Id":"<0B9CF26C-8F9E-4C4E-8F42-50B99DA3D7B8@linux.vnet.ibm.com>","X-Proofpoint-Virus-Version":"vendor=fsecure engine=2.50.10432:, ,\n\tdefinitions=2017-09-15_05:, , signatures=0","X-Proofpoint-Spam-Details":"rule=outbound_notspam policy=outbound score=0\n\tspamscore=0 suspectscore=0 malwarescore=0 phishscore=0\n\tadultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx\n\tscancount=1 engine=8.0.1-1707230000\n\tdefinitions=main-1709150193","X-IsSubscribed":"yes"}}]