Fix PR49642 in 4.6, questions about 4.7

Message ID	1326233702.13203.24.camel@gnopaine
State	New
Headers	show Return-Path: <gcc-patches-return-311205-incoming=patchwork.ozlabs.org@gcc.gnu.org> Comment: DKIM? See http://www.dkim.org Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Received:Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References:Content-Type:Content-Transfer-Encoding:Mime-Version:x-cbid:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=dRDgabO/0uQRoRdu3eWK5NSD3vyGSjqpKY5xwD4S3lIn95SRzdiU4p5HZkkDmX fWd3HfdDvY+R7cmyiZNj6iAJx3gr0D8MZ2QtzAIQHTDoTWMMfqc1aA6L77jgnZWE eVQk6OsDb/l9xQmiv5pABAYRJmrTaH0vVJ8PDj5XgK+FA=; Gateway: Authorized Use Only! Violators will be prosecuted for <gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>; Tue, 10 Jan 2012 15:14:59 -0700 Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 10 Jan 2012 15:14:58 -0700 Message-ID: <1326233702.13203.24.camel@gnopaine> Subject: Re: [PATCH] Fix PR49642 in 4.6, questions about 4.7 From: "William J. Schmidt" <wschmidt@linux.vnet.ibm.com> To: Richard Guenther <richard.guenther@gmail.com> Cc: gcc-patches@gcc.gnu.org, bergner@vnet.ibm.com, Jan Hubicka <hubicka@ucw.cz> Date: Tue, 10 Jan 2012 16:15:02 -0600 In-Reply-To: <1326210179.13203.19.camel@gnopaine> References: <1326203006.13203.10.camel@gnopaine> <CAFiYyc0zQriiGoyWe_1j1AAxyFKwhcQyRr0hPLBkmUuAvgrgMQ@mail.gmail.com> <1326210179.13203.19.camel@gnopaine> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org

Message ID

1326233702.13203.24.camel@gnopaine

State

New

Headers

Comment: DKIM? See http://www.dkim.org
Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org;
	h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Received:Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References:Content-Type:Content-Transfer-Encoding:Mime-Version:x-cbid:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To;
	b=dRDgabO/0uQRoRdu3eWK5NSD3vyGSjqpKY5xwD4S3lIn95SRzdiU4p5HZkkDmX
	fWd3HfdDvY+R7cmyiZNj6iAJx3gr0D8MZ2QtzAIQHTDoTWMMfqc1aA6L77jgnZWE
	eVQk6OsDb/l9xQmiv5pABAYRJmrTaH0vVJ8PDj5XgK+FA=;
Message-ID: <1326233702.13203.24.camel@gnopaine>
Subject: Re: [PATCH] Fix PR49642 in 4.6, questions about 4.7
From: "William J. Schmidt" <wschmidt@linux.vnet.ibm.com>
To: Richard Guenther <richard.guenther@gmail.com>
Cc: gcc-patches@gcc.gnu.org, bergner@vnet.ibm.com,
	Jan Hubicka <hubicka@ucw.cz>
Date: Tue, 10 Jan 2012 16:15:02 -0600
In-Reply-To: <1326210179.13203.19.camel@gnopaine>
References: <1326203006.13203.10.camel@gnopaine>	
	<CAFiYyc0zQriiGoyWe_1j1AAxyFKwhcQyRr0hPLBkmUuAvgrgMQ@mail.gmail.com>	
	<1326210179.13203.19.camel@gnopaine>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org

Commit Message

Bill Schmidt Jan. 10, 2012, 10:15 p.m. UTC

On Tue, 2012-01-10 at 09:42 -0600, William J. Schmidt wrote:
> 
> On Tue, 2012-01-10 at 14:53 +0100, Richard Guenther wrote:
> > 
> > Btw, this will also disqualify any point below
> > 
> >  if (__builtin_constant_p (...))
> >    {
> >      ...
> >    }
> > 
> > because after the if join all BBs are dominated by the __builtin_constant_p
> > call.  What we want to disallow is splitting at a block that is dominated
> > by the true edge of the condition fed by the __builtin_constant_p result ...
> 
> True.  What we have is:
> 
>   D.1899_68 = __builtin_constant_p (D.1898_67);
>   if (D.1899_68 != 0)
>     goto <bb 3>;
>   else
>     goto <bb 133>;
> 
> So I suppose we have to walk the immediate uses of the LHS of the call,
> find all that are part of a condition, and mark the target block for
> nonzero (in this case <bb 3>) as a forbidden dominator.  I can tighten
> this up.

Here's a revised patch for 4.6, following the above.  The same patch
applies to 4.7, if desired, optionally with an additional variation on
the test case to add -fno-tree-fre to the compile step.

Bootstrapped and regression tested on powerpc64-linux-gnu.  OK for
4.6/trunk?

Thanks,
Bill


gcc:

2012-01-10  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/49642
	* ipa-split.c (forbidden_dominators): New variable.
	(check_forbidden_calls): New function.
	(dominated_by_forbidden): Likewise.
	(consider_split): Check for forbidden dominators.
	(execute_split_functions): Initialize and free forbidden
	dominators info; call check_forbidden_calls.

gcc/testsuite:

2012-01-10  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/49642
	* gcc.dg/tree-ssa/pr49642.c: New test.

Comments

Richard Biener Jan. 11, 2012, 9:57 a.m. UTC | #1

On Tue, Jan 10, 2012 at 11:15 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> On Tue, 2012-01-10 at 09:42 -0600, William J. Schmidt wrote:
>>
>> On Tue, 2012-01-10 at 14:53 +0100, Richard Guenther wrote:
>> >
>> > Btw, this will also disqualify any point below
>> >
>> >  if (__builtin_constant_p (...))
>> >    {
>> >      ...
>> >    }
>> >
>> > because after the if join all BBs are dominated by the __builtin_constant_p
>> > call.  What we want to disallow is splitting at a block that is dominated
>> > by the true edge of the condition fed by the __builtin_constant_p result ...
>>
>> True.  What we have is:
>>
>>   D.1899_68 = __builtin_constant_p (D.1898_67);
>>   if (D.1899_68 != 0)
>>     goto <bb 3>;
>>   else
>>     goto <bb 133>;
>>
>> So I suppose we have to walk the immediate uses of the LHS of the call,
>> find all that are part of a condition, and mark the target block for
>> nonzero (in this case <bb 3>) as a forbidden dominator.  I can tighten
>> this up.
>
> Here's a revised patch for 4.6, following the above.  The same patch
> applies to 4.7, if desired, optionally with an additional variation on
> the test case to add -fno-tree-fre to the compile step.
>
> Bootstrapped and regression tested on powerpc64-linux-gnu.  OK for
> 4.6/trunk?

Ok for trunk (and for 4.6 after a while without problems on trunk) with ...

> Thanks,
> Bill
>
>
> gcc:
>
> 2012-01-10  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>        PR tree-optimization/49642
>        * ipa-split.c (forbidden_dominators): New variable.
>        (check_forbidden_calls): New function.
>        (dominated_by_forbidden): Likewise.
>        (consider_split): Check for forbidden dominators.
>        (execute_split_functions): Initialize and free forbidden
>        dominators info; call check_forbidden_calls.
>
> gcc/testsuite:
>
> 2012-01-10  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>        PR tree-optimization/49642
>        * gcc.dg/tree-ssa/pr49642.c: New test.
>
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr49642.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr49642.c     (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr49642.c     (revision 0)
> @@ -0,0 +1,49 @@
> +/* Verify that ipa-split is disabled following __builtin_constant_p.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +typedef unsigned int u32;
> +typedef unsigned long long u64;
> +
> +static inline __attribute__((always_inline)) __attribute__((const))
> +int __ilog2_u32(u32 n)
> +{
> + int bit;
> + asm ("cntlzw %0,%1" : "=r" (bit) : "r" (n));
> + return 31 - bit;
> +}
> +
> +
> +static inline __attribute__((always_inline)) __attribute__((const))
> +int __ilog2_u64(u64 n)
> +{
> + int bit;
> + asm ("cntlzd %0,%1" : "=r" (bit) : "r" (n));
> + return 63 - bit;
> +}
> +
> +
> +
> +static u64 ehca_map_vaddr(void *caddr);
> +
> +struct ehca_shca {
> +        u32 hca_cap_mr_pgsize;
> +};
> +
> +static u64 ehca_get_max_hwpage_size(struct ehca_shca *shca)
> +{
> + return 1UL << ( __builtin_constant_p(shca->hca_cap_mr_pgsize) ? ( (shca->hca_cap_mr_pgsize) < 1 ? ____ilog2_NaN() : (shca->hca_cap_mr_pgsize) & (1ULL << 63) ? 63 : (shca->hca_cap_mr_pgsize) & (1ULL << 62) ? 62 : (shca->hca_cap_mr_pgsize) & (1ULL << 61) ? 61 : (shca->hca_cap_mr_pgsize) & (1ULL << 60) ? 60 : (shca->hca_cap_mr_pgsize) & (1ULL << 59) ? 59 : (shca->hca_cap_mr_pgsize) & (1ULL << 58) ? 58 : (shca->hca_cap_mr_pgsize) & (1ULL << 57) ? 57 : (shca->hca_cap_mr_pgsize) & (1ULL << 56) ? 56 : (shca->hca_cap_mr_pgsize) & (1ULL << 55) ? 55 : (shca->hca_cap_mr_pgsize) & (1ULL << 54) ? 54 : (shca->hca_cap_mr_pgsize) & (1ULL << 53) ? 53 : (shca->hca_cap_mr_pgsize) & (1ULL << 52) ? 52 : (shca->hca_cap_mr_pgsize) & (1ULL << 51) ? 51 : (shca->hca_cap_mr_pgsize) & (1ULL << 50) ? 50 : (shca->hca_cap_mr_pgsize) & (1ULL << 49) ? 49 : (shca->hca_cap_mr_pgsize) & (1ULL << 48) ? 48 : (shca->hca_cap_mr_pgsize) & (1ULL << 47) ? 47 : (shca->hca_cap_mr_pgsize) & (1ULL << 46) ? 46 : (shca->hca_cap_mr_pgsize) & (1ULL << 45) ? 45 : (shca->hca_cap_mr_pgsize) & (1ULL << 44) ? 44 : (shca->hca_cap_mr_pgsize) & (1ULL << 43) ? 43 : (shca->hca_cap_mr_pgsize) & (1ULL << 42) ? 42 : (shca->hca_cap_mr_pgsize) & (1ULL << 41) ? 41 : (shca->hca_cap_mr_pgsize) & (1ULL << 40) ? 40 : (shca->hca_cap_mr_pgsize) & (1ULL << 39) ? 39 : (shca->hca_cap_mr_pgsize) & (1ULL << 38) ? 38 : (shca->hca_cap_mr_pgsize) & (1ULL << 37) ? 37 : (shca->hca_cap_mr_pgsize) & (1ULL << 36) ? 36 : (shca->hca_cap_mr_pgsize) & (1ULL << 35) ? 35 : (shca->hca_cap_mr_pgsize) & (1ULL << 34) ? 34 : (shca->hca_cap_mr_pgsize) & (1ULL << 33) ? 33 : (shca->hca_cap_mr_pgsize) & (1ULL << 32) ? 32 : (shca->hca_cap_mr_pgsize) & (1ULL << 31) ? 31 : (shca->hca_cap_mr_pgsize) & (1ULL << 30) ? 30 : (shca->hca_cap_mr_pgsize) & (1ULL << 29) ? 29 : (shca->hca_cap_mr_pgsize) & (1ULL << 28) ? 28 : (shca->hca_cap_mr_pgsize) & (1ULL << 27) ? 27 : (shca->hca_cap_mr_pgsize) & (1ULL << 26) ? 26 : (shca->hca_cap_mr_pgsize) & (1ULL << 25) ? 25 : (shca->hca_cap_mr_pgsize) & (1ULL << 24) ? 24 : (shca->hca_cap_mr_pgsize) & (1ULL << 23) ? 23 : (shca->hca_cap_mr_pgsize) & (1ULL << 22) ? 22 : (shca->hca_cap_mr_pgsize) & (1ULL << 21) ? 21 : (shca->hca_cap_mr_pgsize) & (1ULL << 20) ? 20 : (shca->hca_cap_mr_pgsize) & (1ULL << 19) ? 19 : (shca->hca_cap_mr_pgsize) & (1ULL << 18) ? 18 : (shca->hca_cap_mr_pgsize) & (1ULL << 17) ? 17 : (shca->hca_cap_mr_pgsize) & (1ULL << 16) ? 16 : (shca->hca_cap_mr_pgsize) & (1ULL << 15) ? 15 : (shca->hca_cap_mr_pgsize) & (1ULL << 14) ? 14 : (shca->hca_cap_mr_pgsize) & (1ULL << 13) ? 13 : (shca->hca_cap_mr_pgsize) & (1ULL << 12) ? 12 : (shca->hca_cap_mr_pgsize) & (1ULL << 11) ? 11 : (shca->hca_cap_mr_pgsize) & (1ULL << 10) ? 10 : (shca->hca_cap_mr_pgsize) & (1ULL << 9) ? 9 : (shca->hca_cap_mr_pgsize) & (1ULL << 8) ? 8 : (shca->hca_cap_mr_pgsize) & (1ULL << 7) ? 7 : (shca->hca_cap_mr_pgsize) & (1ULL << 6) ? 6 : (shca->hca_cap_mr_pgsize) & (1ULL << 5) ? 5 : (shca->hca_cap_mr_pgsize) & (1ULL << 4) ? 4 : (shca->hca_cap_mr_pgsize) & (1ULL << 3) ? 3 : (shca->hca_cap_mr_pgsize) & (1ULL << 2) ? 2 : (shca->hca_cap_mr_pgsize) & (1ULL << 1) ? 1 : (shca->hca_cap_mr_pgsize) & (1ULL << 0) ? 0 : ____ilog2_NaN() ) : (sizeof(shca->hca_cap_mr_pgsize) <= 4) ? __ilog2_u32(shca->hca_cap_mr_pgsize) : __ilog2_u64(shca->hca_cap_mr_pgsize) );
> +}
> +
> +int x(struct ehca_shca *shca) {
> +        return ehca_get_max_hwpage_size(shca);
> +}
> +
> +int y(struct ehca_shca *shca)
> +{
> +        return ehca_get_max_hwpage_size(shca);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "____ilog2_NaN" 0 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/ipa-split.c
> ===================================================================
> --- gcc/ipa-split.c     (revision 183033)
> +++ gcc/ipa-split.c     (working copy)
> @@ -130,6 +130,10 @@ struct split_point
>
>  struct split_point best_split_point;
>
> +/* Set of basic blocks that are not allowed to dominate a split point.  */
> +
> +bitmap forbidden_dominators;
> +

Make that static

>  static tree find_retval (basic_block return_bb);
>
>  /* Callback for walk_stmt_load_store_addr_ops.  If T is non-SSA automatic
> @@ -270,6 +274,104 @@ done:
>   return ok;
>  }
>
> +/* If STMT is a call, check the callee against a list of forbidden
> +   predicate functions.  If a match is found, look for uses of the
> +   call result in condition statements that compare against zero.
> +   For each such use, find the block targeted by the condition
> +   statement for the nonzero result, and set the bit for this block
> +   in the forbidden dominators bitmap.  The purpose of this is to avoid
> +   selecting a split point where we are likely to lose the chance
> +   to optimize away an unused function call.  */
> +
> +static void
> +check_forbidden_calls (gimple stmt)
> +{
> +  tree fndecl;
> +
> +  if (!is_gimple_call (stmt))
> +    return;
> +
> +  fndecl = gimple_call_fndecl (stmt);
> +
> +  if (fndecl
> +      && TREE_CODE (fndecl) == FUNCTION_DECL

Not needed.

> +      && DECL_BUILT_IN (fndecl)

DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL

> +      /* At the moment, __builtin_constant_p is the only forbidden
> +        predicate function call (see PR49642).  */
> +      && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CONSTANT_P)
> +    {
> +      imm_use_iterator use_iter;
> +      use_operand_p use_p;
> +      tree lhs = gimple_call_lhs (stmt);
> +
> +      if (!lhs || TREE_CODE (lhs) != SSA_NAME)
> +       return;
> +
> +      FOR_EACH_IMM_USE_FAST (use_p, use_iter, lhs)
> +       {
> +         tree op0, op1;
> +         basic_block use_bb, forbidden_bb;
> +         enum tree_code code;
> +         edge true_edge, false_edge;
> +         gimple use_stmt = USE_STMT (use_p);
> +
> +         if (gimple_code (use_stmt) != GIMPLE_COND)
> +           continue;
> +
> +         op0 = gimple_cond_lhs (use_stmt);
> +         op1 = gimple_cond_rhs (use_stmt);
> +         code = gimple_cond_code (use_stmt);
> +
> +         use_bb = gimple_bb (use_stmt);
> +
> +         if (!use_bb)
> +           continue;

That check is not needed.

> +
> +         extract_true_false_edges_from_block (use_bb, &true_edge, &false_edge);
> +
> +         /* Swap operands if LHS doesn't come first.  We don't need
> +            to change the code or edges since all we care about is
> +            inequality with zero.  */
> +         if (!operand_equal_p (lhs, op0, 0))

As lhs is an SSA_NAME you can use a pointer equality check, lhs != op0.
Btw, we _should_ have operands in canonical form here - a constant will
be 2nd, thus, as you check that op1 is zero op0 should always be the lhs
(iff op1 is zero).  So you should be able to remove this check completely.

Thanks,
Richard.

> +           {
> +             tree swapper = op0;
> +             op0 = op1;
> +             op1 = swapper;
> +           }
> +
> +         /* We're only interested in comparisons that distinguish
> +            unambiguously from zero.  */
> +         if (!integer_zerop (op1) || code == LE_EXPR || code == GE_EXPR)
> +           continue;
> +
> +         if (code == EQ_EXPR)
> +           forbidden_bb = false_edge->dest;
> +         else
> +           forbidden_bb = true_edge->dest;
> +
> +         bitmap_set_bit (forbidden_dominators, forbidden_bb->index);
> +       }
> +    }
> +}
> +
> +/* If BB is dominated by any block in the forbidden dominators set,
> +   return TRUE; else FALSE.  */
> +
> +static bool
> +dominated_by_forbidden (basic_block bb)
> +{
> +  unsigned dom_bb;
> +  bitmap_iterator bi;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (forbidden_dominators, 1, dom_bb, bi)
> +    {
> +      if (dominated_by_p (CDI_DOMINATORS, bb, BASIC_BLOCK (dom_bb)))
> +       return true;
> +    }
> +
> +  return false;
> +}
> +
>  /* We found an split_point CURRENT.  NON_SSA_VARS is bitmap of all non ssa
>    variables used and RETURN_BB is return basic block.
>    See if we can split function here.  */
> @@ -411,6 +513,18 @@ consider_split (struct split_point *current, bitma
>                 "  Refused: split part has non-ssa uses\n");
>       return;
>     }
> +
> +  /* If the split point is dominated by a forbidden block, reject
> +     the split.  */
> +  if (!bitmap_empty_p (forbidden_dominators)
> +      && dominated_by_forbidden (current->entry_bb))
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file,
> +                "  Refused: split point dominated by forbidden block\n");
> +      return;
> +    }
> +
>   /* See if retval used by return bb is computed by header or split part.
>      When it is computed by split part, we need to produce return statement
>      in the split part and add code to header to pass it around.
> @@ -1329,6 +1443,10 @@ execute_split_functions (void)
>       return 0;
>     }
>
> +  /* Initialize bitmap to track forbidden calls.  */
> +  forbidden_dominators = BITMAP_ALLOC (NULL);
> +  calculate_dominance_info (CDI_DOMINATORS);
> +
>   /* Compute local info about basic blocks and determine function size/time.  */
>   VEC_safe_grow_cleared (bb_info, heap, bb_info_vec, last_basic_block + 1);
>   memset (&best_split_point, 0, sizeof (best_split_point));
> @@ -1350,6 +1468,7 @@ execute_split_functions (void)
>          this_time = estimate_num_insns (stmt, &eni_time_weights) * freq;
>          size += this_size;
>          time += this_time;
> +         check_forbidden_calls (stmt);
>
>          if (dump_file && (dump_flags & TDF_DETAILS))
>            {
> @@ -1371,6 +1490,7 @@ execute_split_functions (void)
>       BITMAP_FREE (best_split_point.split_bbs);
>       todo = TODO_update_ssa | TODO_cleanup_cfg;
>     }
> +  BITMAP_FREE (forbidden_dominators);
>   VEC_free (bb_info, heap, bb_info_vec);
>   bb_info_vec = NULL;
>   return todo;
>
>

Jakub Jelinek Jan. 11, 2012, 12:11 p.m. UTC | #2

On Wed, Jan 11, 2012 at 10:57:28AM +0100, Richard Guenther wrote:
> > +  tree fndecl;
> > +
> > +  if (!is_gimple_call (stmt))
> > +    return;
> > +
> > +  fndecl = gimple_call_fndecl (stmt);
> > +
> > +  if (fndecl
> > +      && TREE_CODE (fndecl) == FUNCTION_DECL
> 
> Not needed.
> 
> > +      && DECL_BUILT_IN (fndecl)
> 
> DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
> 
> > +      /* At the moment, __builtin_constant_p is the only forbidden
> > +        predicate function call (see PR49642).  */
> > +      && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CONSTANT_P)

Or better yet
  if (!gimple_call_builtin_p (stmt, BUILT_IN_CONSTANT_P))
    return;
instead of all the above.

	Jakub

Bill Schmidt Jan. 11, 2012, 1:22 p.m. UTC | #3

Thanks for all the help!  I'll get this tested and committed to trunk
today.

On Wed, 2012-01-11 at 13:11 +0100, Jakub Jelinek wrote:
> On Wed, Jan 11, 2012 at 10:57:28AM +0100, Richard Guenther wrote:
> > > +  tree fndecl;
> > > +
> > > +  if (!is_gimple_call (stmt))
> > > +    return;
> > > +
> > > +  fndecl = gimple_call_fndecl (stmt);
> > > +
> > > +  if (fndecl
> > > +      && TREE_CODE (fndecl) == FUNCTION_DECL
> > 
> > Not needed.
> > 
> > > +      && DECL_BUILT_IN (fndecl)
> > 
> > DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
> > 
> > > +      /* At the moment, __builtin_constant_p is the only forbidden
> > > +        predicate function call (see PR49642).  */
> > > +      && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CONSTANT_P)
> 
> Or better yet
>   if (!gimple_call_builtin_p (stmt, BUILT_IN_CONSTANT_P))
>     return;
> instead of all the above.
> 
> 	Jakub
>

Index: gcc/testsuite/gcc.dg/tree-ssa/pr49642.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr49642.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr49642.c	(revision 0)
@@ -0,0 +1,49 @@ 
+/* Verify that ipa-split is disabled following __builtin_constant_p.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+typedef unsigned int u32;
+typedef unsigned long long u64;
+
+static inline __attribute__((always_inline)) __attribute__((const))
+int __ilog2_u32(u32 n)
+{
+ int bit;
+ asm ("cntlzw %0,%1" : "=r" (bit) : "r" (n));
+ return 31 - bit;
+}
+
+
+static inline __attribute__((always_inline)) __attribute__((const))
+int __ilog2_u64(u64 n)
+{
+ int bit;
+ asm ("cntlzd %0,%1" : "=r" (bit) : "r" (n));
+ return 63 - bit;
+}
+
+
+
+static u64 ehca_map_vaddr(void *caddr);
+
+struct ehca_shca {
+        u32 hca_cap_mr_pgsize;
+};
+
+static u64 ehca_get_max_hwpage_size(struct ehca_shca *shca)
+{
+ return 1UL << ( __builtin_constant_p(shca->hca_cap_mr_pgsize) ? ( (shca->hca_cap_mr_pgsize) < 1 ? ____ilog2_NaN() : (shca->hca_cap_mr_pgsize) & (1ULL << 63) ? 63 : (shca->hca_cap_mr_pgsize) & (1ULL << 62) ? 62 : (shca->hca_cap_mr_pgsize) & (1ULL << 61) ? 61 : (shca->hca_cap_mr_pgsize) & (1ULL << 60) ? 60 : (shca->hca_cap_mr_pgsize) & (1ULL << 59) ? 59 : (shca->hca_cap_mr_pgsize) & (1ULL << 58) ? 58 : (shca->hca_cap_mr_pgsize) & (1ULL << 57) ? 57 : (shca->hca_cap_mr_pgsize) & (1ULL << 56) ? 56 : (shca->hca_cap_mr_pgsize) & (1ULL << 55) ? 55 : (shca->hca_cap_mr_pgsize) & (1ULL << 54) ? 54 : (shca->hca_cap_mr_pgsize) & (1ULL << 53) ? 53 : (shca->hca_cap_mr_pgsize) & (1ULL << 52) ? 52 : (shca->hca_cap_mr_pgsize) & (1ULL << 51) ? 51 : (shca->hca_cap_mr_pgsize) & (1ULL << 50) ? 50 : (shca->hca_cap_mr_pgsize) & (1ULL << 49) ? 49 : (shca->hca_cap_mr_pgsize) & (1ULL << 48) ? 48 : (shca->hca_cap_mr_pgsize) & (1ULL << 47) ? 47 : (shca->hca_cap_mr_pgsize) & (1ULL << 46) ? 46 : (shca->hca_cap_mr_pgsize) & (1ULL << 45) ? 45 : (shca->hca_cap_mr_pgsize) & (1ULL << 44) ? 44 : (shca->hca_cap_mr_pgsize) & (1ULL << 43) ? 43 : (shca->hca_cap_mr_pgsize) & (1ULL << 42) ? 42 : (shca->hca_cap_mr_pgsize) & (1ULL << 41) ? 41 : (shca->hca_cap_mr_pgsize) & (1ULL << 40) ? 40 : (shca->hca_cap_mr_pgsize) & (1ULL << 39) ? 39 : (shca->hca_cap_mr_pgsize) & (1ULL << 38) ? 38 : (shca->hca_cap_mr_pgsize) & (1ULL << 37) ? 37 : (shca->hca_cap_mr_pgsize) & (1ULL << 36) ? 36 : (shca->hca_cap_mr_pgsize) & (1ULL << 35) ? 35 : (shca->hca_cap_mr_pgsize) & (1ULL << 34) ? 34 : (shca->hca_cap_mr_pgsize) & (1ULL << 33) ? 33 : (shca->hca_cap_mr_pgsize) & (1ULL << 32) ? 32 : (shca->hca_cap_mr_pgsize) & (1ULL << 31) ? 31 : (shca->hca_cap_mr_pgsize) & (1ULL << 30) ? 30 : (shca->hca_cap_mr_pgsize) & (1ULL << 29) ? 29 : (shca->hca_cap_mr_pgsize) & (1ULL << 28) ? 28 : (shca->hca_cap_mr_pgsize) & (1ULL << 27) ? 27 : (shca->hca_cap_mr_pgsize) & (1ULL << 26) ? 26 : (shca->hca_cap_mr_pgsize) & (1ULL << 25) ? 25 : (shca->hca_cap_mr_pgsize) & (1ULL << 24) ? 24 : (shca->hca_cap_mr_pgsize) & (1ULL << 23) ? 23 : (shca->hca_cap_mr_pgsize) & (1ULL << 22) ? 22 : (shca->hca_cap_mr_pgsize) & (1ULL << 21) ? 21 : (shca->hca_cap_mr_pgsize) & (1ULL << 20) ? 20 : (shca->hca_cap_mr_pgsize) & (1ULL << 19) ? 19 : (shca->hca_cap_mr_pgsize) & (1ULL << 18) ? 18 : (shca->hca_cap_mr_pgsize) & (1ULL << 17) ? 17 : (shca->hca_cap_mr_pgsize) & (1ULL << 16) ? 16 : (shca->hca_cap_mr_pgsize) & (1ULL << 15) ? 15 : (shca->hca_cap_mr_pgsize) & (1ULL << 14) ? 14 : (shca->hca_cap_mr_pgsize) & (1ULL << 13) ? 13 : (shca->hca_cap_mr_pgsize) & (1ULL << 12) ? 12 : (shca->hca_cap_mr_pgsize) & (1ULL << 11) ? 11 : (shca->hca_cap_mr_pgsize) & (1ULL << 10) ? 10 : (shca->hca_cap_mr_pgsize) & (1ULL << 9) ? 9 : (shca->hca_cap_mr_pgsize) & (1ULL << 8) ? 8 : (shca->hca_cap_mr_pgsize) & (1ULL << 7) ? 7 : (shca->hca_cap_mr_pgsize) & (1ULL << 6) ? 6 : (shca->hca_cap_mr_pgsize) & (1ULL << 5) ? 5 : (shca->hca_cap_mr_pgsize) & (1ULL << 4) ? 4 : (shca->hca_cap_mr_pgsize) & (1ULL << 3) ? 3 : (shca->hca_cap_mr_pgsize) & (1ULL << 2) ? 2 : (shca->hca_cap_mr_pgsize) & (1ULL << 1) ? 1 : (shca->hca_cap_mr_pgsize) & (1ULL << 0) ? 0 : ____ilog2_NaN() ) : (sizeof(shca->hca_cap_mr_pgsize) <= 4) ? __ilog2_u32(shca->hca_cap_mr_pgsize) : __ilog2_u64(shca->hca_cap_mr_pgsize) );
+}
+
+int x(struct ehca_shca *shca) {
+        return ehca_get_max_hwpage_size(shca);
+}
+
+int y(struct ehca_shca *shca)
+{
+        return ehca_get_max_hwpage_size(shca);
+}
+
+/* { dg-final { scan-tree-dump-times "____ilog2_NaN" 0 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/ipa-split.c
===================================================================
--- gcc/ipa-split.c	(revision 183033)
+++ gcc/ipa-split.c	(working copy)
@@ -130,6 +130,10 @@  struct split_point
 
 struct split_point best_split_point;
 
+/* Set of basic blocks that are not allowed to dominate a split point.  */
+
+bitmap forbidden_dominators;
+
 static tree find_retval (basic_block return_bb);
 
 /* Callback for walk_stmt_load_store_addr_ops.  If T is non-SSA automatic
@@ -270,6 +274,104 @@  done:
   return ok;
 }
 
+/* If STMT is a call, check the callee against a list of forbidden
+   predicate functions.  If a match is found, look for uses of the
+   call result in condition statements that compare against zero.
+   For each such use, find the block targeted by the condition
+   statement for the nonzero result, and set the bit for this block
+   in the forbidden dominators bitmap.  The purpose of this is to avoid
+   selecting a split point where we are likely to lose the chance
+   to optimize away an unused function call.  */
+
+static void
+check_forbidden_calls (gimple stmt)
+{
+  tree fndecl;
+
+  if (!is_gimple_call (stmt))
+    return;
+
+  fndecl = gimple_call_fndecl (stmt);
+
+  if (fndecl
+      && TREE_CODE (fndecl) == FUNCTION_DECL
+      && DECL_BUILT_IN (fndecl)
+      /* At the moment, __builtin_constant_p is the only forbidden
+	 predicate function call (see PR49642).  */
+      && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CONSTANT_P)
+    {
+      imm_use_iterator use_iter;
+      use_operand_p use_p;
+      tree lhs = gimple_call_lhs (stmt);
+
+      if (!lhs || TREE_CODE (lhs) != SSA_NAME)
+	return;
+
+      FOR_EACH_IMM_USE_FAST (use_p, use_iter, lhs)
+	{
+	  tree op0, op1;
+	  basic_block use_bb, forbidden_bb;
+	  enum tree_code code;
+	  edge true_edge, false_edge;
+	  gimple use_stmt = USE_STMT (use_p);
+
+	  if (gimple_code (use_stmt) != GIMPLE_COND)
+	    continue;
+
+	  op0 = gimple_cond_lhs (use_stmt);
+	  op1 = gimple_cond_rhs (use_stmt);
+	  code = gimple_cond_code (use_stmt);
+
+	  use_bb = gimple_bb (use_stmt);
+
+	  if (!use_bb)
+	    continue;
+
+	  extract_true_false_edges_from_block (use_bb, &true_edge, &false_edge);
+
+	  /* Swap operands if LHS doesn't come first.  We don't need
+	     to change the code or edges since all we care about is
+	     inequality with zero.  */
+	  if (!operand_equal_p (lhs, op0, 0))
+	    {
+	      tree swapper = op0;
+	      op0 = op1;
+	      op1 = swapper;
+	    }
+
+	  /* We're only interested in comparisons that distinguish
+	     unambiguously from zero.  */
+	  if (!integer_zerop (op1) || code == LE_EXPR || code == GE_EXPR)
+	    continue;
+
+	  if (code == EQ_EXPR)
+	    forbidden_bb = false_edge->dest;
+	  else
+	    forbidden_bb = true_edge->dest;
+
+	  bitmap_set_bit (forbidden_dominators, forbidden_bb->index);
+	}
+    }
+}
+
+/* If BB is dominated by any block in the forbidden dominators set,
+   return TRUE; else FALSE.  */
+
+static bool
+dominated_by_forbidden (basic_block bb)
+{
+  unsigned dom_bb;
+  bitmap_iterator bi;
+
+  EXECUTE_IF_SET_IN_BITMAP (forbidden_dominators, 1, dom_bb, bi)
+    {
+      if (dominated_by_p (CDI_DOMINATORS, bb, BASIC_BLOCK (dom_bb)))
+	return true;
+    }
+
+  return false;
+}
+
 /* We found an split_point CURRENT.  NON_SSA_VARS is bitmap of all non ssa
    variables used and RETURN_BB is return basic block.
    See if we can split function here.  */
@@ -411,6 +513,18 @@  consider_split (struct split_point *current, bitma
 		 "  Refused: split part has non-ssa uses\n");
       return;
     }
+
+  /* If the split point is dominated by a forbidden block, reject
+     the split.  */
+  if (!bitmap_empty_p (forbidden_dominators)
+      && dominated_by_forbidden (current->entry_bb))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "  Refused: split point dominated by forbidden block\n");
+      return;
+    }
+
   /* See if retval used by return bb is computed by header or split part.
      When it is computed by split part, we need to produce return statement
      in the split part and add code to header to pass it around.
@@ -1329,6 +1443,10 @@  execute_split_functions (void)
       return 0;
     }
 
+  /* Initialize bitmap to track forbidden calls.  */
+  forbidden_dominators = BITMAP_ALLOC (NULL);
+  calculate_dominance_info (CDI_DOMINATORS);
+
   /* Compute local info about basic blocks and determine function size/time.  */
   VEC_safe_grow_cleared (bb_info, heap, bb_info_vec, last_basic_block + 1);
   memset (&best_split_point, 0, sizeof (best_split_point));
@@ -1350,6 +1468,7 @@  execute_split_functions (void)
 	  this_time = estimate_num_insns (stmt, &eni_time_weights) * freq;
 	  size += this_size;
 	  time += this_time;
+	  check_forbidden_calls (stmt);
 
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    {
@@ -1371,6 +1490,7 @@  execute_split_functions (void)
       BITMAP_FREE (best_split_point.split_bbs);
       todo = TODO_update_ssa | TODO_cleanup_cfg;
     }
+  BITMAP_FREE (forbidden_dominators);
   VEC_free (bb_info, heap, bb_info_vec);
   bb_info_vec = NULL;
   return todo;

Fix PR49642 in 4.6, questions about 4.7

Commit Message

Comments

Patch