make canonicalize_condition keep its promise
diff mbox series

Message ID 1510760419.6005.3.camel@linux.vnet.ibm.com
State New
Headers show
Series
  • make canonicalize_condition keep its promise
Related show

Commit Message

Aaron Sawdey Nov. 15, 2017, 3:40 p.m. UTC
So, the story of this very small patch starts with me adding patterns
for ppc instructions bdz[tf] and bdnz[tf] such as this:

  [(set (pc)
	(if_then_else
	  (and
	     (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
		 (const_int 1))
	     (match_operator 3 "branch_comparison_operator"
		      [(match_operand 4 "cc_reg_operand" "y,y,y,y")
		       (const_int 0)]))
		      (label_ref (match_operand 0))
		      (pc)))
   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
	(plus:P (match_dup 1)
		(const_int -1)))
   (clobber (match_scratch:P 5 "=X,X,&r,r"))
   (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
   (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]

However when this gets to the loop_doloop pass, we get an assert fail
in iv_number_of_iterations():

  gcc_assert (COMPARISON_P (condition));

This is happening because this branch insn tests two things ANDed
together so the and is at the top of the expression, not a comparison.

This condition is extracted from the insn by get_condition() which is
pretty straightforward, and which calls canonicalize_condition() before
returning it. Now, one could put a test for a jump condition that is
not a conditional test in here but the comment for
canonicalize_condition() says:

   (1) The code will always be a comparison operation (EQ, NE, GT, etc.).

So, this patch adds a test at the end that just returns 0 if the return
rtx is not a comparison. As it happens, doloop conversion is not needed
here because I'm already generating rtl for a branch-decrement counter
based loop.

If there is a better way to go about this please let me know and I'll
revise/retest.

Bootstrap and regtest pass on ppc64le and x86_64. Ok for trunk?

Thanks,
    Aaron


2017-11-15  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>

	* rtlanal.c (canonicalize_condition): Return 0 if final rtx
	does not have a conditional at the top.

Comments

Peter Bergner Nov. 15, 2017, 5:41 p.m. UTC | #1
On 11/15/17 9:40 AM, Aaron Sawdey wrote:
> Index: gcc/rtlanal.c
> ===================================================================
> --- gcc/rtlanal.c       (revision 254553)
> +++ gcc/rtlanal.c       (working copy)
> @@ -5623,7 +5623,11 @@
>    if (CC0_P (op0))
>      return 0;
>  
> -  return gen_rtx_fmt_ee (code, VOIDmode, op0, op1);
> +  /* We promised to return a comparison.  */
> +  rtx ret = gen_rtx_fmt_ee (code, VOIDmode, op0, op1);
> +  if (COMPARISON_P (ret))
> +    return ret;
> +  return 0;

I have no input on whether this approach is correct or not, but...
I know the return above this returns 0 as do other locations in
the file, but new code should return NULL_RTX.

Peter
Jeff Law Nov. 19, 2017, 11:44 p.m. UTC | #2
On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
> So, the story of this very small patch starts with me adding patterns
> for ppc instructions bdz[tf] and bdnz[tf] such as this:
> 
>   [(set (pc)
> 	(if_then_else
> 	  (and
> 	     (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
> 		 (const_int 1))
> 	     (match_operator 3 "branch_comparison_operator"
> 		      [(match_operand 4 "cc_reg_operand" "y,y,y,y")
> 		       (const_int 0)]))
> 		      (label_ref (match_operand 0))
> 		      (pc)))
>    (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
> 	(plus:P (match_dup 1)
> 		(const_int -1)))
>    (clobber (match_scratch:P 5 "=X,X,&r,r"))
>    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
>    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> 
> However when this gets to the loop_doloop pass, we get an assert fail
> in iv_number_of_iterations():
> 
>   gcc_assert (COMPARISON_P (condition));
> 
> This is happening because this branch insn tests two things ANDed
> together so the and is at the top of the expression, not a comparison.
Is this something you've created for an existing loop?  Presumably an
existing loop that previously was a simple loop?

> 
> This condition is extracted from the insn by get_condition() which is
> pretty straightforward, and which calls canonicalize_condition() before
> returning it. Now, one could put a test for a jump condition that is
> not a conditional test in here but the comment for
> canonicalize_condition() says:
> 
>    (1) The code will always be a comparison operation (EQ, NE, GT, etc.).
> 
> So, this patch adds a test at the end that just returns 0 if the return
> rtx is not a comparison. As it happens, doloop conversion is not needed
> here because I'm already generating rtl for a branch-decrement counter
> based loop.
I'm not at all familiar with this code, but I would probably guess that
COND is supposed to appear within INSN.  Canonicalize comparison is
supposed to modify the COND expression that appears within INSN.  THe
overall structure of INSN doesn't much matter as long as COND refers to
a comparison code with two operands.

My gut tells me the problem is upstream or downstream in the call chain
or that you've taken a loop that was considered a simple loop and
modified the exit test in a way that other code is not prepared to handle.
jeff
Aaron Sawdey Nov. 20, 2017, 1:41 p.m. UTC | #3
On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
> On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
> > So, the story of this very small patch starts with me adding
> > patterns
> > for ppc instructions bdz[tf] and bdnz[tf] such as this:
> > 
> >   [(set (pc)
> > 	(if_then_else
> > 	  (and
> > 	     (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
> > 		 (const_int 1))
> > 	     (match_operator 3 "branch_comparison_operator"
> > 		      [(match_operand 4 "cc_reg_operand" "y,y,y,y")
> > 		       (const_int 0)]))
> > 		      (label_ref (match_operand 0))
> > 		      (pc)))
> >    (set (match_operand:P 2 "nonimmediate_operand"
> > "=1,*r,m,*d*wi*c*l")
> > 	(plus:P (match_dup 1)
> > 		(const_int -1)))
> >    (clobber (match_scratch:P 5 "=X,X,&r,r"))
> >    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
> >    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> > 
> > However when this gets to the loop_doloop pass, we get an assert
> > fail
> > in iv_number_of_iterations():
> > 
> >   gcc_assert (COMPARISON_P (condition));
> > 
> > This is happening because this branch insn tests two things ANDed
> > together so the and is at the top of the expression, not a
> > comparison.
> 
> Is this something you've created for an existing loop?  Presumably an
> existing loop that previously was a simple loop?

The rtl to use this instruction is generated by new code I'm working on
to do a builtin expansion of memcmp using a loop. I call
gen_bdnztf_di() to generate rtl for the insn. It would be nice to be
able to generate this instruction from doloop conversion but that is
beyond the scope of what I'm working on presently.

> > 
> > This condition is extracted from the insn by get_condition() which
> > is
> > pretty straightforward, and which calls canonicalize_condition()
> > before
> > returning it. Now, one could put a test for a jump condition that
> > is
> > not a conditional test in here but the comment for
> > canonicalize_condition() says:
> > 
> >    (1) The code will always be a comparison operation (EQ, NE, GT,
> > etc.).
> > 
> > So, this patch adds a test at the end that just returns 0 if the
> > return
> > rtx is not a comparison. As it happens, doloop conversion is not
> > needed
> > here because I'm already generating rtl for a branch-decrement
> > counter
> > based loop.
> 
> I'm not at all familiar with this code, but I would probably guess
> that
> COND is supposed to appear within INSN.  Canonicalize comparison is
> supposed to modify the COND expression that appears within INSN.  THe
> overall structure of INSN doesn't much matter as long as COND refers
> to
> a comparison code with two operands.
> 
> My gut tells me the problem is upstream or downstream in the call
> chain
> or that you've taken a loop that was considered a simple loop and
> modified the exit test in a way that other code is not prepared to
> handle.

Yes, get_condition() is the other place that this could be fixed. It
expects rtl of the form:

(set (pc) (if_then_else (cond ...

and here it's seeing something of the form:

(set (pc) (if_then_else (and (cond ... ) (cond ...)

It seems legitimate to add a check for this in get_condition() as well:

  cond = XEXP (SET_SRC (set), 0);
  if (!COMPARISON_P (cond))
    return NULL_RTX;

There are a couple of uses of canonicalize_condition() in ifcvt.c and
it looks like the code there may expect the return to be a conditional.
This suggests it might be a good idea to do both checks.

Thanks,
   Aaron
Jeff Law Nov. 21, 2017, 5:06 p.m. UTC | #4
On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
> On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
>> On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
>>> So, the story of this very small patch starts with me adding
>>> patterns
>>> for ppc instructions bdz[tf] and bdnz[tf] such as this:
>>>
>>>   [(set (pc)
>>> 	(if_then_else
>>> 	  (and
>>> 	     (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
>>> 		 (const_int 1))
>>> 	     (match_operator 3 "branch_comparison_operator"
>>> 		      [(match_operand 4 "cc_reg_operand" "y,y,y,y")
>>> 		       (const_int 0)]))
>>> 		      (label_ref (match_operand 0))
>>> 		      (pc)))
>>>    (set (match_operand:P 2 "nonimmediate_operand"
>>> "=1,*r,m,*d*wi*c*l")
>>> 	(plus:P (match_dup 1)
>>> 		(const_int -1)))
>>>    (clobber (match_scratch:P 5 "=X,X,&r,r"))
>>>    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
>>>    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
>>>
>>> However when this gets to the loop_doloop pass, we get an assert
>>> fail
>>> in iv_number_of_iterations():
>>>
>>>   gcc_assert (COMPARISON_P (condition));
>>>
>>> This is happening because this branch insn tests two things ANDed
>>> together so the and is at the top of the expression, not a
>>> comparison.
>>
>> Is this something you've created for an existing loop?  Presumably an
>> existing loop that previously was a simple loop?
> 
> The rtl to use this instruction is generated by new code I'm working on
> to do a builtin expansion of memcmp using a loop. I call
> gen_bdnztf_di() to generate rtl for the insn. It would be nice to be
> able to generate this instruction from doloop conversion but that is
> beyond the scope of what I'm working on presently.
Understood.

So what I think (and I'm hoping you can confirm one way or the other) is
that by generating this instruction you're turing a loop which
previously was considered a simple loop by the IV code and turning it
into something the IV bits no longer think is a simple loop.

I think that's problematical as when the loop is thought to be a simple
loop, it has to have a small number of forms for its loop back/exit loop
tests and whether or not a loop is a simple loop is cached in the loop
structure.

I think we need to dig into that first.  If my suspicion is correct then
this patch is really just papering over that deeper problem.  So I think
you need to dig a big deeper into why you're getting into the code in
question (canonicalize_condition) and whether or not the call chain
makes any sense given the changes you've made to the loop.


Jeff
Aaron Sawdey Nov. 21, 2017, 5:45 p.m. UTC | #5
On Tue, 2017-11-21 at 10:06 -0700, Jeff Law wrote:
> On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
> > On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
> > > On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
> > > > So, the story of this very small patch starts with me adding
> > > > patterns
> > > > for ppc instructions bdz[tf] and bdnz[tf] such as this:
> > > > 
> > > >   [(set (pc)
> > > > 	(if_then_else
> > > > 	  (and
> > > > 	     (ne (match_operand:P 1 "register_operand"
> > > > "c,*b,*b,*b")
> > > > 		 (const_int 1))
> > > > 	     (match_operator 3 "branch_comparison_operator"
> > > > 		      [(match_operand 4 "cc_reg_operand"
> > > > "y,y,y,y")
> > > > 		       (const_int 0)]))
> > > > 		      (label_ref (match_operand 0))
> > > > 		      (pc)))
> > > >    (set (match_operand:P 2 "nonimmediate_operand"
> > > > "=1,*r,m,*d*wi*c*l")
> > > > 	(plus:P (match_dup 1)
> > > > 		(const_int -1)))
> > > >    (clobber (match_scratch:P 5 "=X,X,&r,r"))
> > > >    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
> > > >    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> > > > 
> > > > However when this gets to the loop_doloop pass, we get an
> > > > assert
> > > > fail
> > > > in iv_number_of_iterations():
> > > > 
> > > >   gcc_assert (COMPARISON_P (condition));
> > > > 
> > > > This is happening because this branch insn tests two things
> > > > ANDed
> > > > together so the and is at the top of the expression, not a
> > > > comparison.
> > > 
> > > Is this something you've created for an existing
> > > loop?  Presumably an
> > > existing loop that previously was a simple loop?
> > 
> > The rtl to use this instruction is generated by new code I'm
> > working on
> > to do a builtin expansion of memcmp using a loop. I call
> > gen_bdnztf_di() to generate rtl for the insn. It would be nice to
> > be
> > able to generate this instruction from doloop conversion but that
> > is
> > beyond the scope of what I'm working on presently.
> 
> Understood.
> 
> So what I think (and I'm hoping you can confirm one way or the other)
> is
> that by generating this instruction you're turing a loop which
> previously was considered a simple loop by the IV code and turning it
> into something the IV bits no longer think is a simple loop.
> 
> I think that's problematical as when the loop is thought to be a
> simple
> loop, it has to have a small number of forms for its loop back/exit
> loop
> tests and whether or not a loop is a simple loop is cached in the
> loop
> structure.
> 
> I think we need to dig into that first.  If my suspicion is correct
> then
> this patch is really just papering over that deeper problem.  So I
> think
> you need to dig a big deeper into why you're getting into the code in
> question (canonicalize_condition) and whether or not the call chain
> makes any sense given the changes you've made to the loop.
> 

Jeff,
  There is no existing loop structure. This starts with a memcmp() call
and then goes down through the builtin expansion mechanism, which is
ultimately expanding the pattern cmpmemsi which is where my code is
generating a loop that finishes with bdnzt. The code that's ultimately
generated looks like this:

        srdi 9,10,4
        li 6,0
        mtctr 9
        li 4,8
.L9:
        ldbrx 8,11,6
        ldbrx 9,7,6
        ldbrx 5,11,4
        ldbrx 3,7,4
        addi 6,6,16
        addi 4,4,16
        subfc. 9,9,8
        bne 0,.L4
        subfc. 9,3,5
        bdnzt 2,.L9

So it is a loop with a branch out, and then the branch decrement nz
true back to the top. The iv's here (regs 4 and 6) were generated by my
expansion code. 

I really think the ultimate problem here is that both
canonicalize_condition and get_condition promise in their documenting
comments that they will return something that has a cond at the root of
the rtx, or 0 if they don't understand what they're given. In this case
they do not understand the rtx of bdnzt and are returning rtx rooted
with an and, not a cond. This may seem like papering over the problem,
but I think it is legitimate for these functions to return 0 when the
branch insn in question does not have a simple cond at the heart of it.
And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
Ultimately, yes something better ought to be done here.

Thanks,
   Aaron
Aaron Sawdey Nov. 30, 2017, 1:24 a.m. UTC | #6
On Tue, 2017-11-21 at 11:45 -0600, Aaron Sawdey wrote:
> On Tue, 2017-11-21 at 10:06 -0700, Jeff Law wrote:
> > On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
> > > On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
> > > > On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
> > > > > So, the story of this very small patch starts with me adding
> > > > > patterns
> > > > > for ppc instructions bdz[tf] and bdnz[tf] such as this:
> > > > > 
> > > > >   [(set (pc)
> > > > > 	(if_then_else
> > > > > 	  (and
> > > > > 	     (ne (match_operand:P 1 "register_operand"
> > > > > "c,*b,*b,*b")
> > > > > 		 (const_int 1))
> > > > > 	     (match_operator 3 "branch_comparison_operator"
> > > > > 		      [(match_operand 4 "cc_reg_operand"
> > > > > "y,y,y,y")
> > > > > 		       (const_int 0)]))
> > > > > 		      (label_ref (match_operand 0))
> > > > > 		      (pc)))
> > > > >    (set (match_operand:P 2 "nonimmediate_operand"
> > > > > "=1,*r,m,*d*wi*c*l")
> > > > > 	(plus:P (match_dup 1)
> > > > > 		(const_int -1)))
> > > > >    (clobber (match_scratch:P 5 "=X,X,&r,r"))
> > > > >    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
> > > > >    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> > > > > 
> > > > > However when this gets to the loop_doloop pass, we get an
> > > > > assert
> > > > > fail
> > > > > in iv_number_of_iterations():
> > > > > 
> > > > >   gcc_assert (COMPARISON_P (condition));
> > > > > 
> > > > > This is happening because this branch insn tests two things
> > > > > ANDed
> > > > > together so the and is at the top of the expression, not a
> > > > > comparison.
> > > > 
> > > > Is this something you've created for an existing
> > > > loop?  Presumably an
> > > > existing loop that previously was a simple loop?
> > > 
> > > The rtl to use this instruction is generated by new code I'm
> > > working on
> > > to do a builtin expansion of memcmp using a loop. I call
> > > gen_bdnztf_di() to generate rtl for the insn. It would be nice to
> > > be
> > > able to generate this instruction from doloop conversion but that
> > > is
> > > beyond the scope of what I'm working on presently.
> > 
> > Understood.
> > 
> > So what I think (and I'm hoping you can confirm one way or the
> > other)
> > is
> > that by generating this instruction you're turing a loop which
> > previously was considered a simple loop by the IV code and turning
> > it
> > into something the IV bits no longer think is a simple loop.
> > 
> > I think that's problematical as when the loop is thought to be a
> > simple
> > loop, it has to have a small number of forms for its loop back/exit
> > loop
> > tests and whether or not a loop is a simple loop is cached in the
> > loop
> > structure.
> > 
> > I think we need to dig into that first.  If my suspicion is correct
> > then
> > this patch is really just papering over that deeper problem.  So I
> > think
> > you need to dig a big deeper into why you're getting into the code
> > in
> > question (canonicalize_condition) and whether or not the call chain
> > makes any sense given the changes you've made to the loop.
> > 
> 
> Jeff,
>   There is no existing loop structure. This starts with a memcmp()
> call
> and then goes down through the builtin expansion mechanism, which is
> ultimately expanding the pattern cmpmemsi which is where my code is
> generating a loop that finishes with bdnzt. The code that's
> ultimately
> generated looks like this:
> 
>         srdi 9,10,4
>         li 6,0
>         mtctr 9
>         li 4,8
> .L9:
>         ldbrx 8,11,6
>         ldbrx 9,7,6
>         ldbrx 5,11,4
>         ldbrx 3,7,4
>         addi 6,6,16
>         addi 4,4,16
>         subfc. 9,9,8
>         bne 0,.L4
>         subfc. 9,3,5
>         bdnzt 2,.L9
> 
> So it is a loop with a branch out, and then the branch decrement nz
> true back to the top. The iv's here (regs 4 and 6) were generated by
> my
> expansion code. 
> 
> I really think the ultimate problem here is that both
> canonicalize_condition and get_condition promise in their documenting
> comments that they will return something that has a cond at the root
> of
> the rtx, or 0 if they don't understand what they're given. In this
> case
> they do not understand the rtx of bdnzt and are returning rtx rooted
> with an and, not a cond. This may seem like papering over the
> problem,
> but I think it is legitimate for these functions to return 0 when the
> branch insn in question does not have a simple cond at the heart of
> it.
> And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
> Ultimately, yes something better ought to be done here.
> 

Jeff,
  Just wondering if you would have time to take another look at this
now that Thanksgiving is past. I'm hoping to get this resolved so I can
get this new memcmp expansion code into gcc8.

Thanks,
    Aaron
Jeff Law Dec. 1, 2017, 1:19 a.m. UTC | #7
On 11/29/2017 06:24 PM, Aaron Sawdey wrote:
> On Tue, 2017-11-21 at 11:45 -0600, Aaron Sawdey wrote:
>> On Tue, 2017-11-21 at 10:06 -0700, Jeff Law wrote:
>>> On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
>>>> On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
>>>>> On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
>>>>>> So, the story of this very small patch starts with me adding
>>>>>> patterns
>>>>>> for ppc instructions bdz[tf] and bdnz[tf] such as this:
>>>>>>
>>>>>>   [(set (pc)
>>>>>> 	(if_then_else
>>>>>> 	  (and
>>>>>> 	     (ne (match_operand:P 1 "register_operand"
>>>>>> "c,*b,*b,*b")
>>>>>> 		 (const_int 1))
>>>>>> 	     (match_operator 3 "branch_comparison_operator"
>>>>>> 		      [(match_operand 4 "cc_reg_operand"
>>>>>> "y,y,y,y")
>>>>>> 		       (const_int 0)]))
>>>>>> 		      (label_ref (match_operand 0))
>>>>>> 		      (pc)))
>>>>>>    (set (match_operand:P 2 "nonimmediate_operand"
>>>>>> "=1,*r,m,*d*wi*c*l")
>>>>>> 	(plus:P (match_dup 1)
>>>>>> 		(const_int -1)))
>>>>>>    (clobber (match_scratch:P 5 "=X,X,&r,r"))
>>>>>>    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
>>>>>>    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
>>>>>>
>>>>>> However when this gets to the loop_doloop pass, we get an
>>>>>> assert
>>>>>> fail
>>>>>> in iv_number_of_iterations():
>>>>>>
>>>>>>   gcc_assert (COMPARISON_P (condition));
>>>>>>
>>>>>> This is happening because this branch insn tests two things
>>>>>> ANDed
>>>>>> together so the and is at the top of the expression, not a
>>>>>> comparison.
>>>>>
>>>>> Is this something you've created for an existing
>>>>> loop?  Presumably an
>>>>> existing loop that previously was a simple loop?
>>>>
>>>> The rtl to use this instruction is generated by new code I'm
>>>> working on
>>>> to do a builtin expansion of memcmp using a loop. I call
>>>> gen_bdnztf_di() to generate rtl for the insn. It would be nice to
>>>> be
>>>> able to generate this instruction from doloop conversion but that
>>>> is
>>>> beyond the scope of what I'm working on presently.
>>>
>>> Understood.
>>>
>>> So what I think (and I'm hoping you can confirm one way or the
>>> other)
>>> is
>>> that by generating this instruction you're turing a loop which
>>> previously was considered a simple loop by the IV code and turning
>>> it
>>> into something the IV bits no longer think is a simple loop.
>>>
>>> I think that's problematical as when the loop is thought to be a
>>> simple
>>> loop, it has to have a small number of forms for its loop back/exit
>>> loop
>>> tests and whether or not a loop is a simple loop is cached in the
>>> loop
>>> structure.
>>>
>>> I think we need to dig into that first.  If my suspicion is correct
>>> then
>>> this patch is really just papering over that deeper problem.  So I
>>> think
>>> you need to dig a big deeper into why you're getting into the code
>>> in
>>> question (canonicalize_condition) and whether or not the call chain
>>> makes any sense given the changes you've made to the loop.
>>>
>>
>> Jeff,
>>   There is no existing loop structure. This starts with a memcmp()
>> call
>> and then goes down through the builtin expansion mechanism, which is
>> ultimately expanding the pattern cmpmemsi which is where my code is
>> generating a loop that finishes with bdnzt. The code that's
>> ultimately
>> generated looks like this:
>>
>>         srdi 9,10,4
>>         li 6,0
>>         mtctr 9
>>         li 4,8
>> .L9:
>>         ldbrx 8,11,6
>>         ldbrx 9,7,6
>>         ldbrx 5,11,4
>>         ldbrx 3,7,4
>>         addi 6,6,16
>>         addi 4,4,16
>>         subfc. 9,9,8
>>         bne 0,.L4
>>         subfc. 9,3,5
>>         bdnzt 2,.L9
>>
>> So it is a loop with a branch out, and then the branch decrement nz
>> true back to the top. The iv's here (regs 4 and 6) were generated by
>> my
>> expansion code. 
>>
>> I really think the ultimate problem here is that both
>> canonicalize_condition and get_condition promise in their documenting
>> comments that they will return something that has a cond at the root
>> of
>> the rtx, or 0 if they don't understand what they're given. In this
>> case
>> they do not understand the rtx of bdnzt and are returning rtx rooted
>> with an and, not a cond. This may seem like papering over the
>> problem,
>> but I think it is legitimate for these functions to return 0 when the
>> branch insn in question does not have a simple cond at the heart of
>> it.
>> And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
>> Ultimately, yes something better ought to be done here.
>>
> 
> Jeff,
>   Just wondering if you would have time to take another look at this
> now that Thanksgiving is past. I'm hoping to get this resolved so I can
> get this new memcmp expansion code into gcc8.
Definitely in the queue to revisit -- last 3 days have had limited time
to deal with GCC stuff

jeff
Jeff Law Dec. 14, 2017, 8:43 p.m. UTC | #8
On 11/21/2017 10:45 AM, Aaron Sawdey wrote:
> On Tue, 2017-11-21 at 10:06 -0700, Jeff Law wrote:
>> On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
>>> On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
>>>> On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
>>>>> So, the story of this very small patch starts with me adding
>>>>> patterns
>>>>> for ppc instructions bdz[tf] and bdnz[tf] such as this:
>>>>>
>>>>>   [(set (pc)
>>>>> 	(if_then_else
>>>>> 	  (and
>>>>> 	     (ne (match_operand:P 1 "register_operand"
>>>>> "c,*b,*b,*b")
>>>>> 		 (const_int 1))
>>>>> 	     (match_operator 3 "branch_comparison_operator"
>>>>> 		      [(match_operand 4 "cc_reg_operand"
>>>>> "y,y,y,y")
>>>>> 		       (const_int 0)]))
>>>>> 		      (label_ref (match_operand 0))
>>>>> 		      (pc)))
>>>>>    (set (match_operand:P 2 "nonimmediate_operand"
>>>>> "=1,*r,m,*d*wi*c*l")
>>>>> 	(plus:P (match_dup 1)
>>>>> 		(const_int -1)))
>>>>>    (clobber (match_scratch:P 5 "=X,X,&r,r"))
>>>>>    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
>>>>>    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
>>>>>
>>>>> However when this gets to the loop_doloop pass, we get an
>>>>> assert
>>>>> fail
>>>>> in iv_number_of_iterations():
>>>>>
>>>>>   gcc_assert (COMPARISON_P (condition));
>>>>>
>>>>> This is happening because this branch insn tests two things
>>>>> ANDed
>>>>> together so the and is at the top of the expression, not a
>>>>> comparison.
>>>>
>>>> Is this something you've created for an existing
>>>> loop?  Presumably an
>>>> existing loop that previously was a simple loop?
>>>
>>> The rtl to use this instruction is generated by new code I'm
>>> working on
>>> to do a builtin expansion of memcmp using a loop. I call
>>> gen_bdnztf_di() to generate rtl for the insn. It would be nice to
>>> be
>>> able to generate this instruction from doloop conversion but that
>>> is
>>> beyond the scope of what I'm working on presently.
>>
>> Understood.
>>
>> So what I think (and I'm hoping you can confirm one way or the other)
>> is
>> that by generating this instruction you're turing a loop which
>> previously was considered a simple loop by the IV code and turning it
>> into something the IV bits no longer think is a simple loop.
>>
>> I think that's problematical as when the loop is thought to be a
>> simple
>> loop, it has to have a small number of forms for its loop back/exit
>> loop
>> tests and whether or not a loop is a simple loop is cached in the
>> loop
>> structure.
>>
>> I think we need to dig into that first.  If my suspicion is correct
>> then
>> this patch is really just papering over that deeper problem.  So I
>> think
>> you need to dig a big deeper into why you're getting into the code in
>> question (canonicalize_condition) and whether or not the call chain
>> makes any sense given the changes you've made to the loop.
>>
> 
> Jeff,
>   There is no existing loop structure. This starts with a memcmp() call
> and then goes down through the builtin expansion mechanism, which is
> ultimately expanding the pattern cmpmemsi which is where my code is
> generating a loop that finishes with bdnzt. The code that's ultimately
> generated looks like this:
Understood.  But what I still struggle with is how you're getting into
check_simple_exit to begin with and whether or not that should be happening.


The only way to get into check_simple_exit is via find_simple_exit which
is only called from get_simple_loop_desc.

And if you're calling get_simple_loop_desc, then there is some kind of
loop structure in place AFAICT that contains this insn which is rather
surprising.

> 
> I really think the ultimate problem here is that both
> canonicalize_condition and get_condition promise in their documenting
> comments that they will return something that has a cond at the root of
> the rtx, or 0 if they don't understand what they're given. In this case
> they do not understand the rtx of bdnzt and are returning rtx rooted
> with an and, not a cond. This may seem like papering over the problem,
> but I think it is legitimate for these functions to return 0 when the
> branch insn in question does not have a simple cond at the heart of it.
> And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
> Ultimately, yes something better ought to be done here.



Your pattern has the form:

  [(set (pc)
	(if_then_else
	  (and
	     (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
		 (const_int 1))
	     (match_operator 3 "branch_comparison_operator"
		      [(match_operand 4 "cc_reg_operand" "y,y,y,y")
		       (const_int 0)]))
		      (label_ref (match_operand 0))
		      (pc)))
   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
	(plus:P (match_dup 1)
		(const_int -1)))
   (clobber (match_scratch:P 5 "=X,X,&r,r"))
   (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
   (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]



That's a form that get_condition knows how to parse.  It's going to pull
out the condition which looks like this:


	  (and
	     (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
		 (const_int 1))
	     (match_operator 3 "branch_comparison_operator"
		      [(match_operand 4 "cc_reg_operand" "y,y,y,y")
		       (const_int 0)]))

ANd pass that down to canonicalize_condition.  That doesn't look like
something canonicalize_condition should handle and thus it ought to be
returning NULL_RTX.

However, I'm still concerned about how we got to a point where this is
happening.  So while we can fix canonicalize_condition to reject this
form (and you can argue we should and I'd generally agree with you), it
could well be papering over a problem earlier.


Jeff
Aaron Sawdey Dec. 14, 2017, 10:22 p.m. UTC | #9
On Thu, 2017-12-14 at 13:43 -0700, Jeff Law wrote:
> On 11/21/2017 10:45 AM, Aaron Sawdey wrote:
> > On Tue, 2017-11-21 at 10:06 -0700, Jeff Law wrote:
> > > On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
> > > > On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
> > > > > On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
> > > > > > So, the story of this very small patch starts with me
> > > > > > adding
> > > > > > patterns
> > > > > > for ppc instructions bdz[tf] and bdnz[tf] such as this:
> > > > > > 
> > > > > >   [(set (pc)
> > > > > > 	(if_then_else
> > > > > > 	  (and
> > > > > > 	     (ne (match_operand:P 1 "register_operand"
> > > > > > "c,*b,*b,*b")
> > > > > > 		 (const_int 1))
> > > > > > 	     (match_operator 3 "branch_comparison_operator"
> > > > > > 		      [(match_operand 4 "cc_reg_operand"
> > > > > > "y,y,y,y")
> > > > > > 		       (const_int 0)]))
> > > > > > 		      (label_ref (match_operand 0))
> > > > > > 		      (pc)))
> > > > > >    (set (match_operand:P 2 "nonimmediate_operand"
> > > > > > "=1,*r,m,*d*wi*c*l")
> > > > > > 	(plus:P (match_dup 1)
> > > > > > 		(const_int -1)))
> > > > > >    (clobber (match_scratch:P 5 "=X,X,&r,r"))
> > > > > >    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
> > > > > >    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> > > > > > 
> > > > > > However when this gets to the loop_doloop pass, we get an
> > > > > > assert
> > > > > > fail
> > > > > > in iv_number_of_iterations():
> > > > > > 
> > > > > >   gcc_assert (COMPARISON_P (condition));
> > > > > > 
> > > > > > This is happening because this branch insn tests two things
> > > > > > ANDed
> > > > > > together so the and is at the top of the expression, not a
> > > > > > comparison.
> > > > > 
> > > > > Is this something you've created for an existing
> > > > > loop?  Presumably an
> > > > > existing loop that previously was a simple loop?
> > > > 
> > > > The rtl to use this instruction is generated by new code I'm
> > > > working on
> > > > to do a builtin expansion of memcmp using a loop. I call
> > > > gen_bdnztf_di() to generate rtl for the insn. It would be nice
> > > > to
> > > > be
> > > > able to generate this instruction from doloop conversion but
> > > > that
> > > > is
> > > > beyond the scope of what I'm working on presently.
> > > 
> > > Understood.
> > > 
> > > So what I think (and I'm hoping you can confirm one way or the
> > > other)
> > > is
> > > that by generating this instruction you're turing a loop which
> > > previously was considered a simple loop by the IV code and
> > > turning it
> > > into something the IV bits no longer think is a simple loop.
> > > 
> > > I think that's problematical as when the loop is thought to be a
> > > simple
> > > loop, it has to have a small number of forms for its loop
> > > back/exit
> > > loop
> > > tests and whether or not a loop is a simple loop is cached in the
> > > loop
> > > structure.
> > > 
> > > I think we need to dig into that first.  If my suspicion is
> > > correct
> > > then
> > > this patch is really just papering over that deeper problem.  So
> > > I
> > > think
> > > you need to dig a big deeper into why you're getting into the
> > > code in
> > > question (canonicalize_condition) and whether or not the call
> > > chain
> > > makes any sense given the changes you've made to the loop.
> > > 
> > 
> > Jeff,
> >   There is no existing loop structure. This starts with a memcmp()
> > call
> > and then goes down through the builtin expansion mechanism, which
> > is
> > ultimately expanding the pattern cmpmemsi which is where my code is
> > generating a loop that finishes with bdnzt. The code that's
> > ultimately
> > generated looks like this:
> 
> Understood.  But what I still struggle with is how you're getting
> into
> check_simple_exit to begin with and whether or not that should be
> happening.
> 
> 
> The only way to get into check_simple_exit is via find_simple_exit
> which
> is only called from get_simple_loop_desc.
> 
> And if you're calling get_simple_loop_desc, then there is some kind
> of
> loop structure in place AFAICT that contains this insn which is
> rather
> surprising.
> 
> > 
> > I really think the ultimate problem here is that both
> > canonicalize_condition and get_condition promise in their
> > documenting
> > comments that they will return something that has a cond at the
> > root of
> > the rtx, or 0 if they don't understand what they're given. In this
> > case
> > they do not understand the rtx of bdnzt and are returning rtx
> > rooted
> > with an and, not a cond. This may seem like papering over the
> > problem,
> > but I think it is legitimate for these functions to return 0 when
> > the
> > branch insn in question does not have a simple cond at the heart of
> > it.
> > And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
> > Ultimately, yes something better ought to be done here.
> 
> 
> 
> Your pattern has the form:
> 
>   [(set (pc)
> 	(if_then_else
> 	  (and
> 	     (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
> 		 (const_int 1))
> 	     (match_operator 3 "branch_comparison_operator"
> 		      [(match_operand 4 "cc_reg_operand" "y,y,y,y")
> 		       (const_int 0)]))
> 		      (label_ref (match_operand 0))
> 		      (pc)))
>    (set (match_operand:P 2 "nonimmediate_operand"
> "=1,*r,m,*d*wi*c*l")
> 	(plus:P (match_dup 1)
> 		(const_int -1)))
>    (clobber (match_scratch:P 5 "=X,X,&r,r"))
>    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
>    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> 
> 
> 
> That's a form that get_condition knows how to parse.  It's going to
> pull
> out the condition which looks like this:
> 
> 
> 	  (and
> 	     (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
> 		 (const_int 1))
> 	     (match_operator 3 "branch_comparison_operator"
> 		      [(match_operand 4 "cc_reg_operand" "y,y,y,y")
> 		       (const_int 0)]))
> 
> ANd pass that down to canonicalize_condition.  That doesn't look like
> something canonicalize_condition should handle and thus it ought to
> be
> returning NULL_RTX.
> 
> However, I'm still concerned about how we got to a point where this
> is
> happening.  So while we can fix canonicalize_condition to reject this
> form (and you can argue we should and I'd generally agree with you),
> it
> could well be papering over a problem earlier.
> 

Jeff,
  How we end up there is that I do builtin expansion on a memcmp() call
and generate the RTL form for code such as this (35 byte memcmp,
ppc64le power8):

             li 8,2
             mtctr 8
             li 6,0
             li 5,8
     .L13:
             ldbrx 7,3,6
             ldbrx 9,10,6
             ldbrx 0,3,5
             ldbrx 4,10,5
             addi 6,6,16
             addi 5,5,16
             subfc. 9,9,7
             bne 0,.L10
             subfc. 9,4,0
             bdnzt 2,.L13
             bne 0,.L10
             add 3,3,6
             add 10,10,6
             addi 9,3,-5
             ldbrx 7,0,9
             addi 9,10,-5
             ldbrx 9,0,9
             subfc 9,9,7
     .L10:
             popcntd 9,9
             subfe 10,10,10
             or 9,9,10

This happens at expand. Then later down the line, the doloop pass
(doloop_optimize_loops) calls doloop_optimize on each loop. That in
turn calls get_simple_loop_desc().

All I did here was to replace the memcmp() call with a bunch of insns
and a couple labels adding a loop to the code. If I just used another
bne and bdnz I think everything would go through fine. But my
performance testing shows that there is a definite advantage to using
bdnzt here.

To summarize:
 * Input code is a call to memcmp()
 * builtin expansion happens
 * I generate insns forming a loop ending with bdnzt.
 * This loop is processed by doloop_optimize_loops.
 * canonicalize_condition() doesn't understand the RTL for bdnzt.

The beauty of doing stuff like this for builtin expansion is that the
expanded RTL code gets exposed to all of the back end optimizations.
CSE, constant propagation, and other things can do weird and wonderful
things to the code that was once a memcmp(). So if I generate a loop in
the expansion, of course that will be processed by doloop. I don't see
why we would want it any other way.

The function expand_compare_loop() that does this expansion and
generates the loop is in this patch:

https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00623.html

Thanks,
    Aaron
Segher Boessenkool Dec. 15, 2017, 9:16 a.m. UTC | #10
On Thu, Dec 14, 2017 at 01:43:35PM -0700, Jeff Law wrote:
> On 11/21/2017 10:45 AM, Aaron Sawdey wrote:
> >   There is no existing loop structure. This starts with a memcmp() call
> > and then goes down through the builtin expansion mechanism, which is
> > ultimately expanding the pattern cmpmemsi which is where my code is
> > generating a loop that finishes with bdnzt. The code that's ultimately
> > generated looks like this:
> Understood.  But what I still struggle with is how you're getting into
> check_simple_exit to begin with and whether or not that should be happening.
> 
> The only way to get into check_simple_exit is via find_simple_exit which
> is only called from get_simple_loop_desc.
> 
> And if you're calling get_simple_loop_desc, then there is some kind of
> loop structure in place AFAICT that contains this insn which is rather
> surprising.

Why?  It *is* a loop!

Or are you wondering why loop-iv.c is involved?  get_simple_loop_desc
in there is also called from much later in RTL passes.

> > I really think the ultimate problem here is that both
> > canonicalize_condition and get_condition promise in their documenting
> > comments that they will return something that has a cond at the root of
> > the rtx, or 0 if they don't understand what they're given. In this case
> > they do not understand the rtx of bdnzt and are returning rtx rooted
> > with an and, not a cond. This may seem like papering over the problem,
> > but I think it is legitimate for these functions to return 0 when the
> > branch insn in question does not have a simple cond at the heart of it.
> > And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
> > Ultimately, yes something better ought to be done here.
> 
> 
> 
> Your pattern has the form:
> 
>   [(set (pc)
> 	(if_then_else
> 	  (and
> 	     (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
> 		 (const_int 1))
> 	     (match_operator 3 "branch_comparison_operator"
> 		      [(match_operand 4 "cc_reg_operand" "y,y,y,y")
> 		       (const_int 0)]))
> 		      (label_ref (match_operand 0))
> 		      (pc)))
>    (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
> 	(plus:P (match_dup 1)
> 		(const_int -1)))
>    (clobber (match_scratch:P 5 "=X,X,&r,r"))
>    (clobber (match_scratch:CC 6 "=X,&y,&y,&y"))
>    (clobber (match_scratch:CCEQ 7 "=X,&y,&y,&y"))]
> 
> 
> 
> That's a form that get_condition knows how to parse.  It's going to pull
> out the condition which looks like this:
> 
> 
> 	  (and
> 	     (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
> 		 (const_int 1))
> 	     (match_operator 3 "branch_comparison_operator"
> 		      [(match_operand 4 "cc_reg_operand" "y,y,y,y")
> 		       (const_int 0)]))
> 
> ANd pass that down to canonicalize_condition.  That doesn't look like
> something canonicalize_condition should handle and thus it ought to be
> returning NULL_RTX.

Yes exactly.

> However, I'm still concerned about how we got to a point where this is
> happening.  So while we can fix canonicalize_condition to reject this
> form (and you can argue we should and I'd generally agree with you), it
> could well be papering over a problem earlier.

canonicalize_condition does not do what its documentation says it does.
Fixing that is not papering over a problem.  Of course there could be a
problem elsewhere, sure.  But *this* problem is blocking Aaron's other
patches right now (which are approved and ready to go in).


Segher
Jeff Law Dec. 19, 2017, 11:40 p.m. UTC | #11
On 12/15/2017 02:16 AM, Segher Boessenkool wrote:
> On Thu, Dec 14, 2017 at 01:43:35PM -0700, Jeff Law wrote:
>> On 11/21/2017 10:45 AM, Aaron Sawdey wrote:
>>>   There is no existing loop structure. This starts with a memcmp() call
>>> and then goes down through the builtin expansion mechanism, which is
>>> ultimately expanding the pattern cmpmemsi which is where my code is
>>> generating a loop that finishes with bdnzt. The code that's ultimately
>>> generated looks like this:
>> Understood.  But what I still struggle with is how you're getting into
>> check_simple_exit to begin with and whether or not that should be happening.
>>
>> The only way to get into check_simple_exit is via find_simple_exit which
>> is only called from get_simple_loop_desc.
>>
>> And if you're calling get_simple_loop_desc, then there is some kind of
>> loop structure in place AFAICT that contains this insn which is rather
>> surprising.
> 
> Why?  It *is* a loop!
Right.  But how was the loop ever considered simple given that kind of
test?  At one time I managed to convince myself that to trigger the
calls Aaron is having trouble with we must have already analyzed the
loop as "simple" and I'm just having a hard time seeing how that could
be the case given the from of that insn.

>> However, I'm still concerned about how we got to a point where this is
>> happening.  So while we can fix canonicalize_condition to reject this
>> form (and you can argue we should and I'd generally agree with you), it
>> could well be papering over a problem earlier.
> 
> canonicalize_condition does not do what its documentation says it does.
> Fixing that is not papering over a problem.  Of course there could be a
> problem elsewhere, sure.  But *this* problem is blocking Aaron's other
> patches right now (which are approved and ready to go in).
Given that I don't think we should be getting into
canonicalize_condition at all, "fixing it" *is* papering over the problem.

So I think the way forward is to show that the way we're getting into
canonicalize_condition is reasonable/valid.  Once that happens we can go
forward with Aaron's patch.

jeff
Jeff Law Dec. 19, 2017, 11:56 p.m. UTC | #12
On 12/14/2017 03:22 PM, Aaron Sawdey wrote:
>>
>> Understood.  But what I still struggle with is how you're getting
>> into
>> check_simple_exit to begin with and whether or not that should be
>> happening.
>>
>>
>> The only way to get into check_simple_exit is via find_simple_exit
>> which
>> is only called from get_simple_loop_desc.
>>
>> And if you're calling get_simple_loop_desc, then there is some kind
>> of
>> loop structure in place AFAICT that contains this insn which is
>> rather
>> surprising.
So here's what I was looking for...

loop-doloop.c will iterate over all the recorded loops, regardless of
whether or not they are "simple" loops to see if they can be converted
into a do-looop.

For each loop considered we call get_simple_loop_desc which looks like this:



  struct niter_desc *desc = simple_loop_desc (loop);

  if (desc)
    return desc;

  /* At least desc->infinite is not always initialized by
     find_simple_loop_exit.  */
  desc = ggc_cleared_alloc<niter_desc> ();
  iv_analysis_loop_init (loop);
  find_simple_exit (loop, desc);
  loop->simple_loop_desc = desc;
  return desc;

If we already have a descriptor, return it.  Else allocate a new one,
call the IV analysis code, then find_simple_loop_exit.

I must have mis-read this code multiple times.  Clearly we can get to
find_simple_exit which in turn calls check_simple_exit.  The call to
get_condition is guarded by a !any_condjump_p early return.  But
any_condjump_p should return true for your new insn.

So the path to get_condition is reasonable.  That's really all I needed
to have verified one way or the other.

With that in mind your patch is fine.

I will note that I find it highly confusing that we attach a simple loop
descriptor to a loop that is not a simple loop.  But clearly you didn't
introduce that oddball behavior.


jeff
Segher Boessenkool Dec. 20, 2017, 12:13 a.m. UTC | #13
Hi Jeff,

On Tue, Dec 19, 2017 at 04:40:23PM -0700, Jeff Law wrote:
> On 12/15/2017 02:16 AM, Segher Boessenkool wrote:
> >> The only way to get into check_simple_exit is via find_simple_exit which
> >> is only called from get_simple_loop_desc.
> >>
> >> And if you're calling get_simple_loop_desc, then there is some kind of
> >> loop structure in place AFAICT that contains this insn which is rather
> >> surprising.
> > 
> > Why?  It *is* a loop!
> Right.  But how was the loop ever considered simple given that kind of
> test?  At one time I managed to convince myself that to trigger the
> calls Aaron is having trouble with we must have already analyzed the
> loop as "simple" and I'm just having a hard time seeing how that could
> be the case given the from of that insn.

loop-iv.c:check_simple_exit has these two relevant things:

1)
  /* It must end in a simple conditional jump.  */
  if (!any_condjump_p (BB_END (exit_bb)))
    return;

any_condjump_p just sees if it is assigning a suitable if_then_else to pc.
It is.  The comment on any_condjump_p says:

/* Return true when insn is a conditional jump.  This function works for
   instructions containing PC sets in PARALLELs.  The instruction may have
   various other effects so before removing the jump you must verify
   onlyjump_p.

   Note that unlike condjump_p it returns false for unconditional jumps.  */

bdnzt and friends are a parallel of a conditional jump, with as condition
checking whether ctr (the "counter register") will reach 0 (or not), and
whether some other condition is true (or not); and as the other arm of the
parallel it decrements ctr.

2)
  /* Test whether the condition is suitable.  */
  if (!(condition = get_condition (BB_END (ein->src), &at, false, false)))
    return;

... and that gets into trouble already: it gets the condition just fine,
but then it calls canonicalize_condition on that, which cannot handle it.
Which is what Aaron's patch fixes.

> >> However, I'm still concerned about how we got to a point where this is
> >> happening.  So while we can fix canonicalize_condition to reject this
> >> form (and you can argue we should and I'd generally agree with you), it
> >> could well be papering over a problem earlier.
> > 
> > canonicalize_condition does not do what its documentation says it does.
> > Fixing that is not papering over a problem.  Of course there could be a
> > problem elsewhere, sure.  But *this* problem is blocking Aaron's other
> > patches right now (which are approved and ready to go in).
> Given that I don't think we should be getting into
> canonicalize_condition at all, "fixing it" *is* papering over the problem.
> 
> So I think the way forward is to show that the way we're getting into
> canonicalize_condition is reasonable/valid.  Once that happens we can go
> forward with Aaron's patch.

Let me just paste all of get_condition then (it is very short):

rtx
get_condition (rtx_insn *jump, rtx_insn **earliest, int allow_cc_mode,
               int valid_at_insn_p)
{
  rtx cond;
  int reverse;
  rtx set;

  /* If this is not a standard conditional jump, we can't parse it.  */
  if (!JUMP_P (jump)
      || ! any_condjump_p (jump))
    return 0;
  set = pc_set (jump);

  cond = XEXP (SET_SRC (set), 0);

  /* If this branches to JUMP_LABEL when the condition is false, reverse
     the condition.  */
  reverse
    = GET_CODE (XEXP (SET_SRC (set), 2)) == LABEL_REF
      && label_ref_label (XEXP (SET_SRC (set), 2)) == JUMP_LABEL (jump);

  return canonicalize_condition (jump, cond, reverse, earliest, NULL_RTX,
                                 allow_cc_mode, valid_at_insn_p);
}

All of this is totally reasonable and expected; "cond" is an AND of two
things, which canonicalize_condition cannot handle, so it should return
NULL.


Segher
Segher Boessenkool Dec. 20, 2017, 12:18 a.m. UTC | #14
On Tue, Dec 19, 2017 at 04:56:16PM -0700, Jeff Law wrote:
> With that in mind your patch is fine.

Thanks for the reviewing :-)

> I will note that I find it highly confusing that we attach a simple loop
> descriptor to a loop that is not a simple loop.  But clearly you didn't
> introduce that oddball behavior.

It is a simple loop, just with a slightly more complex exit condition.
What is "simple"?  Heh.


Segher
Aaron Sawdey Jan. 2, 2018, 4:47 p.m. UTC | #15
On Tue, 2017-12-19 at 16:56 -0700, Jeff Law wrote:
> So the path to get_condition is reasonable.  That's really all I
> needed
> to have verified one way or the other.
> 
> With that in mind your patch is fine.
> 
> I will note that I find it highly confusing that we attach a simple
> loop
> descriptor to a loop that is not a simple loop.  But clearly you
> didn't
> introduce that oddball behavior.

Jeff,
  Thanks for sticking with this and reviewing, I have re-checked that
regstrap still passes and committed as 256079.

  Aaron

Patch
diff mbox series

Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c       (revision 254553)
+++ gcc/rtlanal.c       (working copy)
@@ -5623,7 +5623,11 @@ 
   if (CC0_P (op0))
     return 0;
 
-  return gen_rtx_fmt_ee (code, VOIDmode, op0, op1);
+  /* We promised to return a comparison.  */
+  rtx ret = gen_rtx_fmt_ee (code, VOIDmode, op0, op1);
+  if (COMPARISON_P (ret))
+    return ret;
+  return 0;
 }
 
 /* Given a jump insn JUMP, return the condition that will cause it to branch