diff mbox

[combine] PR rtl-optimization/68651 Try changing rtx from (r + r) to (r << 1) to aid recognition

Message ID 566EB527.6070305@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov Dec. 14, 2015, 12:25 p.m. UTC
Hi all,

For this PR I want to teach combine to deal with unrecognisable patterns that contain a sub-expression like
(x + x) by transforming it into (x << 1) and trying to match the result. This is because some instruction
sets like arm and aarch64 can combine shifts with other arithmetic operations or have shifts in their RTL representation
of more complex operations (like the aarch64 UBFIZ instruction which can be expressed as a zero_extend+ashift pattern).

Due to a change in rtx costs for -mcpu=cortex-a53 in GCC 5 we no longer expand an expression like x * 2 as x << 1
but rather as x + x, which hurts combination opportunities dues to this deficiency.

This patch addresses the issue in the recog_for_combine function in combine.c in a similar way to the change_zero_ext
trick. That is, if it recog_for_combine fails to match a pattern it replaces all instances of x + x in the
rtx with x << 1 and tries again.

This way I've been able to get combine to more aggressively generate the arithmetic+shift forms of instructions for
-mcpu=cortex-a53 on aarch64 as well as instructions like ubfiz and sbfiz that contain shift-by-immediate sub-expressions.

This patch shouldn't affect rtxes that already match, so it should have no fallout on other cases.

Bootstrapped and tested on arm, aarch64, x86_64.

For the testcase:
int
foo (int x)
{
   return (x * 2) & 65535;
}

int
bar (int x, int y)
{
   return (x * 2) | y;
}

with -O2 -mcpu=cortex-a53 for aarch64 we now generate:
foo:
         ubfiz   w0, w0, 1, 15
         ret

bar:
         orr     w0, w1, w0, lsl 1
         ret

whereas without this patch we generate:
foo:
         add     w0, w0, w0
         uxth    w0, w0
         ret

bar:
         add     w0, w0, w0
         orr     w0, w0, w1
         ret


PR 68651 is a code quality regression for GCC 5 and GCC 6 that was introduced due to updated rtx costs
for -mcpu=cortex-a53 that affected expansion.  The costs changes were correct (to the extent that rtx
costs have any meaning) and I think this is a deficiency in combine that should be fixed.

I wouldn't propose to backport this to GCC 5.

P.S. For the ubfiz testcase above to combine successfully it needs an aarch64 rtx costs issue to be resolved
that I proposed a fix for in https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00526.html.
Otherwise the backend wrongly rejects it on the grounds of wrong costs.

Is this ok for trunk once the costs issue at https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00526.html
gets resolved?

Thanks,
Kyrill

2015-12-14  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68651
     * combine.c (change_shift_by_one): New function.
     (change_rtx_with_func): Likewise.
     (recog_for_combine): Use the above to transform reg + reg
     sub-expressions into reg << 1 within non-recognisable patterns
     and try to match the result.

2015-12-14  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68651
     * gcc.target/aarch64/pr68651_1.c: New test.

Comments

Bernd Schmidt Dec. 15, 2015, 1:22 p.m. UTC | #1
On 12/14/2015 01:25 PM, Kyrill Tkachov wrote:
> For this PR I want to teach combine to deal with unrecognisable patterns
> that contain a sub-expression like
> (x + x) by transforming it into (x << 1) and trying to match the result.
> This is because some instruction
> sets like arm and aarch64 can combine shifts with other arithmetic
> operations or have shifts in their RTL representation
> of more complex operations (like the aarch64 UBFIZ instruction which can
> be expressed as a zero_extend+ashift pattern).
>
> Due to a change in rtx costs for -mcpu=cortex-a53 in GCC 5 we no longer
> expand an expression like x * 2 as x << 1
> but rather as x + x, which hurts combination opportunities dues to this
> deficiency.
>
> This patch addresses the issue in the recog_for_combine function in
> combine.c in a similar way to the change_zero_ext
> trick. That is, if it recog_for_combine fails to match a pattern it
> replaces all instances of x + x in the
> rtx with x << 1 and tries again.
>
> This way I've been able to get combine to more aggressively generate the
> arithmetic+shift forms of instructions for
> -mcpu=cortex-a53 on aarch64 as well as instructions like ubfiz and sbfiz
> that contain shift-by-immediate sub-expressions.
>
> This patch shouldn't affect rtxes that already match, so it should have
> no fallout on other cases.

I'm somewhat undecided on this. If we keep adding cases to this 
mechanism, the run time costs will eventually add up (we'll iterate over 
the pattern over and over again if it doesn't match, which is the normal 
case in combine), and we're still not testing combinations of these 
replacements.

I wonder if it would be possible to have genrecog write a special 
recognizer that can identify cases where a pattern would match if it was 
changed. Something along the lines of

recog_for_combine (..., vec<..> *replacements)
{
....
   /* Trying to recognize a shift.  */
   if (GET_CODE (x) == PLUS && rtx_equal_p (XEXP (x, 0), XEXP (x, 1)))
     replacements->safe_push (...)
}

Seems like it would be more efficient and more flexible.


Bernd
Bernd Schmidt Dec. 15, 2015, 2:22 p.m. UTC | #2
On 12/14/2015 01:25 PM, Kyrill Tkachov wrote:
> PR 68651 is a code quality regression for GCC 5 and GCC 6 that was
> introduced due to updated rtx costs
> for -mcpu=cortex-a53 that affected expansion.  The costs changes were
> correct (to the extent that rtx
> costs have any meaning) and I think this is a deficiency in combine that
> should be fixed.

Thinking a bit more about this, I'm actually not sure that this isn't a 
backend problem. IMO the costs could and maybe very well should be 
represented such that a left shift by 1 and an add have the same cost, 
and the insn pattern for the shift should emit the add if it is cheaper. 
If there are multiple ways of expressing an operation, then how it is 
represented in RTL is essentially irrelevant to the question of how much 
it costs.


Bernd
Richard Earnshaw Dec. 15, 2015, 2:33 p.m. UTC | #3
On 15/12/15 14:22, Bernd Schmidt wrote:
> On 12/14/2015 01:25 PM, Kyrill Tkachov wrote:
>> PR 68651 is a code quality regression for GCC 5 and GCC 6 that was
>> introduced due to updated rtx costs
>> for -mcpu=cortex-a53 that affected expansion.  The costs changes were
>> correct (to the extent that rtx
>> costs have any meaning) and I think this is a deficiency in combine that
>> should be fixed.
> 
> Thinking a bit more about this, I'm actually not sure that this isn't a
> backend problem. IMO the costs could and maybe very well should be
> represented such that a left shift by 1 and an add have the same cost,
> and the insn pattern for the shift should emit the add if it is cheaper.
> If there are multiple ways of expressing an operation, then how it is
> represented in RTL is essentially irrelevant to the question of how much
> it costs.
> 
> 
> Bernd

That might be OK if we didn't have to have standard canonicalization,
but I think handling all these special cases would make it incredibly
complex and fragile to work out the accurate costs of all these patterns.

It's also possible that this would explicitly break some other
optimization passes (such as the way in which multiplies are synthesised
with shift/add operations).

R.
Bernd Schmidt Dec. 15, 2015, 2:37 p.m. UTC | #4
On 12/15/2015 03:33 PM, Richard Earnshaw wrote:
>
> It's also possible that this would explicitly break some other
> optimization passes (such as the way in which multiplies are synthesised
> with shift/add operations).

How so? IMO such a change would make cost calculations more accurate and 
should help rather than break anything.


Bernd
Kyrill Tkachov Dec. 15, 2015, 4:21 p.m. UTC | #5
Hi Bernd,

On 15/12/15 14:22, Bernd Schmidt wrote:
> On 12/14/2015 01:25 PM, Kyrill Tkachov wrote:
>> PR 68651 is a code quality regression for GCC 5 and GCC 6 that was
>> introduced due to updated rtx costs
>> for -mcpu=cortex-a53 that affected expansion.  The costs changes were
>> correct (to the extent that rtx
>> costs have any meaning) and I think this is a deficiency in combine that
>> should be fixed.
>
> Thinking a bit more about this, I'm actually not sure that this isn't a backend problem. IMO the costs could and maybe very well should be represented such that a left shift by 1 and an add have the same cost, and the insn pattern for the 
> shift should emit the add if it is cheaper. If there are multiple ways of expressing an operation, then how it is represented in RTL is essentially irrelevant to the question of how much it costs.
>

Then for the shift pattern in the MD file we'd have to dynamically select the scheduling type depending on whether or not
the shift amount is 1 and the costs line up?
Sounds like going out of our way to work around the fact that either combine or recog does not understand equivalent forms
of instructions.

I'd be more inclined to follow your suggestion in your first reply (teaching genrecog about equivalent patterns)
However, I think that would require a bit more involved surgery in recog/combine whereas this approach
reuses the existing change_zero_ext approach which I feel is less intrusive at this stage.

The price we pay when trying these substitutions is an iteration over the rtx with FOR_EACH_SUBRTX_PTR.
recog gets called only if that iteration actually performed a substitution of x + x into x << 1.
Is that too high a price to pay? (I'm not familiar with the performance characteristics of
the FOR_EACH_SUBRTX machinery)

Thanks,
Kyrill

>
> Bernd
>
Bernd Schmidt Dec. 16, 2015, 12:18 p.m. UTC | #6
On 12/15/2015 05:21 PM, Kyrill Tkachov wrote:
> Then for the shift pattern in the MD file we'd have to dynamically
> select the scheduling type depending on whether or not the shift
> amount is 1 and the costs line up?

Yes. This isn't unusual, take a look at i386.md where you have a lot of 
switches on attr type to decide which string to print.

> The price we pay when trying these substitutions is an iteration over
>  the rtx with FOR_EACH_SUBRTX_PTR. recog gets called only if that
> iteration actually performed a substitution of x + x into x << 1. Is
> that too high a price to pay? (I'm not familiar with the performance
>  characteristics of the FOR_EACH_SUBRTX machinery)

It depends on how many of these transforms we are going to try; it also 
feels very hackish, trying to work around the core design of the 
combiner. IMO it would be better for machine descriptions to work with 
the pass rather than against it.

If you can somehow arrange for the (plus x x) to be turned into a shift 
while substituting that might be yet another approach to try.


Bernd
Kyrill Tkachov Dec. 16, 2015, 5 p.m. UTC | #7
On 16/12/15 12:18, Bernd Schmidt wrote:
> On 12/15/2015 05:21 PM, Kyrill Tkachov wrote:
>> Then for the shift pattern in the MD file we'd have to dynamically
>> select the scheduling type depending on whether or not the shift
>> amount is 1 and the costs line up?
>
> Yes. This isn't unusual, take a look at i386.md where you have a lot of switches on attr type to decide which string to print.
>

I'm just worried that if we take this idea to its logical conclusion, we have to add a new canonicalisation rule:
"all (plus x x) expressions shall be expressed as (ashift x 1)".
Such a rule seems too specific to me and all targets would have to special-case it in their MD patterns and costs
if they ever wanted to treat an add and a shift differently.
In this particular case we'd have
to conditionalise the scheduling string selection on a particular CPU tuning and the shift amount, which will make
the pattern much harder to read.
To implement this properly we'd also have to

>> The price we pay when trying these substitutions is an iteration over
>>  the rtx with FOR_EACH_SUBRTX_PTR. recog gets called only if that
>> iteration actually performed a substitution of x + x into x << 1. Is
>> that too high a price to pay? (I'm not familiar with the performance
>>  characteristics of the FOR_EACH_SUBRTX machinery)
>
> It depends on how many of these transforms we are going to try; it also feels very hackish, trying to work around the core design of the combiner. IMO it would be better for machine descriptions to work with the pass rather than against it.
>

Perhaps I'm lacking the historical context, but what is the core design of the combiner?
Why should the backend have to jump through these hoops if it already communicates to the midend
(through correct rtx costs) that a shift is more expensive than a plus?
I'd be more inclined to agree that this is perhaps a limitation in recog rather than combine,
but still not a backend problem.

> If you can somehow arrange for the (plus x x) to be turned into a shift while substituting that might be yet another approach to try.
>

I did investigate where else we could make this transformation.
For the zero_extend+shift case (the ubfiz instruction from the testcase in my original submission)
we could fix this by modifying make_extraction to convert its argument to a shift from (plus x x)
as, in that context, shifts are undoubtedly more likely to simplify with the various extraction
operations that it's trying to perform.

That leaves the other case (orr + shift), where converting to a shift isn't a
simplification in any way, but the backend happens to have an instruction that matches the
combined orr+shift form. There we want to perform the transformation purely to aid
recognition, not out of any simplification considerations. That's what I'm trying to figure out
how to do now.

However, we want to avoid doing it unconditionally because if we have just a simple set
of y = x + x we want to leave it as a plus rather than a shift because it's cheaper
on that target.

Thanks,
Kyrill
>
> Bernd
Jeff Law Dec. 16, 2015, 5:28 p.m. UTC | #8
On 12/16/2015 10:00 AM, Kyrill Tkachov wrote:
>
> On 16/12/15 12:18, Bernd Schmidt wrote:
>> On 12/15/2015 05:21 PM, Kyrill Tkachov wrote:
>>> Then for the shift pattern in the MD file we'd have to
>>> dynamically select the scheduling type depending on whether or
>>> not the shift amount is 1 and the costs line up?
>>
>> Yes. This isn't unusual, take a look at i386.md where you have a
>> lot of switches on attr type to decide which string to print.
>>
>
> I'm just worried that if we take this idea to its logical conclusion,
> we have to add a new canonicalisation rule: "all (plus x x)
> expressions shall be expressed as (ashift x 1)". Such a rule seems
> too specific to me and all targets would have to special-case it in
> their MD patterns and costs if they ever wanted to treat an add and a
> shift differently. In this particular case we'd have to
> conditionalise the scheduling string selection on a particular CPU
> tuning and the shift amount, which will make the pattern much harder
> to read. To implement this properly we'd also have to
That's not terribly unusual.  And we've done those kind of 
canonicalization rules before -- most recently to deal with issues in 
combine we settled on canonicalization rules for ashift vs mult.  While 
there was fallout, it's manageable.


>
>>> The price we pay when trying these substitutions is an iteration
>>> over the rtx with FOR_EACH_SUBRTX_PTR. recog gets called only if
>>> that iteration actually performed a substitution of x + x into x
>>> << 1. Is that too high a price to pay? (I'm not familiar with the
>>> performance characteristics of the FOR_EACH_SUBRTX machinery)
>>
>> It depends on how many of these transforms we are going to try; it
>>  also feels very hackish, trying to work around the core design of
>> the combiner. IMO it would be better for machine descriptions to
>> work with the pass rather than against it.
>>
>
> Perhaps I'm lacking the historical context, but what is the core
> design of the combiner? Why should the backend have to jump through
> these hoops if it already communicates to the midend (through correct
> rtx costs) that a shift is more expensive than a plus? I'd be more
> inclined to agree that this is perhaps a limitation in recog rather
> than combine, but still not a backend problem.
The historical design of combine is pretty simple.

Use data dependence to substitute the definition of an operand in a use
of the operand. Essentially create bigger blobs of RTL. Canonicalize and
simplify that larger blob of RTL, then try to match it with a pattern in
the backend.

Note that costing didn't enter the picture. The assumption was that if
the combination succeeds, then it's profitable (fewer insns).  We 
haven't generally encouraged trying to match multiple forms of 
equivalent expressions, instead we declare a canonical form and make 
sure combine uses it.


>
>> If you can somehow arrange for the (plus x x) to be turned into a
>> shift while substituting that might be yet another approach to
>> try.
>>
>
> I did investigate where else we could make this transformation. For
> the zero_extend+shift case (the ubfiz instruction from the testcase
> in my original submission) we could fix this by modifying
> make_extraction to convert its argument to a shift from (plus x x)
> as, in that context, shifts are undoubtedly more likely to simplify
> with the various extraction operations that it's trying to perform.
Note that canonicalizing (plus x x) to (ashift x 1) is consistent with 
the canonicalization we do for (mult x C) to (ashift x log2 (C)) where C 
is an exact power of two.

When we made that change consistently (there were cases where we instead 
preferred MULT in the past), we had to fix some backends, but the 
fallout wasn't terrible.

I would think from a representational standpoint canonicalizing (plus x 
x) to (ashift x 1) would be generally a good thing.


jeff
Kyrill Tkachov Dec. 17, 2015, 9:46 a.m. UTC | #9
On 16/12/15 17:28, Jeff Law wrote:
> On 12/16/2015 10:00 AM, Kyrill Tkachov wrote:
>>
>> On 16/12/15 12:18, Bernd Schmidt wrote:
>>> On 12/15/2015 05:21 PM, Kyrill Tkachov wrote:
>>>> Then for the shift pattern in the MD file we'd have to
>>>> dynamically select the scheduling type depending on whether or
>>>> not the shift amount is 1 and the costs line up?
>>>
>>> Yes. This isn't unusual, take a look at i386.md where you have a
>>> lot of switches on attr type to decide which string to print.
>>>
>>
>> I'm just worried that if we take this idea to its logical conclusion,
>> we have to add a new canonicalisation rule: "all (plus x x)
>> expressions shall be expressed as (ashift x 1)". Such a rule seems
>> too specific to me and all targets would have to special-case it in
>> their MD patterns and costs if they ever wanted to treat an add and a
>> shift differently. In this particular case we'd have to
>> conditionalise the scheduling string selection on a particular CPU
>> tuning and the shift amount, which will make the pattern much harder
>> to read. To implement this properly we'd also have to
> That's not terribly unusual.  And we've done those kind of canonicalization rules before -- most recently to deal with issues in combine we settled on canonicalization rules for ashift vs mult.  While there was fallout, it's manageable.
>
>
>>
>>>> The price we pay when trying these substitutions is an iteration
>>>> over the rtx with FOR_EACH_SUBRTX_PTR. recog gets called only if
>>>> that iteration actually performed a substitution of x + x into x
>>>> << 1. Is that too high a price to pay? (I'm not familiar with the
>>>> performance characteristics of the FOR_EACH_SUBRTX machinery)
>>>
>>> It depends on how many of these transforms we are going to try; it
>>>  also feels very hackish, trying to work around the core design of
>>> the combiner. IMO it would be better for machine descriptions to
>>> work with the pass rather than against it.
>>>
>>
>> Perhaps I'm lacking the historical context, but what is the core
>> design of the combiner? Why should the backend have to jump through
>> these hoops if it already communicates to the midend (through correct
>> rtx costs) that a shift is more expensive than a plus? I'd be more
>> inclined to agree that this is perhaps a limitation in recog rather
>> than combine, but still not a backend problem.
> The historical design of combine is pretty simple.
>
> Use data dependence to substitute the definition of an operand in a use
> of the operand. Essentially create bigger blobs of RTL. Canonicalize and
> simplify that larger blob of RTL, then try to match it with a pattern in
> the backend.
>
> Note that costing didn't enter the picture. The assumption was that if
> the combination succeeds, then it's profitable (fewer insns).  We haven't generally encouraged trying to match multiple forms of equivalent expressions, instead we declare a canonical form and make sure combine uses it.
>
>

Thanks for the explanation.

>>
>>> If you can somehow arrange for the (plus x x) to be turned into a
>>> shift while substituting that might be yet another approach to
>>> try.
>>>
>>
>> I did investigate where else we could make this transformation. For
>> the zero_extend+shift case (the ubfiz instruction from the testcase
>> in my original submission) we could fix this by modifying
>> make_extraction to convert its argument to a shift from (plus x x)
>> as, in that context, shifts are undoubtedly more likely to simplify
>> with the various extraction operations that it's trying to perform.
> Note that canonicalizing (plus x x) to (ashift x 1) is consistent with the canonicalization we do for (mult x C) to (ashift x log2 (C)) where C is an exact power of two.
>
> When we made that change consistently (there were cases where we instead preferred MULT in the past), we had to fix some backends, but the fallout wasn't terrible.
>
> I would think from a representational standpoint canonicalizing (plus x x) to (ashift x 1) would be generally a good thing.
>

Ok. Gathering from the above it's combines' job to canonicalise, so implementing this approach would be simply a matter
of adding the transformation to combine_simplify_rtx. At least that does the trick for my testcases and the backend
can decide whether to emit the add instruction for the shift-by-one case.

If there's consensus on this approach I'll propose a patch for that.

Thanks for all your inputs,
Kyrill


>
> jeff
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index c008d2a786ebeaa7560acbd60c7c2e8cdacdc9aa..9465e5927145e768f1a5fc43ce7c3621033d8aef 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11063,6 +11063,60 @@  change_zero_ext (rtx *src)
   return changed;
 }
 
+/* Replace in SRC all sub-rtxes of the form x + x
+   with x << 1 to help recognition on targets with combined
+   shift operations.  Return true iff such any such change was made.  */
+
+static bool
+change_shift_by_one (rtx *src)
+{
+  bool changed = false;
+
+  subrtx_ptr_iterator::array_type array;
+  FOR_EACH_SUBRTX_PTR (iter, array, src, NONCONST)
+    {
+      rtx x = **iter;
+      machine_mode mode = GET_MODE (x);
+
+      if (GET_CODE (x) == PLUS && GET_MODE_CLASS (mode) == MODE_INT
+	  && !side_effects_p (XEXP (x, 0))
+	  && rtx_equal_p (XEXP (x, 0), XEXP (x, 1)))
+	x = simplify_gen_binary (ASHIFT, mode, XEXP (x, 0), const1_rtx);
+      else
+	continue;
+
+      SUBST (**iter, x);
+      changed = true;
+    }
+
+  return changed;
+}
+
+/* Modify the sources of all sets in PAT using the function FUNC that takes
+   a pointer to the rtx to modify and returns true iff it made any
+   modifications.  Used by recog_for_combine to twiddle non-matching patterns
+   into something recognisable.  */
+
+static bool
+change_rtx_with_func (rtx pat, bool (*func) (rtx *))
+{
+  bool changed = false;
+
+  if (GET_CODE (pat) == SET)
+    changed = func (&SET_SRC (pat));
+  else if (GET_CODE (pat) == PARALLEL)
+    {
+      int i;
+      for (i = 0; i < XVECLEN (pat, 0); i++)
+	{
+	  rtx set = XVECEXP (pat, 0, i);
+	  if (GET_CODE (set) == SET)
+	    changed |= func (&SET_SRC (set));
+	}
+    }
+  return changed;
+}
+
 /* Like recog, but we receive the address of a pointer to a new pattern.
    We try to match the rtx that the pointer points to.
    If that fails, we may try to modify or replace the pattern,
@@ -11073,6 +11127,9 @@  change_zero_ext (rtx *src)
    to the equivalent AND and perhaps LSHIFTRT patterns, and try with that
    (and undo if that fails).
 
+   If that still doesn't match we change expressions of the form
+   (PLUS reg1 reg1) into (ASHIFT reg1 (const_int 1)).  Undo if that fails.
+
    PNOTES is a pointer to a location where any REG_UNUSED notes added for
    the CLOBBERs are placed.
 
@@ -11090,19 +11147,7 @@  recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
   void *marker = get_undo_marker ();
   bool changed = false;
 
-  if (GET_CODE (pat) == SET)
-    changed = change_zero_ext (&SET_SRC (pat));
-  else if (GET_CODE (pat) == PARALLEL)
-    {
-      int i;
-      for (i = 0; i < XVECLEN (pat, 0); i++)
-	{
-	  rtx set = XVECEXP (pat, 0, i);
-	  if (GET_CODE (set) == SET)
-	    changed |= change_zero_ext (&SET_SRC (set));
-	}
-    }
-
+  changed = change_rtx_with_func (pat, change_zero_ext);
   if (changed)
     {
       insn_code_number = recog_for_combine_1 (pnewpat, insn, pnotes);
@@ -11111,6 +11156,20 @@  recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
 	undo_to_marker (marker);
     }
 
+  if (insn_code_number < 0)
+    {
+      marker = get_undo_marker ();
+      changed = change_rtx_with_func (pat, change_shift_by_one);
+
+      if (changed)
+	{
+	  insn_code_number = recog_for_combine_1 (pnewpat, insn, pnotes);
+
+	  if (insn_code_number < 0)
+	    undo_to_marker (marker);
+	}
+
+    }
   return insn_code_number;
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr68651_1.c b/gcc/testsuite/gcc.target/aarch64/pr68651_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..ef9456f538776e7db01ecf5473425aed9efd9de2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr68651_1.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a53" } */
+
+int
+foo (int x)
+{
+  return (x * 2) & 65535;
+}
+/* { dg-final { scan-assembler "ubfiz\tw\[0-9\]*, w\[0-9\]*.*\n" } } */
+
+int
+bar (int x, int y)
+{
+  return (x * 2) | y;
+}
+/* { dg-final { scan-assembler "orr\tw\[0-9\]*, w\[0-9\]*, w\[0-9\]*, lsl 1.*\n" } } */