[arm] Implement Armv8.1-M low overhead loops
diff mbox series

Message ID gkrmu9pxwf6.fsf@arm.com
State New
Headers show
Series
  • [arm] Implement Armv8.1-M low overhead loops
Related show

Commit Message

Andrea Corallo Feb. 11, 2020, 10:14 a.m. UTC
Hi all,

This patch enables the Armv8.1-M Mainline LOB (low overhead branch) extension
low overhead loops (LOL) feature by using the loop-doloop pass
that it shares with the Swing Modulo Scheduler.

bootstrapped arm-none-linux-gnueabihf, does not introduce testsuite regressions.

Andrea

gcc/ChangeLog:

2020-??-??  Andrea Corallo  <andrea.corallo@arm.com>
2020-??-??  Mihail-Calin Ionescu  <mihail.ionescu@arm.com>
2020-??-??  Iain Apreotesei  <iain.apreotesei@arm.com>

        * config/arm/arm.c (TARGET_INVALID_WITHIN_DOLOOP):
        (arm_invalid_within_doloop): Implement invalid_within_doloop hook.
        * config/arm/arm.h (TARGET_HAVE_LOB): Add new macro.
        * config/arm/thumb2.md (*doloop_end, doloop_begin, dls_insn):
        Add new patterns.
        * config/arm/unspecs.md: Add new unspec.

gcc/testsuite/ChangeLog:

2020-??-??  Andrea Corallo  <andrea.corallo@arm.com>
2020-??-??  Mihail-Calin Ionescu  <mihail.ionescu@arm.com>
2020-??-??  Iain Apreotesei  <iain.apreotesei@arm.com>

        * gcc.target/arm/lob.h: New header.
        * gcc.target/arm/lob1.c: New testcase.
        * gcc.target/arm/lob2.c: Likewise.
        * gcc.target/arm/lob3.c: Likewise.
        * gcc.target/arm/lob4.c: Likewise.
        * gcc.target/arm/lob5.c: Likewise.
        * gcc.target/arm/lob6.c: Likewise.

Comments

Richard Earnshaw (lists) Feb. 11, 2020, 11 a.m. UTC | #1
On 11/02/2020 10:14, Andrea Corallo wrote:
> Hi all,
> 
> This patch enables the Armv8.1-M Mainline LOB (low overhead branch) extension
> low overhead loops (LOL) feature by using the loop-doloop pass
> that it shares with the Swing Modulo Scheduler.
> 
> bootstrapped arm-none-linux-gnueabihf, does not introduce testsuite regressions.
> 
> Andrea
> 
> gcc/ChangeLog:
> 
> 2020-??-??  Andrea Corallo  <andrea.corallo@arm.com>
> 2020-??-??  Mihail-Calin Ionescu  <mihail.ionescu@arm.com>
> 2020-??-??  Iain Apreotesei  <iain.apreotesei@arm.com>
> 
>          * config/arm/arm.c (TARGET_INVALID_WITHIN_DOLOOP):
>          (arm_invalid_within_doloop): Implement invalid_within_doloop hook.
>          * config/arm/arm.h (TARGET_HAVE_LOB): Add new macro.
>          * config/arm/thumb2.md (*doloop_end, doloop_begin, dls_insn):
>          Add new patterns.
>          * config/arm/unspecs.md: Add new unspec.
> 

A date should only appear before the first author in a multi-author 
patch, other authors should then be indented to align with the name of 
that first author.

> gcc/testsuite/ChangeLog:
> 
> 2020-??-??  Andrea Corallo  <andrea.corallo@arm.com>
> 2020-??-??  Mihail-Calin Ionescu  <mihail.ionescu@arm.com>
> 2020-??-??  Iain Apreotesei  <iain.apreotesei@arm.com>
> 
>          * gcc.target/arm/lob.h: New header.
>          * gcc.target/arm/lob1.c: New testcase.
>          * gcc.target/arm/lob2.c: Likewise.
>          * gcc.target/arm/lob3.c: Likewise.
>          * gcc.target/arm/lob4.c: Likewise.
>          * gcc.target/arm/lob5.c: Likewise.
>          * gcc.target/arm/lob6.c: Likewise.
> 

+(define_insn "*doloop_end"
+  [(parallel [(set (pc)
+                   (if_then_else
+                       (ne (reg:SI LR_REGNUM) (const_int 1))
+                     (label_ref (match_operand 0 "" ""))
+                     (pc)))
+              (set (reg:SI LR_REGNUM)
+                   (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
+  "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched"
+  "le\tlr, %l0")

Is it deliberate that this pattern name has a '*' prefix?  doloop_end is 
a named expansion pattern according to md.texi.

Also, hard-coded register names should be prefixed with '%|' (so "%|lr", 
not just "lr"), just in case the assembler dialect requires something 
(ELF doesn't but others have).  Also for dls_insn.

For the tests, your 'require-effective-taret' tests look insufficient to 
prevent problems when testing a multilib environment, you'll need (at 
least) checks that a) passing -marm has not happened and b) that the 
architecture, or a specific CPU isn't being passed on the command line.

R.
Andrea Corallo Feb. 11, 2020, 1:40 p.m. UTC | #2
Hi Richard,

"Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:

>> gcc/ChangeLog:
>> 2020-??-??  Andrea Corallo  <andrea.corallo@arm.com>
>> 2020-??-??  Mihail-Calin Ionescu  <mihail.ionescu@arm.com>
>> 2020-??-??  Iain Apreotesei  <iain.apreotesei@arm.com>
>>          * config/arm/arm.c (TARGET_INVALID_WITHIN_DOLOOP):
>>          (arm_invalid_within_doloop): Implement invalid_within_doloop hook.
>>          * config/arm/arm.h (TARGET_HAVE_LOB): Add new macro.
>>          * config/arm/thumb2.md (*doloop_end, doloop_begin, dls_insn):
>>          Add new patterns.
>>          * config/arm/unspecs.md: Add new unspec.
>>
>
> A date should only appear before the first author in a multi-author
> patch, other authors should then be indented to align with the name of
> that first author.

Ack

> +(define_insn "*doloop_end"
> +  [(parallel [(set (pc)
> +                   (if_then_else
> +                       (ne (reg:SI LR_REGNUM) (const_int 1))
> +                     (label_ref (match_operand 0 "" ""))
> +                     (pc)))
> +              (set (reg:SI LR_REGNUM)
> +                   (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
> +  "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched"
> +  "le\tlr, %l0")
>
> Is it deliberate that this pattern name has a '*' prefix?  doloop_end
> is a named expansion pattern according to md.texi.

Yes, this should be expanded already by the define_expand we have in
thumb2.md.  Perhaps I'll call it 'doloop_end_internal' and add a
comment.

> Also, hard-coded register names should be prefixed with '%|' (so
> "%|lr", not just "lr"), just in case the assembler dialect requires
> something (ELF doesn't but others have).  Also for dls_insn.

Ack

> For the tests, your 'require-effective-taret' tests look insufficient
> to prevent problems when testing a multilib environment, you'll need
> (at least) checks that a) passing -marm has not happened and b) that
> the architecture, or a specific CPU isn't being passed on the command
> line.

Ack

Thanks for reviewing I'll update the patch.

Andrea
Roman Zhuykov Feb. 12, 2020, 9:23 a.m. UTC | #3
Hello!

11.02.2020 16:40, Andrea Corallo wrote:
> Hi Richard,
>
> "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> writes:
>
>>> gcc/ChangeLog:
>>> 2020-??-??  Andrea Corallo  <andrea.corallo@arm.com>
>>> 2020-??-??  Mihail-Calin Ionescu  <mihail.ionescu@arm.com>
>>> 2020-??-??  Iain Apreotesei  <iain.apreotesei@arm.com>
>>>          * config/arm/arm.c (TARGET_INVALID_WITHIN_DOLOOP):
>>>          (arm_invalid_within_doloop): Implement invalid_within_doloop hook.
>>>          * config/arm/arm.h (TARGET_HAVE_LOB): Add new macro.
>>>          * config/arm/thumb2.md (*doloop_end, doloop_begin, dls_insn):
>>>          Add new patterns.
>>>          * config/arm/unspecs.md: Add new unspec.
>>>
>> A date should only appear before the first author in a multi-author
>> patch, other authors should then be indented to align with the name of
>> that first author.
> Ack
This patch is stage1 material, right?
>
>> +(define_insn "*doloop_end"
>> +  [(parallel [(set (pc)
>> +                   (if_then_else
>> +                       (ne (reg:SI LR_REGNUM) (const_int 1))
>> +                     (label_ref (match_operand 0 "" ""))
>> +                     (pc)))
>> +              (set (reg:SI LR_REGNUM)
>> +                   (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
>> +  "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched"
>> +  "le\tlr, %l0")
I'm not an expert in .md files, but having that "!flag_modulo_sched"
condition seems wrong to me.  What was the issue on SMS side to add that?
Currently, there are fake doloop_end pattern on ARM.  It is generated
only when flag_modulo_sched is set and actually expands to more than one
instruction.  This old approach have its pros and cons.  When we
HAVE_LOB, target allows us to use a real doloop_end instruction, fake
one is not needed at all.  In this case compiler should use real
instruction regardless whether SMS in on or off.

I hope in stage1 after upgrading modulo scheduler, we will restart old
discussion about removing fake doloop_end pattern for ARM:
https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01812.html
https://gcc.gnu.org/ml/gcc-patches/2012-01/msg00195.html
Aarch64 also have such a fake pattern since 2014, probably its removal
also will be considered.

Roman
>> Is it deliberate that this pattern name has a '*' prefix?  doloop_end
>> is a named expansion pattern according to md.texi.
> Yes, this should be expanded already by the define_expand we have in
> thumb2.md.  Perhaps I'll call it 'doloop_end_internal' and add a
> comment.
>
>> Also, hard-coded register names should be prefixed with '%|' (so
>> "%|lr", not just "lr"), just in case the assembler dialect requires
>> something (ELF doesn't but others have).  Also for dls_insn.
> Ack
>
>> For the tests, your 'require-effective-taret' tests look insufficient
>> to prevent problems when testing a multilib environment, you'll need
>> (at least) checks that a) passing -marm has not happened and b) that
>> the architecture, or a specific CPU isn't being passed on the command
>> line.
> Ack
>
> Thanks for reviewing I'll update the patch.
>
> Andrea
Andrea Corallo Feb. 13, 2020, 5:54 p.m. UTC | #4
Hi Roman,

Roman Zhuykov <zhroma@ispras.ru> writes:

> This patch is stage1 material, right?

Yes

>>
>>> +(define_insn "*doloop_end"
>>> +  [(parallel [(set (pc)
>>> +                   (if_then_else
>>> +                       (ne (reg:SI LR_REGNUM) (const_int 1))
>>> +                     (label_ref (match_operand 0 "" ""))
>>> +                     (pc)))
>>> +              (set (reg:SI LR_REGNUM)
>>> +                   (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
>>> +  "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched"
>>> +  "le\tlr, %l0")
> I'm not an expert in .md files, but having that "!flag_modulo_sched"
> condition seems wrong to me.  What was the issue on SMS side to add
> that?

With this patch the first insn of the low loop overhead 'doloop_begin'
is expanded by 'doloop_modify' in loop-doloop.c.  The same does not
happen with SMS.  My understanding is that to have it working in that
case too the machine dependent reorg pass should add it later.  Am I
correct on this?

Thanks
  Andrea

Patch
diff mbox series

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index e07cf03538c..1269f40bd77 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -586,6 +586,9 @@  extern int arm_arch_bf16;
 
 /* Target machine storage Layout.  */
 
+/* Nonzero if this chip provides Armv8.1-M Mainline
+   LOB (low overhead branch features) extension instructions.  */
+#define TARGET_HAVE_LOB (arm_arch8_1m_main)
 
 /* Define this macro if it is advisable to hold scalars in registers
    in a wider mode than that declared by the program.  In such cases,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9cc7bc0e562..d0b50d544e3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -833,6 +833,9 @@  static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT arm_constant_alignment
 
+#undef TARGET_INVALID_WITHIN_DOLOOP
+#define TARGET_INVALID_WITHIN_DOLOOP arm_invalid_within_doloop
+
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
 
@@ -32937,6 +32940,39 @@  arm_ge_bits_access (void)
   return true;
 }
 
+/* NULL if INSN insn is valid within a low-overhead loop.
+   Otherwise return why doloop cannot be applied.  */
+
+static const char *
+arm_invalid_within_doloop (const rtx_insn *insn)
+{
+  if (!TARGET_HAVE_LOB)
+    return default_invalid_within_doloop (insn);
+
+  if (CALL_P (insn))
+    return "Function call in the loop.";
+
+  if (tablejump_p (insn, NULL, NULL) || computed_jump_p (insn))
+    return "Computed branch in the loop.";
+
+  if (INSN_P (insn)
+      && GET_CODE (PATTERN (insn)) == PARALLEL)
+    {
+      rtx parallel = PATTERN (insn);
+      rtx clobber;
+      int j;
+      for (j = XVECLEN (parallel, 0) - 1; j >= 0; j--)
+	{
+	  clobber = XVECEXP (parallel, 0, j);
+	  if (GET_CODE (clobber) == CLOBBER
+	      && GET_CODE (XEXP (clobber, 0)) == REG
+	      && REGNO (XEXP (clobber, 0)) == LR_REGNUM)
+	    return "LR is used inside loop.";
+	}
+    }
+  return NULL;
+}
+
 #if CHECKING_P
 namespace selftest {
 
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index b0d3bd1cf1c..44b1a264dba 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1555,8 +1555,11 @@ 
       using a certain 'count' register and (2) the loop count can be
       adjusted by modifying this register prior to the loop.
       ??? The possible introduction of a new block to initialize the
-      new IV can potentially affect branch optimizations.  */
-   if (optimize > 0 && flag_modulo_sched)
+      new IV can potentially affect branch optimizations.
+
+      Also used to implement the low over head loops feature, which is part of
+      the Armv8.1-M Mainline Low Overhead Branch (LOB) extension.  */
+   if (optimize > 0 && (flag_modulo_sched || TARGET_HAVE_LOB))
    {
      rtx s0;
      rtx bcomp;
@@ -1569,6 +1572,11 @@ 
        FAIL;
 
      s0 = operands [0];
+
+     /* Low over head loop instructions require the first operand to be LR.  */
+     if (TARGET_HAVE_LOB)
+       s0 = gen_rtx_REG (SImode, LR_REGNUM);
+
      if (TARGET_THUMB2)
        insn = emit_insn (gen_thumb2_addsi3_compare0 (s0, s0, GEN_INT (-1)));
      else
@@ -1650,3 +1658,29 @@ 
   "TARGET_HAVE_MVE"
   "lsrl%?\\t%Q0, %R0, %1"
   [(set_attr "predicable" "yes")])
+
+(define_insn "*doloop_end"
+  [(parallel [(set (pc)
+                   (if_then_else
+                       (ne (reg:SI LR_REGNUM) (const_int 1))
+                     (label_ref (match_operand 0 "" ""))
+                     (pc)))
+              (set (reg:SI LR_REGNUM)
+                   (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
+  "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched"
+  "le\tlr, %l0")
+
+(define_expand "doloop_begin"
+  [(match_operand 0 "" "")
+   (match_operand 1 "" "")]
+  "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched"
+  {
+    emit_insn (gen_dls_insn (operands[0], operands[0]));
+    DONE;
+  })
+
+(define_insn "dls_insn"
+  [(set (match_operand:SI 0 "" "")
+        (unspec:SI [(match_operand:SI 1 "s_register_operand" "r")] UNSPEC_DLS))]
+  "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched"
+  "dls\tlr, %1")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 8f4a705f43e..df5ecb73192 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -154,6 +154,7 @@ 
   UNSPEC_SMUADX		; Represent the SMUADX operation.
   UNSPEC_SSAT16		; Represent the SSAT16 operation.
   UNSPEC_USAT16		; Represent the USAT16 operation.
+  UNSPEC_DLS		; Used for DLS (Do Loop Start), Armv8.1-M Mainline instruction
 ])
 
 
diff --git a/gcc/testsuite/gcc.target/arm/lob.h b/gcc/testsuite/gcc.target/arm/lob.h
new file mode 100644
index 00000000000..feaae7cc899
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/lob.h
@@ -0,0 +1,15 @@ 
+#include <string.h>
+
+/* Common code for lob tests.  */
+
+#define NO_LOB asm volatile ("@ clobber lr" : : : "lr" )
+
+#define N 10000
+
+static void
+reset_data (int *a, int *b, int *c)
+{
+  memset (a, -1, N * sizeof (*a));
+  memset (b, -1, N * sizeof (*b));
+  memset (c, -1, N * sizeof (*c));
+}
diff --git a/gcc/testsuite/gcc.target/arm/lob1.c b/gcc/testsuite/gcc.target/arm/lob1.c
new file mode 100644
index 00000000000..8ffaaa29878
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/lob1.c
@@ -0,0 +1,82 @@ 
+/* Check that GCC generates Armv8.1-M low over head loop instructions
+   for some simple loops.  */
+/* { dg-do run } */
+/* { dg-options "-march=armv8.1-m.main -O3 --save-temps" } */
+#include <stdlib.h>
+#include "lob.h"
+
+int a[N];
+int b[N];
+int c[N];
+
+int
+foo (int a, int b)
+{
+  return a + b;
+}
+
+void __attribute__((noinline))
+loop1 (int *a, int *b, int *c)
+{
+  for (int i = 0; i < N; i++)
+    {
+      a[i] = i;
+      b[i] = i * 2;
+      c[i] = a[i] + b[i];
+    }
+}
+
+void __attribute__((noinline))
+loop2 (int *a, int *b, int *c)
+{
+  int i = 0;
+  while (i < N)
+    {
+      a[i] = i - 2;
+      b[i] = i * 5;
+      c[i] = a[i] + b[i];
+      i++;
+    }
+}
+
+void __attribute__((noinline))
+loop3 (int *a, int *b, int *c)
+{
+  int i = 0;
+  do
+    {
+      a[i] = i - 4;
+      b[i] = i * 3;
+      c[i] = a[i] + b[i];
+      i++;
+    } while (i < N);
+}
+
+void
+check (int *a, int *b, int *c)
+{
+  for (int i = 0; i < N; i++)
+    {
+      NO_LOB;
+      if (c[i] != a[i] + b[i])
+	abort ();
+    }
+}
+
+int main (void)
+{
+  reset_data (a, b, c);
+  loop1 (a, b ,c);
+  check (a, b ,c);
+  reset_data (a, b, c);
+  loop2 (a, b ,c);
+  check (a, b ,c);
+  reset_data (a, b, c);
+  loop3 (a, b ,c);
+  check (a, b ,c);
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {dls\s\S*,\s\S*} 3 } } */
+/* { dg-final { scan-assembler-times {le\slr,\s\S*} 3 } } */
diff --git a/gcc/testsuite/gcc.target/arm/lob2.c b/gcc/testsuite/gcc.target/arm/lob2.c
new file mode 100644
index 00000000000..046d92fcad1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/lob2.c
@@ -0,0 +1,30 @@ 
+/* Check that GCC does not generate Armv8.1-M low over head loop instructions
+   if a non-inlineable function call takes place inside the loop.  */
+/* { dg-do compile } */
+/* { dg-options "-march=armv8.1-m.main -O3 --save-temps" } */
+#include <stdlib.h>
+#include "lob.h"
+
+int a[N];
+int b[N];
+int c[N];
+
+int __attribute__ ((noinline))
+foo (int a, int b)
+{
+  return a + b;
+}
+
+int main (void)
+{
+  for (int i = 0; i < N; i++)
+    {
+      a[i] = i;
+      b[i] = i * 2;
+      c[i] = foo (a[i], b[i]);
+    }
+
+  return 0;
+}
+/* { dg-final { scan-assembler-not {dls\s\S*,\s\S*} } } */
+/* { dg-final { scan-assembler-not {le\slr,\s\S*} } } */
diff --git a/gcc/testsuite/gcc.target/arm/lob3.c b/gcc/testsuite/gcc.target/arm/lob3.c
new file mode 100644
index 00000000000..77f89ad9c70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/lob3.c
@@ -0,0 +1,26 @@ 
+/* Check that GCC does not generate Armv8.1-M low over head loop instructions
+   if causes VFP emulation library calls to happen inside the loop.  */
+/* { dg-do compile } */
+/* { dg-options "-march=armv8.1-m.main -O3 --save-temps -mfloat-abi=soft" } */
+/* { dg-require-effective-target arm_softfloat } */
+#include <stdlib.h>
+#include "lob.h"
+
+double a[N];
+double b[N];
+double c[N];
+
+int
+main (void)
+{
+  for (int i = 0; i < N; i++)
+    {
+      a[i] = i;
+      b[i] = i * 2;
+      c[i] = a[i] + b[i];
+    }
+
+  return 0;
+}
+/* { dg-final { scan-assembler-not {dls\s\S*,\s\S*} } } */
+/* { dg-final { scan-assembler-not {le\slr,\s\S*} } } */
diff --git a/gcc/testsuite/gcc.target/arm/lob4.c b/gcc/testsuite/gcc.target/arm/lob4.c
new file mode 100644
index 00000000000..88be61f3c76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/lob4.c
@@ -0,0 +1,32 @@ 
+/* Check that GCC does not generate Armv8.1-M low over head loop instructions
+   if LR is modified within the loop.  */
+/* { dg-do compile } */
+/* { dg-options "-march=armv8.1-m.main -O3 --save-temps -mfloat-abi=soft" } */
+/* { dg-require-effective-target arm_softfloat } */
+#include <stdlib.h>
+#include "lob.h"
+
+int a[N];
+int b[N];
+int c[N];
+
+static __attribute__ ((always_inline)) inline int
+foo (int a, int b)
+{
+  NO_LOB;
+  return a + b;
+}
+
+int main (void)
+{
+  for (int i = 0; i < N; i++)
+    {
+      a[i] = i;
+      b[i] = i * 2;
+      c[i] = foo(a[i], b[i]);
+    }
+
+  return 0;
+}
+/* { dg-final { scan-assembler-not {dls\s\S*,\s\S*} } } */
+/* { dg-final { scan-assembler-not {le\slr,\s\S*} } } */
diff --git a/gcc/testsuite/gcc.target/arm/lob5.c b/gcc/testsuite/gcc.target/arm/lob5.c
new file mode 100644
index 00000000000..cd91c3252d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/lob5.c
@@ -0,0 +1,33 @@ 
+/* Check that GCC does not generates Armv8.1-M low over head loop
+   instructions.  Innermost loop has no fixed number of iterations
+   therefore is not optimizable.  Outer loops are not optimized.  */
+/* { dg-do compile } */
+/* { dg-options "-march=armv8.1-m.main -O3 --save-temps" } */
+#include <stdlib.h>
+#include "lob.h"
+
+int a[N];
+int b[N];
+int c[N];
+
+int main (void)
+{
+  for (int i = 0; i < N; i++)
+    {
+      a[i] = i;
+      b[i] = i * 2;
+
+      int k = b[i];
+      while (k != 0)
+	{
+	  if (k % 2 == 0)
+	    c[i - 1] = k % 2;
+	  k /= 2;
+	}
+      c[i] = a[i] - b[i];
+    }
+
+  return 0;
+}
+/* { dg-final { scan-assembler-not {dls\s\S*,\s\S*} } } */
+/* { dg-final { scan-assembler-not {le\slr,\s\S*} } } */
diff --git a/gcc/testsuite/gcc.target/arm/lob6.c b/gcc/testsuite/gcc.target/arm/lob6.c
new file mode 100644
index 00000000000..4bcedc8bd60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/lob6.c
@@ -0,0 +1,94 @@ 
+/* Check that GCC generates Armv8.1-M low over head loop instructions
+   with some less trivial loops and the result is correct.  */
+/* { dg-do run } */
+/* { dg-options "-march=armv8.1-m.main -O3 --save-temps" } */
+#include <stdlib.h>
+#include "lob.h"
+
+#define TEST_CODE1				\
+  {						\
+    for (int i = 0; i < N; i++)			\
+      {						\
+	a[i] = i;				\
+	b[i] = i * 2;				\
+						\
+	for (int k = 0; k < N; k++)		\
+	  {					\
+	    MAYBE_LOB;				\
+	    c[k] = k / 2;			\
+	  }					\
+	c[i] = a[i] - b[i];			\
+      }						\
+  }
+
+#define TEST_CODE2				\
+  {						\
+    for (int i = 0; i < N / 2; i++)		\
+      {						\
+	MAYBE_LOB;				\
+	if (c[i] % 2 == 0)			\
+	  break;				\
+	a[i]++;					\
+	b[i]++;					\
+      }						\
+  }
+
+int a1[N];
+int b1[N];
+int c1[N];
+
+int a2[N];
+int b2[N];
+int c2[N];
+
+#define MAYBE_LOB
+void __attribute__((noinline))
+loop1 (int *a, int *b, int *c)
+  TEST_CODE1;
+
+void __attribute__((noinline))
+loop2 (int *a, int *b, int *c)
+  TEST_CODE2;
+
+#undef MAYBE_LOB
+#define MAYBE_LOB NO_LOB
+
+void
+ref1 (int *a, int *b, int *c)
+  TEST_CODE1;
+
+void
+ref2 (int *a, int *b, int *c)
+  TEST_CODE2;
+
+void
+check (void)
+{
+  for (int i = 0; i < N; i++)
+    {
+      NO_LOB;
+      if (a1[i] != a2[i]
+	  && b1[i] != b2[i]
+	  && c1[i] != c2[i])
+	abort ();
+    }
+}
+
+int main (void)
+{
+  reset_data (a1, b1, c1);
+  reset_data (a2, b2, c2);
+  loop1 (a1, b1, c1);
+  ref1 (a2, b2, c2);
+  check ();
+
+  reset_data (a1, b1, c1);
+  reset_data (a2, b2, c2);
+  loop2 (a1, b1, c1);
+  ref2 (a2, b2, c2);
+  check ();
+
+  return 0;
+}
+/* { dg-final { scan-assembler-times {dls\s\S*,\s\S*} 2 } } */
+/* { dg-final { scan-assembler-times {le\slr,\s\S*} 2 } } */