diff mbox series

Revert "[nvptx, libgomp] Update pr85381-{2, 4}.c test-cases" [PR89713, PR94392] (was: [PATCH][RFC] c/94392 - only enable -ffinite-loops for C++)

Message ID 877dyxnfa0.fsf@euler.schwinge.homeip.net
State New
Headers show
Series Revert "[nvptx, libgomp] Update pr85381-{2, 4}.c test-cases" [PR89713, PR94392] (was: [PATCH][RFC] c/94392 - only enable -ffinite-loops for C++) | expand

Commit Message

Thomas Schwinge April 3, 2020, 8:29 a.m. UTC
Hi!

On 2020-04-02T11:12:48+0200, Richard Biener <rguenther@suse.de> wrote:
> On Wed, 1 Apr 2020, Jason Merrill wrote:
>
>> On 4/1/20 9:36 AM, Richard Biener wrote:
>> > This does away with enabling -ffinite-loops at -O2+ for all languages
>> > and instead enables it selectively for C++ only.

> I'm retesting the following [...]

..., which got pushed in commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae
"c/94392 - only enable -ffinite-loops for C++".

I pushed the attached in commit 4f6a0888de52a2e523a6fd4235fe7f8193819c3b
'Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases" [PR89713,
PR94392]'.  As can be observed in two nvptx offloading test cases
regressing, 'apparently now again "empty oacc loops are" no longer
"removed before expand"' (quoting myself from the commit log, adapting
Tom's commit log snippet from the reverted commit).

It's not obvious to me how the "finite loop" property discussed/changed
in Richard's commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae "c/94392 -
only enable -ffinite-loops for C++" relates to the previously observed
optimization of removing "empty oacc loops [...] before expand" (after
PR89713 commit c29c92c789d93848cc1c929838771bfc68cb272c "PR
tree-optimization/89713 - Assume loop with an exit is finite"), but
examining that in detail is for another day.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

Comments

Richard Biener April 3, 2020, 9:36 a.m. UTC | #1
On Fri, 3 Apr 2020, Thomas Schwinge wrote:

> Hi!
> 
> On 2020-04-02T11:12:48+0200, Richard Biener <rguenther@suse.de> wrote:
> > On Wed, 1 Apr 2020, Jason Merrill wrote:
> >
> >> On 4/1/20 9:36 AM, Richard Biener wrote:
> >> > This does away with enabling -ffinite-loops at -O2+ for all languages
> >> > and instead enables it selectively for C++ only.
> 
> > I'm retesting the following [...]
> 
> ..., which got pushed in commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae
> "c/94392 - only enable -ffinite-loops for C++".
> 
> I pushed the attached in commit 4f6a0888de52a2e523a6fd4235fe7f8193819c3b
> 'Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases" [PR89713,
> PR94392]'.  As can be observed in two nvptx offloading test cases
> regressing, 'apparently now again "empty oacc loops are" no longer
> "removed before expand"' (quoting myself from the commit log, adapting
> Tom's commit log snippet from the reverted commit).
> 
> It's not obvious to me how the "finite loop" property discussed/changed
> in Richard's commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae "c/94392 -
> only enable -ffinite-loops for C++" relates to the previously observed
> optimization of removing "empty oacc loops [...] before expand" (after
> PR89713 commit c29c92c789d93848cc1c929838771bfc68cb272c "PR
> tree-optimization/89713 - Assume loop with an exit is finite"), but
> examining that in detail is for another day.

For C we no longer have -ffinite-loops in effect but for C++ we still
do.  But since the testcase is c/c++ common I'd have expected it
now fails "split" ... so an explicit -fno-finite-loops or
-ffinite-loops with an explanation would be easier.

Note there's now also the opportunity to set the loop flag for
OpenACC/OpenMP annotated loops if any of that guarantees finiteness.
(for GCC11 only please)

Richard.
Li, Pan2 via Gcc-patches April 3, 2020, 10:34 a.m. UTC | #2
On Fri, Apr 03, 2020 at 11:36:29AM +0200, Richard Biener wrote:
> Note there's now also the opportunity to set the loop flag for
> OpenACC/OpenMP annotated loops if any of that guarantees finiteness.
> (for GCC11 only please)

Dunno about OpenACC, but OpenMP loops guarantee finiteness, as the number of
iterations must be computable before the loop and must fit into the type in
which that count is computed without overflows.

	Jakub
Thomas Schwinge Oct. 30, 2020, 2:09 p.m. UTC | #3
Hi!

Frederik stumbled over a related thing.

On 2020-04-03T12:36:29+0200, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 3 Apr 2020, Thomas Schwinge wrote:
>> On 2020-04-02T11:12:48+0200, Richard Biener <rguenther@suse.de> wrote:
>> > On Wed, 1 Apr 2020, Jason Merrill wrote:
>> >
>> >> On 4/1/20 9:36 AM, Richard Biener wrote:
>> >> > This does away with enabling -ffinite-loops at -O2+ for all languages
>> >> > and instead enables it selectively for C++ only.
>>
>> > I'm retesting the following [...]
>>
>> ..., which got pushed in commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae
>> "c/94392 - only enable -ffinite-loops for C++".
>>
>> I pushed the attached in commit 4f6a0888de52a2e523a6fd4235fe7f8193819c3b
>> 'Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases" [PR89713,
>> PR94392]'.  As can be observed in two nvptx offloading test cases
>> regressing, 'apparently now again "empty oacc loops are" no longer
>> "removed before expand"' (quoting myself from the commit log, adapting
>> Tom's commit log snippet from the reverted commit).
>>
>> It's not obvious to me how the "finite loop" property discussed/changed
>> in Richard's commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae "c/94392 -
>> only enable -ffinite-loops for C++" relates to the previously observed
>> optimization of removing "empty oacc loops [...] before expand" (after
>> PR89713 commit c29c92c789d93848cc1c929838771bfc68cb272c "PR
>> tree-optimization/89713 - Assume loop with an exit is finite"), but
>> examining that in detail is for another day.
>
> For C we no longer have -ffinite-loops in effect but for C++ we still
> do.  But since the testcase is c/c++ common I'd have expected it
> now fails "split" ... so an explicit -fno-finite-loops or
> -ffinite-loops with an explanation would be easier.

(Thanks, and ACK; still have to look into that.)

> Note there's now also the opportunity to set the loop flag for
> OpenACC/OpenMP annotated loops if any of that guarantees finiteness.
> (for GCC11 only please)

On 2020-04-03T13:34:18+0200, Jakub Jelinek <jakub@redhat.com> wrote:
> Dunno about OpenACC, but OpenMP loops guarantee finiteness, as the number of
> iterations must be computable before the loop and must fit into the type in
> which that count is computed without overflows.

Specifically, is that computable at run-time or compile-time?

Similar for OpenACC.  For example, OpenACC 3.0, 2.9. "Loop Construct",
"Restrictions": "A loop associated with a 'loop' construct that does not
have a 'seq' clause must be written such that the loop iteration count is
computable when entering the loop construct".

(This can only viewed by members of the OpenACC GitHub organization, but
I wanted to share the pointer anyway, and can relay discussion as
necessary.)  For the next version of OpenACC (soon!), this is being
further clarified as per the current discussion in
<https://github.com/OpenACC/openacc-spec/pull/320> "Proposed changes for
range-based for loops and != operator", which should be relevant here:

|   - A loop associated with a 'loop' construct that does not have a 'seq'
|     clause must be written to meet all of the following conditions:
|
|       - The loop variable must be of integer, C/C++ pointer, or C++
|         random-access iterator type.
|
|       - The loop variable must monotonically increase or decrease in the
|         direction of its termination condition.
|
|       - The loop iteration count must be computable in constant time when
|         entering the 'loop' construct.
|
|     For a C++ range-based 'for' loop, the loop variable identified by the
|     above conditions is the internal iterator, such as a pointer, that
|     the compiler generates to iterate the range.  It is not the variable
|     declared by the 'for' loop.

(Notice: "computable in constant time" (which means: not computable only
by actually iterating the whole loop structure), and "computable [...]
when entering" (which, if I got that right, means: at run-time, not
necessarily already compile-time?).


Grüße
 Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Jakub Jelinek Oct. 30, 2020, 2:16 p.m. UTC | #4
On Fri, Oct 30, 2020 at 03:09:31PM +0100, Thomas Schwinge wrote:
> On 2020-04-03T13:34:18+0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > Dunno about OpenACC, but OpenMP loops guarantee finiteness, as the number of
> > iterations must be computable before the loop and must fit into the type in
> > which that count is computed without overflows.
> 
> Specifically, is that computable at run-time or compile-time?

At run-time.  OpenMP certainly doesn't disallow loops with non-constant
steps or lower/upper bound expressions (ok, there is one exception, for
!= loop condition the step must be a constant expression, the step has to
be just 1 or -1).

	Jakub
diff mbox series

Patch

From 4f6a0888de52a2e523a6fd4235fe7f8193819c3b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 3 Apr 2020 10:07:16 +0200
Subject: [PATCH] Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases"
 [PR89713, PR94392]

In response to PR94392 commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae
"c/94392 - only enable -ffinite-loops for C++", this reverts PR89713
commit 00908992f2a78f213d227aea8dbab014a1361df0, as apparently now again
"empty oacc loops are" no longer "removed before expand".

	libgomp/
	PR tree-optimization/89713
	PR c/94392
	* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Again expect
	'bar.sync'.
	* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Likewise.
---
 libgomp/ChangeLog                             |  8 ++++++++
 .../libgomp.oacc-c-c++-common/pr85381-2.c     | 20 ++++++++++++++++++-
 .../libgomp.oacc-c-c++-common/pr85381-4.c     |  5 ++++-
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 6c437930b02f..3716f559aa1c 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,11 @@ 
+2020-04-03  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR tree-optimization/89713
+	PR c/94392
+	* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Again expect
+	'bar.sync'.
+	* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Likewise.
+
 2020-03-31  Tobias Burnus  <tobias@codesourcery.com>
 
 	* target.c (GOMP_target_enter_exit_data): Handle PSET/MAP_POINTER.
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
index 2cb5b95949de..6570c64afff5 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
@@ -15,4 +15,22 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bar.sync" 0 } } */
+/* Todo: Boths bar.syncs can be removed.
+   Atm we generate this dead code inbetween forked and joining:
+
+                     mov.u32 %r28, %ntid.y;
+                     mov.u32 %r29, %tid.y;
+                     add.u32 %r30, %r29, %r29;
+                     setp.gt.s32     %r31, %r30, 19;
+             @%r31   bra     $L2;
+                     add.u32 %r25, %r28, %r28;
+                     mov.u32 %r24, %r30;
+     $L3:
+                     add.u32 %r24, %r24, %r25;
+                     setp.le.s32     %r33, %r24, 19;
+             @%r33   bra     $L3;
+     $L2:
+
+   so the loop is not recognized as empty loop (which we detect by seeing if
+   joining immediately follows forked).  */
+/* { dg-final { scan-assembler-times "bar.sync" 2 } } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
index e8a433ffc0a5..d955d79718df 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
@@ -21,4 +21,7 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bar.sync" 0 } } */
+/* Atm, %ntid.y is broadcast from one loop to the next, so there are 2 bar.syncs
+   for that (the other two are there for the same reason as in pr85381-2.c).
+   Todo: Recompute %ntid.y instead of broadcasting it. */
+/* { dg-final { scan-assembler-times "bar.sync" 4 } } */
-- 
2.25.1