diff mbox series

[4/5] lto: Set has_unroll flag when streaming in for LTO [PR116140]

Message ID ZrXhQwtHv0kiOCKr@arm.com
State New
Headers show
Series Address std::find regression with RTL unrolling [PR116140] | expand

Commit Message

Alex Coplan Aug. 9, 2024, 9:28 a.m. UTC
When #pragma GCC unroll is processed in
tree-cfg.cc:replace_loop_annotate_in_block, we set both the loop->unroll
field (which is currently streamed out and back in during LTO) but also
the cfun->has_unroll flag.

cfun->has_unroll, however, is not currently streamed during LTO, so this
patch attempts to recover it by setting it on any function containing a
loop with loop->unroll > 1.

Prior to this patch, loops marked with #pragma GCC unroll that would be
unrolled by RTL loop2_unroll in a non-LTO compilation didn't get
unrolled under LTO.

As per the comment in the PR, a more conservative fix might explicitly
stream out cfun->has_unroll and stream it back in again, but this patch
it simpler and I can't currently see a reason against inferring the
value of the flag like this (comments welcome).

gcc/ChangeLog:

	PR libstdc++/116140
	* lto-streamer-in.cc (input_cfg): Set fn->has_unroll if fn
	contains a loop with requested unrolling.

gcc/testsuite/ChangeLog:

	PR libstdc++/116140
	* g++.dg/ext/pragma-unroll-lambda-lto.C: New test.
---
 gcc/lto-streamer-in.cc                        |  2 ++
 .../g++.dg/ext/pragma-unroll-lambda-lto.C     | 32 +++++++++++++++++++
 2 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C

Comments

Richard Biener Aug. 9, 2024, 11:12 a.m. UTC | #1
> Am 09.08.2024 um 11:30 schrieb Alex Coplan <alex.coplan@arm.com>:
> 
> When #pragma GCC unroll is processed in
> tree-cfg.cc:replace_loop_annotate_in_block, we set both the loop->unroll
> field (which is currently streamed out and back in during LTO) but also
> the cfun->has_unroll flag.
> 
> cfun->has_unroll, however, is not currently streamed during LTO, so this
> patch attempts to recover it by setting it on any function containing a
> loop with loop->unroll > 1.
> 
> Prior to this patch, loops marked with #pragma GCC unroll that would be
> unrolled by RTL loop2_unroll in a non-LTO compilation didn't get
> unrolled under LTO.
> 
> As per the comment in the PR, a more conservative fix might explicitly
> stream out cfun->has_unroll and stream it back in again, but this patch
> it simpler and I can't currently see a reason against inferring the
> value of the flag like this (comments welcome).

If the flag is redundant please eliminate it entirely.  Otherwise please stream it.

> gcc/ChangeLog:
> 
>    PR libstdc++/116140
>    * lto-streamer-in.cc (input_cfg): Set fn->has_unroll if fn
>    contains a loop with requested unrolling.
> 
> gcc/testsuite/ChangeLog:
> 
>    PR libstdc++/116140
>    * g++.dg/ext/pragma-unroll-lambda-lto.C: New test.
> ---
> gcc/lto-streamer-in.cc                        |  2 ++
> .../g++.dg/ext/pragma-unroll-lambda-lto.C     | 32 +++++++++++++++++++
> 2 files changed, 34 insertions(+)
> create mode 100644 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> 
> <0004-lto-Set-has_unroll-flag-when-streaming-in-for-LTO-PR.patch>
Alex Coplan Aug. 9, 2024, 11:40 a.m. UTC | #2
On 09/08/2024 13:12, Richard Biener wrote:
> 
> 
> > Am 09.08.2024 um 11:30 schrieb Alex Coplan <alex.coplan@arm.com>:
> > 
> > When #pragma GCC unroll is processed in
> > tree-cfg.cc:replace_loop_annotate_in_block, we set both the loop->unroll
> > field (which is currently streamed out and back in during LTO) but also
> > the cfun->has_unroll flag.
> > 
> > cfun->has_unroll, however, is not currently streamed during LTO, so this
> > patch attempts to recover it by setting it on any function containing a
> > loop with loop->unroll > 1.
> > 
> > Prior to this patch, loops marked with #pragma GCC unroll that would be
> > unrolled by RTL loop2_unroll in a non-LTO compilation didn't get
> > unrolled under LTO.
> > 
> > As per the comment in the PR, a more conservative fix might explicitly
> > stream out cfun->has_unroll and stream it back in again, but this patch
> > it simpler and I can't currently see a reason against inferring the
> > value of the flag like this (comments welcome).
> 
> If the flag is redundant please eliminate it entirely.  Otherwise please stream it.

AFAICT, the flag effectively provides a way of cacheing the answer to
the question "are there any loops in this function for which unrolling
has been requested"?

Eliminating it entirely would mean replacing any uses with loops over
all loops in the function, which seems suboptimal.

The current users are (both in loop-init.cc):
 - pass_rtl_unroll_loops::gate, and
 - pass_loop2::gate

Would you prefer introducing loops over all loops in the function at
those use sites?  Or is it OK to keep the flag?

If the flag is purely a cache of the above test (I think it is), then
ISTM streaming it has no benefit since we necessarily have to iterate
over all loops when streaming in.

Thanks,
Alex

> 
> > gcc/ChangeLog:
> > 
> >    PR libstdc++/116140
> >    * lto-streamer-in.cc (input_cfg): Set fn->has_unroll if fn
> >    contains a loop with requested unrolling.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >    PR libstdc++/116140
> >    * g++.dg/ext/pragma-unroll-lambda-lto.C: New test.
> > ---
> > gcc/lto-streamer-in.cc                        |  2 ++
> > .../g++.dg/ext/pragma-unroll-lambda-lto.C     | 32 +++++++++++++++++++
> > 2 files changed, 34 insertions(+)
> > create mode 100644 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > 
> > <0004-lto-Set-has_unroll-flag-when-streaming-in-for-LTO-PR.patch>
Richard Biener Aug. 19, 2024, 11:50 a.m. UTC | #3
On Fri, 9 Aug 2024, Alex Coplan wrote:

> On 09/08/2024 13:12, Richard Biener wrote:
> > 
> > 
> > > Am 09.08.2024 um 11:30 schrieb Alex Coplan <alex.coplan@arm.com>:
> > > 
> > > When #pragma GCC unroll is processed in
> > > tree-cfg.cc:replace_loop_annotate_in_block, we set both the loop->unroll
> > > field (which is currently streamed out and back in during LTO) but also
> > > the cfun->has_unroll flag.
> > > 
> > > cfun->has_unroll, however, is not currently streamed during LTO, so this
> > > patch attempts to recover it by setting it on any function containing a
> > > loop with loop->unroll > 1.
> > > 
> > > Prior to this patch, loops marked with #pragma GCC unroll that would be
> > > unrolled by RTL loop2_unroll in a non-LTO compilation didn't get
> > > unrolled under LTO.
> > > 
> > > As per the comment in the PR, a more conservative fix might explicitly
> > > stream out cfun->has_unroll and stream it back in again, but this patch
> > > it simpler and I can't currently see a reason against inferring the
> > > value of the flag like this (comments welcome).
> > 
> > If the flag is redundant please eliminate it entirely.  Otherwise please stream it.
> 
> AFAICT, the flag effectively provides a way of cacheing the answer to
> the question "are there any loops in this function for which unrolling
> has been requested"?
> 
> Eliminating it entirely would mean replacing any uses with loops over
> all loops in the function, which seems suboptimal.
> 
> The current users are (both in loop-init.cc):
>  - pass_rtl_unroll_loops::gate, and
>  - pass_loop2::gate
> 
> Would you prefer introducing loops over all loops in the function at
> those use sites?  Or is it OK to keep the flag?
> 
> If the flag is purely a cache of the above test (I think it is), then
> ISTM streaming it has no benefit since we necessarily have to iterate
> over all loops when streaming in.

But it's just a bit and streaming is straight forward while "recomputing"
even if trivial is extra code that requires maintainance and 
understanding.

Please simply stream the bit.

Thanks,
Richard.

> Thanks,
> Alex
> 
> > 
> > > gcc/ChangeLog:
> > > 
> > >    PR libstdc++/116140
> > >    * lto-streamer-in.cc (input_cfg): Set fn->has_unroll if fn
> > >    contains a loop with requested unrolling.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >    PR libstdc++/116140
> > >    * g++.dg/ext/pragma-unroll-lambda-lto.C: New test.
> > > ---
> > > gcc/lto-streamer-in.cc                        |  2 ++
> > > .../g++.dg/ext/pragma-unroll-lambda-lto.C     | 32 +++++++++++++++++++
> > > 2 files changed, 34 insertions(+)
> > > create mode 100644 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > > 
> > > <0004-lto-Set-has_unroll-flag-when-streaming-in-for-LTO-PR.patch>
>
diff mbox series

Patch

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2e592be8082..93877065d86 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1136,6 +1136,8 @@  input_cfg (class lto_input_block *ib, class data_in *data_in,
       /* Read OMP SIMD related info.  */
       loop->safelen = streamer_read_hwi (ib);
       loop->unroll = streamer_read_hwi (ib);
+      if (loop->unroll > 1)
+	fn->has_unroll = true;
       loop->owned_clique = streamer_read_hwi (ib);
       loop->dont_vectorize = streamer_read_hwi (ib);
       loop->force_vectorize = streamer_read_hwi (ib);
diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
new file mode 100644
index 00000000000..144c4c32692
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
@@ -0,0 +1,32 @@ 
+// { dg-do link { target c++11 } }
+// { dg-options "-O2 -flto -fdump-rtl-loop2_unroll" }
+
+#include <cstdlib>
+
+template<typename Iter, typename Pred>
+inline Iter
+my_find(Iter first, Iter last, Pred pred)
+{
+#pragma GCC unroll 4
+    while (first != last && !pred(*first))
+        ++first;
+    return first;
+}
+
+__attribute__((noipa))
+short *use_find(short *p)
+{
+    auto pred = [](short x) { return x == 42; };
+    return my_find(p, p + 1024, pred);
+}
+
+int main(void)
+{
+  short a[1024];
+  for (int i = 0; i < 1024; i++)
+    a[i] = rand ();
+
+  return use_find (a) - a;
+}
+
+// { dg-final { scan-ltrans-rtl-dump-times "Unrolled loop 3 times" 1 "loop2_unroll" } }