tree-optimization/102139 - fix SLP DR base alignment

Message ID	r8po9665-r9r7-3r5-n086-297r819155nr@fhfr.qr
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 258BF3858408 Date: Tue, 31 Aug 2021 11:25:40 +0200 (CEST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/102139 - fix SLP DR base alignment Message-ID: <r8po9665-r9r7-3r5-n086-297r819155nr@fhfr.qr> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: list From: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Richard Biener <rguenther@suse.de> Cc: richard.sandiford@arm.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	tree-optimization/102139 - fix SLP DR base alignment \| expand tree-optimization/102139 - fix SLP DR base alignment

Message ID

r8po9665-r9r7-3r5-n086-297r819155nr@fhfr.qr

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 258BF3858408
Date: Tue, 31 Aug 2021 11:25:40 +0200 (CEST)
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] tree-optimization/102139 - fix SLP DR base alignment
Message-ID: <r8po9665-r9r7-3r5-n086-297r819155nr@fhfr.qr>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Precedence: list
From: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>
Reply-To: Richard Biener <rguenther@suse.de>
Cc: richard.sandiford@arm.com
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>

Series

tree-optimization/102139 - fix SLP DR base alignment | expand

Commit Message

Richard Biener Aug. 31, 2021, 9:25 a.m. UTC

When doing whole-function SLP we have to make sure the recorded
base alignments we compute as the maximum alignment seen for a
base anywhere in the function is actually valid at the point
we want to make use of it.

To make this work we now record the stmt the alignment was derived
from in addition to the DRs innermost behavior and we use a
dominance check to verify the recorded info is valid when doing
BB vectorization.

Note this leaves a small(?) hole for the case where we have sth
like

    unaligned DR
    call (); // does not return
    aligned DR

since we'll derive an aligned access for the earlier DR but the
later DR is never actually reached since the call does not
return.  To plug this hole one option (for the easy backporting)
would be to simply not use the base-alignment recording at all.
Alternatively we'd have to store the dataref grouping 'id' somewhere
in the DR itself and use that to handle this particular case.

For optimal handling we'd need the ability to record different
base alignments based on context, we could hash on the BB and
at query time walk immediate dominators to find the "best"
base alignment.  But I'd rather leave such improvement for trunk.

Any opinions?  The issue looks quite serious and IMHO warrants a
timely fix, even if partial - I'm not sure how often the same-BB
case would trigger.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Thanks,
Richard.

2021-08-31  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/102139
	* tree-vectorizer.h (vec_base_alignments): Adjust hash-map
	type to record a std::pair of the stmt and the innermost
	loop behavior.
	* tree-vect-data-refs.c (vect_record_base_alignment): Adjust.
	(vect_compute_data_ref_alignment): Verify the recorded
	base alignment can be used.
---
 gcc/tree-vect-data-refs.c | 19 ++++++++++++-------
 gcc/tree-vectorizer.h     |  7 ++++---
 2 files changed, 16 insertions(+), 10 deletions(-)

Comments

Richard Biener Aug. 31, 2021, 11:43 a.m. UTC | #1

On Tue, Aug 31, 2021 at 11:26 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> When doing whole-function SLP we have to make sure the recorded
> base alignments we compute as the maximum alignment seen for a
> base anywhere in the function is actually valid at the point
> we want to make use of it.
>
> To make this work we now record the stmt the alignment was derived
> from in addition to the DRs innermost behavior and we use a
> dominance check to verify the recorded info is valid when doing
> BB vectorization.
>
> Note this leaves a small(?) hole for the case where we have sth
> like
>
>     unaligned DR
>     call (); // does not return
>     aligned DR
>
> since we'll derive an aligned access for the earlier DR but the
> later DR is never actually reached since the call does not
> return.  To plug this hole one option (for the easy backporting)
> would be to simply not use the base-alignment recording at all.
> Alternatively we'd have to store the dataref grouping 'id' somewhere
> in the DR itself and use that to handle this particular case.

It turns out this isn't too difficult so the following is a patch adjusted
to cover that case together with a testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2021-08-31  Richard Biener  <rguenther@suse.de>

        PR tree-optimization/102139
        * tree-vectorizer.h (vec_base_alignments): Adjust hash-map
        type to record a std::pair of the stmt-info and the innermost
        loop behavior.
        (dr_vec_info::group): New member.
        * tree-vect-data-refs.c (vect_record_base_alignment): Adjust.
        (vect_compute_data_ref_alignment): Verify the recorded
        base alignment can be used.
        (data_ref_pair): Remove.
        (dr_group_sort_cmp): Adjust.
        (vect_analyze_data_ref_accesses): Store the group-ID in the
        dr_vec_info and operate on a vector of dr_vec_infos.

        * gcc.dg/torture/pr102139.c: New testcase.

Richard Sandiford Sept. 1, 2021, 9:59 a.m. UTC | #2

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Aug 31, 2021 at 11:26 AM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> When doing whole-function SLP we have to make sure the recorded
>> base alignments we compute as the maximum alignment seen for a
>> base anywhere in the function is actually valid at the point
>> we want to make use of it.

Ah, yeah, the danger of optimisations that silently rely on the
then-current restrictions :-(

>> To make this work we now record the stmt the alignment was derived
>> from in addition to the DRs innermost behavior and we use a
>> dominance check to verify the recorded info is valid when doing
>> BB vectorization.
>>
>> Note this leaves a small(?) hole for the case where we have sth
>> like
>>
>>     unaligned DR
>>     call (); // does not return
>>     aligned DR
>>
>> since we'll derive an aligned access for the earlier DR but the
>> later DR is never actually reached since the call does not
>> return.  To plug this hole one option (for the easy backporting)
>> would be to simply not use the base-alignment recording at all.
>> Alternatively we'd have to store the dataref grouping 'id' somewhere
>> in the DR itself and use that to handle this particular case.
>
> It turns out this isn't too difficult so the following is a patch adjusted
> to cover that case together with a testcase.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK?

TBH I know nothing about this group id scheme, so I'm not really
in a position to comment.  But it LGTM from the (few) bits I do understand.

I guess we're leaving the same easter egg for loop optimisation if
we ever support early exits, but I'm not sure what to do about that.

Thanks,
Richard

>
> Thanks,
> Richard.
>
> 2021-08-31  Richard Biener  <rguenther@suse.de>
>
>         PR tree-optimization/102139
>         * tree-vectorizer.h (vec_base_alignments): Adjust hash-map
>         type to record a std::pair of the stmt-info and the innermost
>         loop behavior.
>         (dr_vec_info::group): New member.
>         * tree-vect-data-refs.c (vect_record_base_alignment): Adjust.
>         (vect_compute_data_ref_alignment): Verify the recorded
>         base alignment can be used.
>         (data_ref_pair): Remove.
>         (dr_group_sort_cmp): Adjust.
>         (vect_analyze_data_ref_accesses): Store the group-ID in the
>         dr_vec_info and operate on a vector of dr_vec_infos.
>
>         * gcc.dg/torture/pr102139.c: New testcase.

Richard Biener Sept. 1, 2021, 10:07 a.m. UTC | #3

On Wed, 1 Sep 2021, Richard Sandiford wrote:

> Richard Biener <richard.guenther@gmail.com> writes:
> > On Tue, Aug 31, 2021 at 11:26 AM Richard Biener via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> When doing whole-function SLP we have to make sure the recorded
> >> base alignments we compute as the maximum alignment seen for a
> >> base anywhere in the function is actually valid at the point
> >> we want to make use of it.
> 
> Ah, yeah, the danger of optimisations that silently rely on the
> then-current restrictions :-(

Yeah.

> >> To make this work we now record the stmt the alignment was derived
> >> from in addition to the DRs innermost behavior and we use a
> >> dominance check to verify the recorded info is valid when doing
> >> BB vectorization.
> >>
> >> Note this leaves a small(?) hole for the case where we have sth
> >> like
> >>
> >>     unaligned DR
> >>     call (); // does not return
> >>     aligned DR
> >>
> >> since we'll derive an aligned access for the earlier DR but the
> >> later DR is never actually reached since the call does not
> >> return.  To plug this hole one option (for the easy backporting)
> >> would be to simply not use the base-alignment recording at all.
> >> Alternatively we'd have to store the dataref grouping 'id' somewhere
> >> in the DR itself and use that to handle this particular case.
> >
> > It turns out this isn't too difficult so the following is a patch adjusted
> > to cover that case together with a testcase.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > OK?
> 
> TBH I know nothing about this group id scheme, so I'm not really
> in a position to comment.  But it LGTM from the (few) bits I do understand.
> 
> I guess we're leaving the same easter egg for loop optimisation if
> we ever support early exits, but I'm not sure what to do about that.

We're currently not recording alignment from masked DRs
(aka DR_IS_CONDITIONAL_IN_STMT), I suppose we'd need to mark
all stmts after early exits in this way then since in the end
they will have to be masked on the early exit.

So we _might_ be fine there ...

Pushed.

Thanks,
Richard.

> Thanks,
> Richard
> 
> >
> > Thanks,
> > Richard.
> >
> > 2021-08-31  Richard Biener  <rguenther@suse.de>
> >
> >         PR tree-optimization/102139
> >         * tree-vectorizer.h (vec_base_alignments): Adjust hash-map
> >         type to record a std::pair of the stmt-info and the innermost
> >         loop behavior.
> >         (dr_vec_info::group): New member.
> >         * tree-vect-data-refs.c (vect_record_base_alignment): Adjust.
> >         (vect_compute_data_ref_alignment): Verify the recorded
> >         base alignment can be used.
> >         (data_ref_pair): Remove.
> >         (dr_group_sort_cmp): Adjust.
> >         (vect_analyze_data_ref_accesses): Store the group-ID in the
> >         dr_vec_info and operate on a vector of dr_vec_infos.
> >
> >         * gcc.dg/torture/pr102139.c: New testcase.
>

Richard Sandiford Sept. 1, 2021, 10:26 a.m. UTC | #4

Richard Biener <rguenther@suse.de> writes:
> On Wed, 1 Sep 2021, Richard Sandiford wrote:
>
>> Richard Biener <richard.guenther@gmail.com> writes:
>> > On Tue, Aug 31, 2021 at 11:26 AM Richard Biener via Gcc-patches
>> > <gcc-patches@gcc.gnu.org> wrote:
>> >>
>> >> When doing whole-function SLP we have to make sure the recorded
>> >> base alignments we compute as the maximum alignment seen for a
>> >> base anywhere in the function is actually valid at the point
>> >> we want to make use of it.
>> 
>> Ah, yeah, the danger of optimisations that silently rely on the
>> then-current restrictions :-(
>
> Yeah.
>
>> >> To make this work we now record the stmt the alignment was derived
>> >> from in addition to the DRs innermost behavior and we use a
>> >> dominance check to verify the recorded info is valid when doing
>> >> BB vectorization.
>> >>
>> >> Note this leaves a small(?) hole for the case where we have sth
>> >> like
>> >>
>> >>     unaligned DR
>> >>     call (); // does not return
>> >>     aligned DR
>> >>
>> >> since we'll derive an aligned access for the earlier DR but the
>> >> later DR is never actually reached since the call does not
>> >> return.  To plug this hole one option (for the easy backporting)
>> >> would be to simply not use the base-alignment recording at all.
>> >> Alternatively we'd have to store the dataref grouping 'id' somewhere
>> >> in the DR itself and use that to handle this particular case.
>> >
>> > It turns out this isn't too difficult so the following is a patch adjusted
>> > to cover that case together with a testcase.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> >
>> > OK?
>> 
>> TBH I know nothing about this group id scheme, so I'm not really
>> in a position to comment.  But it LGTM from the (few) bits I do understand.
>> 
>> I guess we're leaving the same easter egg for loop optimisation if
>> we ever support early exits, but I'm not sure what to do about that.
>
> We're currently not recording alignment from masked DRs
> (aka DR_IS_CONDITIONAL_IN_STMT), I suppose we'd need to mark
> all stmts after early exits in this way then since in the end
> they will have to be masked on the early exit.
>
> So we _might_ be fine there ...

Yeah, for a pure SVE-like implementation that's probably true.  But we
also have the option of vectorising an early exit by branching if the
condition is true for *any* element, then handling the remaining
elements with an epilogue.

It would be quite a different approach from what we're doing now,
and wouldn't necessarily require up-front if-conversion.  But the
point is that it's theoretically possible, just as whole-function
SLP was theoretically possible (but seemingly some way off) when
this code was written :-)

Thanks,
Richard

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 37f46d1aaa3..e2549811961 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -895,11 +895,11 @@  vect_record_base_alignment (vec_info *vinfo, stmt_vec_info stmt_info,
 			    innermost_loop_behavior *drb)
 {
   bool existed;
-  innermost_loop_behavior *&entry
+  std::pair<gimple *, innermost_loop_behavior *> &entry
     = vinfo->base_alignments.get_or_insert (drb->base_address, &existed);
-  if (!existed || entry->base_alignment < drb->base_alignment)
+  if (!existed || entry.second->base_alignment < drb->base_alignment)
     {
-      entry = drb;
+      entry = std::make_pair (stmt_info->stmt, drb);
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location,
 			 "recording new base alignment for %T\n"
@@ -1060,11 +1060,16 @@  vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info)
 
   /* Calculate the maximum of the pooled base address alignment and the
      alignment that we can compute for DR itself.  */
-  innermost_loop_behavior **entry = base_alignments->get (drb->base_address);
-  if (entry && base_alignment < (*entry)->base_alignment)
+  std::pair<gimple *, innermost_loop_behavior *> *entry
+    = base_alignments->get (drb->base_address);
+  if (entry
+      && base_alignment < (*entry).second->base_alignment
+      && (loop_vinfo
+	  || dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt_info->stmt),
+			     gimple_bb (entry->first))))
     {
-      base_alignment = (*entry)->base_alignment;
-      base_misalignment = (*entry)->base_misalignment;
+      base_alignment = entry->second->base_alignment;
+      base_misalignment = entry->second->base_misalignment;
     }
 
   if (drb->offset_alignment < vect_align_c
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 72e018e8eac..8db642c7dc3 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -106,10 +106,11 @@  struct stmt_info_for_cost {
 
 typedef vec<stmt_info_for_cost> stmt_vector_for_cost;
 
-/* Maps base addresses to an innermost_loop_behavior that gives the maximum
-   known alignment for that base.  */
+/* Maps base addresses to an innermost_loop_behavior and the stmt it was
+   derived from that gives the maximum known alignment for that base.  */
 typedef hash_map<tree_operand_hash,
-		 innermost_loop_behavior *> vec_base_alignments;
+		 std::pair<gimple *, innermost_loop_behavior *> >
+	  vec_base_alignments;
 
 /************************************************************************
   SLP

tree-optimization/102139 - fix SLP DR base alignment

Commit Message

Comments

Patch