Fix PR50969 - Patchwork

Message ID	1328275454.2799.13.camel@gnopaine
State	New
Headers	show Return-Path: <gcc-patches-return-312572-incoming=patchwork.ozlabs.org@gcc.gnu.org> Comment: DKIM? See http://www.dkim.org Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Received:Received:Message-ID:Subject:From:To:Cc:Date:Content-Type:Content-Transfer-Encoding:Mime-Version:X-Content-Scanned:x-cbid:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=rAuCvz7JesujqrcugzMjzHXClU8oSd+dhZ2rNTNanRBe8LZbznvULdJBG63y/6 5o82Jru6NBLK4AyhpRyC3xMlATgchE639k5MR3OErb8M1vTTSPpAu+nIisOLuyUy 7iRR0GN3wJQ5t6BmsvdxmDc0CA5GeRSCM0E5j+9zbq4tE=; Gateway: Authorized Use Only! Violators will be prosecuted for <gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>; Fri, 3 Feb 2012 08:24:17 -0500 Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 3 Feb 2012 08:24:16 -0500 Message-ID: <1328275454.2799.13.camel@gnopaine> Subject: [PATCH] Fix PR50969 From: "William J. Schmidt" <wschmidt@linux.vnet.ibm.com> To: gcc-patches@gcc.gnu.org Cc: bergner@vnet.ibm.com Date: Fri, 03 Feb 2012 07:24:14 -0600 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org

Message ID

1328275454.2799.13.camel@gnopaine

State

New

Headers

Comment: DKIM? See http://www.dkim.org
Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org;
	h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Received:Received:Received:Message-ID:Subject:From:To:Cc:Date:Content-Type:Content-Transfer-Encoding:Mime-Version:X-Content-Scanned:x-cbid:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To;
	b=rAuCvz7JesujqrcugzMjzHXClU8oSd+dhZ2rNTNanRBe8LZbznvULdJBG63y/6
	5o82Jru6NBLK4AyhpRyC3xMlATgchE639k5MR3OErb8M1vTTSPpAu+nIisOLuyUy
	7iRR0GN3wJQ5t6BmsvdxmDc0CA5GeRSCM0E5j+9zbq4tE=;
Message-ID: <1328275454.2799.13.camel@gnopaine>
Subject: [PATCH] Fix PR50969
From: "William J. Schmidt" <wschmidt@linux.vnet.ibm.com>
To: gcc-patches@gcc.gnu.org
Cc: bergner@vnet.ibm.com
Date: Fri, 03 Feb 2012 07:24:14 -0600
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org

Commit Message

Bill Schmidt Feb. 3, 2012, 1:24 p.m. UTC

This fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50969 by slightly
raising the cost of vector permutes on powerpc64 VSX targets (and
ensuring those costs are correctly used).  This reverses the performance
loss for 168.wupwise, and gives a slight boost to 433.milc as well.

In the long run, we will want to model VSX permutes differently, since
the real issue is that only one floating-point pipe can hold a permute
at a time.  Thus the present patch can be overly conservative when
permutes are rare compared with other vector instructions.

Bootstrapped and regtested on powerpc64-linux-gnu with no failures.  OK
for trunk?

Thanks,
Bill


2012-02-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/50969
	* tree-vect-stmts.c (vect_model_store_cost): Correct statement cost to
	use vec_perm rather than vector_stmt.
	(vect_model_load_cost): Likewise.
	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Revise
	cost of vec_perm for TARGET_VSX.

Comments

Richard Biener Feb. 3, 2012, 1:41 p.m. UTC | #1

On Fri, Feb 3, 2012 at 2:24 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> This fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50969 by slightly
> raising the cost of vector permutes on powerpc64 VSX targets (and
> ensuring those costs are correctly used).  This reverses the performance
> loss for 168.wupwise, and gives a slight boost to 433.milc as well.
>
> In the long run, we will want to model VSX permutes differently, since
> the real issue is that only one floating-point pipe can hold a permute
> at a time.  Thus the present patch can be overly conservative when
> permutes are rare compared with other vector instructions.
>
> Bootstrapped and regtested on powerpc64-linux-gnu with no failures.  OK
> for trunk?

Note this makes permutes artificially cheap for AMD K8, K10 and
Bulldozer.  Can you change config/i386/i386.c:ix86_builtin_vectorization_cost
to return ix86_cost->vec_stmt_cost instead of one for vec_perm?
The cost is otherwise only queried by SLP vectorization it seems.

Otherwise this looks ok.  Please give other maintainers a chance to
chime in (other cost hooks might need similar adjustments).

Thanks,
Richard.

> Thanks,
> Bill
>
>
> 2012-02-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>        PR tree-optimization/50969
>        * tree-vect-stmts.c (vect_model_store_cost): Correct statement cost to
>        use vec_perm rather than vector_stmt.
>        (vect_model_load_cost): Likewise.
>        * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Revise
>        cost of vec_perm for TARGET_VSX.
>
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       (revision 183871)
> +++ gcc/tree-vect-stmts.c       (working copy)
> @@ -882,7 +882,7 @@ vect_model_store_cost (stmt_vec_info stmt_info, in
>     {
>       /* Uses a high and low interleave operation for each needed permute.  */
>       inside_cost = ncopies * exact_log2(group_size) * group_size
> -        * vect_get_stmt_cost (vector_stmt);
> +        * vect_get_stmt_cost (vec_perm);
>
>       if (vect_print_dump_info (REPORT_COST))
>         fprintf (vect_dump, "vect_model_store_cost: strided group_size = %d .",
> @@ -988,7 +988,7 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
>     {
>       /* Uses an even and odd extract operations for each needed permute.  */
>       inside_cost = ncopies * exact_log2(group_size) * group_size
> -       * vect_get_stmt_cost (vector_stmt);
> +       * vect_get_stmt_cost (vec_perm);
>
>       if (vect_print_dump_info (REPORT_COST))
>         fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .",
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  (revision 183871)
> +++ gcc/config/rs6000/rs6000.c  (working copy)
> @@ -3540,9 +3540,13 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
>       case vec_to_scalar:
>       case scalar_to_vec:
>       case cond_branch_not_taken:
> -      case vec_perm:
>         return 1;
>
> +      case vec_perm:
> +       if (!TARGET_VSX)
> +         return 1;
> +       return 2;
> +
>       case cond_branch_taken:
>         return 3;
>
>
>

Bill Schmidt Feb. 3, 2012, 4:32 p.m. UTC | #2

On Fri, 2012-02-03 at 14:41 +0100, Richard Guenther wrote:
> On Fri, Feb 3, 2012 at 2:24 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > This fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50969 by slightly
> > raising the cost of vector permutes on powerpc64 VSX targets (and
> > ensuring those costs are correctly used).  This reverses the performance
> > loss for 168.wupwise, and gives a slight boost to 433.milc as well.
> >
> > In the long run, we will want to model VSX permutes differently, since
> > the real issue is that only one floating-point pipe can hold a permute
> > at a time.  Thus the present patch can be overly conservative when
> > permutes are rare compared with other vector instructions.
> >
> > Bootstrapped and regtested on powerpc64-linux-gnu with no failures.  OK
> > for trunk?
> 
> Note this makes permutes artificially cheap for AMD K8, K10 and
> Bulldozer.  Can you change config/i386/i386.c:ix86_builtin_vectorization_cost
> to return ix86_cost->vec_stmt_cost instead of one for vec_perm?
> The cost is otherwise only queried by SLP vectorization it seems.

Sure, will do.

> 
> Otherwise this looks ok.  Please give other maintainers a chance to
> chime in (other cost hooks might need similar adjustments).

I'll give this until at least late Monday before committing.  Thanks for
the quick response!

Bill
 
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Bill
> >
> >
> > 2012-02-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> >
> >        PR tree-optimization/50969
> >        * tree-vect-stmts.c (vect_model_store_cost): Correct statement cost to
> >        use vec_perm rather than vector_stmt.
> >        (vect_model_load_cost): Likewise.
> >        * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Revise
> >        cost of vec_perm for TARGET_VSX.
> >
> >
> > Index: gcc/tree-vect-stmts.c
> > ===================================================================
> > --- gcc/tree-vect-stmts.c       (revision 183871)
> > +++ gcc/tree-vect-stmts.c       (working copy)
> > @@ -882,7 +882,7 @@ vect_model_store_cost (stmt_vec_info stmt_info, in
> >     {
> >       /* Uses a high and low interleave operation for each needed permute.  */
> >       inside_cost = ncopies * exact_log2(group_size) * group_size
> > -        * vect_get_stmt_cost (vector_stmt);
> > +        * vect_get_stmt_cost (vec_perm);
> >
> >       if (vect_print_dump_info (REPORT_COST))
> >         fprintf (vect_dump, "vect_model_store_cost: strided group_size = %d .",
> > @@ -988,7 +988,7 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
> >     {
> >       /* Uses an even and odd extract operations for each needed permute.  */
> >       inside_cost = ncopies * exact_log2(group_size) * group_size
> > -       * vect_get_stmt_cost (vector_stmt);
> > +       * vect_get_stmt_cost (vec_perm);
> >
> >       if (vect_print_dump_info (REPORT_COST))
> >         fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .",
> > Index: gcc/config/rs6000/rs6000.c
> > ===================================================================
> > --- gcc/config/rs6000/rs6000.c  (revision 183871)
> > +++ gcc/config/rs6000/rs6000.c  (working copy)
> > @@ -3540,9 +3540,13 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
> >       case vec_to_scalar:
> >       case scalar_to_vec:
> >       case cond_branch_not_taken:
> > -      case vec_perm:
> >         return 1;
> >
> > +      case vec_perm:
> > +       if (!TARGET_VSX)
> > +         return 1;
> > +       return 2;
> > +
> >       case cond_branch_taken:
> >         return 3;
> >
> >
> >
>

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	(revision 183871)
+++ gcc/tree-vect-stmts.c	(working copy)
@@ -882,7 +882,7 @@  vect_model_store_cost (stmt_vec_info stmt_info, in
     {
       /* Uses a high and low interleave operation for each needed permute.  */
       inside_cost = ncopies * exact_log2(group_size) * group_size
-        * vect_get_stmt_cost (vector_stmt);
+        * vect_get_stmt_cost (vec_perm);
 
       if (vect_print_dump_info (REPORT_COST))
         fprintf (vect_dump, "vect_model_store_cost: strided group_size = %d .",
@@ -988,7 +988,7 @@  vect_model_load_cost (stmt_vec_info stmt_info, int
     {
       /* Uses an even and odd extract operations for each needed permute.  */
       inside_cost = ncopies * exact_log2(group_size) * group_size
-	* vect_get_stmt_cost (vector_stmt);
+	* vect_get_stmt_cost (vec_perm);
 
       if (vect_print_dump_info (REPORT_COST))
         fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .",
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 183871)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -3540,9 +3540,13 @@  rs6000_builtin_vectorization_cost (enum vect_cost_
       case vec_to_scalar:
       case scalar_to_vec:
       case cond_branch_not_taken:
-      case vec_perm:
         return 1;
 
+      case vec_perm:
+	if (!TARGET_VSX)
+	  return 1;
+	return 2;
+
       case cond_branch_taken:
         return 3;