From patchwork Wed May 17 21:41:44 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Thomas Koenig <tkoenig@netcologne.de>
X-Patchwork-Id: 763775
Return-Path: 
 <gcc-patches-return-453916-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3wSnpT57vqz9s4q
	for <incoming@patchwork.ozlabs.org>;
	Thu, 18 May 2017 07:42:08 +1000 (AEST)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="iEj+Vdo6"; dkim-atps=neutral
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to
	:from:subject:message-id:date:mime-version:content-type; q=dns;
	s=default; b=dD66KaAIeeY4kBwRL7sHt3GU4zpgZ0meiLBO0/zqkysZm9rmNJ
	GOLJUvhd3VdLjku3BGhJXDEfkWyfEecasOVfSVNT0tHbMDPCpzLlXnuWLJbFNJH4
	4tSYCXmE61eWjd4zEep1I+64zsd03Ab+OeNr18ONJqLvfFrCsuBWPQnAs=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to
	:from:subject:message-id:date:mime-version:content-type; s=
	default; bh=f1RUgBuzqrQktm9b5bivaH3tgdY=; b=iEj+Vdo6Q8s55/8i5Amu
	kiMD2YZ/fxgwO+8kYlb1fYkQsORotXmX6M66m4mtdxdDr60aWDZ6ikZ3TLbpTVBg
	PgTEBq/y125Jkbv6gwb9K3i9+JXuSSoiithgQsKjqp9sx80wgw1NunW/yjQfxyiZ
	emfcT7YHMIb74GJWny0RElI=
Received: (qmail 53720 invoked by alias); 17 May 2017 21:41:50 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 53697 invoked by uid 89); 17 May 2017 21:41:49 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-11.7 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD,
	SPF_PASS autolearn=ham version=3.3.2 spammy=kj, rank, 223,
	sk:fronten
X-Spam-User: qpsmtpd, 2 recipients
X-HELO: cc-smtpout1.netcologne.de
Received: from cc-smtpout1.netcologne.de (HELO cc-smtpout1.netcologne.de)
	(89.1.8.211) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Wed, 17 May 2017 21:41:47 +0000
Received: from cc-smtpin1.netcologne.de (cc-smtpin1.netcologne.de
	[89.1.8.201])	by cc-smtpout1.netcologne.de (Postfix) with
	ESMTP id 0677713175; Wed, 17 May 2017 23:41:47 +0200 (CEST)
Received: from localhost (localhost [127.0.0.1])	by cc-smtpin1.netcologne.de
	(Postfix) with ESMTP id EDAB211D9A;
	Wed, 17 May 2017 23:41:46 +0200 (CEST)
Received: from [78.35.135.112] (helo=cc-smtpin1.netcologne.de)	by localhost
	with ESMTP (eXpurgate 4.1.9)	(envelope-from
	<tkoenig@netcologne.de>)	id
	591cc39a-021e-7f0000012729-7f0000019b8c-1	for
	<multiple-recipients>; Wed, 17 May 2017 23:41:46 +0200
Received: from [192.168.178.20] (xdsl-78-35-135-112.netcologne.de
	[78.35.135.112])	(using TLSv1.2 with cipher
	ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))	(No client
	certificate requested)	by cc-smtpin1.netcologne.de (Postfix)
	with ESMTPSA; Wed, 17 May 2017 23:41:45 +0200 (CEST)
To: "fortran@gcc.gnu.org" <fortran@gcc.gnu.org>,
	gcc-patches <gcc-patches@gcc.gnu.org>
From: Thomas Koenig <tkoenig@netcologne.de>
Subject: [patch, fortran] Handle MATMUL(TRANSPOSE(A),B) in inline matmul
Message-ID: <619ade24-e4ce-82b0-9214-533feb74941c@netcologne.de>
Date: Wed, 17 May 2017 23:41:44 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:52.0) Gecko/20100101 Thunderbird/52.1.0
MIME-Version: 1.0

Hello world,

after receiving no negative feedback on my RFC patch, I have deciced
to submit the patch.

The attached patch handles MATMUL(TRANSPOSE(A),B) in inlining matmul.
Speed is a bit faster than the library version.

Regression-tested.  OK for trunk?

Regards

	Thomas

2017-05-17  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/66094
         * frontend-passes.c (matrix_case):  Add A2TB2.
         (inline_limit_check):  Handle MATMUL(TRANSPOSE(A),B)
         (inline_matmul_assign):  Likewise.

2017-05-17  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/66094
         * gfortran.dg/inline_matmul_16.f90:  New test.

Index: frontend-passes.c
===================================================================
--- frontend-passes.c	(Revision 247809)
+++ frontend-passes.c	(Arbeitskopie)
@@ -112,7 +112,7 @@ static int var_num = 1;
 
 /* What sort of matrix we are dealing with when inlining MATMUL.  */
 
-enum matrix_case { none=0, A2B2, A2B1, A1B2, A2B2T };
+enum matrix_case { none=0, A2B2, A2B1, A1B2, A2B2T, A2TB2 };
 
 /* Keep track of the number of expressions we have inserted so far
    using create_var.  */
@@ -2252,7 +2252,7 @@ inline_limit_check (gfc_expr *a, gfc_expr *b, enum
   gfc_typespec ts;
   gfc_expr *cond;
 
-  gcc_assert (m_case == A2B2 || m_case == A2B2T);
+  gcc_assert (m_case == A2B2 || m_case == A2B2T || m_case == A2TB2);
 
   /* Calculation is done in real to avoid integer overflow.  */
 
@@ -2425,6 +2425,20 @@ matmul_lhs_realloc (gfc_expr *c, gfc_expr *a, gfc_
       cond = build_logical_expr (INTRINSIC_OR, ne1, ne2);
       break;
 
+    case A2TB2:
+
+      ar->start[0] = get_array_inq_function (GFC_ISYM_SIZE, a, 2);
+      ar->start[1] = get_array_inq_function (GFC_ISYM_SIZE, b, 2);
+
+      ne1 = build_logical_expr (INTRINSIC_NE,
+				get_array_inq_function (GFC_ISYM_SIZE, c, 1),
+				get_array_inq_function (GFC_ISYM_SIZE, a, 2));
+      ne2 = build_logical_expr (INTRINSIC_NE,
+				get_array_inq_function (GFC_ISYM_SIZE, c, 2),
+				get_array_inq_function (GFC_ISYM_SIZE, b, 2));
+      cond = build_logical_expr (INTRINSIC_OR, ne1, ne2);
+      break;
+
     case A2B1:
       ar->start[0] = get_array_inq_function (GFC_ISYM_SIZE, a, 1);
       cond = build_logical_expr (INTRINSIC_NE,
@@ -3009,7 +3023,7 @@ inline_matmul_assign (gfc_code **c, int *walk_subt
 
   a = expr2->value.function.actual;
   matrix_a = check_conjg_transpose_variable (a->expr, &conjg_a, &transpose_a);
-  if (transpose_a || matrix_a == NULL)
+  if (matrix_a == NULL)
     return 0;
 
   b = a->next;
@@ -3026,27 +3040,36 @@ inline_matmul_assign (gfc_code **c, int *walk_subt
       || gfc_check_dependency (expr1, matrix_b, true))
     return 0;
 
+  m_case = none;
   if (matrix_a->rank == 2)
     {
-      if (matrix_b->rank == 1)
-	m_case = A2B1;
+      if (transpose_a)
+	{
+	  if (matrix_b->rank == 2 && !transpose_b)
+	    m_case = A2TB2;
+	}
       else
 	{
-	  if (transpose_b)
-	    m_case = A2B2T;
-	  else
-	    m_case = A2B2;
+	  if (matrix_b->rank == 1)
+	    m_case = A2B1;
+	  else /* matrix_b->rank == 2 */
+	    {
+	      if (transpose_b)
+		m_case = A2B2T;
+	      else
+		m_case = A2B2;
+	    }
 	}
     }
-  else
+  else /* matrix_a->rank == 1 */
     {
-      /* Vector * Transpose(B) not handled yet.  */
-      if (transpose_b)
-	m_case = none;
-      else
-	m_case = A1B2;
+      if (matrix_b->rank == 2)
+	{
+	  if (!transpose_b)
+	    m_case = A1B2;
+	}
     }
-
+    
   if (m_case == none)
     return 0;
 
@@ -3250,6 +3273,37 @@ inline_matmul_assign (gfc_code **c, int *walk_subt
 	  next_code_point = &test->next;
 
 	}
+
+      if (m_case == A2TB2)
+	{
+	  c1 = get_array_inq_function (GFC_ISYM_SIZE, expr1, 1);
+	  a2 = get_array_inq_function (GFC_ISYM_SIZE, matrix_a, 2);
+
+	  test = runtime_error_ne (c1, a2, "Incorrect extent in return array in "
+				   "MATMUL intrinsic for dimension 1: "
+				   "is %ld, should be %ld");
+
+	  *next_code_point = test;
+	  next_code_point = &test->next;
+
+	  c2 = get_array_inq_function (GFC_ISYM_SIZE, expr1, 2);
+	  b2 = get_array_inq_function (GFC_ISYM_SIZE, matrix_b, 2);
+	  test = runtime_error_ne (c2, b2, "Incorrect extent in return array in "
+				   "MATMUL intrinsic for dimension 2: "
+				   "is %ld, should be %ld");
+	  *next_code_point = test;
+	  next_code_point = &test->next;
+
+	  a1 = get_array_inq_function (GFC_ISYM_SIZE, matrix_a, 1);
+	  b1 = get_array_inq_function (GFC_ISYM_SIZE, matrix_b, 1);
+
+	  test = runtime_error_ne (b1, a1, "Incorrect extent in argument B in "
+				   "MATMUL intrnisic for dimension 2: "
+				   "is %ld, should be %ld");
+	  *next_code_point = test;
+	  next_code_point = &test->next;
+
+	}
     }
 
   *next_code_point = assign_zero;
@@ -3331,6 +3385,39 @@ inline_matmul_assign (gfc_code **c, int *walk_subt
 
       break;
 
+    case A2TB2:
+      inline_limit_check (matrix_a, matrix_b, m_case);
+
+      u1 = get_size_m1 (matrix_a, 2);
+      u2 = get_size_m1 (matrix_b, 2);
+      u3 = get_size_m1 (matrix_a, 1);
+
+      do_1 = create_do_loop (gfc_copy_expr (zero), u1, NULL, &co->loc, ns);
+      do_2 = create_do_loop (gfc_copy_expr (zero), u2, NULL, &co->loc, ns);
+      do_3 = create_do_loop (gfc_copy_expr (zero), u3, NULL, &co->loc, ns);
+
+      do_1->block->next = do_2;
+      do_2->block->next = do_3;
+      do_3->block->next = assign_matmul;
+
+      var_1 = do_1->ext.iterator->var;
+      var_2 = do_2->ext.iterator->var;
+      var_3 = do_3->ext.iterator->var;
+
+      list[0] = var_1;
+      list[1] = var_2;
+      cscalar = scalarized_expr (co->expr1, list, 2);
+
+      list[0] = var_3;
+      list[1] = var_1;
+      ascalar = scalarized_expr (matrix_a, list, 2);
+
+      list[0] = var_3;
+      list[1] = var_2;
+      bscalar = scalarized_expr (matrix_b, list, 2);
+
+      break;
+
     case A2B1:
       u1 = get_size_m1 (matrix_b, 1);
       u2 = get_size_m1 (matrix_a, 1);