From patchwork Sat Jul 1 13:48:08 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Koenig X-Patchwork-Id: 783072 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3x0FB03Z0Tz9sRY for ; Sat, 1 Jul 2017 23:49:10 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="hZfBLRJP"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=msYmPXTqOr8O0O9bQjn/OzDUHMasxxRGAQACtGPQr1lN3JZUJb V2wLSd2ZpzRJxp7uxXKyp1FJ81OGEko75uDjBPdnkwp+FYujHUaGcag7Cfs90LcH otY80kQCGT9EX3RLkuP8pM1MRT2ADmZSIU1ixahZhvEZjEzY1oXTP8k9Y= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; s= default; bh=ZxP7eUIdE4KeHo1nuwplTiM3awA=; b=hZfBLRJPSNUAMxhEqfOp rb7Oykg5mkXZc7YrZHGdlQs/atW7Ap61SchqZt1RnZTSB0km/hUVckuTtsZB3ock byTZ5B31Y7uYpcFN5K6CUt6BVmwCwgvn+259ZKakDVnV/qbE0cECGdqG8P+f09NL z6GHnKhmc2b85l48ED95NqM= Received: (qmail 15855 invoked by alias); 1 Jul 2017 13:48:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 15687 invoked by uid 89); 1 Jul 2017 13:48:22 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 spammy=H*Ad:U*tkoenig, tkoenig@gcc.gnu.org, tkoeniggccgnuorg, U*tkoenig X-Spam-User: qpsmtpd, 2 recipients X-HELO: cc-smtpout1.netcologne.de Received: from cc-smtpout1.netcologne.de (HELO cc-smtpout1.netcologne.de) (89.1.8.211) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 01 Jul 2017 13:48:19 +0000 Received: from cc-smtpin2.netcologne.de (cc-smtpin2.netcologne.de [89.1.8.202]) by cc-smtpout1.netcologne.de (Postfix) with ESMTP id 695801315E; Sat, 1 Jul 2017 15:48:10 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by cc-smtpin2.netcologne.de (Postfix) with ESMTP id 5A98C11D9C; Sat, 1 Jul 2017 15:48:10 +0200 (CEST) Received: from [78.35.134.246] (helo=cc-smtpin2.netcologne.de) by localhost with ESMTP (eXpurgate 4.1.9) (envelope-from ) id 5957a81a-022c-7f0000012729-7f000001d46f-1 for ; Sat, 01 Jul 2017 15:48:10 +0200 Received: from [192.168.178.20] (xdsl-78-35-134-246.netcologne.de [78.35.134.246]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by cc-smtpin2.netcologne.de (Postfix) with ESMTPSA; Sat, 1 Jul 2017 15:48:08 +0200 (CEST) To: "fortran@gcc.gnu.org" , gcc-patches From: Thomas Koenig Subject: [patch, fortran] Implement blocked eoshift for eoshift0 Message-ID: <7f484ee3-4cbf-4373-8f5f-d30197e1e7c6@netcologne.de> Date: Sat, 1 Jul 2017 15:48:08 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 Hello world, the attached patch implements the blocked algorithm for constant shift for dim > 1 for eoshift0 (which handles the case of constant shift and constant fill value). Speedup, as for cshift, is large. Moving a 500*500*500 array by -3 with eo_bench.f90 (also attached): $ gfortran -O3 eo_bench.f90 && ./a.out dim = 1 t = 0.451796889 dim = 2 t = 0.183514118 dim = 3 t = 0.184015989 $ gfortran-7 -static-libgfortran -O3 eo_bench.f90 && ./a.out dim = 1 t = 0.955736041 dim = 2 t = 1.42228103 dim = 3 t = 3.00043702 Regression-tested. OK for trunk? Regards Thomas 2017-07-01 Thomas Koenig * intrinsics/eoshift0.c: For contiguous arrays, use block algorithm. Use memcpy where possible. 2017-07-01 Thomas Koenig * gfortran/eoshift_3.f90: New test. Index: intrinsics/eoshift0.c =================================================================== --- intrinsics/eoshift0.c (Revision 249632) +++ intrinsics/eoshift0.c (Arbeitskopie) @@ -53,7 +53,8 @@ index_type len; index_type n; index_type arraysize; - + bool do_blocked; + /* The compiler cannot figure out that these are set, initialize them to avoid warnings. */ len = 0; @@ -102,39 +103,94 @@ count[0] = 0; sstride[0] = -1; rstride[0] = -1; + + if (which > 0) + { + /* Test if both ret and array are contiguous. */ + size_t r_ex, a_ex; + r_ex = 1; + a_ex = 1; + do_blocked = true; + dim = GFC_DESCRIPTOR_RANK (array); + for (n = 0; n < dim; n ++) + { + index_type rs, as; + rs = GFC_DESCRIPTOR_STRIDE (ret, n); + if (rs != r_ex) + { + do_blocked = false; + break; + } + as = GFC_DESCRIPTOR_STRIDE (array, n); + if (as != a_ex) + { + do_blocked = false; + break; + } + r_ex *= GFC_DESCRIPTOR_EXTENT (ret, n); + a_ex *= GFC_DESCRIPTOR_EXTENT (array, n); + } + } + else + do_blocked = false; + n = 0; - for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++) + + if (do_blocked) { - if (dim == which) - { - roffset = GFC_DESCRIPTOR_STRIDE_BYTES(ret,dim); - if (roffset == 0) - roffset = size; - soffset = GFC_DESCRIPTOR_STRIDE_BYTES(array,dim); - if (soffset == 0) - soffset = size; - len = GFC_DESCRIPTOR_EXTENT(array,dim); - } - else - { - count[n] = 0; - extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim); - rstride[n] = GFC_DESCRIPTOR_STRIDE_BYTES(ret,dim); - sstride[n] = GFC_DESCRIPTOR_STRIDE_BYTES(array,dim); - n++; - } + /* For contiguous arrays, use the relationship that + + dimension(n1,n2,n3) :: a, b + b = eoshift(a,sh,3) + + can be dealt with as if + + dimension(n1*n2*n3) :: an, bn + bn = eoshift(a,sh*n1*n2,1) + + so a block move can be used for dim>1. */ + len = GFC_DESCRIPTOR_STRIDE(array, which) + * GFC_DESCRIPTOR_EXTENT(array, which); + shift *= GFC_DESCRIPTOR_STRIDE(array, which); + roffset = size; + soffset = size; + for (dim = which + 1; dim < GFC_DESCRIPTOR_RANK (array); dim++) + { + count[n] = 0; + extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim); + rstride[n] = GFC_DESCRIPTOR_STRIDE_BYTES(ret,dim); + sstride[n] = GFC_DESCRIPTOR_STRIDE_BYTES(array,dim); + n++; + } + count[n] = 0; + dim = GFC_DESCRIPTOR_RANK (array) - which; } - if (sstride[0] == 0) - sstride[0] = size; - if (rstride[0] == 0) - rstride[0] = size; + else + { + for (dim = 0; dim < GFC_DESCRIPTOR_RANK (array); dim++) + { + if (dim == which) + { + roffset = GFC_DESCRIPTOR_STRIDE_BYTES(ret,dim); + if (roffset == 0) + roffset = size; + soffset = GFC_DESCRIPTOR_STRIDE_BYTES(array,dim); + if (soffset == 0) + soffset = size; + len = GFC_DESCRIPTOR_EXTENT(array,dim); + } + else + { + count[n] = 0; + extent[n] = GFC_DESCRIPTOR_EXTENT(array,dim); + rstride[n] = GFC_DESCRIPTOR_STRIDE_BYTES(ret,dim); + sstride[n] = GFC_DESCRIPTOR_STRIDE_BYTES(array,dim); + n++; + } + } + dim = GFC_DESCRIPTOR_RANK (array); + } - dim = GFC_DESCRIPTOR_RANK (array); - rstride0 = rstride[0]; - sstride0 = sstride[0]; - rptr = ret->base_addr; - sptr = array->base_addr; - if ((shift >= 0 ? shift : -shift) > len) { shift = len; @@ -148,6 +204,11 @@ len = len + shift; } + rstride0 = rstride[0]; + sstride0 = sstride[0]; + rptr = ret->base_addr; + sptr = array->base_addr; + while (rptr) { /* Do the shift for this dimension. */ @@ -161,12 +222,23 @@ src = sptr; dest = &rptr[-shift * roffset]; } - for (n = 0; n < len; n++) - { - memcpy (dest, src, size); - dest += roffset; - src += soffset; - } + /* If the elements are contiguous, perform a single block move. */ + + if (soffset == size && roffset == size) + { + size_t chunk = size * len; + memcpy (dest, src, chunk); + dest += chunk; + } + else + { + for (n = 0; n < len; n++) + { + memcpy (dest, src, size); + dest += roffset; + src += soffset; + } + } if (shift >= 0) { n = shift;