From patchwork Tue Oct 29 12:04:57 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Greenhalgh X-Patchwork-Id: 286816 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 175642C033B for ; Tue, 29 Oct 2013 23:05:17 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=Qyty60yrGq/29P8fN3dk7g2jRvVDG5F1cqi303BzXFCWZ1ovlM /o23LLM2oT08Alu3GXmUXNt/nEDBg7ZA3Jkom7mloWiUCv0z3eb55M0iPoAe1F8N obG9b+ei4B4brYn9ruRRnrK8Oped1eCgvyfoMcXJVh64WEfoV3Jlu5JZw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=HonY8LS/Grq+P2RpCcOXWXDvkwA=; b=ZTVpYRsIMFRh+CSqjlMu 1VtP5H5CQHlKEGPX5Z31vawNOD9uYmt38MyBGixazBqCBpzUz+EF4PlXgJgrBwP3 2/TRygxUn/CeJVs8QQiivdOOJsCyhh0KsB9Lrz83BiwpcyWnNnRCTZNuXUtRUw2C 18gr0v88+M3NlteDHTf97Mg= Received: (qmail 2321 invoked by alias); 29 Oct 2013 12:05:10 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 2298 invoked by uid 89); 29 Oct 2013 12:05:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 29 Oct 2013 12:05:09 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Tue, 29 Oct 2013 12:05:06 +0000 Received: from e106375-lin.cambridge.arm.com ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Tue, 29 Oct 2013 12:05:04 +0000 From: James Greenhalgh To: gcc-patches@gcc.gnu.org Cc: marcus.shawcroft@arm.com Subject: [AArch64] Fix size of memory store for the vst_lane intrinsics Date: Tue, 29 Oct 2013 12:04:57 +0000 Message-Id: <1383048297-16706-1-git-send-email-james.greenhalgh@arm.com> MIME-Version: 1.0 X-MC-Unique: 113102912050607101 X-IsSubscribed: yes Hi, The vst_lane_ intrinsics should write (sizeof (lane_type) * n) bytes to memory. In their current form, their asm constraints suggest a write size of (sizeof (vector_type) * n). This is anywhere from 1 to 16 times too much data, can cause huge headaches with dead store elimination. This patch better models how much data we will be writing, which in turn lets us eliminate the memory clobber. Together, we avoid the problems with dead store elimination. Tested with aarch64.exp and checked the C++ neon mangling test which often breaks when you do these ugly casts. OK? Thanks, James --- gcc/ 2013-10-29 James Greenhalgh * config/aarch64/arm_neon.h (__ST2_LANE_FUNC): Better model data size. (__ST3_LANE_FUNC): Likewise. (__ST4_LANE_FUNC): Likewise. diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 787ff15..7a63ea1 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -14704,16 +14704,19 @@ __LD4_LANE_FUNC (uint64x2x4_t, uint64_t, 2d, d, u64, q) #define __ST2_LANE_FUNC(intype, ptrtype, regsuffix, \ lnsuffix, funcsuffix, Q) \ + typedef struct { ptrtype __x[2]; } __ST2_LANE_STRUCTURE_##intype; \ __extension__ static __inline void \ __attribute__ ((__always_inline__)) \ - vst2 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr, \ + vst2 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr, \ intype b, const int c) \ { \ + __ST2_LANE_STRUCTURE_##intype *__p = \ + (__ST2_LANE_STRUCTURE_##intype *)ptr; \ __asm__ ("ld1 {v16." #regsuffix ", v17." #regsuffix "}, %1\n\t" \ "st2 {v16." #lnsuffix ", v17." #lnsuffix "}[%2], %0\n\t" \ - : "=Q"(*(intype *) ptr) \ + : "=Q"(*__p) \ : "Q"(b), "i"(c) \ - : "memory", "v16", "v17"); \ + : "v16", "v17"); \ } __ST2_LANE_FUNC (int8x8x2_t, int8_t, 8b, b, s8,) @@ -14743,16 +14746,19 @@ __ST2_LANE_FUNC (uint64x2x2_t, uint64_t, 2d, d, u64, q) #define __ST3_LANE_FUNC(intype, ptrtype, regsuffix, \ lnsuffix, funcsuffix, Q) \ + typedef struct { ptrtype __x[3]; } __ST3_LANE_STRUCTURE_##intype; \ __extension__ static __inline void \ __attribute__ ((__always_inline__)) \ - vst3 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr, \ + vst3 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr, \ intype b, const int c) \ { \ + __ST3_LANE_STRUCTURE_##intype *__p = \ + (__ST3_LANE_STRUCTURE_##intype *)ptr; \ __asm__ ("ld1 {v16." #regsuffix " - v18." #regsuffix "}, %1\n\t" \ "st3 {v16." #lnsuffix " - v18." #lnsuffix "}[%2], %0\n\t" \ - : "=Q"(*(intype *) ptr) \ + : "=Q"(*__p) \ : "Q"(b), "i"(c) \ - : "memory", "v16", "v17", "v18"); \ + : "v16", "v17", "v18"); \ } __ST3_LANE_FUNC (int8x8x3_t, int8_t, 8b, b, s8,) @@ -14782,16 +14788,19 @@ __ST3_LANE_FUNC (uint64x2x3_t, uint64_t, 2d, d, u64, q) #define __ST4_LANE_FUNC(intype, ptrtype, regsuffix, \ lnsuffix, funcsuffix, Q) \ + typedef struct { ptrtype __x[4]; } __ST4_LANE_STRUCTURE_##intype; \ __extension__ static __inline void \ __attribute__ ((__always_inline__)) \ - vst4 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr, \ + vst4 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr, \ intype b, const int c) \ { \ + __ST4_LANE_STRUCTURE_##intype *__p = \ + (__ST4_LANE_STRUCTURE_##intype *)ptr; \ __asm__ ("ld1 {v16." #regsuffix " - v19." #regsuffix "}, %1\n\t" \ "st4 {v16." #lnsuffix " - v19." #lnsuffix "}[%2], %0\n\t" \ - : "=Q"(*(intype *) ptr) \ + : "=Q"(*__p) \ : "Q"(b), "i"(c) \ - : "memory", "v16", "v17", "v18", "v19"); \ + : "v16", "v17", "v18", "v19"); \ } __ST4_LANE_FUNC (int8x8x4_t, int8_t, 8b, b, s8,)