From patchwork Tue Feb 9 16:40:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Joel Hutton X-Patchwork-Id: 1438481 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=t2Z7Uevx; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DZpY80Nrlz9sS8 for ; Wed, 10 Feb 2021 03:40:40 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DD922385700D; Tue, 9 Feb 2021 16:40:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DD922385700D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1612888837; bh=JS0dT90/uxfmw6uIFAI6T25zsenCB7KJTLVQhoXeB/Y=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=t2Z7UevxvRuGKtbd0G4nEcNLngSXDg/ZYEmYuQpyvQJnn0t17GjErpUF9Ve/PawI5 NZAGbDRF6YPc+1tUOCfCWFz5ebX2D3Bk7iIPRJfcTIrF29Hj+pS8ISgAVA45hM9AdN lr2YuADPhg8Pf4vC3XwsIIXW1tEo+FImNI5P/7lg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60079.outbound.protection.outlook.com [40.107.6.79]) by sourceware.org (Postfix) with ESMTPS id 024793858034 for ; Tue, 9 Feb 2021 16:40:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 024793858034 Received: from DB7PR03CA0098.eurprd03.prod.outlook.com (2603:10a6:10:72::39) by PR3PR08MB5657.eurprd08.prod.outlook.com (2603:10a6:102:87::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3825.17; Tue, 9 Feb 2021 16:40:31 +0000 Received: from DB5EUR03FT050.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:72:cafe::3e) by DB7PR03CA0098.outlook.office365.com (2603:10a6:10:72::39) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3846.25 via Frontend Transport; Tue, 9 Feb 2021 16:40:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT050.mail.protection.outlook.com (10.152.21.128) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3784.11 via Frontend Transport; Tue, 9 Feb 2021 16:40:31 +0000 Received: ("Tessian outbound f362b81824dc:v71"); Tue, 09 Feb 2021 16:40:31 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 0488e35f904594db X-CR-MTA-TID: 64aa7808 Received: from 9f5a29e7b4e7.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id F0C2FD16-30F9-4AE3-A9EA-DDA835BE6E1B.1; Tue, 09 Feb 2021 16:40:21 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9f5a29e7b4e7.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 09 Feb 2021 16:40:21 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=G75xbC0ksers7Sis2oBVXU0tAl1WuYAmKSssAeJyv/lF8GhLX1WZ4vTxxYUHsp/GPAgeBt5QPf68FrrmJE6120O43JkXl6LIz1P0kfW6enyoi007ZyNCx2n/8460N9N5uW9Spv6EqfH3ZXOCAOG7BtqwOlvCdslbsjl0Lr9NKL2Rd+c3/Bs5fC2eRYMH0VVUtklC3ZI5L269m3fVs3mYWzG61kufy3jcnjLg1hg6pLYJfxUXOEtiKtdkPLsHx9SObAJSjZ5udq9wjJme4My0T58S5qGSQ3Wl7H8gdeO0MXDNN1HMk1guwPsVkLrm/eYcuk1b8i/5lmNq+XLW5q6XSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JS0dT90/uxfmw6uIFAI6T25zsenCB7KJTLVQhoXeB/Y=; b=h5owdYjFSNgS2fQYEZtnhG2Dyd1ql3PeftZzftMk21teDl3j1CEphLldeqzCHgRnoSTP1SZUTm+30IILmlLLQRO4fAZsfJfng8vh3Psycx7EN32cbJ79w+HICFom+uR53xEm1S3byh+jPN07VAWwZl3b966/Yoo6ZmWSDeeAJYLYvbSy6lnL8Ub7bjxBkbcdgP18YUJBKhxhvbJG35dwG36xFhtFxzREuqnteuXcjQIAzDqra0pKhmP6f5uTHI3RlQdC1tYONr0vBkIcCkkUYYUwnG4HCSm7CzmcivkibeKjWZ6Bvv4RTy226LSs2WAy4YnMz2XLZfPBdB6p6uLnaQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from AM5PR0802MB2500.eurprd08.prod.outlook.com (2603:10a6:203:a0::12) by AM6PR08MB3877.eurprd08.prod.outlook.com (2603:10a6:20b:88::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3846.25; Tue, 9 Feb 2021 16:40:20 +0000 Received: from AM5PR0802MB2500.eurprd08.prod.outlook.com ([fe80::8da3:f307:f155:73a4]) by AM5PR0802MB2500.eurprd08.prod.outlook.com ([fe80::8da3:f307:f155:73a4%11]) with mapi id 15.20.3825.030; Tue, 9 Feb 2021 16:40:20 +0000 To: GCC Patches Subject: [aarch64][vect] Support V8QI->V8HI WIDEN_ patterns Thread-Topic: [aarch64][vect] Support V8QI->V8HI WIDEN_ patterns Thread-Index: AQHW/wHtvPH//0JyI0mQLl0s01rw7A== Date: Tue, 9 Feb 2021 16:40:20 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: yes X-MS-TNEF-Correlator: Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; x-originating-ip: [217.140.99.251] x-ms-publictraffictype: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 88d45f20-311c-4a50-aeb3-08d8cd196a39 x-ms-traffictypediagnostic: AM6PR08MB3877:|PR3PR08MB5657: x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:10000;OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: IJzn92oACeFX3xi6TJxE5SfyQFYD4eNih4yzd7mkjWUsyJSriBsrDp39IlDvzTEPWLEgAZ/SBpL/eP4W0eYBQW4G6Q4nK5eaWzbz/AWPLcjM8WJ2Wnf5MsJEtN4ph+wAuH9F9aGIjnSqW4ZJXhe9AqOFWOVeAEMQ/QSzH6b0RO28KPk04k3zeA8f3Pt44l/0pfGd5mBBZv2rjF1VadMQKLJuuj4kfKsWlRdXCKbQVcbCnDs7ZBwfEfoF44HNZXLzOiwlB0hTn36Uu01RgUFyDdkeymImvtIrFMgEIMiro9cDkLHYwrmx0FxZRx6z1nk3wIBuC4PDMclTwdL3KbAnio8wLD3sXRJi/xhrwiK32n3E5g841miqlSG5kv5MWvVzftJ4VWvRED7B04bOSOW00POPydtCuo7U97FWXJCaEG0XO37t0s5hTvVfh8FalhFadR2YyMUeJXdQd2I9fZllBu/6FaBUXiWnMZ5U0ltDFrrDnLyQ2F8gfPEa8DumC3YW04srK2RID4dbRYmhpsAhiDloV2x80fR/EE/IrDEv2gtvH4ZtCILyUkjatMzWizho X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM5PR0802MB2500.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39860400002)(396003)(346002)(376002)(366004)(136003)(52536014)(4744005)(5660300002)(7696005)(2906002)(99936003)(54906003)(478600001)(66616009)(66476007)(66556008)(64756008)(66446008)(76116006)(66946007)(8936002)(26005)(33656002)(71200400001)(6916009)(6506007)(86362001)(186003)(4326008)(55016002)(9686003)(316002)(8676002)(32563001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-8859-1?q?YMUZebmKK/jZoLNtlLpuyb+b0?= =?iso-8859-1?q?pwtdg/TgC9ZuWBlfW8D6AfjYo+arcJUOVsusxQZgwQQ2n+wcBQaZ+LooTn/V?= =?iso-8859-1?q?ZiiWUKEjUXcf0Ad/GoGyRTKEN5iw9WmyaLJZZ0WgXYaiMsuW3rK1S9AdMVRp?= =?iso-8859-1?q?nu07Zcu+UgE4rMdfCMU2+4vhqnM1lPpU0JpGJYe+DxpCeyZWPEaVQGbFnws1?= =?iso-8859-1?q?P9s/e9w4eY83gE6jQ+eoGn1WVPK065SGdS2k1Z0SmlCx1O9+hWNIuutQC4+6?= =?iso-8859-1?q?MAyyuUmD7byMCEZv99AVRGJDq+Wf4b+5dBrKQmQxbt7zau00lmZGOpMlvw4C?= =?iso-8859-1?q?moyEJJA5pBo10pREkxg0Jm7SWW0cX26PxSVphbO6LfnFbMZQj+sOEWy68uaJ?= =?iso-8859-1?q?IAkRCYZz2xhy3C+foDPuIVo8nE1ZQzUL1Sn1c0o3l4LaeTEyzJ7PqnFyKEZO?= =?iso-8859-1?q?zoSRSeMTg7gIjnhqF3SU6YmW9DKNDpAKLuZzulW1El5XVz0wJLj8F680Y99Q?= =?iso-8859-1?q?gdRGw10wt/d9k/UY1+8rEe1GzR3RDtMv1VH136rA9S7ZD5MiAaduZZ6bTiLS?= =?iso-8859-1?q?zNSNL2HM9Y7cAHM0hPtBKSsr9zB+Uo1PvcLwjNgZ/2fPTGKEj+UHDQHt5UDa?= =?iso-8859-1?q?dl2FrYNTiDjbNJHnJJ4jWWz2i6mRilP4vORHX0j1ZtzdJPHczyYa0UGA9umc?= =?iso-8859-1?q?OH99B1iQrcyhs5ygYdSlocNlguVpK4CJ+3X/H1pBbdV1y8DIGQEAXh10Lewc?= =?iso-8859-1?q?lIEOIyOMuGgxmmx8t8GDiOh3Jmmi5/XkHFt6un3KZMRjR/aOATydnoAk1Enu?= =?iso-8859-1?q?p5WHfgbLhov6TrtfGYA2KtPAeW61W5eWAd7RbMY2eus9+7amoLA1aV3bg4wK?= =?iso-8859-1?q?X+xRMm/BR9ov0cRinakYDvNc72iQUdpyyDSFUK27V1Y65qWPNIeMRunvzs2j?= =?iso-8859-1?q?4S1rAWAcmoWWPF3AHk8dmg4qW0hSFLlB/ItJborWkw/7HIQZLyoaIrmOzw6d?= =?iso-8859-1?q?eQfKfz/Y5iAyGinv3lghoawZdJ37ME22sVRuLNoRYM65/mBiUuo5FB+KbqGt?= =?iso-8859-1?q?UD/aV+eYCvqhWKTAeYdxWfNcRPe1sbVhB+eVPryj9Gcjjpz7QAoWXhq7nppb?= =?iso-8859-1?q?PCLRKWH/cS0DhGwunjMsSRZoSXf1aJHbkQRmSV3igSZxd8g34g7nvoCoJkS6?= =?iso-8859-1?q?2JbD/LJt4X0wqoKIDM7PgpDv+YmiCg97N9YS//0w3b4fMDr5h/3m+NtzkFMq?= =?iso-8859-1?q?wz9eGcqoMZhGmd29qjIOrM9XchdU92jfO6b2IMITAgPWDd/MNKiw0fLZ3ErN?= =?iso-8859-1?q?a87PwwFBDHXfaMb6jb+eXpUQSNbHEhrXn8trUERecbwJkmF8jGWAVu92kq0?= MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB3877 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT050.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 03de57cd-0b29-41d3-5848-08d8cd1963a0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: FeQXSh3QScAKPNwIor2LQ8P24DfEoregoOtyrqdFKv65EVOLyrmBHpKjHhucMNDSJMrtsesuwhcKUSAx57StOf+dT/iHPFdSEpgSIrREoLiGHxUTHJ1MJkjEygXyFVbehglRBXH6vaSiJqr2MPlao5L9pU9BVfzppQEtg7F99/2vHhGHZnQM2ZHy6w885lw9SOrWTxnAFOpDjqKYd/4PyMT8MMpWQdhHlC6O9xrq6iP4O1fp5XKYbDg+a93B0SkUc6XTah4FBOxj4ixC06OT+GvG55NnVuRc+GDiouTlfOlH194v9AR7W+M0OqTBEQaXcK0NEzDQ8MacRmoeERHQaqB7yXWJzHxzU6vUjlwrCXt4waoScO07yA5hqfxp8e2OhxaR2/AhlMpQh+y6LZeYWMNJk7qLY7eSMiBI6DtNwn9qvXxdMBuowSijnWfwLRASOpj5XFM4yPW5v9zD1P7ctI46rpqZByy4HVh1h8s+B7bkgB5yyszm9bxiebA5wlagdeFmi5ZLrk1ckwGz/tvxRrfbWj7LaqyeFcP5xFRfaMfE6BjcNcqvl9ADmlyQqtPwmTNrGPPuAWKgXiLGbsuZHFZRksQoTP2tmEdez+aTaxFgvicFlOaFAX0kQ4BzHRoTN/kGb9wmmKYd7CssoQLcM5BRqAebQ6ils41Z7oKObyvXDTo9HDCk8BfaC7jVM4Py X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(39860400002)(346002)(136003)(396003)(36840700001)(46966006)(99936003)(81166007)(2906002)(26005)(356005)(55016002)(82740400003)(33656002)(70586007)(316002)(9686003)(82310400003)(54906003)(86362001)(107886003)(66616009)(478600001)(70206006)(8936002)(6916009)(7696005)(336012)(36860700001)(4326008)(47076005)(186003)(8676002)(52536014)(6506007)(5660300002)(235185007)(32563001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Feb 2021 16:40:31.2876 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 88d45f20-311c-4a50-aeb3-08d8cd196a39 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT050.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR3PR08MB5657 X-Spam-Status: No, score=-14.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_LOTSOFHASH, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Joel Hutton via Gcc-patches From: Joel Hutton Reply-To: Joel Hutton Cc: Richard Sandiford , Richard Biener Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi Richards, This patch adds support for the V8QI->V8HI case from widening vect patterns as discussed to target PR98772. Bootstrapped and regression tested on aarch64. [aarch64][vect] Support V8QI->V8HI WIDEN_ patterns In the case where 8 out of every 16 elements are widened using a widening pattern and the next 8 are skipped the patterns are not recognized. This is because they are normally used in a pair, such  as VEC_WIDEN_MINUS_HI/LO, to achieve a v16qi->v16hi conversion for example. This patch adds support for V8HI->V8QI patterns. gcc/ChangeLog:         PR tree-optimisation/98772         * optabs-tree.c (supportable_convert_operation): Add case for V8QI->V8HI         * tree-vect-stmts.c (vect_create_vectorized_promotion_stmts): New function to generate promotion stmts for V8QI->V8HI         (vectorizable_conversion): Add case for V8QI->V8HI gcc/testsuite/ChangeLog:         PR tree-optimisation/98772         * gcc.target/aarch64/pr98772.c: New test. diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c index c94073e3ed98f8c4cab65891f65dedebdb1ec274..b91ce3af6f0d4b3a62110bdb38f68ecc53765cad 100644 --- a/gcc/optabs-tree.c +++ b/gcc/optabs-tree.c @@ -308,6 +308,40 @@ supportable_convert_operation (enum tree_code code, if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2)) return false; + /* The case where a widening operation is not making use of the full width of + of the input vector, but using the full width of the output vector. + Return the non-wided code, which will be used after the inputs are + converted to the wide type. */ + if ((code == WIDEN_MINUS_EXPR + || code == WIDEN_PLUS_EXPR + || code == WIDEN_MULT_EXPR + || code == WIDEN_LSHIFT_EXPR) + && known_eq (TYPE_VECTOR_SUBPARTS (vectype_in), + TYPE_VECTOR_SUBPARTS (vectype_out))) + { + switch (code) + { + case WIDEN_LSHIFT_EXPR: + *code1 = LSHIFT_EXPR; + return true; + break; + case WIDEN_MINUS_EXPR: + *code1 = MINUS_EXPR; + return true; + break; + case WIDEN_PLUS_EXPR: + *code1 = PLUS_EXPR; + return true; + break; + case WIDEN_MULT_EXPR: + *code1 = MULT_EXPR; + return true; + break; + default: + gcc_unreachable (); + } + } + /* First check if we can done conversion directly. */ if ((code == FIX_TRUNC_EXPR && can_fix_p (m1,m2,TYPE_UNSIGNED (vectype_out), &truncp) diff --git a/gcc/testsuite/gcc.target/aarch64/pr98772.c b/gcc/testsuite/gcc.target/aarch64/pr98772.c new file mode 100644 index 0000000000000000000000000000000000000000..35568a9f9d60c44aa01a6afc5f7e6a0935009aaf --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr98772.c @@ -0,0 +1,155 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -save-temps" } */ +#include +#include + +#define DSIZE 16 +#define PIXSIZE 64 + +extern void +wplus (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 ) +{ + for( int y = 0; y < 4; y++ ) + { + for( int x = 0; x < 4; x++ ) + d[x + y*4] = pix1[x] + pix2[x]; + pix1 += 16; + pix2 += 16; + } +} +extern void __attribute__((optimize (0))) +wplus_no_opt (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 ) +{ + for( int y = 0; y < 4; y++ ) + { + for( int x = 0; x < 4; x++ ) + d[x + y*4] = pix1[x] + pix2[x]; + pix1 += 16; + pix2 += 16; + } +} + +extern void +wminus (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 ) +{ + for( int y = 0; y < 4; y++ ) + { + for( int x = 0; x < 4; x++ ) + d[x + y*4] = pix1[x] - pix2[x]; + pix1 += 16; + pix2 += 16; + } +} +extern void __attribute__((optimize (0))) +wminus_no_opt (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 ) +{ + for( int y = 0; y < 4; y++ ) + { + for( int x = 0; x < 4; x++ ) + d[x + y*4] = pix1[x] - pix2[x]; + pix1 += 16; + pix2 += 16; + } +} + +extern void +wmult (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 ) +{ + for( int y = 0; y < 4; y++ ) + { + for( int x = 0; x < 4; x++ ) + d[x + y*4] = pix1[x] * pix2[x]; + pix1 += 16; + pix2 += 16; + } +} +extern void __attribute__((optimize (0))) +wmult_no_opt (uint16_t *d, uint8_t *restrict pix1, uint8_t *restrict pix2 ) +{ + for( int y = 0; y < 4; y++ ) + { + for( int x = 0; x < 4; x++ ) + d[x + y*4] = pix1[x] * pix2[x]; + pix1 += 16; + pix2 += 16; + } +} + +extern void +wlshift (uint16_t *d, uint8_t *restrict pix1) + +{ + for( int y = 0; y < 4; y++ ) + { + for( int x = 0; x < 4; x++ ) + d[x + y*4] = pix1[x] << 8; + pix1 += 16; + } +} +extern void __attribute__((optimize (0))) +wlshift_no_opt (uint16_t *d, uint8_t *restrict pix1) + +{ + for( int y = 0; y < 4; y++ ) + { + for( int x = 0; x < 4; x++ ) + d[x + y*4] = pix1[x] << 8; + pix1 += 16; + } +} + +void __attribute__((optimize (0))) +init_arrays(uint16_t *d_a, uint16_t *d_b, uint8_t *pix1, uint8_t *pix2) +{ + for(int i = 0; i < DSIZE; i++) + { + d_a[i] = (1074 * i)%17; + d_b[i] = (1074 * i)%17; + } + for(int i = 0; i < PIXSIZE; i++) + { + pix1[i] = (1024 * i)%17; + pix2[i] = (1024 * i)%17; + } +} + +/* Don't optimize main so we don't get confused over where the vector + instructions are generated. */ +__attribute__((optimize (0))) +int main() +{ + uint16_t d_a[DSIZE]; + uint16_t d_b[DSIZE]; + uint8_t pix1[PIXSIZE]; + uint8_t pix2[PIXSIZE]; + + init_arrays (d_a, d_b, pix1, pix2); + wplus(d_a, pix1, pix2); + wplus_no_opt(d_b, pix1, pix2); + if (memcmp(d_a,d_b, DSIZE) != 0) + return 1; + + init_arrays (d_a, d_b, pix1, pix2); + wminus(d_a, pix1, pix2); + wminus_no_opt(d_b, pix1, pix2); + if (memcmp(d_a,d_b, DSIZE) != 0) + return 2; + + init_arrays (d_a, d_b, pix1, pix2); + wmult(d_a, pix1, pix2); + wmult_no_opt(d_b, pix1, pix2); + if (memcmp(d_a,d_b, DSIZE) != 0) + return 3; + + init_arrays (d_a, d_b, pix1, pix2); + wlshift(d_a, pix1); + wlshift_no_opt(d_b, pix1); + if (memcmp(d_a,d_b, DSIZE) != 0) + return 4; + +} + +/* { dg-final { scan-assembler-times "uaddl\\tv" 2 } } */ +/* { dg-final { scan-assembler-times "usubl\\tv" 2 } } */ +/* { dg-final { scan-assembler-times "umull\\tv" 2 } } */ +/* { dg-final { scan-assembler-times "shl\\tv" 2 } } */ diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index f180ced312443ba1e698932d5e8362208690b3fc..b34b00f67ea67943dee7023ab9bfd19c1be5ccbe 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -4545,6 +4545,72 @@ vect_create_vectorized_promotion_stmts (vec_info *vinfo, *vec_oprnds0 = vec_tmp; } +/* Create vectorized promotion stmts for widening stmts using only half the + potential vector size for input. */ +static void +vect_create_vectorized_promotion_stmts (vec_info *vinfo, + vec *vec_oprnds0, + vec *vec_oprnds1, + stmt_vec_info stmt_info, tree vec_dest, + gimple_stmt_iterator *gsi, + enum tree_code code1, + int op_type) +{ + int i; + tree vop0, vop1, new_tmp; + gimple *new_stmt1; + gimple *new_stmt2; + gimple *new_stmt3; + vec vec_tmp = vNULL; + + vec_tmp.create (vec_oprnds0->length () * 2); + FOR_EACH_VEC_ELT (*vec_oprnds0, i, vop0) + { + tree new_tmp1, new_tmp2, new_tmp3, out_type; + + gcc_assert (op_type == binary_op); + vop1 = (*vec_oprnds1)[i]; + + /* Widen the first vector input. */ + out_type = TREE_TYPE (vec_dest); + new_tmp1 = make_ssa_name (out_type); + new_stmt1 = gimple_build_assign (new_tmp1, NOP_EXPR, vop0); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt1, gsi); + if (VECTOR_TYPE_P (TREE_TYPE (vop1))) + { + /* Widen the second vector input. */ + new_tmp2 = make_ssa_name (out_type); + new_stmt2 = gimple_build_assign (new_tmp2, NOP_EXPR, vop1); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt2, gsi); + /* Perform the operation. With both vector inputs widened. */ + new_stmt3 = gimple_build_assign (vec_dest, code1, new_tmp1, new_tmp2); + } + else + { + /* Perform the operation. With the single vector input widened. */ + new_stmt3 = gimple_build_assign (vec_dest, code1, new_tmp1, vop1); + } + + new_tmp3 = make_ssa_name (vec_dest, new_stmt3); + gimple_assign_set_lhs (new_stmt3, new_tmp3); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt3, gsi); + if (is_gimple_call (new_stmt3)) + { + new_tmp = gimple_call_lhs (new_stmt3); + } + else + { + new_tmp = gimple_assign_lhs (new_stmt3); + } + + /* Store the results for the next step. */ + vec_tmp.quick_push (new_tmp); + } + + vec_oprnds0->release (); + *vec_oprnds0 = vec_tmp; +} + /* Check if STMT_INFO performs a conversion operation that can be vectorized. If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized @@ -4697,7 +4763,13 @@ vectorizable_conversion (vec_info *vinfo, nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in); nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out); if (known_eq (nunits_out, nunits_in)) - modifier = NONE; + if (code == WIDEN_MINUS_EXPR + || code == WIDEN_PLUS_EXPR + || code == WIDEN_LSHIFT_EXPR + || code == WIDEN_MULT_EXPR) + modifier = WIDEN; + else + modifier = NONE; else if (multiple_p (nunits_out, nunits_in)) modifier = NARROW; else @@ -4743,9 +4815,21 @@ vectorizable_conversion (vec_info *vinfo, return false; case WIDEN: - if (supportable_widening_operation (vinfo, code, stmt_info, vectype_out, - vectype_in, &code1, &code2, - &multi_step_cvt, &interm_types)) + if (known_eq (nunits_out, nunits_in) + && (code == WIDEN_MINUS_EXPR + || code == WIDEN_LSHIFT_EXPR + || code == WIDEN_PLUS_EXPR + || code == WIDEN_MULT_EXPR) + && supportable_convert_operation (code, vectype_out, vectype_in, + &code1)) + { + gcc_assert (!(multi_step_cvt && op_type == binary_op)); + break; + } + else if (supportable_widening_operation (vinfo, code, stmt_info, + vectype_out, vectype_in, &code1, + &code2, &multi_step_cvt, + &interm_types)) { /* Binary widening operation can only be supported directly by the architecture. */ @@ -4981,10 +5065,20 @@ vectorizable_conversion (vec_info *vinfo, c1 = codecvt1; c2 = codecvt2; } - vect_create_vectorized_promotion_stmts (vinfo, &vec_oprnds0, - &vec_oprnds1, stmt_info, - this_dest, gsi, - c1, c2, op_type); + if ((code == WIDEN_MINUS_EXPR + || code == WIDEN_PLUS_EXPR + || code == WIDEN_LSHIFT_EXPR + || code == WIDEN_MULT_EXPR) + && known_eq (nunits_in, nunits_out)) + vect_create_vectorized_promotion_stmts (vinfo, &vec_oprnds0, + &vec_oprnds1, stmt_info, + this_dest, gsi, + c1, op_type); + else + vect_create_vectorized_promotion_stmts (vinfo, &vec_oprnds0, + &vec_oprnds1, stmt_info, + this_dest, gsi, + c1, c2, op_type); } FOR_EACH_VEC_ELT (vec_oprnds0, i, vop0)