From patchwork Tue Jan 12 14:51:22 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Kumar, Venkataramanan" X-Patchwork-Id: 566585 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id AA7511402EC for ; Wed, 13 Jan 2016 01:51:35 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=ENPDCykr; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=rUF Vy8/y1rmNx8/+hZYOWlEnrwn2fqenr1RHHIyMVq04ST+1kCwXK44ljoUunu3X0qx 3BFTk9fGaJsu/nsR2j/y5zN5z0WAxKRzCBhC9UFCmH3Gw9PujUez7Z+eVW74zJrh iGUu8X7ub6Oy9/wDh88oZQU7+ei4Onf7H+IaG6FY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; s=default; bh=WZUACy3J0 GmUA5s5Ghb//jlAbwQ=; b=ENPDCykrgUY3xh7SSa8N/LHkl3ZuSSpno2mCBMPU4 QjP2zUXFwGO+v67Dy5zeF+qxfuXnIYnRKAM538s3t4IsJ1qO5ilmSp8DreL27Z6z v9ZAgk9psoQcuShmAQcIu22xhBCSrq0ewafdGdh8U+aW8fqWkpzagXYikRVYXs+8 cE= Received: (qmail 88387 invoked by alias); 12 Jan 2016 14:51:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 88366 invoked by uid 89); 12 Jan 2016 14:51:27 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, MIME_BASE64_BLANKS, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS autolearn=no version=3.3.2 spammy=sk:result_, Venkat, venkat, benchmarking X-HELO: na01-bl2-obe.outbound.protection.outlook.com Received: from mail-bl2on0090.outbound.protection.outlook.com (HELO na01-bl2-obe.outbound.protection.outlook.com) (65.55.169.90) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA256 encrypted) ESMTPS; Tue, 12 Jan 2016 14:51:26 +0000 Received: from CY1PR1201MB1098.namprd12.prod.outlook.com (10.169.19.14) by CY1PR1201MB1097.namprd12.prod.outlook.com (10.169.19.13) with Microsoft SMTP Server (TLS) id 15.1.365.19; Tue, 12 Jan 2016 14:51:22 +0000 Received: from CY1PR1201MB1098.namprd12.prod.outlook.com ([10.169.19.14]) by CY1PR1201MB1098.namprd12.prod.outlook.com ([10.169.19.14]) with mapi id 15.01.0361.006; Tue, 12 Jan 2016 14:51:22 +0000 From: "Kumar, Venkataramanan" To: "gcc-patches@gcc.gnu.org" CC: "Richard Beiner (richard.guenther@gmail.com)" , "Uros Bizjak (ubizjak@gmail.com)" Subject: [RFC] non-unit stride loads for size power of 2. Date: Tue, 12 Jan 2016 14:51:22 +0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Venkataramanan.Kumar@amd.com; x-microsoft-exchange-diagnostics: 1; CY1PR1201MB1097; 5:HjtYViq4f4vTm+z7J11V2wv9xltyYLphyHe5FV0oPg4ur9g9MEbjFfyf1VgPxfC15DpOux6lH9jzaMVpAUnUCv+Ycmz4sKe4iG1VzAco1T286DLWDfYIUwME+IVzFWbGTsM7/14ODkTQyjTooIdIxw==; 24:U0j9vh7L34+ToXE0qQzd9fSt+KnSF+30a4iXtnSuLNfjOcmGje8XR+4ZOijllwEmfK2XZfcXrQXITqGLyoAr7n9UuWty9ofx3u1SscLQsSY=; 20:5VZuBPT+GYV15KJz/SiG+lfNdUfokYRPGOohGCsDRaT+Ui5EOzpVi4g4JwxJoqpcmmPZN8oP9FssVtm73uc/ocpz3/ufo6QbL3I8nHbgCQdeA8qyb1dQFbT41+0YP11aZzIMgpI/4YbfYRCqcgqOEFy6VVSQSALBaxvteAK4uNAkRHvdsYM08yCwGgW4iNFH//eeJcluD/JqqffTSLyRi0eTmoklGfcodiz8OTvOC3XKvhYPmBss60ADbWH0j9oF x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY1PR1201MB1097; x-ms-office365-filtering-correlation-id: 8d3e7496-13e2-4ba0-4df4-08d31b5fd6be x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(2401047)(8121501046)(5005006)(520078)(10201501046)(3002001); SRVR:CY1PR1201MB1097; BCL:0; PCL:0; RULEID:; SRVR:CY1PR1201MB1097; x-forefront-prvs: 081904387B x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(189002)(199003)(54356999)(77096005)(99286002)(81156007)(5002640100001)(11100500001)(189998001)(2501003)(110136002)(50986999)(5003600100002)(40100003)(2900100001)(5008740100001)(4326007)(5001960100002)(86362001)(122556002)(106356001)(87936001)(575784001)(33656002)(2351001)(74316001)(97736004)(2906002)(66066001)(1096002)(102836003)(10400500002)(6116002)(76576001)(3846002)(1220700001)(105586002)(92566002)(586003)(101416001)(5004730100002)(229853001); DIR:OUT; SFP:1101; SCL:1; SRVR:CY1PR1201MB1097; H:CY1PR1201MB1098.namprd12.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: amd.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Jan 2016 14:51:22.5510 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR1201MB1097 X-IsSubscribed: yes Hi The code below it looks like we always call “vect_permute_load_chain” to load non-unit strides of size powers of 2. (---snip---) /* If reassociation width for vector type is 2 or greater target machine can execute 2 or more vector instructions in parallel. Otherwise try to get chain for loads group using vect_shift_permute_load_chain. */ mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt))); if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1 || exact_log2 (size) != -1 || !vect_shift_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain)) vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain); static bool vect_shift_permute_load_chain (vec dr_chain, unsigned int length, gimple *stmt, gimple_stmt_iterator *gsi, vec *result_chain) { …... …... if (exact_log2 (length) != -1 && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 4) ⇐ This is not used. { unsigned int j, log_length = exact_log2 (length); for (i = 0; i < nelt / 2; ++i) sel[i] = i * 2; for (i = 0; i < nelt / 2; ++i) sel[nelt / 2 + i] = i * 2 + 1; (---snip------) Is there any reason to do so? I have not done any benchmarking, but tried simple test cases for -mavx targets with sizes 2, 4 and VF > 4 (short/char types). Looks like using vect_shift_permute_load_chain seems better. Should we change it to something like this ? regards, Venkat. diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index d0e20da..b0f0a02 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -5733,9 +5733,9 @@ vect_transform_grouped_load (gimple *stmt, vec dr_chain, int size, get chain for loads group using vect_shift_permute_load_chain. */ mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt))); if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1 - || exact_log2 (size) != -1 - || !vect_shift_permute_load_chain (dr_chain, size, stmt, - gsi, &result_chain)) + || (!vect_shift_permute_load_chain (dr_chain, size, stmt, + gsi, &result_chain) + && exact_log2 (size) != -1)) vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain); vect_record_grouped_load_vectors (stmt, result_chain); result_chain.release ();