From patchwork Wed Jun 15 09:22:20 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bin Cheng X-Patchwork-Id: 635778 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rV1JR4mj5z9t1G for ; Wed, 15 Jun 2016 19:22:47 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=bJ5NguRL; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=E6vrDOLR2oRhJpNj7iAR4UXgDUf5B0AfLvLaehK1R5iM6c89q4 3xurYgTUzboFCjXrDt9EqNpWTRZYkL7VM+FKkjKQ4ruacB1CPRq5PjBeiYgvYhbi NZGFXnoHM7E7rDRvtMVI0rM+FFO1ptUdKtK3xWbhiLkSUKaGxinAX2rlY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=m0qjH6Fn63mGtgEhN3VrYGfjRm0=; b=bJ5NguRLij6koSApl3Fv b1zuRidaCuG8B7wajhQkYgoxz6aVphutzju6XxoSifmJjTvj6izTiJvaIRu3nNE0 xQyKHOLTpG1P+ofBfCPoglNFYCpkBXZR9Axf+gAVZ9i+m0y2AOQzaxEOTDhb8QNm QAeBP/u5wEZqXQgpMfvfWNE= Received: (qmail 21025 invoked by alias); 15 Jun 2016 09:22:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 21008 invoked by uid 89); 15 Jun 2016 09:22:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00, SPF_PASS autolearn=ham version=3.3.2 spammy=6106, 610, 6, v4sf, V4SF X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 15 Jun 2016 09:22:28 +0000 Received: from emea01-am1-obe.outbound.protection.outlook.com (mail-am1lrp0015.outbound.protection.outlook.com [213.199.154.15]) (Using TLS) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-60-2zg8B_PoSveoNLXQfemM8w-1; Wed, 15 Jun 2016 10:22:22 +0100 Received: from DB5PR08MB1144.eurprd08.prod.outlook.com (10.166.174.137) by DB5PR08MB1142.eurprd08.prod.outlook.com (10.166.174.135) with Microsoft SMTP Server (TLS) id 15.1.523.12; Wed, 15 Jun 2016 09:22:21 +0000 Received: from DB5PR08MB1144.eurprd08.prod.outlook.com ([10.166.174.137]) by DB5PR08MB1144.eurprd08.prod.outlook.com ([10.166.174.137]) with mapi id 15.01.0517.013; Wed, 15 Jun 2016 09:22:21 +0000 From: Bin Cheng To: "gcc-patches@gcc.gnu.org" CC: nd Subject: [Patch AArch64 2/2]Add missing vcond by rewriting it with vcond_mask/vec_cmp patterns. Date: Wed, 15 Jun 2016 09:22:20 +0000 Message-ID: x-ms-office365-filtering-correlation-id: a42aba61-333e-4f0b-cd21-08d394fe8dca x-microsoft-exchange-diagnostics: 1; DB5PR08MB1142; 20:M0Gja8k8oE4nXWMTMOqbb16alQApEAVqI6W1uN+lLrIIdgypg3aL1rTCqOzdebJQR2WaFIuKrVcJdza6bIofCfmDvfz1Ac3fqiJHnIbi2Q7ZXZi2pRViJ0TPdXlgL5ljF6UICMHxFCPKmd6kMrqzzZCdvdhtI4CipQJ65Z5po4A= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB5PR08MB1142; nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(102415321)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6055026); SRVR:DB5PR08MB1142; BCL:0; PCL:0; RULEID:; SRVR:DB5PR08MB1142; x-forefront-prvs: 09749A275C x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(377424004)(199003)(189002)(122556002)(86362001)(4326007)(66066001)(2351001)(106116001)(2906002)(105586002)(586003)(106356001)(102836003)(3846002)(11100500001)(6116002)(9686002)(10400500002)(2501003)(8936002)(19580405001)(5004730100002)(77096005)(5002640100001)(19580395003)(54356999)(74316001)(2900100001)(189998001)(76576001)(87936001)(33656002)(3280700002)(101416001)(5008740100001)(92566002)(68736007)(81166006)(229853001)(81156014)(97736004)(50986999)(3660700001)(99936001)(450100001)(8676002)(5003600100002)(110136002); DIR:OUT; SFP:1101; SCL:1; SRVR:DB5PR08MB1142; H:DB5PR08MB1144.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; CAT:NONE; LANG:en; CAT:NONE; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Jun 2016 09:22:20.9563 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB1142 X-MC-Unique: 2zg8B_PoSveoNLXQfemM8w-1 X-IsSubscribed: yes Hi, This is the second patch. It rewrites vcond patterns using vcond_mask/vec_cmp patterns introduced in the first one. It also implements vcond patterns which were missing in the current AArch64 backend. After this patch, I have a simple follow up change enabling testing requirement "vect_cond_mixed" on AArch64, which will enable various tests. Bootstrap & test along with the first patch on AArch64, is it OK? Thanks, bin 2016-06-07 Alan Lawrence Renlin Li Bin Cheng * config/aarch64/iterators.md (V_cmp_mixed, v_cmp_mixed): New. * config/aarch64/aarch64-simd.md (v2di3): Call gen_vcondv2div2di instead of gen_aarch64_vcond_internalv2div2di. (aarch64_vcond_internal): Delete pattern. (aarch64_vcond_internal): Ditto. (vcond): Re-implement using vec_cmp and vcond_mask. (vcondu): Ditto. (vcond): Delete. (vcond): New pattern. (vcondu): New pattern. (aarch64_cmtst): Revise comment using aarch64_vcond instead of aarch64_vcond_internal. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 6ea35bf..e080b71 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1053,7 +1053,7 @@ } cmp_fmt = gen_rtx_fmt_ee (cmp_operator, V2DImode, operands[1], operands[2]); - emit_insn (gen_aarch64_vcond_internalv2div2di (operands[0], operands[1], + emit_insn (gen_vcondv2div2di (operands[0], operands[1], operands[2], cmp_fmt, operands[1], operands[2])); DONE; }) @@ -2202,314 +2202,6 @@ DONE; }) -(define_expand "aarch64_vcond_internal" - [(set (match_operand:VSDQ_I_DI 0 "register_operand") - (if_then_else:VSDQ_I_DI - (match_operator 3 "comparison_operator" - [(match_operand:VSDQ_I_DI 4 "register_operand") - (match_operand:VSDQ_I_DI 5 "nonmemory_operand")]) - (match_operand:VSDQ_I_DI 1 "nonmemory_operand") - (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))] - "TARGET_SIMD" -{ - rtx op1 = operands[1]; - rtx op2 = operands[2]; - rtx mask = gen_reg_rtx (mode); - enum rtx_code code = GET_CODE (operands[3]); - - /* Switching OP1 and OP2 is necessary for NE (to output a cmeq insn), - and desirable for other comparisons if it results in FOO ? -1 : 0 - (this allows direct use of the comparison result without a bsl). */ - if (code == NE - || (code != EQ - && op1 == CONST0_RTX (mode) - && op2 == CONSTM1_RTX (mode))) - { - op1 = operands[2]; - op2 = operands[1]; - switch (code) - { - case LE: code = GT; break; - case LT: code = GE; break; - case GE: code = LT; break; - case GT: code = LE; break; - /* No case EQ. */ - case NE: code = EQ; break; - case LTU: code = GEU; break; - case LEU: code = GTU; break; - case GTU: code = LEU; break; - case GEU: code = LTU; break; - default: gcc_unreachable (); - } - } - - /* Make sure we can handle the last operand. */ - switch (code) - { - case NE: - /* Normalized to EQ above. */ - gcc_unreachable (); - - case LE: - case LT: - case GE: - case GT: - case EQ: - /* These instructions have a form taking an immediate zero. */ - if (operands[5] == CONST0_RTX (mode)) - break; - /* Fall through, as may need to load into register. */ - default: - if (!REG_P (operands[5])) - operands[5] = force_reg (mode, operands[5]); - break; - } - - switch (code) - { - case LT: - emit_insn (gen_aarch64_cmlt (mask, operands[4], operands[5])); - break; - - case GE: - emit_insn (gen_aarch64_cmge (mask, operands[4], operands[5])); - break; - - case LE: - emit_insn (gen_aarch64_cmle (mask, operands[4], operands[5])); - break; - - case GT: - emit_insn (gen_aarch64_cmgt (mask, operands[4], operands[5])); - break; - - case LTU: - emit_insn (gen_aarch64_cmgtu (mask, operands[5], operands[4])); - break; - - case GEU: - emit_insn (gen_aarch64_cmgeu (mask, operands[4], operands[5])); - break; - - case LEU: - emit_insn (gen_aarch64_cmgeu (mask, operands[5], operands[4])); - break; - - case GTU: - emit_insn (gen_aarch64_cmgtu (mask, operands[4], operands[5])); - break; - - /* NE has been normalized to EQ above. */ - case EQ: - emit_insn (gen_aarch64_cmeq (mask, operands[4], operands[5])); - break; - - default: - gcc_unreachable (); - } - - /* If we have (a = (b CMP c) ? -1 : 0); - Then we can simply move the generated mask. */ - - if (op1 == CONSTM1_RTX (mode) - && op2 == CONST0_RTX (mode)) - emit_move_insn (operands[0], mask); - else - { - if (!REG_P (op1)) - op1 = force_reg (mode, op1); - if (!REG_P (op2)) - op2 = force_reg (mode, op2); - emit_insn (gen_aarch64_simd_bsl (operands[0], mask, - op1, op2)); - } - - DONE; -}) - -(define_expand "aarch64_vcond_internal" - [(set (match_operand:VDQF_COND 0 "register_operand") - (if_then_else:VDQF - (match_operator 3 "comparison_operator" - [(match_operand:VDQF 4 "register_operand") - (match_operand:VDQF 5 "nonmemory_operand")]) - (match_operand:VDQF_COND 1 "nonmemory_operand") - (match_operand:VDQF_COND 2 "nonmemory_operand")))] - "TARGET_SIMD" -{ - int inverse = 0; - int use_zero_form = 0; - int swap_bsl_operands = 0; - rtx op1 = operands[1]; - rtx op2 = operands[2]; - rtx mask = gen_reg_rtx (mode); - rtx tmp = gen_reg_rtx (mode); - - rtx (*base_comparison) (rtx, rtx, rtx); - rtx (*complimentary_comparison) (rtx, rtx, rtx); - - switch (GET_CODE (operands[3])) - { - case GE: - case GT: - case LE: - case LT: - case EQ: - if (operands[5] == CONST0_RTX (mode)) - { - use_zero_form = 1; - break; - } - /* Fall through. */ - default: - if (!REG_P (operands[5])) - operands[5] = force_reg (mode, operands[5]); - } - - switch (GET_CODE (operands[3])) - { - case LT: - case UNLT: - inverse = 1; - /* Fall through. */ - case GE: - case UNGE: - case ORDERED: - case UNORDERED: - base_comparison = gen_aarch64_cmge; - complimentary_comparison = gen_aarch64_cmgt; - break; - case LE: - case UNLE: - inverse = 1; - /* Fall through. */ - case GT: - case UNGT: - base_comparison = gen_aarch64_cmgt; - complimentary_comparison = gen_aarch64_cmge; - break; - case EQ: - case NE: - case UNEQ: - base_comparison = gen_aarch64_cmeq; - complimentary_comparison = gen_aarch64_cmeq; - break; - default: - gcc_unreachable (); - } - - switch (GET_CODE (operands[3])) - { - case LT: - case LE: - case GT: - case GE: - case EQ: - /* The easy case. Here we emit one of FCMGE, FCMGT or FCMEQ. - As a LT b <=> b GE a && a LE b <=> b GT a. Our transformations are: - a GE b -> a GE b - a GT b -> a GT b - a LE b -> b GE a - a LT b -> b GT a - a EQ b -> a EQ b - Note that there also exist direct comparison against 0 forms, - so catch those as a special case. */ - if (use_zero_form) - { - inverse = 0; - switch (GET_CODE (operands[3])) - { - case LT: - base_comparison = gen_aarch64_cmlt; - break; - case LE: - base_comparison = gen_aarch64_cmle; - break; - default: - /* Do nothing, other zero form cases already have the correct - base_comparison. */ - break; - } - } - - if (!inverse) - emit_insn (base_comparison (mask, operands[4], operands[5])); - else - emit_insn (complimentary_comparison (mask, operands[5], operands[4])); - break; - case UNLT: - case UNLE: - case UNGT: - case UNGE: - case NE: - /* FCM returns false for lanes which are unordered, so if we use - the inverse of the comparison we actually want to emit, then - swap the operands to BSL, we will end up with the correct result. - Note that a NE NaN and NaN NE b are true for all a, b. - - Our transformations are: - a GE b -> !(b GT a) - a GT b -> !(b GE a) - a LE b -> !(a GT b) - a LT b -> !(a GE b) - a NE b -> !(a EQ b) */ - - if (inverse) - emit_insn (base_comparison (mask, operands[4], operands[5])); - else - emit_insn (complimentary_comparison (mask, operands[5], operands[4])); - - swap_bsl_operands = 1; - break; - case UNEQ: - /* We check (a > b || b > a). combining these comparisons give us - true iff !(a != b && a ORDERED b), swapping the operands to BSL - will then give us (a == b || a UNORDERED b) as intended. */ - - emit_insn (gen_aarch64_cmgt (mask, operands[4], operands[5])); - emit_insn (gen_aarch64_cmgt (tmp, operands[5], operands[4])); - emit_insn (gen_ior3 (mask, mask, tmp)); - swap_bsl_operands = 1; - break; - case UNORDERED: - /* Operands are ORDERED iff (a > b || b >= a). - Swapping the operands to BSL will give the UNORDERED case. */ - swap_bsl_operands = 1; - /* Fall through. */ - case ORDERED: - emit_insn (gen_aarch64_cmgt (tmp, operands[4], operands[5])); - emit_insn (gen_aarch64_cmge (mask, operands[5], operands[4])); - emit_insn (gen_ior3 (mask, mask, tmp)); - break; - default: - gcc_unreachable (); - } - - if (swap_bsl_operands) - { - op1 = operands[2]; - op2 = operands[1]; - } - - /* If we have (a = (b CMP c) ? -1 : 0); - Then we can simply move the generated mask. */ - - if (op1 == CONSTM1_RTX (mode) - && op2 == CONST0_RTX (mode)) - emit_move_insn (operands[0], mask); - else - { - if (!REG_P (op1)) - op1 = force_reg (mode, op1); - if (!REG_P (op2)) - op2 = force_reg (mode, op2); - emit_insn (gen_aarch64_simd_bsl (operands[0], mask, - op1, op2)); - } - - DONE; -}) - (define_expand "vcond" [(set (match_operand:VALLDI 0 "register_operand") (if_then_else:VALLDI @@ -2520,26 +2212,50 @@ (match_operand:VALLDI 2 "nonmemory_operand")))] "TARGET_SIMD" { - emit_insn (gen_aarch64_vcond_internal (operands[0], operands[1], - operands[2], operands[3], + rtx mask = gen_reg_rtx (mode); + enum rtx_code code = GET_CODE (operands[3]); + + emit_insn (gen_vec_cmp_internal (mask, operands[3], operands[4], operands[5])); + /* See comments of vec_cmp_internal, the opposite + result masks are computed for below operators, we need to invert + the mask here. In this case we can save an inverting instruction + by simply swapping the two operands to bsl. */ + if (code == NE || code == UNEQ || code == UNLT || code == UNLE + || code == UNGT || code == UNGE || code == UNORDERED) + std::swap (operands[1], operands[2]); + + emit_insn (gen_vcond_mask_ (operands[0], operands[1], + operands[2], mask)); DONE; }) -(define_expand "vcond" - [(set (match_operand: 0 "register_operand") - (if_then_else: +(define_expand "vcond" + [(set (match_operand: 0 "register_operand") + (if_then_else: (match_operator 3 "comparison_operator" - [(match_operand:VDQF 4 "register_operand") - (match_operand:VDQF 5 "nonmemory_operand")]) - (match_operand: 1 "nonmemory_operand") - (match_operand: 2 "nonmemory_operand")))] + [(match_operand:VDQF_COND 4 "register_operand") + (match_operand:VDQF_COND 5 "nonmemory_operand")]) + (match_operand: 1 "nonmemory_operand") + (match_operand: 2 "nonmemory_operand")))] "TARGET_SIMD" { - emit_insn (gen_aarch64_vcond_internal ( + rtx mask = gen_reg_rtx (mode); + enum rtx_code code = GET_CODE (operands[3]); + + emit_insn (gen_vec_cmp_internal (mask, operands[3], + operands[4], operands[5])); + /* See comments of vec_cmp_internal, the opposite + result masks are computed for below operators, we need to invert + the mask here. In this case we can save an inverting instruction + by simply swapping the two operands to bsl. */ + if (code == NE || code == UNEQ || code == UNLT || code == UNLE + || code == UNGT || code == UNGE || code == UNORDERED) + std::swap (operands[1], operands[2]); + + emit_insn (gen_vcond_mask_ ( operands[0], operands[1], - operands[2], operands[3], - operands[4], operands[5])); + operands[2], mask)); DONE; }) @@ -2553,9 +2269,48 @@ (match_operand:VSDQ_I_DI 2 "nonmemory_operand")))] "TARGET_SIMD" { - emit_insn (gen_aarch64_vcond_internal (operands[0], operands[1], - operands[2], operands[3], + rtx mask = gen_reg_rtx (mode); + enum rtx_code code = GET_CODE (operands[3]); + + emit_insn (gen_vec_cmp_internal (mask, operands[3], operands[4], operands[5])); + /* See comments of vec_cmp_internal, the opposite result + mask is computed for NE operator, we need to invert the mask here. + In this case we can save an inverting instruction by simply swapping + the two operands to bsl. */ + if (code == NE) + std::swap (operands[1], operands[2]); + + emit_insn (gen_vcond_mask_ (operands[0], operands[1], + operands[2], mask)); + DONE; +}) + +(define_expand "vcondu" + [(set (match_operand:VDQF 0 "register_operand") + (if_then_else:VDQF + (match_operator 3 "comparison_operator" + [(match_operand: 4 "register_operand") + (match_operand: 5 "nonmemory_operand")]) + (match_operand:VDQF 1 "nonmemory_operand") + (match_operand:VDQF 2 "nonmemory_operand")))] + "TARGET_SIMD" +{ + rtx mask = gen_reg_rtx (mode); + enum rtx_code code = GET_CODE (operands[3]); + + emit_insn (gen_vec_cmp_internal ( + mask, operands[3], + operands[4], operands[5])); + /* See comments of vec_cmp_internal, the opposite result + mask is computed for NE operator, we need to invert the mask here. + In this case we can save an inverting instruction by simply swapping + the two operands to bsl. */ + if (code == NE) + std::swap (operands[1], operands[2]); + + emit_insn (gen_vcond_mask_ (operands[0], operands[1], + operands[2], mask)); DONE; }) @@ -4156,7 +3911,7 @@ ;; cmtst ;; Although neg (ne (and x y) 0) is the natural way of expressing a cmtst, -;; we don't have any insns using ne, and aarch64_vcond_internal outputs +;; we don't have any insns using ne, and aarch64_vcond outputs ;; not (neg (eq (and x y) 0)) ;; which is rewritten by simplify_rtx as ;; plus (eq (and x y) 0) -1. diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 43b22d8..f53fe9d 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -610,6 +610,16 @@ (V2DF "v2di") (DF "di") (SF "si")]) +;; Mode for vector conditional operations where the comparison has +;; different type from the lhs. +(define_mode_attr V_cmp_mixed [(V2SI "V2SF") (V4SI "V4SF") + (V2DI "V2DF") (V2SF "V2SI") + (V4SF "V4SI") (V2DF "V2DI")]) + +(define_mode_attr v_cmp_mixed [(V2SI "v2sf") (V4SI "v4sf") + (V2DI "v2df") (V2SF "v2si") + (V4SF "v4si") (V2DF "v2di")]) + ;; Lower case element modes (as used in shift immediate patterns). (define_mode_attr ve_mode [(V8QI "qi") (V16QI "qi") (V4HI "hi") (V8HI "hi")