From patchwork Fri Feb 17 10:14:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 1744146 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=UsWksH27; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4PJ7461WxCz23r4 for ; Fri, 17 Feb 2023 21:15:31 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6FAC2385B51D for ; Fri, 17 Feb 2023 10:15:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6FAC2385B51D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1676628926; bh=wl3haC2qtULdNDBEvtcQsBuKrco7yZ/K9SEkdKJ0dVc=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=UsWksH27SvyDl4ZP3bzKLoKmIlsvb1l91YmHN+eDrvkqa8a1VU2bbM6hhdk6TLk1O U90rVkUoy1B56ZOl1kw/2HMWadydv//wVwMiZe3WR4Yc6Ak/BIrC9SCuoqxygMgsq4 OSkRgyGAVz+iaThTUiskPRL4oPjj2oWEX9T9FWco= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 359E93858C78 for ; Fri, 17 Feb 2023 10:15:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 359E93858C78 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-549-m3VqmZ0bOneWGCUTlpibSw-1; Fri, 17 Feb 2023 05:15:02 -0500 X-MC-Unique: m3VqmZ0bOneWGCUTlpibSw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5E8A13C10696; Fri, 17 Feb 2023 10:15:02 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.211]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 189CB175A2; Fri, 17 Feb 2023 10:15:01 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 31HAExCG2921687 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 17 Feb 2023 11:14:59 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 31HAEw2c2921686; Fri, 17 Feb 2023 11:14:58 +0100 Date: Fri, 17 Feb 2023 11:14:57 +0100 To: Richard Sandiford , Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803] Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi! The following testcase is miscompiled on aarch64. The problem is that aarch64 with TARGET_SIMD is !SHIFT_COUNT_TRUNCATED target with targetm.shift_truncation_mask (DImode) == 0 which has HAVE_conditional_move true. If a doubleword shift (in this case TImode) doesn't have its own expander (as is the case of e.g. x86) and is handled in generic code, there are 3 possible expansions. One is when the shift count is constant, the code computes in that case superword_op1 as op1 - BITS_PER_WORD, and chooses at compile time which of expand_superword_shift or expand_subword_shift to call, which ensures that whatever is used actually has its shift count (superword_op1 or op1) in [0, BITS_PER_WORD - 1] range. If !HAVE_conditional_move or that expansion fails, the function handles non-constant op1 similarly (there are some special cases for shift_mask >= BITS_PER_WORD - 1 but let's talk here just about shift_mask < BITS_PER_WORD - 1), except that the selection is done at runtime, with branches around the stuff. While superword_op1 can be [-BITS_PER_WORD, BITS_PER_WORD - 1] and op1 [0, 2 * BITS_PER_WORD - 1], the runtime selection ensures that the instructions executed at runtime have their corresponding shift count again in the right range of [0, BITS_PER_WORD - 1]. The problematic is the HAVE_conditional_move case, which emits both sequences into the actually executed code, so necessarily one of them has an out of range shift count and then using 2 conditional moves picks up a result. Now, in the testcase because -Og doesn't perform VRP/EVRP the shift count isn't optimized to constant during GIMPLE passes, but is determined to be constant during/after expansion into RTL. The shift count is actually const0_rtx later, so superword_op1 is (const_int -64) and we end up with miscompilation during combine because of that. I'm afraid on targetm.shift_truncation_mask (DImode) == 0 targets we have to mask the shift counts when the doubleshift count is in range but one of subword_op1/superword_op1 is not, which is what the following patch does and what fixes the wrong-code. Now, targets like x86 or aarch64, while they are !SHIFT_COUNT_TRUNCATED, have actually patterns to catch shift with masked counter, so the ANDs can be optimized out. On the other side, when we know the result will be masked this way we can use the simpler ~op1 instead of (BITS_PER_WORD - 1) - op1 in expand_subword_shift. So, say on __attribute__((noipa)) __int128 foo (__int128 a, unsigned k) { return a << k; } on aarch64 at -Og the difference is: foo: subs w5, w2, #64 - lsl x6, x0, x5 + lsl x4, x0, x2 lsr x3, x0, 1 - mov w4, 63 - sub w4, w4, w2 - lsr x3, x3, x4 + mvn w6, w2 + and w2, w2, 63 + lsr x3, x3, x6 lsl x1, x1, x2 orr x1, x3, x1 lsl x0, x0, x2 csel x0, xzr, x0, pl - csel x1, x6, x1, pl + csel x1, x4, x1, pl ret We could do even better and optimize the and w2, w2, 63 instruction out, but it is a matter of costs and so IMHO should be handled incrementally. For that case consider say: void corge (int, int, int); void qux (int x, int y, int z, int n) { n &= 31; corge (x << n, y << n, z >> n); } with -O2 -fno-tree-vectorize, on x86_64 one gets sarl %cl, %edx sall %cl, %esi sall %cl, %edi jmp corge but on aarch64 and w3, w3, 31 lsl w0, w0, w3 lsl w1, w1, w3 asr w2, w2, w3 b corge The reason it works on x86 is that its rtx_costs hook recognizes that the AND in shift+mask patterns is for free. Trying 9 -> 11: 9: {r85:SI=r96:SI&0x1f;clobber flags:CC;} REG_UNUSED flags:CC 11: {r91:SI=r87:SI< 11: 9: r95:SI=r106:SI&0x1f REG_DEAD r106:SI 11: r101:SI=r104:SI< PR middle-end/108803 * optabs.cc (expand_subword_shift): Add MASK_COUNT argument, if true, use ~op1 & (BITS_PER_WORD - 1) instead of (BITS_PER_WORD - 1) - op1 as tmp and op1 & (BITS_PER_WORD - 1) instead of op1 on the other two shifts. (expand_doubleword_shift_condmove): Use superword_op1 & (BITS_PER_WORD - 1) instead of superword_op1. Pass shift_mask < BITS_PER_WORD - 1 as MASK_COUNT to expand_subword_shift. (expand_doubleword_shift): Pass false as MASK_COUNT to expand_subword_shift. * gcc.dg/pr108803.c: New test. Jakub --- gcc/optabs.cc.jj 2023-01-02 09:32:53.309838465 +0100 +++ gcc/optabs.cc 2023-02-16 20:44:21.862031535 +0100 @@ -507,7 +507,7 @@ expand_subword_shift (scalar_int_mode op rtx outof_input, rtx into_input, rtx op1, rtx outof_target, rtx into_target, int unsignedp, enum optab_methods methods, - unsigned HOST_WIDE_INT shift_mask) + unsigned HOST_WIDE_INT shift_mask, bool mask_count) { optab reverse_unsigned_shift, unsigned_shift; rtx tmp, carries; @@ -535,7 +535,7 @@ expand_subword_shift (scalar_int_mode op are truncated to the mode size. */ carries = expand_binop (word_mode, reverse_unsigned_shift, outof_input, const1_rtx, 0, unsignedp, methods); - if (shift_mask == BITS_PER_WORD - 1) + if (shift_mask == BITS_PER_WORD - 1 || mask_count) { tmp = immed_wide_int_const (wi::minus_one (GET_MODE_PRECISION (op1_mode)), op1_mode); @@ -549,6 +549,15 @@ expand_subword_shift (scalar_int_mode op tmp = simplify_expand_binop (op1_mode, sub_optab, tmp, op1, 0, true, methods); } + if (mask_count) + { + rtx tmp2 = immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1, + op1_mode), op1_mode); + tmp = simplify_expand_binop (op1_mode, and_optab, tmp, tmp2, 0, + true, methods); + op1 = simplify_expand_binop (op1_mode, and_optab, op1, tmp2, 0, + true, methods); + } } if (tmp == 0 || carries == 0) return false; @@ -596,6 +605,15 @@ expand_doubleword_shift_condmove (scalar { rtx outof_superword, into_superword; + if (shift_mask < BITS_PER_WORD - 1) + { + rtx tmp = immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1, op1_mode), + op1_mode); + superword_op1 + = simplify_expand_binop (op1_mode, and_optab, superword_op1, tmp, + 0, true, methods); + } + /* Put the superword version of the output into OUTOF_SUPERWORD and INTO_SUPERWORD. */ outof_superword = outof_target != 0 ? gen_reg_rtx (word_mode) : 0; @@ -621,7 +639,8 @@ expand_doubleword_shift_condmove (scalar if (!expand_subword_shift (op1_mode, binoptab, outof_input, into_input, subword_op1, outof_target, into_target, - unsignedp, methods, shift_mask)) + unsignedp, methods, shift_mask, + shift_mask < BITS_PER_WORD - 1)) return false; /* Select between them. Do the INTO half first because INTO_SUPERWORD @@ -742,7 +761,7 @@ expand_doubleword_shift (scalar_int_mode return expand_subword_shift (op1_mode, binoptab, outof_input, into_input, op1, outof_target, into_target, - unsignedp, methods, shift_mask); + unsignedp, methods, shift_mask, false); } /* Try using conditional moves to generate straight-line code. */ @@ -781,7 +800,7 @@ expand_doubleword_shift (scalar_int_mode if (!expand_subword_shift (op1_mode, binoptab, outof_input, into_input, op1, outof_target, into_target, - unsignedp, methods, shift_mask)) + unsignedp, methods, shift_mask, false)) return false; emit_label (done_label); --- gcc/testsuite/gcc.dg/pr108803.c.jj 2023-02-16 20:48:53.517032422 +0100 +++ gcc/testsuite/gcc.dg/pr108803.c 2023-02-16 20:48:45.971143498 +0100 @@ -0,0 +1,24 @@ +/* PR middle-end/108803 */ +/* { dg-do run { target int128 } } */ +/* { dg-options "-Og" } */ + +unsigned i, j; + +__int128 +foo (__int128 a) +{ + a &= 9; + unsigned k = __builtin_add_overflow_p (i, j, a); + __int128 b = (63 + a) << k | ((63 + a) >> (63 & k)); + __int128 c = a + b; + return c; +} + +int +main (void) +{ + __int128 x = foo (0); + if (x != 63) + __builtin_abort (); + return 0; +}