From patchwork Mon Apr 29 07:14:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Botcazou X-Patchwork-Id: 1928847 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=adacore.com header.i=@adacore.com header.a=rsa-sha256 header.s=google header.b=k/gtfxtY; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VSZMs1NXKz23hd for ; Mon, 29 Apr 2024 17:14:49 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 613593858432 for ; Mon, 29 Apr 2024 07:14:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by sourceware.org (Postfix) with ESMTPS id 2ADD83858427 for ; Mon, 29 Apr 2024 07:14:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2ADD83858427 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=adacore.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=adacore.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2ADD83858427 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::330 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714374865; cv=none; b=aPztS731omBsxW81rNUUiKQxPEE3rx6UAZjtOIgyrlYwccA7EkIYdnQ/bJ7U6BfJdVixGyE8Ispbxsdi8oEpxWBBglgCQt8scv9YkwfV5LjOVdWJFvzQrNr1FmnccPx8dC9SN+PWBn2inbcTWTpwTL1AekMhZ5yaf2K6T5li93A= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714374865; c=relaxed/simple; bh=n5qaWj0r2VL+xcgQxTDJfj3buvnizcVhlT9OQJK4VS0=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=nschYZFgZYqSJXYqNuHlHJOM/Q8ecFt+aYEs3tbSwKlqRnFfIUXZxhH8hq6zhyMUmcEUmNy4x8IgfJATrhw6TsunZiiW3fCQMHk7rQguZlcKboPn48KXcWgwasqPyx/gAGSXU0YRDdTtkp7JvUfqcDyM+r+itKoYRnJx/NX4/PQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x330.google.com with SMTP id 5b1f17b1804b1-41b9dff6be8so17898685e9.3 for ; Mon, 29 Apr 2024 00:14:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adacore.com; s=google; t=1714374860; x=1714979660; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=qX2qzwBUvS0Ft3tl0hSBu0EuN+U3BOlTaLp1Xs7mYt0=; b=k/gtfxtYxV9//DKk5p9APbzid/qnXEM1eFOc+fow7BKZjPXVTnE7z58JhniIHwXdV9 YPm1pw1tODU/Esf9HBL4RE+rP3Nn9tLefEVl+vtFd1U9MUT6AGhUtKylBxtUqIzfhZsh FKEic1qCM3JTn9vlwef6SIXuPFz5lMULobrOZ84JUfAUJtuJGVuMM2dmVqP0lZP/4OkP Jxd9QLk3W7xMhKPsJ9rCP4OtpMnCDiaxFd3lJw2PaR+DsGMcodnQoaNCiyIvFSEpS7u7 3yda2kW8OC8sKLNgYXsBorh2XXP7Vwxrxayuu3HTHzaxl0cTcwasIx2ZKw5eR/Jaex3j L9ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714374860; x=1714979660; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qX2qzwBUvS0Ft3tl0hSBu0EuN+U3BOlTaLp1Xs7mYt0=; b=Zf+gQ3I54hPsyZmEYPFq76TWcskoacUgo6U/UP2GBATdpI0ed5itap2F12yfAD9mQh Vgq/WRfHtSjEUx71/nhRdwBuBKHE4vDDxHEi+HF2cSEAzroKFOP9AqT5PttvM0llRAxa N8OdjssumiP+1ncmw60ugMwA4e8DaANz5ISjJ6Hb4O7n9zolcPzbxX/jGgxEjo9gwHZd WUczTszm41gPCS5/m57BUAIyZUrnJT/LwUk3rOFeETL8p8nyE3dwr+7FINPw5jedYN8+ eKybEUPuNRx0RlLT/t+FXU3uydqYXNDsqjgb948kF4D3wAZEaO4YKdiilba7PVVtQZLz Pvhg== X-Gm-Message-State: AOJu0YzA4k0SoPNp98bqSnFny4K/gWmZFC/EL16r+Nv0TnpdhWJ4mZC8 d4uQc7rhSjVFikdBcjSf1oy6Of+DIR29Dqlu22Q8erymKI+NIA8uXoj0rxqX+/HNckxR9myKp0Y = X-Google-Smtp-Source: AGHT+IG4kEwSpGsmg5fNt4h3PJG1MNpaOf7bIs9fM+1QKt2kjhjfMtKhjwP+lXzyBP2ASRAoz9MU9g== X-Received: by 2002:a05:600c:46ce:b0:41b:fb9b:d2be with SMTP id q14-20020a05600c46ce00b0041bfb9bd2bemr3009635wmo.40.1714374859681; Mon, 29 Apr 2024 00:14:19 -0700 (PDT) Received: from fomalhaut.localnet ([2a01:e0a:8d5:d990:e654:e8ff:fe8f:2ce6]) by smtp.gmail.com with ESMTPSA id n7-20020a05600c500700b0041bc412899fsm8880306wmr.42.2024.04.29.00.14.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 00:14:19 -0700 (PDT) From: Eric Botcazou X-Google-Original-From: Eric Botcazou To: gcc-patches@gcc.gnu.org Subject: [PATCH] Minor tweaks to code computing modular multiplicative inverse Date: Mon, 29 Apr 2024 09:14:18 +0200 Message-ID: <6040019.lOV4Wx5bFT@fomalhaut> MIME-Version: 1.0 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, this removes the last parameter of choose_multiplier, which is unused, adds another assertion and more details to the description and various comments. Likewise to the closely related invert_mod2n, except for the last parameter. Tested on x86-64/Linux, OK for the mainline? 2024-04-29 Eric Botcazou * expmed.h (choose_multiplier): Tweak description and remove last parameter. * expmed.cc (choose_multiplier): Likewise. Add assertion for the third parameter and adds details to various comments. (invert_mod2n): Tweak description and add assertion for the first parameter. (expand_divmod): Adjust calls to choose_multiplier. * tree-vect-generic.cc (expand_vector_divmod): Likewise. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Likewise. diff --git a/gcc/expmed.cc b/gcc/expmed.cc index 4ec035e4843..60f65c7acc5 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -3689,50 +3689,62 @@ expand_widening_mult (machine_mode mode, rtx op0, rtx op1, rtx target, unsignedp, OPTAB_LIB_WIDEN); } -/* Choose a minimal N + 1 bit approximation to 1/D that can be used to - replace division by D, and put the least significant N bits of the result - in *MULTIPLIER_PTR and return the most significant bit. +/* Choose a minimal N + 1 bit approximation to 2**K / D that can be used to + replace division by D, put the least significant N bits of the result in + *MULTIPLIER_PTR, the value K - N in *POST_SHIFT_PTR, and return the most + significant bit. The width of operations is N (should be <= HOST_BITS_PER_WIDE_INT), the - needed precision is in PRECISION (should be <= N). + needed precision is PRECISION (should be <= N). - PRECISION should be as small as possible so this function can choose - multiplier more freely. + PRECISION should be as small as possible so this function can choose the + multiplier more freely. If PRECISION is <= N - 1, the most significant + bit returned by the function will be zero. - The rounded-up logarithm of D is placed in *lgup_ptr. A shift count that - is to be used for a final right shift is placed in *POST_SHIFT_PTR. - - Using this function, x/D will be equal to (x * m) >> (*POST_SHIFT_PTR), - where m is the full HOST_BITS_PER_WIDE_INT + 1 bit multiplier. */ + Using this function, x / D is equal to (x*m) / 2**N >> (*POST_SHIFT_PTR), + where m is the full N + 1 bit multiplier. */ unsigned HOST_WIDE_INT choose_multiplier (unsigned HOST_WIDE_INT d, int n, int precision, unsigned HOST_WIDE_INT *multiplier_ptr, - int *post_shift_ptr, int *lgup_ptr) + int *post_shift_ptr) { int lgup, post_shift; - int pow, pow2; + int pow1, pow2; - /* lgup = ceil(log2(divisor)); */ + /* lgup = ceil(log2(d)) */ + /* Assuming d > 1, we have d >= 2^(lgup-1) + 1 */ lgup = ceil_log2 (d); gcc_assert (lgup <= n); + gcc_assert (lgup <= precision); - pow = n + lgup; + pow1 = n + lgup; pow2 = n + lgup - precision; - /* mlow = 2^(N + lgup)/d */ - wide_int val = wi::set_bit_in_zero (pow, HOST_BITS_PER_DOUBLE_INT); + /* mlow = 2^(n + lgup)/d */ + /* Trivially from above we have mlow < 2^(n+1) */ + wide_int val = wi::set_bit_in_zero (pow1, HOST_BITS_PER_DOUBLE_INT); wide_int mlow = wi::udiv_trunc (val, d); - /* mhigh = (2^(N + lgup) + 2^(N + lgup - precision))/d */ + /* mhigh = (2^(n + lgup) + 2^(n + lgup - precision))/d */ + /* From above we have mhigh < 2^(n+1) assuming lgup <= precision */ + /* From precision <= n, the difference between the numerators of mhigh and + mlow is >= 2^lgup >= d. Therefore the difference of the quotients in + the Euclidean division by d is at least 1, so we have mlow < mhigh and + the exact value of 2^(n + lgup)/d lies in the interval [mlow; mhigh(. */ val |= wi::set_bit_in_zero (pow2, HOST_BITS_PER_DOUBLE_INT); wide_int mhigh = wi::udiv_trunc (val, d); - /* If precision == N, then mlow, mhigh exceed 2^N - (but they do not exceed 2^(N+1)). */ - /* Reduce to lowest terms. */ + /* If precision <= n - 1, then the difference between the numerators of + mhigh and mlow is >= 2^(lgup + 1) >= 2 * 2^lgup >= 2 * d. Therefore + the difference of the quotients in the Euclidean division by d is at + least 2, which means that mhigh and mlow differ by at least one bit + not in the last place. The conclusion is that the first iteration of + the loop below completes and shifts mhigh and mlow by 1 bit, which in + particular means that mhigh < 2^n, that is to say, the most significant + bit in the n + 1 bit value is zero. */ for (post_shift = lgup; post_shift > 0; post_shift--) { unsigned HOST_WIDE_INT ml_lo = wi::extract_uhwi (mlow, 1, @@ -3747,7 +3759,7 @@ choose_multiplier (unsigned HOST_WIDE_INT d, int n, int precision, } *post_shift_ptr = post_shift; - *lgup_ptr = lgup; + if (n < HOST_BITS_PER_WIDE_INT) { unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << n) - 1; @@ -3761,31 +3773,32 @@ choose_multiplier (unsigned HOST_WIDE_INT d, int n, int precision, } } -/* Compute the inverse of X mod 2**n, i.e., find Y such that X * Y is - congruent to 1 (mod 2**N). */ +/* Compute the inverse of X mod 2**N, i.e., find Y such that X * Y is congruent + to 1 modulo 2**N, assuming that X is odd. Bézout's lemma guarantees that Y + exists for any given positive N. */ static unsigned HOST_WIDE_INT invert_mod2n (unsigned HOST_WIDE_INT x, int n) { - /* Solve x*y == 1 (mod 2^n), where x is odd. Return y. */ + gcc_assert ((x & 1) == 1); - /* The algorithm notes that the choice y = x satisfies - x*y == 1 mod 2^3, since x is assumed odd. - Each iteration doubles the number of bits of significance in y. */ + /* The algorithm notes that the choice Y = Z satisfies X*Y == 1 mod 2^3, + since X is odd. Then each iteration doubles the number of bits of + significance in Y. */ - unsigned HOST_WIDE_INT mask; + const unsigned HOST_WIDE_INT mask + = (n == HOST_BITS_PER_WIDE_INT + ? HOST_WIDE_INT_M1U + : (HOST_WIDE_INT_1U << n) - 1); unsigned HOST_WIDE_INT y = x; int nbit = 3; - mask = (n == HOST_BITS_PER_WIDE_INT - ? HOST_WIDE_INT_M1U - : (HOST_WIDE_INT_1U << n) - 1); - while (nbit < n) { y = y * (2 - x*y) & mask; /* Modulo 2^N */ nbit *= 2; } + return y; } @@ -4443,7 +4456,6 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, { unsigned HOST_WIDE_INT mh, ml; int pre_shift, post_shift; - int dummy; wide_int wd = rtx_mode_t (op1, int_mode); unsigned HOST_WIDE_INT d = wd.to_uhwi (); @@ -4476,10 +4488,9 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, else { /* Find a suitable multiplier and right shift count - instead of multiplying with D. */ - + instead of directly dividing by D. */ mh = choose_multiplier (d, size, size, - &ml, &post_shift, &dummy); + &ml, &post_shift); /* If the suggested multiplier is more than SIZE bits, we can do better for even divisors, using an @@ -4489,7 +4500,7 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, pre_shift = ctz_or_zero (d); mh = choose_multiplier (d >> pre_shift, size, size - pre_shift, - &ml, &post_shift, &dummy); + &ml, &post_shift); gcc_assert (!mh); } else @@ -4561,7 +4572,7 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, else /* TRUNC_DIV, signed */ { unsigned HOST_WIDE_INT ml; - int lgup, post_shift; + int post_shift; rtx mlr; HOST_WIDE_INT d = INTVAL (op1); unsigned HOST_WIDE_INT abs_d; @@ -4657,7 +4668,7 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, else if (size <= HOST_BITS_PER_WIDE_INT) { choose_multiplier (abs_d, size, size - 1, - &ml, &post_shift, &lgup); + &ml, &post_shift); if (ml < HOST_WIDE_INT_1U << (size - 1)) { rtx t1, t2, t3; @@ -4748,7 +4759,7 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, scalar_int_mode int_mode = as_a (compute_mode); int size = GET_MODE_BITSIZE (int_mode); unsigned HOST_WIDE_INT mh, ml; - int pre_shift, lgup, post_shift; + int pre_shift, post_shift; HOST_WIDE_INT d = INTVAL (op1); if (d > 0) @@ -4778,7 +4789,7 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, rtx t1, t2, t3, t4; mh = choose_multiplier (d, size, size - 1, - &ml, &post_shift, &lgup); + &ml, &post_shift); gcc_assert (!mh); if (post_shift < BITS_PER_WORD diff --git a/gcc/expmed.h b/gcc/expmed.h index bf3c4097f33..f5375c84f25 100644 --- a/gcc/expmed.h +++ b/gcc/expmed.h @@ -694,12 +694,13 @@ extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx, extern void canonicalize_comparison (machine_mode, enum rtx_code *, rtx *); -/* Choose a minimal N + 1 bit approximation to 1/D that can be used to - replace division by D, and put the least significant N bits of the result - in *MULTIPLIER_PTR and return the most significant bit. */ +/* Choose a minimal N + 1 bit approximation to 2**K / D that can be used to + replace division by D, put the least significant N bits of the result in + *MULTIPLIER_PTR, the value K - N in *POST_SHIFT_PTR, and return the most + significant bit. */ extern unsigned HOST_WIDE_INT choose_multiplier (unsigned HOST_WIDE_INT, int, int, unsigned HOST_WIDE_INT *, - int *, int *); + int *); #ifdef TREE_CODE extern rtx expand_variable_shift (enum tree_code, machine_mode, diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index ab640096ca2..ea0069f7a67 100644 --- a/gcc/tree-vect-generic.cc +++ b/gcc/tree-vect-generic.cc @@ -551,7 +551,6 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0, int *shift_temps = post_shifts + nunits; unsigned HOST_WIDE_INT *mulc = XALLOCAVEC (unsigned HOST_WIDE_INT, nunits); int prec = TYPE_PRECISION (TREE_TYPE (type)); - int dummy_int; unsigned int i; signop sign_p = TYPE_SIGN (TREE_TYPE (type)); unsigned HOST_WIDE_INT mask = GET_MODE_MASK (TYPE_MODE (TREE_TYPE (type))); @@ -609,11 +608,11 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0, continue; } - /* Find a suitable multiplier and right shift count - instead of multiplying with D. */ - mh = choose_multiplier (d, prec, prec, &ml, &post_shift, &dummy_int); + /* Find a suitable multiplier and right shift count instead of + directly dividing by D. */ + mh = choose_multiplier (d, prec, prec, &ml, &post_shift); - /* If the suggested multiplier is more than SIZE bits, we can + /* If the suggested multiplier is more than PREC bits, we can do better for even divisors, using an initial right shift. */ if ((mh != 0 && (d & 1) == 0) || (!has_vector_shift && pre_shift != -1)) @@ -655,7 +654,7 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0, } mh = choose_multiplier (d >> pre_shift, prec, prec - pre_shift, - &ml, &post_shift, &dummy_int); + &ml, &post_shift); gcc_assert (!mh); pre_shifts[i] = pre_shift; } @@ -699,7 +698,7 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0, } choose_multiplier (abs_d, prec, prec - 1, &ml, - &post_shift, &dummy_int); + &post_shift); if (ml >= HOST_WIDE_INT_1U << (prec - 1)) { this_mode = 4 + (d < 0); diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 87c2acff386..8e8de5ea3a5 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -4535,7 +4535,7 @@ vect_recog_divmod_pattern (vec_info *vinfo, enum tree_code rhs_code; optab optab; tree q, cst; - int dummy_int, prec; + int prec; if (!is_gimple_assign (last_stmt)) return NULL; @@ -4795,17 +4795,17 @@ vect_recog_divmod_pattern (vec_info *vinfo, /* FIXME: Can transform this into oprnd0 >= oprnd1 ? 1 : 0. */ return NULL; - /* Find a suitable multiplier and right shift count - instead of multiplying with D. */ - mh = choose_multiplier (d, prec, prec, &ml, &post_shift, &dummy_int); + /* Find a suitable multiplier and right shift count instead of + directly dividing by D. */ + mh = choose_multiplier (d, prec, prec, &ml, &post_shift); - /* If the suggested multiplier is more than SIZE bits, we can do better + /* If the suggested multiplier is more than PREC bits, we can do better for even divisors, using an initial right shift. */ if (mh != 0 && (d & 1) == 0) { pre_shift = ctz_or_zero (d); mh = choose_multiplier (d >> pre_shift, prec, prec - pre_shift, - &ml, &post_shift, &dummy_int); + &ml, &post_shift); gcc_assert (!mh); } else @@ -4924,7 +4924,7 @@ vect_recog_divmod_pattern (vec_info *vinfo, /* This case is not handled correctly below. */ return NULL; - choose_multiplier (abs_d, prec, prec - 1, &ml, &post_shift, &dummy_int); + choose_multiplier (abs_d, prec, prec - 1, &ml, &post_shift); if (ml >= HOST_WIDE_INT_1U << (prec - 1)) { add = true;